summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/controllers/cpuacct.txt32
-rw-r--r--Documentation/ftrace.txt62
-rw-r--r--Documentation/kernel-parameters.txt8
-rw-r--r--Documentation/lockstat.txt51
-rw-r--r--Documentation/markers.txt14
-rw-r--r--Documentation/scheduler/sched-arch.txt4
-rw-r--r--Documentation/tracepoints.txt92
7 files changed, 174 insertions, 89 deletions
diff --git a/Documentation/controllers/cpuacct.txt b/Documentation/controllers/cpuacct.txt
new file mode 100644
index 000000000000..bb775fbe43d7
--- /dev/null
+++ b/Documentation/controllers/cpuacct.txt
@@ -0,0 +1,32 @@
+CPU Accounting Controller
+-------------------------
+
+The CPU accounting controller is used to group tasks using cgroups and
+account the CPU usage of these groups of tasks.
+
+The CPU accounting controller supports multi-hierarchy groups. An accounting
+group accumulates the CPU usage of all of its child groups and the tasks
+directly present in its group.
+
+Accounting groups can be created by first mounting the cgroup filesystem.
+
+# mkdir /cgroups
+# mount -t cgroup -ocpuacct none /cgroups
+
+With the above step, the initial or the parent accounting group
+becomes visible at /cgroups. At bootup, this group includes all the
+tasks in the system. /cgroups/tasks lists the tasks in this cgroup.
+/cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by
+this group which is essentially the CPU time obtained by all the tasks
+in the system.
+
+New accounting groups can be created under the parent group /cgroups.
+
+# cd /cgroups
+# mkdir g1
+# echo $$ > g1
+
+The above steps create a new group g1 and move the current shell
+process (bash) into it. CPU time consumed by this bash and its children
+can be obtained from g1/cpuacct.usage and the same is accumulated in
+/cgroups/cpuacct.usage also.
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 9cc4d685dde5..35a78bc6651d 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -82,7 +82,7 @@ of ftrace. Here is a list of some of the key files:
tracer is not adding more data, they will display
the same information every time they are read.
- iter_ctrl: This file lets the user control the amount of data
+ trace_options: This file lets the user control the amount of data
that is displayed in one of the above output
files.
@@ -94,10 +94,10 @@ of ftrace. Here is a list of some of the key files:
only be recorded if the latency is greater than
the value in this file. (in microseconds)
- trace_entries: This sets or displays the number of bytes each CPU
+ buffer_size_kb: This sets or displays the number of kilobytes each CPU
buffer can hold. The tracer buffers are the same size
for each CPU. The displayed number is the size of the
- CPU buffer and not total size of all buffers. The
+ CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
@@ -316,23 +316,23 @@ The above is mostly meaningful for kernel developers.
The rest is the same as the 'trace' file.
-iter_ctrl
----------
+trace_options
+-------------
-The iter_ctrl file is used to control what gets printed in the trace
+The trace_options file is used to control what gets printed in the trace
output. To see what is available, simply cat the file:
- cat /debug/tracing/iter_ctrl
+ cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
- noblock nostacktrace nosched-tree
+ noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj
To disable one of the options, echo in the option prepended with "no".
- echo noprint-parent > /debug/tracing/iter_ctrl
+ echo noprint-parent > /debug/tracing/trace_options
To enable an option, leave off the "no".
- echo sym-offset > /debug/tracing/iter_ctrl
+ echo sym-offset > /debug/tracing/trace_options
Here are the available options:
@@ -378,6 +378,20 @@ Here are the available options:
When a trace is recorded, so is the stack of functions.
This allows for back traces of trace sites.
+ userstacktrace - This option changes the trace.
+ It records a stacktrace of the current userspace thread.
+
+ sym-userobj - when user stacktrace are enabled, look up which object the
+ address belongs to, and print a relative address
+ This is especially useful when ASLR is on, otherwise you don't
+ get a chance to resolve the address to object/file/line after the app is no
+ longer running
+
+ The lookup is performed when you read trace,trace_pipe,latency_trace. Example:
+
+ a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
+x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
+
sched-tree - TBD (any users??)
@@ -1299,41 +1313,29 @@ trace entries
-------------
Having too much or not enough data can be troublesome in diagnosing
-an issue in the kernel. The file trace_entries is used to modify
+an issue in the kernel. The file buffer_size_kb is used to modify
the size of the internal trace buffers. The number listed
is the number of entries that can be recorded per CPU. To know
the full size, multiply the number of possible CPUS with the
number of entries.
- # cat /debug/tracing/trace_entries
-65620
+ # cat /debug/tracing/buffer_size_kb
+1408 (units kilobytes)
Note, to modify this, you must have tracing completely disabled. To do that,
echo "nop" into the current_tracer. If the current_tracer is not set
to "nop", an EINVAL error will be returned.
# echo nop > /debug/tracing/current_tracer
- # echo 100000 > /debug/tracing/trace_entries
- # cat /debug/tracing/trace_entries
-100045
-
-
-Notice that we echoed in 100,000 but the size is 100,045. The entries
-are held in individual pages. It allocates the number of pages it takes
-to fulfill the request. If more entries may fit on the last page
-then they will be added.
-
- # echo 1 > /debug/tracing/trace_entries
- # cat /debug/tracing/trace_entries
-85
-
-This shows us that 85 entries can fit in a single page.
+ # echo 10000 > /debug/tracing/buffer_size_kb
+ # cat /debug/tracing/buffer_size_kb
+10000 (units kilobytes)
The number of pages which will be allocated is limited to a percentage
of available memory. Allocating too much will produce an error.
- # echo 1000000000000 > /debug/tracing/trace_entries
+ # echo 1000000000000 > /debug/tracing/buffer_size_kb
-bash: echo: write error: Cannot allocate memory
- # cat /debug/tracing/trace_entries
+ # cat /debug/tracing/buffer_size_kb
85
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e0f346d201ed..2919a2e91938 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -750,6 +750,14 @@ and is between 256 and 4096 characters. It is defined in the file
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
+ ftrace=[tracer]
+ [ftrace] will set and start the specified tracer
+ as early as possible in order to facilitate early
+ boot debugging.
+
+ ftrace_dump_on_oops
+ [ftrace] will dump the trace buffers on oops.
+
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt
index 4ba4664ce5c3..9cb9138f7a79 100644
--- a/Documentation/lockstat.txt
+++ b/Documentation/lockstat.txt
@@ -71,35 +71,50 @@ Look at the current lock statistics:
# less /proc/lock_stat
-01 lock_stat version 0.2
+01 lock_stat version 0.3
02 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
03 class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
04 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
05
-06 &inode->i_data.tree_lock-W: 15 21657 0.18 1093295.30 11547131054.85 58 10415 0.16 87.51 6387.60
-07 &inode->i_data.tree_lock-R: 0 0 0.00 0.00 0.00 23302 231198 0.25 8.45 98023.38
-08 --------------------------
-09 &inode->i_data.tree_lock 0 [<ffffffff8027c08f>] add_to_page_cache+0x5f/0x190
-10
-11 ...............................................................................................................................................................................................
-12
-13 dcache_lock: 1037 1161 0.38 45.32 774.51 6611 243371 0.15 306.48 77387.24
-14 -----------
-15 dcache_lock 180 [<ffffffff802c0d7e>] sys_getcwd+0x11e/0x230
-16 dcache_lock 165 [<ffffffff802c002a>] d_alloc+0x15a/0x210
-17 dcache_lock 33 [<ffffffff8035818d>] _atomic_dec_and_lock+0x4d/0x70
-18 dcache_lock 1 [<ffffffff802beef8>] shrink_dcache_parent+0x18/0x130
+06 &mm->mmap_sem-W: 233 538 18446744073708 22924.27 607243.51 1342 45806 1.71 8595.89 1180582.34
+07 &mm->mmap_sem-R: 205 587 18446744073708 28403.36 731975.00 1940 412426 0.58 187825.45 6307502.88
+08 ---------------
+09 &mm->mmap_sem 487 [<ffffffff8053491f>] do_page_fault+0x466/0x928
+10 &mm->mmap_sem 179 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
+11 &mm->mmap_sem 279 [<ffffffff80210a57>] sys_mmap+0x75/0xce
+12 &mm->mmap_sem 76 [<ffffffff802a490b>] sys_munmap+0x32/0x59
+13 ---------------
+14 &mm->mmap_sem 270 [<ffffffff80210a57>] sys_mmap+0x75/0xce
+15 &mm->mmap_sem 431 [<ffffffff8053491f>] do_page_fault+0x466/0x928
+16 &mm->mmap_sem 138 [<ffffffff802a490b>] sys_munmap+0x32/0x59
+17 &mm->mmap_sem 145 [<ffffffff802a6200>] sys_mprotect+0xcd/0x21d
+18
+19 ...............................................................................................................................................................................................
+20
+21 dcache_lock: 621 623 0.52 118.26 1053.02 6745 91930 0.29 316.29 118423.41
+22 -----------
+23 dcache_lock 179 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
+24 dcache_lock 113 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
+25 dcache_lock 99 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
+26 dcache_lock 104 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
+27 -----------
+28 dcache_lock 192 [<ffffffff80378274>] _atomic_dec_and_lock+0x34/0x54
+29 dcache_lock 98 [<ffffffff802ca0dc>] d_rehash+0x1b/0x44
+30 dcache_lock 72 [<ffffffff802cc17b>] d_alloc+0x19a/0x1eb
+31 dcache_lock 112 [<ffffffff802cbca0>] d_instantiate+0x36/0x8a
This excerpt shows the first two lock class statistics. Line 01 shows the
output version - each time the format changes this will be updated. Line 02-04
-show the header with column descriptions. Lines 05-10 and 13-18 show the actual
+show the header with column descriptions. Lines 05-18 and 20-31 show the actual
statistics. These statistics come in two parts; the actual stats separated by a
-short separator (line 08, 14) from the contention points.
+short separator (line 08, 13) from the contention points.
-The first lock (05-10) is a read/write lock, and shows two lines above the
+The first lock (05-18) is a read/write lock, and shows two lines above the
short separator. The contention points don't match the column descriptors,
-they have two: contentions and [<IP>] symbol.
+they have two: contentions and [<IP>] symbol. The second set of contention
+points are the points we're contending with.
+The integer part of the time values is in us.
View the top contending locks:
diff --git a/Documentation/markers.txt b/Documentation/markers.txt
index 089f6138fcd9..6d275e4ef385 100644
--- a/Documentation/markers.txt
+++ b/Documentation/markers.txt
@@ -70,6 +70,20 @@ a printk warning which identifies the inconsistency:
"Format mismatch for probe probe_name (format), marker (format)"
+Another way to use markers is to simply define the marker without generating any
+function call to actually call into the marker. This is useful in combination
+with tracepoint probes in a scheme like this :
+
+void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk);
+
+DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name,
+ "arg1 %u pid %d");
+
+notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk)
+{
+ struct marker *marker = &GET_MARKER(kernel_irq_entry);
+ /* write data to trace buffers ... */
+}
* Probe / marker example
diff --git a/Documentation/scheduler/sched-arch.txt b/Documentation/scheduler/sched-arch.txt
index 941615a9769b..d43dbcbd163b 100644
--- a/Documentation/scheduler/sched-arch.txt
+++ b/Documentation/scheduler/sched-arch.txt
@@ -8,7 +8,7 @@ Context switch
By default, the switch_to arch function is called with the runqueue
locked. This is usually not a problem unless switch_to may need to
take the runqueue lock. This is usually due to a wake up operation in
-the context switch. See include/asm-ia64/system.h for an example.
+the context switch. See arch/ia64/include/asm/system.h for an example.
To request the scheduler call switch_to with the runqueue unlocked,
you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file
@@ -23,7 +23,7 @@ disabled. Interrupts may be enabled over the call if it is likely to
introduce a significant interrupt latency by adding the line
`#define __ARCH_WANT_INTERRUPTS_ON_CTXSW` in the same place as for
unlocked context switches. This define also implies
-`__ARCH_WANT_UNLOCKED_CTXSW`. See include/asm-arm/system.h for an
+`__ARCH_WANT_UNLOCKED_CTXSW`. See arch/arm/include/asm/system.h for an
example.
diff --git a/Documentation/tracepoints.txt b/Documentation/tracepoints.txt
index 5d354e167494..2d42241a25c3 100644
--- a/Documentation/tracepoints.txt
+++ b/Documentation/tracepoints.txt
@@ -3,28 +3,30 @@
Mathieu Desnoyers
-This document introduces Linux Kernel Tracepoints and their use. It provides
-examples of how to insert tracepoints in the kernel and connect probe functions
-to them and provides some examples of probe functions.
+This document introduces Linux Kernel Tracepoints and their use. It
+provides examples of how to insert tracepoints in the kernel and
+connect probe functions to them and provides some examples of probe
+functions.
* Purpose of tracepoints
-A tracepoint placed in code provides a hook to call a function (probe) that you
-can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
-"off" (no probe is attached). When a tracepoint is "off" it has no effect,
-except for adding a tiny time penalty (checking a condition for a branch) and
-space penalty (adding a few bytes for the function call at the end of the
-instrumented function and adds a data structure in a separate section). When a
-tracepoint is "on", the function you provide is called each time the tracepoint
-is executed, in the execution context of the caller. When the function provided
-ends its execution, it returns to the caller (continuing from the tracepoint
-site).
+A tracepoint placed in code provides a hook to call a function (probe)
+that you can provide at runtime. A tracepoint can be "on" (a probe is
+connected to it) or "off" (no probe is attached). When a tracepoint is
+"off" it has no effect, except for adding a tiny time penalty
+(checking a condition for a branch) and space penalty (adding a few
+bytes for the function call at the end of the instrumented function
+and adds a data structure in a separate section). When a tracepoint
+is "on", the function you provide is called each time the tracepoint
+is executed, in the execution context of the caller. When the function
+provided ends its execution, it returns to the caller (continuing from
+the tracepoint site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters,
-which prototypes are described in a tracepoint declaration placed in a header
-file.
+which prototypes are described in a tracepoint declaration placed in a
+header file.
They can be used for tracing and performance accounting.
@@ -42,7 +44,7 @@ In include/trace/subsys.h :
#include <linux/tracepoint.h>
-DEFINE_TRACE(subsys_eventname,
+DECLARE_TRACE(subsys_eventname,
TPPTOTO(int firstarg, struct task_struct *p),
TPARGS(firstarg, p));
@@ -50,6 +52,8 @@ In subsys/file.c (where the tracing statement must be added) :
#include <trace/subsys.h>
+DEFINE_TRACE(subsys_eventname);
+
void somefct(void)
{
...
@@ -61,31 +65,41 @@ Where :
- subsys_eventname is an identifier unique to your event
- subsys is the name of your subsystem.
- eventname is the name of the event to trace.
-- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
- called by this tracepoint.
-- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.
-Connecting a function (probe) to a tracepoint is done by providing a probe
-(function to call) for the specific tracepoint through
-register_trace_subsys_eventname(). Removing a probe is done through
-unregister_trace_subsys_eventname(); it will remove the probe sure there is no
-caller left using the probe when it returns. Probe removal is preempt-safe
-because preemption is disabled around the probe call. See the "Probe example"
-section below for a sample probe module.
-
-The tracepoint mechanism supports inserting multiple instances of the same
-tracepoint, but a single definition must be made of a given tracepoint name over
-all the kernel to make sure no type conflict will occur. Name mangling of the
-tracepoints is done using the prototypes to make sure typing is correct.
-Verification of probe type correctness is done at the registration site by the
-compiler. Tracepoints can be put in inline functions, inlined static functions,
-and unrolled loops as well as regular functions.
-
-The naming scheme "subsys_event" is suggested here as a convention intended
-to limit collisions. Tracepoint names are global to the kernel: they are
-considered as being the same whether they are in the core kernel image or in
-modules.
+- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the
+ function called by this tracepoint.
+- TPARGS(firstarg, p) are the parameters names, same as found in the
+ prototype.
+
+Connecting a function (probe) to a tracepoint is done by providing a
+probe (function to call) for the specific tracepoint through
+register_trace_subsys_eventname(). Removing a probe is done through
+unregister_trace_subsys_eventname(); it will remove the probe.
+
+tracepoint_synchronize_unregister() must be called before the end of
+the module exit function to make sure there is no caller left using
+the probe. This, and the fact that preemption is disabled around the
+probe call, make sure that probe removal and module unload are safe.
+See the "Probe example" section below for a sample probe module.
+
+The tracepoint mechanism supports inserting multiple instances of the
+same tracepoint, but a single definition must be made of a given
+tracepoint name over all the kernel to make sure no type conflict will
+occur. Name mangling of the tracepoints is done using the prototypes
+to make sure typing is correct. Verification of probe type correctness
+is done at the registration site by the compiler. Tracepoints can be
+put in inline functions, inlined static functions, and unrolled loops
+as well as regular functions.
+
+The naming scheme "subsys_event" is suggested here as a convention
+intended to limit collisions. Tracepoint names are global to the
+kernel: they are considered as being the same whether they are in the
+core kernel image or in modules.
+
+If the tracepoint has to be used in kernel modules, an
+EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
+used to export the defined tracepoints.
* Probe / tracepoint example