Age | Commit message (Collapse) | Author |
|
Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.
To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.
Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.
The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.
Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.
Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.
There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.
Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In order to enable namespaces or any sort of isolation within
user_events the register lock and pages need to be broken up into
groups. Each event and file now has a group pointer which stores the
actual pages to map, lookup data and synchronization objects.
This only enables a single group that maps to init_user_ns, as IMA
namespace has done. This enables user_events to start the work of
supporting namespaces by walking the namespaces up to the init_user_ns.
Future patches will address other user namespaces and will align to the
approaches the IMA namespace uses.
Link: https://lore.kernel.org/linux-kernel/20220915193221.1728029-15-stefanb@linux.ibm.com/#t
Link: https://lkml.kernel.org/r/20221001001016.2832-2-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Masami has been maintaining kprobes for a while now and that code has
been an integral part of tracing. He has also been an excellent reviewer
of all the tracing code and contributor as well.
The tracing subsystem needs another active maintainer to keep it running
smoothly, and I do not know anyone more qualified for the job than Masami.
Ingo has also told me that he has not been active in the tracing code for
some time and said he could be removed from the TRACING portion of the
MAINTAINERS file.
Link: https://lkml.kernel.org/r/20220930124131.7b6432dd@gandalf.local.home
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Reported by Clang [-Wunused-but-set-variable]
'commit c193707dde77 ("tracing: Remove code which merges duplicates")'
This commit removed the code which merges duplicates in detect_dups(),
but forgot to delete the variable 'dups' which used to merge
duplicates in the loop.
Now only 'total_dups' is needed, remove 'dups' for clean code.
Link: https://lkml.kernel.org/r/20220930103236.253985-1-chenzhongjin@huawei.com
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since I'm actively involved in a number of arch bits that intersect
ftrace (e.g. the actual arch implementation on arm64, stacktracing,
entry management, and general instrumentation safety), add myself as a
reviewer of the core ftrace code so that I have the change to catch any
potential problems early.
I spoke with Steven about this at LPC, and it seemed to make sense to
add me as a reviewer.
Link: https://lkml.kernel.org/r/20220928114621.248038-1-mark.rutland@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The ring buffer is broken up into sub buffers (currently of page size).
Each sub buffer has a pointer to its "tail" (the last event written to the
sub buffer). When a new event is requested, the tail is locally
incremented to cover the size of the new event. This is done in a way that
there is no need for locking.
If the tail goes past the end of the sub buffer, the process of moving to
the next sub buffer takes place. After setting the current sub buffer to
the next one, the previous one that had the tail go passed the end of the
sub buffer needs to be reset back to the original tail location (before
the new event was requested) and the rest of the sub buffer needs to be
"padded".
The race happens when a reader takes control of the sub buffer. As readers
do a "swap" of sub buffers from the ring buffer to get exclusive access to
the sub buffer, it replaces the "head" sub buffer with an empty sub buffer
that goes back into the writable portion of the ring buffer. This swap can
happen as soon as the writer moves to the next sub buffer and before it
updates the last sub buffer with padding.
Because the sub buffer can be released to the reader while the writer is
still updating the padding, it is possible for the reader to see the event
that goes past the end of the sub buffer. This can cause obvious issues.
To fix this, add a few memory barriers so that the reader definitely sees
the updates to the sub buffer, and also waits until the writer has put
back the "tail" of the sub buffer back to the last event that was written
on it.
To be paranoid, it will only spin for 1 second, otherwise it will
warn and shutdown the ring buffer code. 1 second should be enough as
the writer does have preemption disabled. If the writer doesn't move
within 1 second (with preemption disabled) something is horribly
wrong. No interrupt should last 1 second!
Link: https://lore.kernel.org/all/20220830120854.7545-1-jiazi.li@transsion.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216369
Link: https://lkml.kernel.org/r/20220929104909.0650a36c@gandalf.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Fixes: c7b0930857e22 ("ring-buffer: prevent adding write in discarded area")
Reported-by: Jiazi.Li <jiazi.li@transsion.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Update the documentation to reflect the new ABI requirements and how to
use the byte index with the mask properly to check event status.
Link: https://lkml.kernel.org/r/20220728233309.1896-7-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
User processes may require many events and when they do the cache
performance of a byte index status check is less ideal than a bit index.
The previous event limit per-page was 4096, the new limit is 32,768.
This change adds a bitwise index to the user_reg struct. Programs check
that the bit at status_bit has a bit set within the status page(s).
Link: https://lkml.kernel.org/r/20220728233309.1896-6-beaub@linux.microsoft.com
Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
User processes could open up enough event references to cause rollovers.
These could cause use after free scenarios, which we do not want.
Switching to refcount APIs prevent this, but will leak memory once
saturated.
Once saturated, user processes can still use the events. This prevents
a bad user process from stopping existing telemetry from being emitted.
Link: https://lkml.kernel.org/r/20220728233309.1896-5-beaub@linux.microsoft.com
Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
User processes can provide bad strings that may cause issues or leak
kernel details back out. Don't trust the content of these strings
when formatting strings for matching.
This also moves to a consistent dynamic length string creation model.
Link: https://lkml.kernel.org/r/20220728233309.1896-4-beaub@linux.microsoft.com
Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
import_single_range expects the direction/rw to be where it came from,
not the protection/limit. Since the import is in a write path use WRITE.
Link: https://lkml.kernel.org/r/20220728233309.1896-3-beaub@linux.microsoft.com
Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Trivial fix to ensure strstr checks use NULL instead of 0.
Link: https://lkml.kernel.org/r/20220728233309.1896-2-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There is a spelling mistake in the trace text. Fix it.
Link: https://lkml.kernel.org/r/20220928215828.66325-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When tracing is disabled, there's no reason that waiters should stay
waiting, wake them up, otherwise tasks get stuck when they should be
flushing the buffers.
Cc: stable@vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If a process is waiting on the ring buffer for data, there currently isn't
a clean way to force it to wake up. Add an ioctl call that will force any
tasks that are waiting on the trace_pipe_raw file to wake up.
Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.
Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
On closing of a file that represents a ring buffer or flushing the file,
there may be waiters on the ring buffer that needs to be woken up and exit
the ring_buffer_wait() function.
Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer
and allow them to exit the wait loop.
Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The wake up waiters only checks the "wakeup_full" variable and not the
"full_waiters_pending". The full_waiters_pending is set when a waiter is
added to the wait queue. The wakeup_full is only set when an event is
triggered, and it clears the full_waiters_pending to avoid multiple calls
to irq_work_queue().
The irq_work callback really needs to check both wakeup_full as well as
full_waiters_pending such that this code can be used to wake up waiters
when a file is closed that represents the ring buffer and the waiters need
to be woken up.
Link: https://lkml.kernel.org/r/20220927231824.209460321@goodmis.org
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The logic to know when the shortest waiters on the ring buffer should be
woken up or not has uses a less than instead of a greater than compare,
which causes the shortest_full to actually be the longest.
Link: https://lkml.kernel.org/r/20220927231823.718039222@goodmis.org
Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If a page is partially read, and then the splice system call is run
against the ring buffer, it will always fail to read, no matter how much
is in the ring buffer. That's because the code path for a partial read of
the page does will fail if the "full" flag is set.
The splice system call wants full pages, so if the read of the ring buffer
is not yet full, it should return zero, and the splice will block. But if
a previous read was done, where the beginning has been consumed, it should
still be given to the splice caller if the rest of the page has been
written to.
This caused the splice command to never consume data in this scenario, and
let the ring buffer just fill up and lose events.
Link: https://lkml.kernel.org/r/20220927144317.46be6b80@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 8789a9e7df6bf ("ring-buffer: read page interface")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Naveen reported recursive locking of direct_mutex with sample
ftrace-direct-modify.ko:
[ 74.762406] WARNING: possible recursive locking detected
[ 74.762887] 6.0.0-rc6+ #33 Not tainted
[ 74.763216] --------------------------------------------
[ 74.763672] event-sample-fn/1084 is trying to acquire lock:
[ 74.764152] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
register_ftrace_function+0x1f/0x180
[ 74.764922]
[ 74.764922] but task is already holding lock:
[ 74.765421] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
modify_ftrace_direct+0x34/0x1f0
[ 74.766142]
[ 74.766142] other info that might help us debug this:
[ 74.766701] Possible unsafe locking scenario:
[ 74.766701]
[ 74.767216] CPU0
[ 74.767437] ----
[ 74.767656] lock(direct_mutex);
[ 74.767952] lock(direct_mutex);
[ 74.768245]
[ 74.768245] *** DEADLOCK ***
[ 74.768245]
[ 74.768750] May be due to missing lock nesting notation
[ 74.768750]
[ 74.769332] 1 lock held by event-sample-fn/1084:
[ 74.769731] #0: ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \
modify_ftrace_direct+0x34/0x1f0
[ 74.770496]
[ 74.770496] stack backtrace:
[ 74.770884] CPU: 4 PID: 1084 Comm: event-sample-fn Not tainted ...
[ 74.771498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
[ 74.772474] Call Trace:
[ 74.772696] <TASK>
[ 74.772896] dump_stack_lvl+0x44/0x5b
[ 74.773223] __lock_acquire.cold.74+0xac/0x2b7
[ 74.773616] lock_acquire+0xd2/0x310
[ 74.773936] ? register_ftrace_function+0x1f/0x180
[ 74.774357] ? lock_is_held_type+0xd8/0x130
[ 74.774744] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.775213] __mutex_lock+0x99/0x1010
[ 74.775536] ? register_ftrace_function+0x1f/0x180
[ 74.775954] ? slab_free_freelist_hook.isra.43+0x115/0x160
[ 74.776424] ? ftrace_set_hash+0x195/0x220
[ 74.776779] ? register_ftrace_function+0x1f/0x180
[ 74.777194] ? kfree+0x3e1/0x440
[ 74.777482] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.777941] ? __schedule+0xb40/0xb40
[ 74.778258] ? register_ftrace_function+0x1f/0x180
[ 74.778672] ? my_tramp1+0xf/0xf [ftrace_direct_modify]
[ 74.779128] register_ftrace_function+0x1f/0x180
[ 74.779527] ? ftrace_set_filter_ip+0x33/0x70
[ 74.779910] ? __schedule+0xb40/0xb40
[ 74.780231] ? my_tramp1+0xf/0xf [ftrace_direct_modify]
[ 74.780678] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.781147] ftrace_modify_direct_caller+0x5b/0x90
[ 74.781563] ? 0xffffffffa0201000
[ 74.781859] ? my_tramp1+0xf/0xf [ftrace_direct_modify]
[ 74.782309] modify_ftrace_direct+0x1b2/0x1f0
[ 74.782690] ? __schedule+0xb40/0xb40
[ 74.783014] ? simple_thread+0x2a/0xb0 [ftrace_direct_modify]
[ 74.783508] ? __schedule+0xb40/0xb40
[ 74.783832] ? my_tramp2+0x11/0x11 [ftrace_direct_modify]
[ 74.784294] simple_thread+0x76/0xb0 [ftrace_direct_modify]
[ 74.784766] kthread+0xf5/0x120
[ 74.785052] ? kthread_complete_and_exit+0x20/0x20
[ 74.785464] ret_from_fork+0x22/0x30
[ 74.785781] </TASK>
Fix this by using register_ftrace_function_nolock in
ftrace_modify_direct_caller.
Link: https://lkml.kernel.org/r/20220927004146.1215303-1-song@kernel.org
Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function")
Reported-and-tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When executing following commands like what document said, but the log
"#### all functions enabled ####" was not shown as expect:
1. Set a 'mod' filter:
$ echo 'write*:mod:ext3' > /sys/kernel/tracing/set_ftrace_filter
2. Invert above filter:
$ echo '!write*:mod:ext3' >> /sys/kernel/tracing/set_ftrace_filter
3. Read the file:
$ cat /sys/kernel/tracing/set_ftrace_filter
By some debugging, I found that flag FTRACE_HASH_FL_MOD was not unset
after inversion like above step 2 and then result of ftrace_hash_empty()
is incorrect.
Link: https://lkml.kernel.org/r/20220926152008.2239274-1-zhengyejian1@huawei.com
Cc: <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The event dir will alloc failed when event name no set, using the
command:
"echo "e:esys/ syscalls/sys_enter_openat file=\$filename:string"
>> dynamic_events"
It seems that dir name="syscalls/sys_enter_openat" is not allowed
in debugfs. So just use the "sys_enter_openat" as the event name.
Link: https://lkml.kernel.org/r/1664028814-45923-1-git-send-email-chentao.kernel@linux.alibaba.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Linyu Yuan <quic_linyyuan@quicinc.com>
Cc: Tao Chen <chentao.kernel@linux.alibaba.com
Cc: stable@vger.kernel.org
Fixes: 95c104c378dc ("tracing: Auto generate event name when creating a group of events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
An unused macro reported by [-Wunused-macros].
This macro is used to access the sp in pt_regs because at that time
x86_32 can only get sp by kernel_stack_pointer(regs).
'3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")'
This commit have unified the pt_regs and from them we can get sp from
pt_regs with regs->sp easily. Nowhere is using this macro anymore.
Refrencing pt_regs directly is more clear. Remove this macro for
code cleaning.
Link: https://lkml.kernel.org/r/20220924072629.104759-1-chenzhongjin@huawei.com
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The trace of "struct task_struct" was no longer used since
commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the
bitmap like events do"), and the functions about flags for
current->trace is useless, so remove them.
Link: https://lkml.kernel.org/r/20220923090012.505990-1-cuigaosheng1@huawei.com
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
It was found that some tracing functions in kernel/trace/trace.c acquire
an arch_spinlock_t with preemption and irqs enabled. An example is the
tracing_saved_cmdlines_size_read() function which intermittently causes
a "BUG: using smp_processor_id() in preemptible" warning when the LTP
read_all_proc test is run.
That can be problematic in case preemption happens after acquiring the
lock. Add the necessary preemption or interrupt disabling code in the
appropriate places before acquiring an arch_spinlock_t.
The convention here is to disable preemption for trace_cmdline_lock and
interupt for max_lock.
Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org
Fixes: a35873a0993b ("tracing: Add conditional snapshot")
Fixes: 939c7a4f04fc ("tracing: Introduce saved_cmdlines_size file")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add missing __init/__exit annotations to module init/exit funcs.
Link: https://lkml.kernel.org/r/20220922103208.162869-1-xiujianfeng@huawei.com
Fixes: 24bce201d798 ("tools/rv: Add dot2k")
Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k")
Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There is a recursive lock on the cpu_hotplug_lock.
In kernel/trace/trace_osnoise.c:<start/stop>_per_cpu_kthreads:
- start_per_cpu_kthreads calls cpus_read_lock() and if
start_kthreads returns a error it will call stop_per_cpu_kthreads.
- stop_per_cpu_kthreads then calls cpus_read_lock() again causing
deadlock.
Fix this by calling cpus_read_unlock() before calling
stop_per_cpu_kthreads. This behavior can also be seen in commit
f46b16520a08 ("trace/hwlat: Implement the per-cpu mode").
This error was noticed during the LTP ftrace-stress-test:
WARNING: possible recursive locking detected
--------------------------------------------
sh/275006 is trying to acquire lock:
ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: stop_per_cpu_kthreads
but task is already holding lock:
ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: start_per_cpu_kthreads
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock);
lock(cpu_hotplug_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by sh/275006:
#0: ffff8881023f0470 (sb_writers#24){.+.+}-{0:0}, at: ksys_write
#1: ffffffffb084f430 (trace_types_lock){+.+.}-{3:3}, at: rb_simple_write
#2: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: start_per_cpu_kthreads
Link: https://lkml.kernel.org/r/20220919144932.3064014-1-npache@redhat.com
Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Nico Pache <npache@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
For now, this selftest module can only work in x86 because of the
kprobe cmd was fixed use of x86 registers.
This patch adapted to register names under arm and riscv, So that
this module can be worked on those platform.
Link: https://lkml.kernel.org/r/20220919125629.238242-3-zouyipeng@huawei.com
Cc: <linux-riscv@lists.infradead.org>
Cc: <mingo@redhat.com>
Cc: <paul.walmsley@sifive.com>
Cc: <palmer@dabbelt.com>
Cc: <aou@eecs.berkeley.edu>
Cc: <zanussi@kernel.org>
Cc: <liaochang1@huawei.com>
Cc: <chris.zjh@huawei.com>
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Correct gen_kretprobe_test clr event para on module exit.
This will make it can't to delete.
Link: https://lkml.kernel.org/r/20220919125629.238242-2-zouyipeng@huawei.com
Cc: <linux-riscv@lists.infradead.org>
Cc: <mingo@redhat.com>
Cc: <paul.walmsley@sifive.com>
Cc: <palmer@dabbelt.com>
Cc: <aou@eecs.berkeley.edu>
Cc: <zanussi@kernel.org>
Cc: <liaochang1@huawei.com>
Cc: <chris.zjh@huawei.com>
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
All uses of arch_kprobe_override_function() have been removed by
commit 540adea3809f ("error-injection: Separate error-injection
from kprobe"), so remove the declaration, too.
Link: https://lkml.kernel.org/r/20220914110437.1436353-3-cuigaosheng1@huawei.com
Cc: <mingo@redhat.com>
Cc: <tglx@linutronix.de>
Cc: <bp@alien8.de>
Cc: <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: <hpa@zytor.com>
Cc: <mhiramat@kernel.org>
Cc: <peterz@infradead.org>
Cc: <ast@kernel.org>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
All uses of modifying_ftrace_code have been removed by
commit 768ae4406a5c ("x86/ftrace: Use text_poke()"),
so remove the declaration, too.
Link: https://lkml.kernel.org/r/20220914110437.1436353-2-cuigaosheng1@huawei.com
Cc: <mingo@redhat.com>
Cc: <tglx@linutronix.de>
Cc: <bp@alien8.de>
Cc: <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: <hpa@zytor.com>
Cc: <mhiramat@kernel.org>
Cc: <peterz@infradead.org>
Cc: <ast@kernel.org>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
tracepoint_module_coming()
The memory allocation of 'tp_mod' does not require mutex_lock()
protection, move it out.
Link: https://lkml.kernel.org/r/20220914061416.1630-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Due to retpolines, indirect calls are much more expensive than direct
calls. The filters have a select set of functions it uses for the
predicates. Instead of using function pointers to call them, create a
filter_pred_fn_call() function that uses a switch statement to call the
predicate functions directly. This gives almost a 10% speedup to the
filter logic.
Using the histogram benchmark:
Before:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount: 1892484
{ delta: 122 } hitcount: 6205004
{ delta: 123 } hitcount: 11583521
{ delta: 124 } hitcount: 37590979
{ delta: 125 } hitcount: 108308504
{ delta: 126 } hitcount: 131672461
{ delta: 127 } hitcount: 88700598
{ delta: 128 } hitcount: 65939870
{ delta: 129 } hitcount: 45055004
{ delta: 130 } hitcount: 33174464
{ delta: 131 } hitcount: 31813493
{ delta: 132 } hitcount: 29011676
{ delta: 133 } hitcount: 22798782
{ delta: 134 } hitcount: 22072486
{ delta: 135 } hitcount: 17034113
{ delta: 136 } hitcount: 8982490
{ delta: 137 } hitcount: 2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount: 1651944
{ delta: 140 } hitcount: 4112073
{ delta: 141 } hitcount: 3963269
{ delta: 142 } hitcount: 1712508
{ delta: 143 } hitcount: 575941
After:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 103 } hitcount: 60
{ delta: 104 } hitcount: 16966
{ delta: 105 } hitcount: 396625
{ delta: 106 } hitcount: 3223400
{ delta: 107 } hitcount: 12053754
{ delta: 108 } hitcount: 20241711
{ delta: 109 } hitcount: 14850200
{ delta: 110 } hitcount: 4946599
{ delta: 111 } hitcount: 3479315
{ delta: 112 } hitcount: 18698299
{ delta: 113 } hitcount: 62388733
{ delta: 114 } hitcount: 95803834
{ delta: 115 } hitcount: 58278130
{ delta: 116 } hitcount: 15364800
{ delta: 117 } hitcount: 5586866
{ delta: 118 } hitcount: 2346880
{ delta: 119 } hitcount: 1131091
{ delta: 120 } hitcount: 620896
{ delta: 121 } hitcount: 236652
{ delta: 122 } hitcount: 105957
{ delta: 123 } hitcount: 119107
{ delta: 124 } hitcount: 54494
{ delta: 125 } hitcount: 63856
{ delta: 126 } hitcount: 64454
{ delta: 127 } hitcount: 34818
{ delta: 128 } hitcount: 41446
{ delta: 129 } hitcount: 51242
{ delta: 130 } hitcount: 28361
{ delta: 131 } hitcount: 23926
The peak before was 126ns per event, after the peak is 114ns, and the
fastest time went from 113ns to 103ns.
Link: https://lkml.kernel.org/r/20220906225529.781407172@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The structure filter_pred and the typedef of the function used are only
referenced by trace_events_filter.c. There's no reason to have it in an
external header file. Move them into the only file they are used in.
Link: https://lkml.kernel.org/r/20220906225529.598047132@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Due to retpolines, indirect calls are much more expensive than direct
calls. The histograms have a select set of functions it uses for the
histograms, instead of using function pointers to call them, create a
hist_fn_call() function that uses a switch statement to call the histogram
functions directly. This gives a 13% speedup to the histogram logic.
Using the histogram benchmark:
Before:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 129 } hitcount: 2213
{ delta: 130 } hitcount: 285965
{ delta: 131 } hitcount: 1146545
{ delta: 132 } hitcount: 5185432
{ delta: 133 } hitcount: 19896215
{ delta: 134 } hitcount: 53118616
{ delta: 135 } hitcount: 83816709
{ delta: 136 } hitcount: 68329562
{ delta: 137 } hitcount: 41859349
{ delta: 138 } hitcount: 46257797
{ delta: 139 } hitcount: 54400831
{ delta: 140 } hitcount: 72875007
{ delta: 141 } hitcount: 76193272
{ delta: 142 } hitcount: 49504263
{ delta: 143 } hitcount: 38821072
{ delta: 144 } hitcount: 47702679
{ delta: 145 } hitcount: 41357297
{ delta: 146 } hitcount: 22058238
{ delta: 147 } hitcount: 9720002
{ delta: 148 } hitcount: 3193542
{ delta: 149 } hitcount: 927030
{ delta: 150 } hitcount: 850772
{ delta: 151 } hitcount: 1477380
{ delta: 152 } hitcount: 2687977
{ delta: 153 } hitcount: 2865985
{ delta: 154 } hitcount: 1977492
{ delta: 155 } hitcount: 2475607
{ delta: 156 } hitcount: 3403612
After:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount: 1892484
{ delta: 122 } hitcount: 6205004
{ delta: 123 } hitcount: 11583521
{ delta: 124 } hitcount: 37590979
{ delta: 125 } hitcount: 108308504
{ delta: 126 } hitcount: 131672461
{ delta: 127 } hitcount: 88700598
{ delta: 128 } hitcount: 65939870
{ delta: 129 } hitcount: 45055004
{ delta: 130 } hitcount: 33174464
{ delta: 131 } hitcount: 31813493
{ delta: 132 } hitcount: 29011676
{ delta: 133 } hitcount: 22798782
{ delta: 134 } hitcount: 22072486
{ delta: 135 } hitcount: 17034113
{ delta: 136 } hitcount: 8982490
{ delta: 137 } hitcount: 2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount: 1651944
{ delta: 140 } hitcount: 4112073
{ delta: 141 } hitcount: 3963269
{ delta: 142 } hitcount: 1712508
{ delta: 143 } hitcount: 575941
{ delta: 144 } hitcount: 351427
{ delta: 145 } hitcount: 218077
{ delta: 146 } hitcount: 167297
{ delta: 147 } hitcount: 146198
{ delta: 148 } hitcount: 116122
{ delta: 149 } hitcount: 58993
{ delta: 150 } hitcount: 40228
The delta above is in nanoseconds. It brings the fastest time down from
129ns to 113ns, and the peak from 141ns to 126ns.
Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In order to testing filtering and histograms via the trace event
benchmark, record the delta time of the last event as a numeric value
(currently, it just saves it within the string) so that filters and
histograms can use it.
Link: https://lkml.kernel.org/r/20220906225529.213677569@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Following Daniel's suggestion, fix similar warning
in template files, which would prevent new monitors
from such warning.
Link: https://lkml.kernel.org/r/20220824034357.2014202-3-zengheng4@huawei.com
Cc: <mingo@redhat.com>
Fixes: 24bce201d798 ("tools/rv: Add dot2k")
Suggested-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The sparse tool complains as follows:
kernel/trace/rv/monitors/wwnr/wwnr.c:18:19:
warning: symbol 'rv_wwnr' was not declared. Should it be static?
The `rv_wwnr` symbol is not dereferenced by other extern files,
so add static qualifier for it.
So does wip module.
Link: https://lkml.kernel.org/r/20220824034357.2014202-2-zengheng4@huawei.com
Cc: <mingo@redhat.com>
Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor")
Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a syntax error test case for eprobe as same as kprobes.
Link: https://lkml.kernel.org/r/165932115471.2850673.8014722990775242727.stgit@devnote2
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the filter option to the event probe. This is useful if user wants
to derive a new event based on the condition of the original event.
E.g.
echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \
runtime=$runtime:u32 if cpu < 4' >> ../dynamic_events
Then it can filter the events only on first 4 cores.
Note that the fields used for 'if' must be the fields in the original
events, not eprobe events.
Link: https://lkml.kernel.org/r/165932114513.2850673.2592206685744598080.stgit@devnote2
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Regression and bug fixes:
- Performance regression fix from 5.18 on a Rasberry Pi
- Fix extent parsing bug which triggers a BUG_ON when a (corrupted)
extent tree has has a non-root node when zero entries.
- Fix a livelock where in the right (wrong) circumstances a large
number of nfsd threads can try to write to a nearly full file
system, and retry for hours(!)"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: limit the number of retries after discarding preallocations blocks
ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0
ext4: use buckets for cr 1 block scan instead of rbtree
ext4: use locality group preallocation for small closed files
ext4: make directory inode spreading reflect flexbg size
ext4: avoid unnecessary spreading of allocations among groups
ext4: make mballoc try target group first even with mb_optimize_scan
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull NVDIMM and DAX fixes from Dan Williams:
"A recently discovered one-line fix for devdax that further addresses a
v5.5 regression, and (a bit embarrassing) a small batch of fixes that
have been sitting in my fixes tree for weeks.
The older fixes have soaked in linux-next during that time and address
an fsdax infinite loop and some other minor fixups.
- Fix a infinite loop bug in fsdax
- Fix memory-type detection for devdax (EINJ regression)
- Small cleanups"
* tag 'dax-and-nvdimm-fixes-v6.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
devdax: Fix soft-reservation memory description
fsdax: Fix infinite loop in dax_iomap_rw()
nvdimm/namespace: drop nested variable in create_namespace_pmem()
ndtest: Cleanup all of blk namespace specific code
pmem: fix a name collision
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C driver bugfixes for mlxbf and imx, a few documentation fixes after
the rework this cycle, and one hardening for the i2c-mux core"
* tag 'i2c-for-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mux: harden i2c_mux_alloc() against integer overflows
i2c: mlxbf: Fix frequency calculation
i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction()
i2c: mlxbf: incorrect base address passed during io write
Documentation: i2c: fix references to other documents
MAINTAINERS: remove Nehal Shah from AMD MP2 I2C DRIVER
i2c: imx: If pm_runtime_get_sync() returned 1 device access is possible
|
|
Pick up another "Soft Reservation" fix for v6.0-final on top of some
straggling nvdimm fixes that missed v5.19.
|
|
The "hmem" platform-devices that are created to represent the
platform-advertised "Soft Reserved" memory ranges end up inserting a
resource that causes the iomem_resource tree to look like this:
340000000-43fffffff : hmem.0
340000000-43fffffff : Soft Reserved
340000000-43fffffff : dax0.0
This is because insert_resource() reparents ranges when they completely
intersect an existing range.
This matters because code that uses region_intersects() to scan for a
given IORES_DESC will only check that top-level 'hmem.0' resource and
not the 'Soft Reserved' descendant.
So, to support EINJ (via einj_error_inject()) to inject errors into
memory hosted by a dax-device, be sure to describe the memory as
IORES_DESC_SOFT_RESERVED. This is a follow-on to:
commit b13a3e5fd40b ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")
...that fixed EINJ support for "Soft Reserved" ranges in the first
instance.
Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration")
Reported-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Tested-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Omar Avelar <omar.avelar@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mark Gross <markgross@kernel.org>
Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix build error for the combination of SYSTEM_TRUSTED_KEYRING=y and
X509_CERTIFICATE_PARSER=m
- Fix DEBUG_INFO_SPLIT to generate debug info for GCC 11+ and Clang 12+
- Revive debug info for assembly files
- Remove unused code
* tag 'kbuild-fixes-v6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Makefile.debug: re-enable debug info for .S files
Makefile.debug: set -g unconditional on CONFIG_DEBUG_INFO_SPLIT
certs: make system keyring depend on built-in x509 parser
Kconfig: remove unused function 'menu_get_root_menu'
scripts/clang-tools: remove unused module
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fix from Vasily Gorbik:
- Fix potential hangs in VFIO AP driver
* tag 's390-6.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: bypass unnecessary processing of AP resources
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an uninitialized variable usage in the operating performance
points code and add missing DT bindings for it.
Specifics:
- Fix uninitialized variable usage in dev_pm_opp_config_clks_simple()
(Christophe JAILLET)
- Add missing OPP DT properties (Rob Herring)"
* tag 'pm-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
dt-bindings: opp: Add missing (unevaluated|additional)Properties on child nodes
OPP: Fix an un-initialized variable usage
|