summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-01-14ring-bufer: kernel-doc warning fixesFabian Frederick
Also fixes a couple of typos Link: http://lkml.kernel.org/r/1401992525-10417-1-git-send-email-fabf@skynet.be Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> [ Found this deep in the abyss of my INBOX ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14tracing/uprobe: Fix double perf_event linking on multiprobe uprobeMasami Hiramatsu
Fix double perf_event linking to trace_uprobe_filter on multiple uprobe event by moving trace_uprobe_filter under trace_probe_event. In uprobe perf event, trace_uprobe_filter data structure is managing target mm filters (in perf_event) related to each uprobe event. Since commit 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe") left the trace_uprobe_filter data structure in trace_uprobe, if a trace_probe_event has multiple trace_uprobe (multi-probe event), a perf_event is added to different trace_uprobe_filter on each trace_uprobe. This leads a linked list corruption. To fix this issue, move trace_uprobe_filter to trace_probe_event and link it once on each event instead of each probe. Link: http://lkml.kernel.org/r/157862073931.1800.3800576241181489174.stgit@devnote2 Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?b?w7hyZ2Vuc2Vu?= <thoiland@redhat.com> Cc: Jean-Tsung Hsiao <jhsiao@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: stable@vger.kernel.org Fixes: 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe") Link: https://lkml.kernel.org/r/20200108171611.GA8472@kernel.org Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14Merge branch 'dhowells' (patches from DavidH)Linus Torvalds
Merge misc fixes from David Howells. Two afs fixes and a key refcounting fix. * dhowells: afs: Fix afs_lookup() to not clobber the version on a new dentry afs: Fix use-after-loss-of-ref keys: Fix request_key() cache
2020-01-14bpf: Fix seq_show for BPF_MAP_TYPE_STRUCT_OPSMartin KaFai Lau
Instead of using bpf_struct_ops_map_lookup_elem() which is not implemented, bpf_struct_ops_map_seq_show_elem() should also use bpf_struct_ops_map_sys_lookup_elem() which does an inplace update to the value. The change allocates a value to pass to bpf_struct_ops_map_sys_lookup_elem(). [root@arch-fb-vm1 bpf]# cat /sys/fs/bpf/dctcp {{{1}},BPF_STRUCT_OPS_STATE_INUSE,{{00000000df93eebc,00000000df93eebc},0,2, ... Fixes: 85d33df357b6 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200114072647.3188298-1-kafai@fb.com
2020-01-14keys: Fix request_key() cacheDavid Howells
When the key cached by request_key() and co. is cleaned up on exit(), the code looks in the wrong task_struct, and so clears the wrong cache. This leads to anomalies in key refcounting when doing, say, a kernel build on an afs volume, that then trigger kasan to report a use-after-free when the key is viewed in /proc/keys. Fix this by making exit_creds() look in the passed-in task_struct rather than in current (the task_struct cleanup code is deferred by RCU and potentially run in another task). Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptionsJason Gunthorpe
The name mmu_notifier_mm implies that the thing is a mm_struct pointer, and is difficult to abbreviate. The struct is actually holding the interval tree and hlist containing the notifiers subscribed to a mm. Use 'subscriptions' as the variable name for this struct instead of the really terrible and misleading 'mmn_mm'. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-14fs/proc: Introduce /proc/pid/timens_offsetsAndrei Vagin
API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com
2020-01-14x86/vdso: Zap vvar pages when switching to a time namespaceDmitry Safonov
The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14time: Allocate per-timens vvar pageDmitry Safonov
VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <luto@kernel.org> Based-on-work-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com
2020-01-14posix-timers: Make clock_nanosleep() time namespace awareAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time, if the TIMER_ABSTIME flag is set. This value is in the tasks time namespace, which has to be converted to the host time namespace. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-18-dima@arista.com
2020-01-14hrtimers: Prepare hrtimer_nanosleep() for time namespacesAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com
2020-01-14alarmtimer: Make nanosleep() time namespace awareAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time when the TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace and has to be converted to the host's time. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-16-dima@arista.com
2020-01-14posix-timers: Make timer_settime() time namespace awareAndrei Vagin
Wire timer_settime() syscall into time namespace virtualization. sys_timer_settime() calls the ktime->timer_set() callback. Right now, common_timer_set() is the only implementation for the callback. The user-supplied expiry value is converted from timespec64 to ktime and then timens_ktime_to_host() can be used to convert namespace's time to the host time. Inside a time namespace kernel's time differs by a fixed offset from a user-supplied time, but only absolute values (TIMER_ABSTIME) must be converted. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-15-dima@arista.com
2020-01-14time: Add do_timens_ktime_to_host() helperAndrei Vagin
The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com
2020-01-14posix-clocks: Wire up clock_gettime() with timens offsetsAndrei Vagin
Adjust monotonic and boottime clocks with per-timens offsets. As the result a process inside time namespace will see timers and clocks corrected to offsets that were set when the namespace was created Note that applications usually go through vDSO to get time, which is not yet adjusted. Further changes will complete time namespace virtualisation with vDSO support. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-12-dima@arista.com
2020-01-14posix-timers: Use clock_get_ktime() in common_timer_get()Andrei Vagin
Now, when the clock_get_ktime() callback exists, the suboptimal timespec64-based conversion can be removed from common_timer_get(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-11-dima@arista.com
2020-01-14posix-clocks: Introduce clock_get_ktime() callbackAndrei Vagin
The callsite in common_timer_get() has already a comment: /* * The timespec64 based conversion is suboptimal, but it's not * worth to implement yet another callback. */ kc->clock_get(timr->it_clock, &ts64); now = timespec64_to_ktime(ts64); The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-10-dima@arista.com
2020-01-14alarmtimer: Provide get_timespec() callbackAndrei Vagin
The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() Wire up alarm bases with get_timespec(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-9-dima@arista.com
2020-01-14alarmtimer: Rename gettime() callback to get_ktime()Andrei Vagin
The upcoming support for time namespaces requires to have access to: - The time in a tasks time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() struct alarm_base needs to follow the same naming convention, so rename .gettime() callback into get_ktime() as a preparation for introducing get_timespec(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-8-dima@arista.com
2020-01-14posix-clocks: Rename .clock_get_timespec() callbacks accordinglyAndrei Vagin
The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format in (struct k_clock). As a preparation ground for introducing clock_get_ktime(), the original callback clock_get() was renamed into clock_get_timespec(). Reflect the renaming into the callback implementations. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-7-dima@arista.com
2020-01-14posix-clocks: Rename the clock_get() callback to clock_get_timespec()Andrei Vagin
The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format, rather than in (struct timespec). Rename the clock_get() callback to clock_get_timespec() as a preparation for introducing clock_get_ktime(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-6-dima@arista.com
2020-01-14time: Add timens_offsets to be used for tasks in time namespaceAndrei Vagin
Introduce offsets for time namespace. They will contain an adjustment needed to convert clocks to/from host's. A new namespace is created with the same offsets as the time namespace of the current process. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-5-dima@arista.com
2020-01-14ns: Introduce Time NamespaceAndrei Vagin
Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-13tracing: trigger: Replace unneeded RCU-list traversalsMasami Hiramatsu
With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings when I ran ftracetest trigger testcases. ----- # dmesg -c > /dev/null # ./ftracetest test.d/trigger ... # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " " kernel/trace/trace_events_hist.c:6070 kernel/trace/trace_events_hist.c:1760 kernel/trace/trace_events_hist.c:5911 kernel/trace/trace_events_trigger.c:504 kernel/trace/trace_events_hist.c:1810 kernel/trace/trace_events_hist.c:3158 kernel/trace/trace_events_hist.c:3105 kernel/trace/trace_events_hist.c:5518 kernel/trace/trace_events_hist.c:5998 kernel/trace/trace_events_hist.c:6019 kernel/trace/trace_events_hist.c:6044 kernel/trace/trace_events_trigger.c:1500 kernel/trace/trace_events_trigger.c:1540 kernel/trace/trace_events_trigger.c:539 kernel/trace/trace_events_trigger.c:584 ----- I investigated those warnings and found that the RCU-list traversals in event trigger and hist didn't need to use RCU version because those were called only under event_mutex. I also checked other RCU-list traversals related to event trigger list, and found that most of them were called from event_hist_trigger_func() or hist_unregister_trigger() or register/unregister functions except for a few cases. Replace these unneeded RCU-list traversals with normal list traversal macro and lockdep_assert_held() to check the event_mutex is held. Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2 Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13pid: Implement pidfd_getfd syscallSargun Dhillon
This syscall allows for the retrieval of file descriptors from other processes, based on their pidfd. This is possible using ptrace, and injection of parasitic code to inject code which leverages SCM_RIGHTS to move file descriptors between a tracee and a tracer. Unfortunately, ptrace comes with a high cost of requiring the process to be stopped, and breaks debuggers. This does not require stopping the process under manipulation. One reason to use this is to allow sandboxers to take actions on file descriptors on the behalf of another process. For example, this can be combined with seccomp-bpf's user notification to do on-demand fd extraction and take privileged actions. One such privileged action is binding a socket to a privileged port. /* prototype */ /* flags is currently reserved and should be set to 0 */ int sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); /* testing */ Ran self-test suite on x86_64 Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-3-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13tracing/boot: Add function tracer filter optionsMasami Hiramatsu
Add below function-tracer filter options to boot-time tracing. - ftrace.[instance.INSTANCE.]ftrace.filters This will take an array of tracing function filter rules - ftrace.[instance.INSTANCE.]ftrace.notraces This will take an array of NON-tracing function filter rules Link: http://lkml.kernel.org/r/157867244841.17873.10933616628243103561.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot: Add cpu_mask option supportMasami Hiramatsu
Add ftrace.cpumask option support to boot-time tracing. This sets cpumask for each instance. - ftrace.[instance.INSTANCE.]cpumask = CPUMASK; Set the trace cpumask. Note that the CPUMASK should be a string which <tracefs>/tracing_cpumask can accepts. Link: http://lkml.kernel.org/r/157867243625.17873.13613922641273149372.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot: Add instance node supportMasami Hiramatsu
Add instance node support to boot-time tracing. User can set some options and event nodes under instance node. - ftrace.instance.INSTANCE[...] Add new INSTANCE instance. Some options and event nodes are acceptable for instance node. Link: http://lkml.kernel.org/r/157867242413.17873.9814204526141500278.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot: Add synthetic event supportMasami Hiramatsu
Add synthetic event node support to boot time tracing. The synthetic event is a kind of event node, but the group name is "synthetic". - ftrace.event.synthetic.EVENT.fields = FIELD[, FIELD2...] Defines new synthetic event with FIELDs. Each field should be "type varname". The synthetic node requires "fields" string arraies, which defines the fields as same as tracing/synth_events interface. Link: http://lkml.kernel.org/r/157867241236.17873.12411615143321557709.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot Add kprobe event supportMasami Hiramatsu
Add kprobe event support on event node to boot-time tracing. If the group name of event is "kprobes", the boot-time tracing defines new probe event according to "probes" values. - ftrace.event.kprobes.EVENT.probes = PROBE[, PROBE2...] Defines new kprobe event based on PROBEs. It is able to define multiple probes on one event, but those must have same type of arguments. For example, ftrace.events.kprobes.myevent { probes = "vfs_read $arg1 $arg2"; enable; } This will add kprobes:myevent on vfs_read with the 1st and the 2nd arguments. Link: http://lkml.kernel.org/r/157867240104.17873.9712052065426433111.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot: Add per-event settingsMasami Hiramatsu
Add per-event settings for boottime tracing. User can set filter, actions and enable on each event on boot. The event entries are under ftrace.event.GROUP.EVENT node (note that the option key includes event's group name and event name.) This supports below configs. - ftrace.event.GROUP.EVENT.enable Enables GROUP:EVENT tracing. - ftrace.event.GROUP.EVENT.filter = FILTER Set FILTER rule to the GROUP:EVENT. - ftrace.event.GROUP.EVENT.actions = ACTION[, ACTION2...] Set ACTIONs to the GROUP:EVENT. For example, ftrace.event.sched.sched_process_exec { filter = "pid < 128" enable } this will enable tracing "sched:sched_process_exec" event with "pid < 128" filter. Link: http://lkml.kernel.org/r/157867238942.17873.11177628789184546198.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot: Add boot-time tracingMasami Hiramatsu
Setup tracing options via extra boot config in addition to kernel command line. This adds following commands support. These are applied to the global trace instance. - ftrace.options = OPT1[,OPT2...] Enable given ftrace options. - ftrace.trace_clock = CLOCK Set given CLOCK to ftrace's trace_clock. - ftrace.buffer_size = SIZE Configure ftrace buffer size to SIZE. You can use "KB" or "MB" for that SIZE. - ftrace.events = EVENT[, EVENT2...] Enable given events on boot. You can use a wild card in EVENT. - ftrace.tracer = TRACER Set TRACER to current tracer on boot. (e.g. function) Note that this is NOT replacing the kernel parameters, because this boot config based setting is later than that. If you want to trace earlier boot events, you still need kernel parameters. Link: http://lkml.kernel.org/r/157867237723.17873.17494943526320587488.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: Add NULL trace-array check in print_synth_event()Masami Hiramatsu
Add NULL trace-array check in print_synth_event(), because if we enable tp_printk option, iter->tr can be NULL. Link: http://lkml.kernel.org/r/157867236536.17873.12529350542460184019.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: Accept different type for synthetic event fieldsMasami Hiramatsu
Make the synthetic event accepts a different type field to record. However, the size and signed flag must be same. Link: http://lkml.kernel.org/r/157867235358.17873.61732996461602171.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: kprobes: Register to dynevent earlier stageMasami Hiramatsu
Register kprobe event to dynevent in subsys_initcall level. This will allow kernel to register new kprobe events in fs_initcall level via trace_run_command. Link: http://lkml.kernel.org/r/157867234213.17873.18039000024374948737.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: kprobes: Output kprobe event to printk bufferMasami Hiramatsu
Since kprobe-events use event_trigger_unlock_commit_regs() directly, that events doesn't show up in printk buffer if "tp_printk" is set. Use trace_event_buffer_commit() in kprobe events so that it can invoke output_printk() as same as other trace events. Link: http://lkml.kernel.org/r/157867233085.17873.5210928676787339604.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> [ Adjusted data var declaration placement in __kretprobe_trace_func() ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: Apply soft-disabled and filter to tracepoints printkMasami Hiramatsu
Apply soft-disabled and the filter rule of the trace events to the printk output of tracepoints (a.k.a. tp_printk kernel parameter) as same as trace buffer output. Link: http://lkml.kernel.org/r/157867231876.17873.15825819592284704068.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: Make struct ring_buffer less ambiguousSteven Rostedt (VMware)
As there's two struct ring_buffers in the kernel, it causes some confusion. The other one being the perf ring buffer. It was agreed upon that as neither of the ring buffers are generic enough to be used globally, they should be renamed as: perf's ring_buffer -> perf_buffer ftrace's ring_buffer -> trace_buffer This implements the changes to the ring buffer that ftrace uses. Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing: Rename trace_buffer to array_bufferSteven Rostedt (VMware)
As we are working to remove the generic "ring_buffer" name that is used by both tracing and perf, the ring_buffer name for tracing will be renamed to trace_buffer, and perf's ring buffer will be renamed to perf_buffer. As there already exists a trace_buffer that is used by the trace_arrays, it needs to be first renamed to array_buffer. Link: https://lore.kernel.org/r/20191213153553.GE20583@krava Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13perf: Make struct ring_buffer less ambiguousSteven Rostedt (VMware)
eBPF requires needing to know the size of the perf ring buffer structure. But it unfortunately has the same name as the generic ring buffer used by tracing and oprofile. To make it less ambiguous, rename the perf ring buffer structure to "perf_buffer". As other parts of the ring buffer code has "perf_" as the prefix, it only makes sense to give the ring buffer the "perf_" prefix as well. Link: https://lore.kernel.org/r/20191213153553.GE20583@krava Acked-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-11Merge tag 'clone3-tls-v5.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread fixes from Christian Brauner: "This contains a series of patches to fix CLONE_SETTLS when used with clone3(). The clone3() syscall passes the tls argument through struct clone_args instead of a register. This means, all architectures that do not implement copy_thread_tls() but still support CLONE_SETTLS via copy_thread() expecting the tls to be located in a register argument based on clone() are currently unfortunately broken. Their tls value will be garbage. The patch series fixes this on all architectures that currently define __ARCH_WANT_SYS_CLONE3. It also adds a compile-time check to ensure that any architecture that enables clone3() in the future is forced to also implement copy_thread_tls(). My ultimate goal is to get rid of the copy_thread()/copy_thread_tls() split and just have copy_thread_tls() at some point in the not too distant future (Maybe even renaming copy_thread_tls() back to simply copy_thread() once the old function is ripped from all arches). This is dependent now on all arches supporting clone3(). While all relevant arches do that now there are still four missing: ia64, m68k, sh and sparc. They have the system call reserved, but not implemented. Once they all implement clone3() we can get rid of ARCH_WANT_SYS_CLONE3 and HAVE_COPY_THREAD_TLS. This series also includes a minor fix for the arm64 uapi headers which caused __NR_clone3 to be missing from the exported user headers. Unfortunately the series came in a little late especially given that it touches a range of architectures. Due to the holidays not all arch maintainers responded in time probably due to their backlog. Will and Arnd have thankfully acked the arm specific changes. Given that the changes are straightforward and rather minimal combined with the fact the that clone3() with CLONE_SETTLS is broken I decided to send them post rc3 nonetheless" * tag 'clone3-tls-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: um: Implement copy_thread_tls clone3: ensure copy_thread_tls is implemented xtensa: Implement copy_thread_tls riscv: Implement copy_thread_tls parisc: Implement copy_thread_tls arm: Implement copy_thread_tls arm64: Implement copy_thread_tls arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers
2020-01-10Merge branch 'timers/urgent' into timers/coreThomas Gleixner
Pick up upstream VDSO fix before adding more VDSO changes.
2020-01-10bpf: Introduce function-by-function verificationAlexei Starovoitov
New llvm and old llvm with libbpf help produce BTF that distinguish global and static functions. Unlike arguments of static function the arguments of global functions cannot be removed or optimized away by llvm. The compiler has to use exactly the arguments specified in a function prototype. The argument type information allows the verifier validate each global function independently. For now only supported argument types are pointer to context and scalars. In the future pointers to structures, sizes, pointer to packet data can be supported as well. Consider the following example: static int f1(int ...) { ... } int f3(int b); int f2(int a) { f1(a) + f3(a); } int f3(int b) { ... } int main(...) { f1(...) + f2(...) + f3(...); } The verifier will start its safety checks from the first global function f2(). It will recursively descend into f1() because it's static. Then it will check that arguments match for the f3() invocation inside f2(). It will not descend into f3(). It will finish f2() that has to be successfully verified for all possible values of 'a'. Then it will proceed with f3(). That function also has to be safe for all possible values of 'b'. Then it will start subprog 0 (which is main() function). It will recursively descend into f1() and will skip full check of f2() and f3(), since they are global. The order of processing global functions doesn't affect safety, since all global functions must be proven safe based on their arguments only. Such function by function verification can drastically improve speed of the verification and reduce complexity. Note that the stack limit of 512 still applies to the call chain regardless whether functions were static or global. The nested level of 8 also still applies. The same recursion prevention checks are in place as well. The type information and static/global kind is preserved after the verification hence in the above example global function f2() and f3() can be replaced later by equivalent functions with the same types that are loaded and verified later without affecting safety of this main() program. Such replacement (re-linking) of global functions is a subject of future patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
2020-01-10PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"Colin Ian King
There is a spelling mistake in a pr_info message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-09kunit: allow kunit tests to be loaded as a moduleAlan Maguire
As tests are added to kunit, it will become less feasible to execute all built tests together. By supporting modular tests we provide a simple way to do selective execution on a running system; specifying CONFIG_KUNIT=y CONFIG_KUNIT_EXAMPLE_TEST=m ...means we can simply "insmod example-test.ko" to run the tests. To achieve this we need to do the following: o export the required symbols in kunit o string-stream tests utilize non-exported symbols so for now we skip building them when CONFIG_KUNIT_TEST=m. o drivers/base/power/qos-test.c contains a few unexported interface references, namely freq_qos_read_value() and freq_constraints_init(). Both of these could be potentially defined as static inline functions in include/linux/pm_qos.h, but for now we simply avoid supporting module build for that test suite. o support a new way of declaring test suites. Because a module cannot do multiple late_initcall()s, we provide a kunit_test_suites() macro to declare multiple suites within the same module at once. o some test module names would have been too general ("test-test" and "example-test" for kunit tests, "inode-test" for ext4 tests); rename these as appropriate ("kunit-test", "kunit-example-test" and "ext4-inode-test" respectively). Also define kunit_test_suite() via kunit_test_suites() as callers in other trees may need the old definition. Co-developed-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4 bits Acked-by: David Gow <davidgow@google.com> # For list-test Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The ungrafting from PRIO bug fixes in net, when merged into net-next, merge cleanly but create a build failure. The resolution used here is from Petr Machata. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Missing netns pointer init in arp_tables, from Florian Westphal. 2) Fix normal tcp SACK being treated as D-SACK, from Pengcheng Yang. 3) Fix divide by zero in sch_cake, from Wen Yang. 4) Len passed to skb_put_padto() is wrong in qrtr code, from Carl Huang. 5) cmd->obj.chunk is leaked in sctp code error paths, from Xin Long. 6) cgroup bpf programs can be released out of order, fix from Roman Gushchin. 7) Make sure stmmac debugfs entry name is changed when device name changes, from Jiping Ma. 8) Fix memory leak in vlan_dev_set_egress_priority(), from Eric Dumazet. 9) SKB leak in lan78xx usb driver, also from Eric Dumazet. 10) Ridiculous TCA_FQ_QUANTUM values configured can cause loops in fq packet scheduler, reject them. From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) tipc: fix wrong connect() return code tipc: fix link overflow issue at socket shutdown netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present netfilter: conntrack: dccp, sctp: handle null timeout argument atm: eni: fix uninitialized variable warning macvlan: do not assume mac_header is set in macvlan_broadcast() net: sch_prio: When ungrafting, replace with FIFO mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO MAINTAINERS: Remove myself as co-maintainer for qcom-ethqos gtp: fix bad unlock balance in gtp_encap_enable_socket pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM tipc: remove meaningless assignment in Makefile tipc: do not add socket.o to tipc-y twice net: stmmac: dwmac-sun8i: Allow all RGMII modes net: stmmac: dwmac-sunxi: Allow all RGMII modes net: usb: lan78xx: fix possible skb leak net: stmmac: Fixed link does not need MDIO Bus vlan: vlan_changelink() should propagate errors vlan: fix memory leak in vlan_dev_set_egress_priority stmmac: debugfs entry name is not be changed when udev rename device name. ...
2020-01-09time/sched_clock: Disable interrupts in sched_clock_register()Paul Cercueil
Instead of issueing a warning if sched_clock_register() is called from a context where IRQs are enabled, the code now ensures that IRQs are indeed disabled. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200107010630.954648-1-paul@crapouillou.net
2020-01-09time/posix-stubs: Provide compat itimer supoprt for alphaArnd Bergmann
Using compat_sys_getitimer and compat_sys_setitimer on alpha causes a link failure in the Alpha tinyconfig and other configurations that turn off CONFIG_POSIX_TIMERS. Use the same #ifdef check for the stub version as well. Fixes: 4c22ea2b9120 ("y2038: use compat_{get,set}_itimer on alpha") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20191207191043.656328-1-arnd@arndb.de
2020-01-09genirq: Add missing __must_hold() sparse annotationJules Irenge
Add __must_hold() annotation to address the following sparse warning: warning: context imbalance in irq_wait_for_poll - unexpected unlock Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191216144208.29852-2-jbi.octave@gmail.com