summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2019-10-21tracing: Fix race in perf_trace_buf initializationPrateek Sood
A race condition exists while initialiazing perf_trace_buf from perf_trace_init() and perf_kprobe_init(). CPU0 CPU1 perf_trace_init() mutex_lock(&event_mutex) perf_trace_event_init() perf_trace_event_reg() total_ref_count == 0 buf = alloc_percpu() perf_trace_buf[i] = buf tp_event->class->reg() //fails perf_kprobe_init() goto fail perf_trace_event_init() perf_trace_event_reg() fail: total_ref_count == 0 total_ref_count == 0 buf = alloc_percpu() perf_trace_buf[i] = buf tp_event->class->reg() total_ref_count++ free_percpu(perf_trace_buf[i]) perf_trace_buf[i] = NULL Any subsequent call to perf_trace_event_reg() will observe total_ref_count > 0, causing the perf_trace_buf to be always NULL. This can result in perf_trace_buf getting accessed from perf_trace_buf_alloc() without being initialized. Acquiring event_mutex in perf_kprobe_init() before calling perf_trace_event_init() should fix this race. The race caused the following bug: Unable to handle kernel paging request at virtual address 0000003106f2003c Mem abort info: ESR = 0x96000045 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000045 CM = 0, WnR = 1 user pgtable: 4k pages, 39-bit VAs, pgdp = ffffffc034b9b000 [0000003106f2003c] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000045 [#1] PREEMPT SMP Process syz-executor (pid: 18393, stack limit = 0xffffffc093190000) pstate: 80400005 (Nzcv daif +PAN -UAO) pc : __memset+0x20/0x1ac lr : memset+0x3c/0x50 sp : ffffffc09319fc50 __memset+0x20/0x1ac perf_trace_buf_alloc+0x140/0x1a0 perf_trace_sys_enter+0x158/0x310 syscall_trace_enter+0x348/0x7c0 el0_svc_common+0x11c/0x368 el0_svc_handler+0x12c/0x198 el0_svc+0x8/0xc Ramdumps showed the following: total_ref_count = 3 perf_trace_buf = ( 0x0 -> NULL, 0x0 -> NULL, 0x0 -> NULL, 0x0 -> NULL) Link: http://lkml.kernel.org/r/1571120245-4186-1-git-send-email-prsood@codeaurora.org Cc: stable@vger.kernel.org Fixes: e12f03d7031a9 ("perf/core: Implement the 'perf_kprobe' PMU") Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-18tracing: Fix "gfp_t" format for synthetic eventsZhengjun Xing
In the format of synthetic events, the "gfp_t" is shown as "signed:1", but in fact the "gfp_t" is "unsigned", should be shown as "signed:0". The issue can be reproduced by the following commands: echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events cat /sys/kernel/debug/tracing/events/synthetic/memlatency/format name: memlatency ID: 2233 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:u64 lat; offset:8; size:8; signed:0; field:unsigned int order; offset:16; size:4; signed:0; field:gfp_t gfp_flags; offset:24; size:4; signed:1; field:int migratetype; offset:32; size:4; signed:1; print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Initialize iter->seq after zeroing in tracing_read_pipe()Petr Mladek
A customer reported the following softlockup: [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464] [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4 [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] Kernel panic - not syncing: softlockup: hung tasks [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12 [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000 [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00 [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000 [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8 [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000 [899688.160002] tracing_read_pipe+0x336/0x3c0 [899688.160002] __vfs_read+0x26/0x140 [899688.160002] vfs_read+0x87/0x130 [899688.160002] SyS_read+0x42/0x90 [899688.160002] do_syscall_64+0x74/0x160 It caught the process in the middle of trace_access_unlock(). There is no loop. So, it must be looping in the caller tracing_read_pipe() via the "waitagain" label. Crashdump analyze uncovered that iter->seq was completely zeroed at this point, including iter->seq.seq.size. It means that print_trace_line() was never able to print anything and there was no forward progress. The culprit seems to be in the code: /* reset all but tr, trace, and overruns */ memset(&iter->seq, 0, sizeof(struct trace_iterator) - offsetof(struct trace_iterator, seq)); It was added by the commit 53d0aa773053ab182877 ("ftrace: add logic to record overruns"). It was v2.6.27-rc1. It was the time when iter->seq looked like: struct trace_seq { unsigned char buffer[PAGE_SIZE]; unsigned int len; }; There was no "size" variable and zeroing was perfectly fine. The solution is to reinitialize the structure after or without zeroing. Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing/hwlat: Don't ignore outer-loop duration when calculating max_latencySrivatsa S. Bhat (VMware)
max_latency is intended to record the maximum ever observed hardware latency, which may occur in either part of the loop (inner/outer). So we need to also consider the outer-loop sample when updating max_latency. Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing/hwlat: Report total time spent in all NMIs during the sampleSrivatsa S. Bhat (VMware)
nmi_total_ts is supposed to record the total time spent in *all* NMIs that occur on the given CPU during the (active portion of the) sampling window. However, the code seems to be overwriting this variable for each NMI, thereby only recording the time spent in the most recent NMI. Fix it by accumulating the duration instead. Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Add locked_down checks to the open calls of files created for tracefsSteven Rostedt (VMware)
Added various checks on open tracefs calls to see if tracefs is in lockdown mode, and if so, to return -EPERM. Note, the event format files (which are basically standard on all machines) as well as the enabled_functions file (which shows what is currently being traced) are not lockde down. Perhaps they should be, but it seems counter intuitive to lockdown information to help you know if the system has been modified. Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Add tracing_check_open_get_tr()Steven Rostedt (VMware)
Currently, most files in the tracefs directory test if tracing_disabled is set. If so, it should return -ENODEV. The tracing_disabled is called when tracing is found to be broken. Originally it was done in case the ring buffer was found to be corrupted, and we wanted to prevent reading it from crashing the kernel. But it's also called if a tracing selftest fails on boot. It's a one way switch. That is, once it is triggered, tracing is disabled until reboot. As most tracefs files can also be used by instances in the tracefs directory, they need to be carefully done. Each instance has a trace_array associated to it, and when the instance is removed, the trace_array is freed. But if an instance is opened with a reference to the trace_array, then it requires looking up the trace_array to get its ref counter (as there could be a race with it being deleted and the open itself). Once it is found, a reference is added to prevent the instance from being removed (and the trace_array associated with it freed). Combine the two checks (tracing_disabled and trace_array_get()) into a single helper function. This will also make it easier to add lockdown to tracefs later. Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Have trace events system open call tracing_open_generic_tr()Steven Rostedt (VMware)
Instead of having the trace events system open call open code the taking of the trace_array descriptor (with trace_array_get()) and then calling trace_open_generic(), have it use the tracing_open_generic_tr() that does the combination of the two. This requires making tracing_open_generic_tr() global. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Get trace_array reference for available_tracers filesSteven Rostedt (VMware)
As instances may have different tracers available, we need to look at the trace_array descriptor that shows the list of the available tracers for the instance. But there's a race between opening the file and an admin deleting the instance. The trace_array_get() needs to be called before accessing the trace_array. Cc: stable@vger.kernel.org Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12ftrace: Get a reference counter for the trace_array on filter filesSteven Rostedt (VMware)
The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for an instance now. They need to take a reference to the instance otherwise there could be a race between accessing the files and deleting the instance. It wasn't until the :mod: caching where these file operations started referencing the trace_array directly. Cc: stable@vger.kernel.org Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-30Merge tag 'trace-v5.4-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few more tracing fixes: - Fix a buffer overflow by checking nr_args correctly in probes - Fix a warning that is reported by clang - Fix a possible memory leak in error path of filter processing - Fix the selftest that checks for failures, but wasn't failing - Minor clean up on call site output of a memory trace event" * tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests/ftrace: Fix same probe error test mm, tracing: Print symbol name for call_site in trace events tracing: Have error path in predicate_parse() free its allocated memory tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro tracing/probe: Fix to check the difference of nr_args before adding probe
2019-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Sanity check URB networking device parameters to avoid divide by zero, from Oliver Neukum. 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6 don't work properly. Longer term this needs a better fix tho. From Vijay Khemka. 3) Small fixes to selftests (use ping when ping6 is not present, etc.) from David Ahern. 4) Bring back rt_uses_gateway member of struct rtable, it's semantics were not well understood and trying to remove it broke things. From David Ahern. 5) Move usbnet snaity checking, ignore endpoints with invalid wMaxPacketSize. From Bjørn Mork. 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan. 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel, Alex Vesker, and Yevgeny Kliteynik 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron. 9) Fix crash when removing sch_cbs entry while offloading is enabled, from Vinicius Costa Gomes. 10) Signedness bug fixes, generally in looking at the result given by of_get_phy_mode() and friends. From Dan Crapenter. 11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet. 12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern. 13) Fix quantization code in tcp_bbr, from Kevin Yang. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits) net: tap: clean up an indentation issue nfp: abm: fix memory leak in nfp_abm_u32_knode_replace tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state sk_buff: drop all skb extensions on free and skb scrubbing tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions Documentation: Clarify trap's description mlxsw: spectrum: Clear VLAN filters during port initialization net: ena: clean up indentation issue NFC: st95hf: clean up indentation issue net: phy: micrel: add Asym Pause workaround for KSZ9021 net: socionext: ave: Avoid using netdev_err() before calling register_netdev() ptp: correctly disable flags on old ioctls lib: dimlib: fix help text typos net: dsa: microchip: Always set regmap stride to 1 nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled net: sched: sch_sfb: don't call qdisc_put() while holding tree lock ...
2019-09-28tracing: Have error path in predicate_parse() free its allocated memoryNavid Emamdoost
In predicate_parse, there is an error path that is not going to out_free instead it returns directly which leads to a memory leak. Link: http://lkml.kernel.org/r/20190920225800.3870-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macroNathan Chancellor
After r372664 in clang, the IF_ASSIGN macro causes a couple hundred warnings along the lines of: kernel/trace/trace_output.c:1331:2: warning: converting the enum constant to a boolean [-Wint-in-bool-context] kernel/trace/trace.h:409:3: note: expanded from macro 'trace_assign_type' IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry, ^ kernel/trace/trace.h:371:14: note: expanded from macro 'IF_ASSIGN' WARN_ON(id && (entry)->type != id); \ ^ 264 warnings generated. This warning can catch issues with constructs like: if (state == A || B) where the developer really meant: if (state == A || state == B) This is currently the only occurrence of the warning in the kernel tree across defconfig, allyesconfig, allmodconfig for arm32, arm64, and x86_64. Add the implicit '!= 0' to the WARN_ON statement to fix the warnings and find potential issues in the future. Link: https://github.com/llvm/llvm-project/commit/28b38c277a2941e9e891b2db30652cfd962f070b Link: https://github.com/ClangBuiltLinux/linux/issues/686 Link: http://lkml.kernel.org/r/20190926162258.466321-1-natechancellor@gmail.com Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28tracing/probe: Fix to check the difference of nr_args before adding probeMasami Hiramatsu
Steven reported that a test triggered: ================================================================== BUG: KASAN: slab-out-of-bounds in trace_kprobe_create+0xa9e/0xe40 Read of size 8 at addr ffff8880c4f25a48 by task ftracetest/4798 CPU: 2 PID: 4798 Comm: ftracetest Not tainted 5.3.0-rc6-test+ #30 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: dump_stack+0x7c/0xc0 ? trace_kprobe_create+0xa9e/0xe40 print_address_description+0x6c/0x332 ? trace_kprobe_create+0xa9e/0xe40 ? trace_kprobe_create+0xa9e/0xe40 __kasan_report.cold.6+0x1a/0x3b ? trace_kprobe_create+0xa9e/0xe40 kasan_report+0xe/0x12 trace_kprobe_create+0xa9e/0xe40 ? print_kprobe_event+0x280/0x280 ? match_held_lock+0x1b/0x240 ? find_held_lock+0xac/0xd0 ? fs_reclaim_release.part.112+0x5/0x20 ? lock_downgrade+0x350/0x350 ? kasan_unpoison_shadow+0x30/0x40 ? __kasan_kmalloc.constprop.6+0xc1/0xd0 ? trace_kprobe_create+0xe40/0xe40 ? trace_kprobe_create+0xe40/0xe40 create_or_delete_trace_kprobe+0x2e/0x60 trace_run_command+0xc3/0xe0 ? trace_panic_handler+0x20/0x20 ? kasan_unpoison_shadow+0x30/0x40 trace_parse_run_command+0xdc/0x163 vfs_write+0xe1/0x240 ksys_write+0xba/0x150 ? __ia32_sys_read+0x50/0x50 ? tracer_hardirqs_on+0x61/0x180 ? trace_hardirqs_off_caller+0x43/0x110 ? mark_held_locks+0x29/0xa0 ? do_syscall_64+0x14/0x260 do_syscall_64+0x68/0x260 Fix to check the difference of nr_args before adding probe on existing probes. This also may set the error log index bigger than the number of command parameters. In that case it sets the error position is next to the last parameter. Link: http://lkml.kernel.org/r/156966474783.3478.13217501608215769150.stgit@devnote2 Fixes: ca89bc071d5e ("tracing/kprobe: Add multi-probe per event support") Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28Merge branch 'next-lockdown' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull kernel lockdown mode from James Morris: "This is the latest iteration of the kernel lockdown patchset, from Matthew Garrett, David Howells and others. From the original description: This patchset introduces an optional kernel lockdown feature, intended to strengthen the boundary between UID 0 and the kernel. When enabled, various pieces of kernel functionality are restricted. Applications that rely on low-level access to either hardware or the kernel may cease working as a result - therefore this should not be enabled without appropriate evaluation beforehand. The majority of mainstream distributions have been carrying variants of this patchset for many years now, so there's value in providing a doesn't meet every distribution requirement, but gets us much closer to not requiring external patches. There are two major changes since this was last proposed for mainline: - Separating lockdown from EFI secure boot. Background discussion is covered here: https://lwn.net/Articles/751061/ - Implementation as an LSM, with a default stackable lockdown LSM module. This allows the lockdown feature to be policy-driven, rather than encoding an implicit policy within the mechanism. The new locked_down LSM hook is provided to allow LSMs to make a policy decision around whether kernel functionality that would allow tampering with or examining the runtime state of the kernel should be permitted. The included lockdown LSM provides an implementation with a simple policy intended for general purpose use. This policy provides a coarse level of granularity, controllable via the kernel command line: lockdown={integrity|confidentiality} Enable the kernel lockdown feature. If set to integrity, kernel features that allow userland to modify the running kernel are disabled. If set to confidentiality, kernel features that allow userland to extract confidential information from the kernel are also disabled. This may also be controlled via /sys/kernel/security/lockdown and overriden by kernel configuration. New or existing LSMs may implement finer-grained controls of the lockdown features. Refer to the lockdown_reason documentation in include/linux/security.h for details. The lockdown feature has had signficant design feedback and review across many subsystems. This code has been in linux-next for some weeks, with a few fixes applied along the way. Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode") is missing a Signed-off-by from its author. Matthew responded that he is providing this under category (c) of the DCO" * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits) kexec: Fix file verification on S390 security: constify some arrays in lockdown LSM lockdown: Print current->comm in restriction messages efi: Restrict efivar_ssdt_load when the kernel is locked down tracefs: Restrict tracefs when the kernel is locked down debugfs: Restrict debugfs when the kernel is locked down kexec: Allow kexec_file() with appropriate IMA policy when locked down lockdown: Lock down perf when in confidentiality mode bpf: Restrict bpf when kernel lockdown is in confidentiality mode lockdown: Lock down tracing and perf kprobes when in confidentiality mode lockdown: Lock down /proc/kcore x86/mmiotrace: Lock down the testmmiotrace module lockdown: Lock down module params that specify hardware parameters (eg. ioport) lockdown: Lock down TIOCSSERIAL lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down acpi: Disable ACPI table override if the kernel is locked down acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down ACPI: Limit access to custom_method when the kernel is locked down x86/msr: Restrict MSR access when the kernel is locked down x86: Lock down IO port access when the kernel is locked down ...
2019-09-27bpf: Fix bpf_event_output re-entry issueAllan Zhang
BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it can be called from atomic and non-atomic contexts since we don't have bpf_prog_active to prevent it happen. This patch enables 3 levels of nesting to support normal, irq and nmi context. We can easily reproduce the issue by running netperf crr mode with 100 flows and 10 threads from netperf client side. Here is the whole stack dump: [ 515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220 [ 515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G W 4.15.0-smp-fixpanic #44 [ 515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018 [ 515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220 [ 515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246 [ 515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000 [ 515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80 [ 515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012 [ 515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002 [ 515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0 [ 515.228910] FS: 00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000 [ 515.228910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0 [ 515.228911] Call Trace: [ 515.228913] <IRQ> [ 515.228919] [<ffffffff82c6c6cb>] bpf_sockopt_event_output+0x3b/0x50 [ 515.228923] [<ffffffff8265daee>] ? bpf_ktime_get_ns+0xe/0x10 [ 515.228927] [<ffffffff8266fda5>] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.228930] [<ffffffff82cf90a5>] ? tcp_init_transfer+0x125/0x150 [ 515.228933] [<ffffffff82cf9159>] ? tcp_finish_connect+0x89/0x110 [ 515.228936] [<ffffffff82cf98e4>] ? tcp_rcv_state_process+0x704/0x1010 [ 515.228939] [<ffffffff82c6e263>] ? sk_filter_trim_cap+0x53/0x2a0 [ 515.228942] [<ffffffff82d90d1f>] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0 [ 515.228945] [<ffffffff82d92160>] ? tcp_v6_do_rcv+0x1c0/0x460 [ 515.228947] [<ffffffff82d93558>] ? tcp_v6_rcv+0x9f8/0xb30 [ 515.228951] [<ffffffff82d737c0>] ? ip6_route_input+0x190/0x220 [ 515.228955] [<ffffffff82d5f7ad>] ? ip6_protocol_deliver_rcu+0x6d/0x450 [ 515.228958] [<ffffffff82d60246>] ? ip6_rcv_finish+0xb6/0x170 [ 515.228961] [<ffffffff82d5fb90>] ? ip6_protocol_deliver_rcu+0x450/0x450 [ 515.228963] [<ffffffff82d60361>] ? ipv6_rcv+0x61/0xe0 [ 515.228966] [<ffffffff82d60190>] ? ipv6_list_rcv+0x330/0x330 [ 515.228969] [<ffffffff82c4976b>] ? __netif_receive_skb_one_core+0x5b/0xa0 [ 515.228972] [<ffffffff82c497d1>] ? __netif_receive_skb+0x21/0x70 [ 515.228975] [<ffffffff82c4a8d2>] ? process_backlog+0xb2/0x150 [ 515.228978] [<ffffffff82c4aadf>] ? net_rx_action+0x16f/0x410 [ 515.228982] [<ffffffff830000dd>] ? __do_softirq+0xdd/0x305 [ 515.228986] [<ffffffff8252cfdc>] ? irq_exit+0x9c/0xb0 [ 515.228989] [<ffffffff82e02de5>] ? smp_call_function_single_interrupt+0x65/0x120 [ 515.228991] [<ffffffff82e020e1>] ? call_function_single_interrupt+0x81/0x90 [ 515.228992] </IRQ> [ 515.228996] [<ffffffff82a11ff0>] ? io_serial_in+0x20/0x20 [ 515.229000] [<ffffffff8259c040>] ? console_unlock+0x230/0x490 [ 515.229003] [<ffffffff8259cbaa>] ? vprintk_emit+0x26a/0x2a0 [ 515.229006] [<ffffffff8259cbff>] ? vprintk_default+0x1f/0x30 [ 515.229008] [<ffffffff8259d9f5>] ? vprintk_func+0x35/0x70 [ 515.229011] [<ffffffff8259d4bb>] ? printk+0x50/0x66 [ 515.229013] [<ffffffff82637637>] ? bpf_event_output+0xb7/0x220 [ 515.229016] [<ffffffff82c6c6cb>] ? bpf_sockopt_event_output+0x3b/0x50 [ 515.229019] [<ffffffff8265daee>] ? bpf_ktime_get_ns+0xe/0x10 [ 515.229023] [<ffffffff82c29e87>] ? release_sock+0x97/0xb0 [ 515.229026] [<ffffffff82ce9d6a>] ? tcp_recvmsg+0x31a/0xda0 [ 515.229029] [<ffffffff8266fda5>] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.229032] [<ffffffff82ce77c1>] ? tcp_set_state+0x191/0x1b0 [ 515.229035] [<ffffffff82ced10e>] ? tcp_disconnect+0x2e/0x600 [ 515.229038] [<ffffffff82cecbbb>] ? tcp_close+0x3eb/0x460 [ 515.229040] [<ffffffff82d21082>] ? inet_release+0x42/0x70 [ 515.229043] [<ffffffff82d58809>] ? inet6_release+0x39/0x50 [ 515.229046] [<ffffffff82c1f32d>] ? __sock_release+0x4d/0xd0 [ 515.229049] [<ffffffff82c1f3e5>] ? sock_close+0x15/0x20 [ 515.229052] [<ffffffff8273b517>] ? __fput+0xe7/0x1f0 [ 515.229055] [<ffffffff8273b66e>] ? ____fput+0xe/0x10 [ 515.229058] [<ffffffff82547bf2>] ? task_work_run+0x82/0xb0 [ 515.229061] [<ffffffff824086df>] ? exit_to_usermode_loop+0x7e/0x11f [ 515.229064] [<ffffffff82408171>] ? do_syscall_64+0x111/0x130 [ 515.229067] [<ffffffff82e0007c>] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops") Signed-off-by: Allan Zhang <allanzhang@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20190925234312.94063-2-allanzhang@google.com
2019-09-26Merge tag 'trace-v5.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Srikar Dronamraju fixed a bug in the newmulti probe code" * tag 'trace-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/probe: Fix same probe event argument matching
2019-09-25tracing/probe: Fix same probe event argument matchingSrikar Dronamraju
Commit fe60b0ce8e73 ("tracing/probe: Reject exactly same probe event") tries to reject a event which matches an already existing probe. However it currently continues to match arguments and rejects adding a probe even when the arguments don't match. Fix this by only rejecting a probe if and only if all the arguments match. Link: http://lkml.kernel.org/r/20190924114906.14038-1-srikar@linux.vnet.ibm.com Fixes: fe60b0ce8e73 ("tracing/probe: Reject exactly same probe event") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-20Merge tag 'powerpc-5.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "This is a bit late, partly due to me travelling, and partly due to a power outage knocking out some of my test systems *while* I was travelling. - Initial support for running on a system with an Ultravisor, which is software that runs below the hypervisor and protects guests against some attacks by the hypervisor. - Support for building the kernel to run as a "Secure Virtual Machine", ie. as a guest capable of running on a system with an Ultravisor. - Some changes to our DMA code on bare metal, to allow devices with medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of DMA space. - Support for firmware assisted crash dumps on bare metal (powernv). - Two series fixing bugs in and refactoring our PCI EEH code. - A large series refactoring our exception entry code to use gas macros, both to make it more readable and also enable some future optimisations. As well as many cleanups and other minor features & fixups. Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand, Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar, Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm, Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu, Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom Lendacky, Vasant Hegde" * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits) powerpc/mm/mce: Keep irqs disabled during lockless page table walk powerpc: Use ftrace_graph_ret_addr() when unwinding powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR ftrace: Look up the address of return_to_handler() using helpers powerpc: dump kernel log before carrying out fadump or kdump docs: powerpc: Add missing documentation reference powerpc/xmon: Fix output of XIVE IPI powerpc/xmon: Improve output of XIVE interrupts powerpc/mm/radix: remove useless kernel messages powerpc/fadump: support holes in kernel boot memory area powerpc/fadump: remove RMA_START and RMA_END macros powerpc/fadump: update documentation about option to release opalcore powerpc/fadump: consider f/w load area powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel powerpc/fadump: improve how crashed kernel's memory is reserved powerpc/fadump: consider reserved ranges while releasing memory powerpc/fadump: make crash memory ranges array allocation generic ...
2019-09-20Merge tag 'trace-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Addition of multiprobes to kprobe and uprobe events (allows for more than one probe attached to the same location) - Addition of adding immediates to probe parameters - Clean up of the recordmcount.c code. This brings us closer to merging recordmcount into objtool, and reuse code. - Other small clean ups * tag 'trace-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) selftests/ftrace: Update kprobe event error testcase tracing/probe: Reject exactly same probe event tracing/probe: Fix to allow user to enable events on unloaded modules selftests/ftrace: Select an existing function in kprobe_eventname test tracing/kprobe: Fix NULL pointer access in trace_porbe_unlink() tracing: Make sure variable reference alias has correct var_ref_idx tracing: Be more clever when dumping hex in __print_hex() ftrace: Simplify ftrace hash lookup code in clear_func_from_hash() tracing: Add "gfp_t" support in synthetic_events tracing: Rename tracing_reset() to tracing_reset_cpu() tracing: Document the stack trace algorithm in the comments tracing/arm64: Have max stack tracer handle the case of return address after data recordmcount: Clarify what cleanup() does recordmcount: Remove redundant cleanup() calls recordmcount: Kernel style formatting recordmcount: Kernel style function signature formatting recordmcount: Rewrite error/success handling selftests/ftrace: Add syntax error test for multiprobe selftests/ftrace: Add syntax error test for immediates selftests/ftrace: Add a testcase for kprobe multiprobe event ...
2019-09-19tracing/probe: Reject exactly same probe eventMasami Hiramatsu
Reject exactly same probe events as existing probes. Multiprobe allows user to define multiple probes on same event. If user appends a probe which exactly same definition (same probe address and same arguments) on existing event, the event will record same probe information twice. That can be confusing users, so reject it. Link: http://lkml.kernel.org/r/156879694602.31056.5533024778165036763.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-19tracing/probe: Fix to allow user to enable events on unloaded modulesMasami Hiramatsu
Fix to allow user to enable probe events on unloaded modules. This operations was allowed before commit 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe"), because if users need to probe module init functions, they have to enable those probe events before loading module. Link: http://lkml.kernel.org/r/156879693733.31056.9331322616994665167.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski. 2) Use bio_vec in the networking instead of custom skb_frag_t, from Matthew Wilcox. 3) Make use of xmit_more in r8169 driver, from Heiner Kallweit. 4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen. 5) Support all variants of 5750X bnxt_en chips, from Michael Chan. 6) More RTNL avoidance work in the core and mlx5 driver, from Vlad Buslov. 7) Add TCP syn cookies bpf helper, from Petar Penkov. 8) Add 'nettest' to selftests and use it, from David Ahern. 9) Add extack support to drop_monitor, add packet alert mode and support for HW drops, from Ido Schimmel. 10) Add VLAN offload to stmmac, from Jose Abreu. 11) Lots of devm_platform_ioremap_resource() conversions, from YueHaibing. 12) Add IONIC driver, from Shannon Nelson. 13) Several kTLS cleanups, from Jakub Kicinski. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits) mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer mlxsw: spectrum: Register CPU port with devlink mlxsw: spectrum_buffers: Prevent changing CPU port's configuration net: ena: fix incorrect update of intr_delay_resolution net: ena: fix retrieval of nonadaptive interrupt moderation intervals net: ena: fix update of interrupt moderation register net: ena: remove all old adaptive rx interrupt moderation code from ena_com net: ena: remove ena_restore_ethtool_params() and relevant fields net: ena: remove old adaptive interrupt moderation code from ena_netdev net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*() net: ena: enable the interrupt_moderation in driver_supported_features net: ena: reimplement set/get_coalesce() net: ena: switch to dim algorithm for rx adaptive interrupt moderation net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable ethtool: implement Energy Detect Powerdown support via phy-tunable xen-netfront: do not assume sk_buff_head list is empty in error handling s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb” net: ena: don't wake up tx queue when down drop_monitor: Better sanitize notified packets ...
2019-09-18ftrace: Look up the address of return_to_handler() using helpersNaveen N. Rao
This ensures that we use the right address on architectures that use function descriptors. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8f6f14d192a994008ac370ce14036bbe67224c7d.1567707399.git.naveen.n.rao@linux.vnet.ibm.com
2019-09-17tracing/kprobe: Fix NULL pointer access in trace_porbe_unlink()Masami Hiramatsu
Fix NULL pointer access in trace_probe_unlink() by initializing trace_probe.list correctly in trace_probe_init(). In the error case of trace_probe_init(), it can call trace_probe_unlink() before initializing trace_probe.list member. This causes NULL pointer dereference at list_del_init() in trace_probe_unlink(). Syzbot reported : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8633 Comm: syz-executor797 Not tainted 5.3.0-rc8-next-20190915 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__list_del_entry_valid+0x85/0xf5 lib/list_debug.c:51 Code: 0f 84 e1 00 00 00 48 b8 22 01 00 00 00 00 ad de 49 39 c4 0f 84 e2 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 75 53 49 8b 14 24 4c 39 f2 0f 85 99 00 00 00 49 8d 7d RSP: 0018:ffff888090a7f9d8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88809b6f90c0 RCX: ffffffff817c0ca9 RDX: 0000000000000000 RSI: ffffffff817c0a73 RDI: ffff88809b6f90c8 RBP: ffff888090a7f9f0 R08: ffff88809a04e600 R09: ffffed1015d26aed R10: ffffed1015d26aec R11: ffff8880ae935763 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88809b6f90c0 R15: ffff88809b6f90d0 FS: 0000555556f99880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cc090 CR3: 00000000962b2000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __list_del_entry include/linux/list.h:131 [inline] list_del_init include/linux/list.h:190 [inline] trace_probe_unlink+0x1f/0x200 kernel/trace/trace_probe.c:959 trace_probe_cleanup+0xd3/0x110 kernel/trace/trace_probe.c:973 trace_probe_init+0x3f2/0x510 kernel/trace/trace_probe.c:1011 alloc_trace_uprobe+0x5e/0x250 kernel/trace/trace_uprobe.c:353 create_local_trace_uprobe+0x109/0x4a0 kernel/trace/trace_uprobe.c:1508 perf_uprobe_init+0x131/0x210 kernel/trace/trace_event_perf.c:314 perf_uprobe_event_init+0x106/0x1a0 kernel/events/core.c:8898 perf_try_init_event+0x135/0x590 kernel/events/core.c:10184 perf_init_event kernel/events/core.c:10228 [inline] perf_event_alloc.part.0+0x1b89/0x33d0 kernel/events/core.c:10505 perf_event_alloc kernel/events/core.c:10887 [inline] __do_sys_perf_event_open+0xa2d/0x2d00 kernel/events/core.c:10989 __se_sys_perf_event_open kernel/events/core.c:10871 [inline] __x64_sys_perf_event_open+0xbe/0x150 kernel/events/core.c:10871 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: http://lkml.kernel.org/r/156869709721.22406.5153754822203046939.stgit@devnote2 Reported-by: syzbot+2f807f4d3a2a4e87f18f@syzkaller.appspotmail.com Fixes: ca89bc071d5e ("tracing/kprobe: Add multi-probe per event support") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-17tracing: Make sure variable reference alias has correct var_ref_idxTom Zanussi
Original changelog from Steve Rostedt (except last sentence which explains the problem, and the Fixes: tag): I performed a three way histogram with the following commands: echo 'irq_lat u64 lat pid_t pid' > synthetic_events echo 'wake_lat u64 lat u64 irqlat pid_t pid' >> synthetic_events echo 'hist:keys=common_pid:irqts=common_timestamp.usecs if function == 0xffffffff81200580' > events/timer/hrtimer_start/trigger echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$irqts:onmatch(timer.hrtimer_start).irq_lat($lat,pid) if common_flags & 1' > events/sched/sched_waking/trigger echo 'hist:keys=pid:wakets=common_timestamp.usecs,irqlat=lat' > events/synthetic/irq_lat/trigger echo 'hist:keys=next_pid:lat=common_timestamp.usecs-$wakets,irqlat=$irqlat:onmatch(synthetic.irq_lat).wake_lat($lat,$irqlat,next_pid)' > events/sched/sched_switch/trigger echo 1 > events/synthetic/wake_lat/enable Basically I wanted to see: hrtimer_start (calling function tick_sched_timer) Note: # grep tick_sched_timer /proc/kallsyms ffffffff81200580 t tick_sched_timer And save the time of that, and then record sched_waking if it is called in interrupt context and with the same pid as the hrtimer_start, it will record the latency between that and the waking event. I then look at when the task that is woken is scheduled in, and record the latency between the wakeup and the task running. At the end, the wake_lat synthetic event will show the wakeup to scheduled latency, as well as the irq latency in from hritmer_start to the wakeup. The problem is that I found this: <idle>-0 [007] d... 190.485261: wake_lat: lat=27 irqlat=190485230 pid=698 <idle>-0 [005] d... 190.485283: wake_lat: lat=40 irqlat=190485239 pid=10 <idle>-0 [002] d... 190.488327: wake_lat: lat=56 irqlat=190488266 pid=335 <idle>-0 [005] d... 190.489330: wake_lat: lat=64 irqlat=190489262 pid=10 <idle>-0 [003] d... 190.490312: wake_lat: lat=43 irqlat=190490265 pid=77 <idle>-0 [005] d... 190.493322: wake_lat: lat=54 irqlat=190493262 pid=10 <idle>-0 [005] d... 190.497305: wake_lat: lat=35 irqlat=190497267 pid=10 <idle>-0 [005] d... 190.501319: wake_lat: lat=50 irqlat=190501264 pid=10 The irqlat seemed quite large! Investigating this further, if I had enabled the irq_lat synthetic event, I noticed this: <idle>-0 [002] d.s. 249.429308: irq_lat: lat=164968 pid=335 <idle>-0 [002] d... 249.429369: wake_lat: lat=55 irqlat=249429308 pid=335 Notice that the timestamp of the irq_lat "249.429308" is awfully similar to the reported irqlat variable. In fact, all instances were like this. It appeared that: irqlat=$irqlat Wasn't assigning the old $irqlat to the new irqlat variable, but instead was assigning the $irqts to it. The issue is that assigning the old $irqlat to the new irqlat variable creates a variable reference alias, but the alias creation code forgets to make sure the alias uses the same var_ref_idx to access the reference. Link: http://lkml.kernel.org/r/1567375321.5282.12.camel@kernel.org Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: 7e8b88a30b085 ("tracing: Add hist trigger support for variable reference aliases") Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-17tracing: Be more clever when dumping hex in __print_hex()Andy Shevchenko
Hex dump as many as 16 bytes at once in trace_print_hex_seq() instead of byte-by-byte approach. Link: http://lkml.kernel.org/r/20190806151543.86061-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-17ftrace: Simplify ftrace hash lookup code in clear_func_from_hash()Changbin Du
Function ftrace_lookup_ip() will check empty hash table. So we don't need extra check outside. Link: http://lkml.kernel.org/r/20190910143336.13472-1-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-09-16Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "This cycle's RCU changes were: - A few more RCU flavor consolidation cleanups. - Updates to RCU's list-traversal macros improving lockdep usability. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - Miscellaneous fixes. - Torture-test updates. - minor LKMM updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits) MAINTAINERS: Update from paulmck@linux.ibm.com to paulmck@kernel.org rcu: Don't include <linux/ktime.h> in rcutiny.h rcu: Allow rcu_do_batch() to dynamically adjust batch sizes rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload rcu/nocb: Reduce __call_rcu_nocb_wake() leaf rcu_node ->lock contention rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended rcu/nocb: Add bypass callback queueing rcu/nocb: Atomic ->len field in rcu_segcblist structure rcu/nocb: Unconditionally advance and wake for excessive CBs rcu/nocb: Reduce ->nocb_lock contention with separate ->nocb_gp_lock rcu/nocb: Reduce contention at no-CBs invocation-done time rcu/nocb: Reduce contention at no-CBs registry-time CB advancement rcu/nocb: Round down for number of no-CBs grace-period kthreads rcu/nocb: Avoid ->nocb_lock capture by corresponding CPU rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread rcu/nocb: Make __call_rcu_nocb_wake() safe for many callbacks ...
2019-09-16Merge branch 'sched/rt' into sched/core, to pick up -rt changesIngo Molnar
Pick up the first couple of patches working towards PREEMPT_RT. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05tracing: Add "gfp_t" support in synthetic_eventsZhengjun Xing
Add "gfp_t" support in synthetic_events, then the "gfp_t" type parameter in some functions can be traced. Prints the gfp flags as hex in addition to the human-readable flag string. Example output: whoopsie-630 [000] ...1 78.969452: testevent: bar=b20 (GFP_ATOMIC|__GFP_ZERO) rcuc/0-11 [000] ...1 81.097555: testevent: bar=a20 (GFP_ATOMIC) rcuc/0-11 [000] ...1 81.583123: testevent: bar=a20 (GFP_ATOMIC) Link: http://lkml.kernel.org/r/20190712015308.9908-1-zhengjun.xing@linux.intel.com Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> [ Added printing of flag names ] Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing: Rename tracing_reset() to tracing_reset_cpu()Steven Rostedt (VMware)
The name tracing_reset() was a misnomer, as it really only reset a single CPU buffer. Rename it to tracing_reset_cpu() and also make it static and remove the prototype from trace.h, as it is only used in a single function. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing: Document the stack trace algorithm in the commentsSteven Rostedt (VMware)
As the max stack tracer algorithm is not that easy to understand from the code, add comments that explain the algorithm and mentions how ARCH_FTRACE_SHIFT_STACK_TRACER affects it. Link: http://lkml.kernel.org/r/20190806123455.487ac02b@gandalf.local.home Suggested-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/arm64: Have max stack tracer handle the case of return address after ↵Steven Rostedt (VMware)
data Most archs (well at least x86) store the function call return address on the stack before storing the local variables for the function. The max stack tracer depends on this in its algorithm to display the stack size of each function it finds in the back trace. Some archs (arm64), may store the return address (from its link register) just before calling a nested function. There's no reason to save the link register on leaf functions, as it wont be updated. This breaks the algorithm of the max stack tracer. Add a new define ARCH_FTRACE_SHIFT_STACK_TRACER that an architecture may set if it stores the return address (link register) after it stores the function's local variables, and have the stack trace shift the values of the mapped stack size to the appropriate functions. Link: 20190802094103.163576-1-jiping.ma2@windriver.com Reported-by: Jiping Ma <jiping.ma2@windriver.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/probe: Add immediate string parameter supportMasami Hiramatsu
Add immediate string parameter (\"string") support to probe events. This allows you to specify an immediate (or dummy) parameter instead of fetching a string from memory. This feature looks odd, but imagine that you put a probe on a code to trace some string data. If the code is compiled into 2 instructions and 1 instruction has a string on memory but other has no string since it is optimized out. In that case, you can not fold those into one event, even if ftrace supported multiple probes on one event. With this feature, you can set a dummy string like foo=\"(optimized)":string instead of something like foo=+0(+0(%bp)):string. Link: http://lkml.kernel.org/r/156095691687.28024.13372712423865047991.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/probe: Add immediate parameter supportMasami Hiramatsu
Add immediate value parameter (\1234) support to probe events. This allows you to specify an immediate (or dummy) parameter instead of fetching from memory or register. This feature looks odd, but imagine when you put a probe on a code to trace some data. If the code is compiled into 2 instructions and 1 instruction has a value but other has nothing since it is optimized out. In that case, you can not fold those into one event, even if ftrace supported multiple probes on one event. With this feature, you can set a dummy value like foo=\deadbeef instead of something like foo=%di. Link: http://lkml.kernel.org/r/156095690733.28024.13258186548822649469.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/uprobe: Add per-probe delete from eventMasami Hiramatsu
Add per-probe delete method from one event passing the head of definition. In other words, the events which match the head N parameters are deleted. Link: http://lkml.kernel.org/r/156095689811.28024.221706761151739433.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/kprobe: Add per-probe delete from eventMasami Hiramatsu
Allow user to delete a probe from event. This is done by head match. For example, if we have 2 probes on an event $ cat kprobe_events p:kprobes/testprobe _do_fork r1=%ax r2=%dx p:kprobes/testprobe idle_fork r1=%ax r2=%cx Then you can remove one of them by passing the head of definition which identify the probe. $ echo "-:kprobes/testprobe idle_fork" >> kprobe_events Link: http://lkml.kernel.org/r/156095688848.28024.15798690082378432435.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/uprobe: Add multi-probe per uprobe event supportMasami Hiramatsu
Allow user to define several probes on one uprobe event. Note that this only support appending method. So deleting event will delete all probes on the event. Link: http://lkml.kernel.org/r/156095687876.28024.13840331032234992863.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/kprobe: Add multi-probe per event supportMasami Hiramatsu
Add multi-probe per one event support to kprobe events. User can define several different probes on one trace event if those events have same "event signature", e.g. # echo p:testevent _do_fork > kprobe_events # echo p:testevent fork_idle >> kprobe_events # kprobe_events p:kprobes/testevent _do_fork p:kprobes/testevent fork_idle The event signature is defined by kprobe type (retprobe or not), the number of args, argument names, and argument types. Note that this only support appending method. Delete event operation will delete all probes on the event. Link: http://lkml.kernel.org/r/156095686913.28024.9357292202316540742.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/dynevent: Pass extra arguments to match operationMasami Hiramatsu
Pass extra arguments to match operation for checking exact match. If the event doesn't support exact match, it will be ignored. Link: http://lkml.kernel.org/r/156095685930.28024.10405547027475590975.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/dynevent: Delete all matched eventsMasami Hiramatsu
When user gives an event name to delete, delete all matched events instead of the first one. This means if there are several events which have same name but different group (subsystem) name, those are removed if user passed only the event name, e.g. # cat kprobe_events p:group1/testevent _do_fork p:group2/testevent fork_idle # echo -:testevent >> kprobe_events # cat kprobe_events # Link: http://lkml.kernel.org/r/156095684958.28024.16597826267117453638.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/probe: Split trace_event related data from trace_probeMasami Hiramatsu
Split the trace_event related data from trace_probe data structure and introduce trace_probe_event data structure for its folder. This trace_probe_event data structure can have multiple trace_probe. Link: http://lkml.kernel.org/r/156095683995.28024.7552150340561557873.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing: Correct kdoc formatsJakub Kicinski
Fix the following kdoc warnings: kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single' kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer' kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk' Link: http://lkml.kernel.org/r/20190828052549.2472-2-jakub.kicinski@netronome.com Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/probe: Fix null pointer dereferenceXinpeng Liu
BUG: KASAN: null-ptr-deref in trace_probe_cleanup+0x8d/0xd0 Read of size 8 at addr 0000000000000000 by task syz-executor.0/9746 trace_probe_cleanup+0x8d/0xd0 free_trace_kprobe.part.14+0x15/0x50 alloc_trace_kprobe+0x23e/0x250 Link: http://lkml.kernel.org/r/1565220563-980-1-git-send-email-danielliu861@gmail.com Fixes: e3dc9f898ef9c ("tracing/probe: Add trace_event_call accesses APIs") Signed-off-by: Xinpeng Liu <danielliu861@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing: Make exported ftrace_set_clr_event non-staticDenis Efremov
The function ftrace_set_clr_event is declared static and marked EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the function was decided to be a part of API, this commit removes the static attribute and adds the declaration to the header. Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances") Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-30ftrace: Check for successful allocation of hashNaveen N. Rao
In register_ftrace_function_probe(), we are not checking the return value of alloc_and_copy_ftrace_hash(). The subsequent call to ftrace_match_records() may end up dereferencing the same. Add a check to ensure this doesn't happen. Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>