summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - remove of our own implementation of architecture-specific relocation code and leveraging existing code in the module loader to perform arch-dependent work, from Jessica Yu. The relevant patches have been acked by Rusty (for module.c) and Heiko (for s390). - live patching support for ppc64le, which is a joint work of Michael Ellerman and Torsten Duwe. This is coming from topic branch that is share between livepatching.git and ppc tree. - addition of livepatching documentation from Petr Mladek * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: make object/func-walking helpers more robust livepatch: Add some basic livepatch documentation powerpc/livepatch: Add live patching support on ppc64le powerpc/livepatch: Add livepatch stack to struct thread_info powerpc/livepatch: Add livepatch header livepatch: Allow architectures to specify an alternate ftrace location ftrace: Make ftrace_location_range() global livepatch: robustify klp_register_patch() API error checking Documentation: livepatch: outline Elf format and requirements for patch modules livepatch: reuse module loader code to write relocations module: s390: keep mod_arch_specific for livepatch modules module: preserve Elf information for livepatch modules Elf: add livepatch-specific Elf constants
2016-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ...
2016-05-17Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block layer updates from Jens Axboe: "This is the core block IO changes for this merge window. Nothing earth shattering in here, it's mostly just fixes. In detail: - Fix for a long standing issue where wrong ordering in blk-mq caused order_to_size() to spew a warning. From Bart. - Async discard support from Christoph. Basically just splitting our sync interface into a submit + wait part. - Add a cleaner interface for flagging whether a device has a write back cache or not. We've previously overloaded blk_queue_flush() with this, but let's make it more explicit. Drivers cleaned up and updated in the drivers pull request. From me. - Fix for a double check for whether IO accounting is enabled or not. From Michael Callahan. - Fix for the async discard from Mike Snitzer, reinstating the early EOPNOTSUPP return if the device doesn't support discards. - Also from Mike, export bio_inc_remaining() so dm can drop it's private copy of it. - From Ming Lin, add support for passing in an offset for request payloads. - Tag function export from Sagi, which will be used in NVMe in the drivers pull. - Two blktrace related fixes from Shaohua. - Propagate NOMERGE flag when making a request from a bio, also from Shaohua. - An optimization to not parse cgroup paths in blk-throttle, if we don't need to. From Shaohua" * 'for-4.7/core' of git://git.kernel.dk/linux-block: blk-mq: fix undefined behaviour in order_to_size() blk-throttle: don't parse cgroup path if trace isn't enabled blktrace: add missed mask name blktrace: delete garbage for message trace block: make bio_inc_remaining() interface accessible again block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard block: Minor blk_account_io_start usage cleanup block: add __blkdev_issue_discard block: remove struct bio_batch block: copy NOMERGE flag from bio to request block: add ability to flag write back caching on a device blk-mq: Export tagset iter function block: add offset in blk_add_request_payload() writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-17Merge tag 'trace-fixes-v4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing ring-buffer fixes from Steven Rostedt: "Hao Qin reported an integer overflow possibility with signed and unsigned numbers in the ring-buffer code. https://bugzilla.kernel.org/show_bug.cgi?id=118001 At first I did not think this was too much of an issue, because the overflow would be caught later when either too much data was allocated or it would trigger RB_WARN_ON() which shuts down the ring buffer. But looking closer into it, I found that the right settings could bypass the checks and crash the kernel. Luckily, this is only accessible by root. The first fix is to convert all the variables into long, such that we don't get into issues between 32 bit variables being assigned 64 bit ones. This fixes the RB_WARN_ON() triggering. The next fix is to get rid of a duplicate DIV_ROUND_UP() that when called twice with the right value, can cause a kernel crash. The first DIV_ROUND_UP() is to normalize the input and it is checked against the minimum allowable value. But then DIV_ROUND_UP() is called again, which can overflow due to the (a + b - 1)/b, logic. The first called upped the value, the second can overflow (with the +b part). The second call to DIV_ROUND_UP() came in via a second change a while ago and the code is cleaned up to remove it" * tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Prevent overflow of size in ring_buffer_resize() ring-buffer: Use long for nr_pages to avoid overflow failures
2016-05-16Merge tag 'pm-4.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of changes go into the cpufreq subsystem this time. To me, quite obviously, the biggest ticket item is the new "schedutil" governor. Interestingly enough, it's the first new cpufreq governor since the beginning of the git era (except for some out-of-the-tree ones). There are two main differences between it and the existing governors. First, it uses the information provided by the scheduler directly for making its decisions, so it doesn't have to track anything by itself. Second, it can invoke drivers (supporting that feature) to adjust CPU performance right away without having to spawn work items to be executed in process context or similar. Currently, the acpi-cpufreq driver is the only one supporting that mode of operation, but then it is used on a large number of systems. The "schedutil" governor as included here is very simple and mostly regarded as a foundation for future work on the integration of the scheduler with CPU power management (in fact, there is work in progress on top of it already). Nevertheless it works and the preliminary results obtained with it are encouraging. There also is some consolidation of CPU frequency management for ARM platforms that can add their machine IDs the the new stub dt-platdev driver now and that will take care of creating the requisite platform device for cpufreq-dt, so it is not necessary to do that in platform code any more. Several ARM platforms are switched over to using this generic mechanism. In addition to that, the intel_pstate driver is now going to respect CPU frequency limits set by the platform firmware (or a BMC) and provided via the ACPI _PPC object. The devfreq subsystem is getting a new "passive" governor for SoCs subsystems that will depend on somebody else to manage their voltage rails and its support for Samsung Exynos SoCs is consolidated. The rest is support for new hardware (Intel Broxton support in intel_idle for one example), bug fixes, optimizations and cleanups in a number of places. Specifics: - New cpufreq "schedutil" governor (making decisions based on CPU utilization information provided by the scheduler and capable of switching CPU frequencies right away if the underlying driver supports that) and support for fast frequency switching in the acpi-cpufreq driver (Rafael Wysocki) - Consolidation of CPU frequency management on ARM platforms allowing them to get rid of some platform-specific boilerplate code if they are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao, Marc Gonzalez) - Support for ACPI _PPC and CPU frequency limits in the intel_pstate driver (Srinivas Pandruvada) - Fixes and cleanups in the cpufreq core and generic governor code (Rafael Wysocki, Sai Gurrappadi) - intel_pstate driver optimizations and cleanups (Rafael Wysocki, Philippe Longepe, Chen Yu, Joe Perches) - cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri Bhat) - cpufreq qoriq driver fixes and cleanups (Jia Hongtao) - ACPI cpufreq driver cleanups (Viresh Kumar) - Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang, Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla) - Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann) - Fixes and cleanups in the OPP (Operating Performance Points) framework, mostly related to OPP sharing, and reorganization of OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla) - New "passive" governor for devfreq (for SoC subsystems that will rely on someone else for the management of their power resources) and consolidation of devfreq support for Exynos platforms, coding style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham) - PM core fixes and cleanups, mostly to make it work better with the generic power domains (genpd) framework, and updates for that framework (Ulf Hansson, Thierry Reding, Colin Ian King) - Intel Broxton support for the intel_idle driver (Len Brown) - cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach) - ARM cpuidle cleanups (Jisheng Zhang) - Intel Kabylake support for the RAPL power capping driver (Jacob Pan) - AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko Stuebner) - Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King, Mattia Dongili, Thomas Renninger)" * tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (112 commits) intel_pstate: Clean up get_target_pstate_use_performance() intel_pstate: Use sample.core_avg_perf in get_avg_pstate() intel_pstate: Clarify average performance computation intel_pstate: Avoid unnecessary synchronize_sched() during initialization cpufreq: schedutil: Make default depend on CONFIG_SMP cpufreq: powernv: del_timer_sync when global and local pstate are equal cpufreq: powernv: Move smp_call_function_any() out of irq safe block intel_pstate: Clean up intel_pstate_get() cpufreq: schedutil: Make it depend on CONFIG_SMP cpufreq: governor: Fix handling of special cases in dbs_update() PM / OPP: Move CONFIG_OF dependent code in a separate file cpufreq: intel_pstate: Ignore _PPC processing under HWP cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table cpufreq: tango: Use generic platdev driver PM / OPP: pass cpumask by reference cpufreq: Fix GOV_LIMITS handling for the userspace governor cpupower: fix potential memory leak PM / devfreq: style/typo fixes PM / devfreq: exynos: Add the detailed correlation for Exynos5422 bus ..
2016-05-16Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: (63 commits) intel_pstate: Clean up get_target_pstate_use_performance() intel_pstate: Use sample.core_avg_perf in get_avg_pstate() intel_pstate: Clarify average performance computation intel_pstate: Avoid unnecessary synchronize_sched() during initialization cpufreq: schedutil: Make default depend on CONFIG_SMP cpufreq: powernv: del_timer_sync when global and local pstate are equal cpufreq: powernv: Move smp_call_function_any() out of irq safe block intel_pstate: Clean up intel_pstate_get() cpufreq: schedutil: Make it depend on CONFIG_SMP cpufreq: governor: Fix handling of special cases in dbs_update() cpufreq: intel_pstate: Ignore _PPC processing under HWP cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table cpufreq: tango: Use generic platdev driver cpufreq: Fix GOV_LIMITS handling for the userspace governor cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/ cpufreq: dt: Kill platform-data mvebu: Use dev_pm_opp_set_sharing_cpus() to mark OPP tables as shared cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2 cpufreq: governor: Change confusing struct field and variable names cpufreq: intel_pstate: Enable PPC enforcement for servers ...
2016-05-13ring-buffer: Prevent overflow of size in ring_buffer_resize()Steven Rostedt (Red Hat)
If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE then the DIV_ROUND_UP() will return zero. Here's the details: # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb tracing_entries_write() processes this and converts kb to bytes. 18014398509481980 << 10 = 18446744073709547520 and this is passed to ring_buffer_resize() as unsigned long size. size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); Where DIV_ROUND_UP(a, b) is (a + b - 1)/b BUF_PAGE_SIZE is 4080 and here 18446744073709547520 + 4080 - 1 = 18446744073709551599 where 18446744073709551599 is still smaller than 2^64 2^64 - 18446744073709551599 = 17 But now 18446744073709551599 / 4080 = 4521260802379792 and size = size * 4080 = 18446744073709551360 This is checked to make sure its still greater than 2 * 4080, which it is. Then we convert to the number of buffer pages needed. nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE) but this time size is 18446744073709551360 and 2^64 - (18446744073709551360 + 4080 - 1) = -3823 Thus it overflows and the resulting number is less than 4080, which makes 3823 / 4080 = 0 an nr_pages is set to this. As we already checked against the minimum that nr_pages may be, this causes the logic to fail as well, and we crash the kernel. There's no reason to have the two DIV_ROUND_UP() (that's just result of historical code changes), clean up the code and fix this bug. Cc: stable@vger.kernel.org # 3.5+ Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13ring-buffer: Use long for nr_pages to avoid overflow failuresSteven Rostedt (Red Hat)
The size variable to change the ring buffer in ftrace is a long. The nr_pages used to update the ring buffer based on the size is int. On 64 bit machines this can cause an overflow problem. For example, the following will cause the ring buffer to crash: # cd /sys/kernel/debug/tracing # echo 10 > buffer_size_kb # echo 8556384240 > buffer_size_kb Then you get the warning of: WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260 Which is: RB_WARN_ON(cpu_buffer, nr_removed); Note each ring buffer page holds 4080 bytes. This is because: 1) 10 causes the ring buffer to have 3 pages. (10kb requires 3 * 4080 pages to hold) 2) (2^31 / 2^10 + 1) * 4080 = 8556384240 The value written into buffer_size_kb is shifted by 10 and then passed to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE which is 4080. 8761737461760 / 4080 = 2147484672 4) nr_pages is subtracted from the current nr_pages (3) and we get: 2147484669. This value is saved in a signed integer nr_pages_to_update 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int turns into the value of -2147482627 6) As the value is a negative number, in update_pages_handler() it is negated and passed to rb_remove_pages() and 2147482627 pages will be removed, which is much larger than 3 and it causes the warning because not all the pages asked to be removed were removed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001 Cc: stable@vger.kernel.org # 2.6.28+ Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer") Reported-by: Hao Qin <QEver.cn@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-11Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10blktrace: add missed mask nameShaohua Li
BLK_TC_NOTIFY is missed in mask_maps, so we can't print out notify or set mask with 'notify' name. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-10blktrace: delete garbage for message traceShaohua Li
commit f4a1d08ce65 introduces a regression. Originally for BLK_TN_MESSAGE, we add message in trace and return. The commit ignores the early return and add garbage info. Signed-off-by: Shaohua Li <shli@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03tracing: Use temp buffer when filtering eventsSteven Rostedt (Red Hat)
Filtering of events requires the data to be written to the ring buffer before it can be decided to filter or not. This is because the parameters of the filter are based on the result that is written to the ring buffer and not on the parameters that are passed into the trace functions. The ftrace ring buffer is optimized for writing into the ring buffer and committing. The discard procedure used when filtering decides the event should be discarded is much more heavy weight. Thus, using a temporary filter when filtering events can speed things up drastically. Without a temp buffer we have: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.790706626 seconds time elapsed ( +- 0.71% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.566904059 seconds time elapsed ( +- 0.27% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.690598511 seconds time elapsed ( +- 0.19% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.707486364 seconds time elapsed ( +- 0.30% ) The first run above is without any tracing, just to get a based figure. hackbench takes ~0.79 seconds to run on the system. The second run enables tracing all events where nothing is filtered. This increases the time by 100% and hackbench takes 1.57 seconds to run. The third run filters all events where the preempt count will equal "20" (this should never happen) thus all events are discarded. This takes 1.69 seconds to run. This is 10% slower than just committing the events! The last run enables all events and filters where the filter will commit all events, and this takes 1.70 seconds to run. The filtering overhead is approximately 10%. Thus, the discard and commit of an event from the ring buffer may be about the same time. With this patch, the numbers change: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.778233033 seconds time elapsed ( +- 0.38% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.582102692 seconds time elapsed ( +- 0.28% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.309230710 seconds time elapsed ( +- 0.22% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.786001924 seconds time elapsed ( +- 0.20% ) The first run is again the base with no tracing. The second run is all tracing with no filtering. It is a little slower, but that may be well within the noise. The third run shows that discarding all events only took 1.3 seconds. This is a speed up of 23%! The discard is much faster than even the commit. The one downside is shown in the last run. Events that are not discarded by the filter will take longer to add, this is due to the extra copy of the event. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-03tracing: Don't display trigger file for events that can't be enabledChunyu Hu
Currently register functions for events will be called through the 'reg' field of event class directly without any check when seting up triggers. Triggers for events that don't support register through debug fs (events under events/ftrace are for trace-cmd to read event format, and most of them don't have a register function except events/ftrace/functionx) can't be enabled at all, and an oops will be hit when setting up trigger for those events, so just not creating them is an easy way to avoid the oops. Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com Cc: stable@vger.kernel.org # 3.14+ Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-02tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logicSteven Rostedt (Red Hat)
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove unused function trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)
trace_current_buffer_lock_reserve() has no more users. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove one use of trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)
The only user of trace_current_buffer_lock_reserve() is in the boot up self tests. Restructure the code a little to have that code use what everything else uses: trace_event_buffer_lock_reserve(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Have trace_buffer_unlock_commit() call the _regs version with NULLSteven Rostedt (Red Hat)
There's no real difference between trace_buffer_unlock_commit() and trace_buffer_unlock_commit_regs() except that the former passes NULL to ftrace_stack_trace() instead of regs. Have the former be a static inline of the latter which passes NULL for regs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove unused function trace_current_buffer_discard_commit()Steven Rostedt (Red Hat)
The function trace_current_buffer_discard_commit() has no callers, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Move trace_buffer_unlock_commit{_regs}() to local headerSteven Rostedt (Red Hat)
The functions trace_buffer_unlock_commit() and the _regs() version are only used within the kernel/trace directory. Move them to the local header and remove the export as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Fold filter_check_discard() into its only userSteven Rostedt (Red Hat)
The function filter_check_discard() is small and only called by one user, its code can be folded into that one caller and make the code a bit less comlplex. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-27tracing: Make filter_check_discard() localSteven Rostedt (Red Hat)
Nothing outside of the tracing directory calls filter_check_discard() or check_filter_check_discard(). They should not be called by modules. Move their prototypes into the local tracing header and remove their EXPORT_SYMBOL() macros. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-26tracing: Move event_trigger_unlock_commit{_regs}() to local headerSteven Rostedt (Red Hat)
The functions event_trigger_unlock_commit() and event_trigger_unlock_commit_regs() are no longer used outside the tracing system. Move them out of the generic headers and into the local one. Along with __event_trigger_test_discard() that is only used by them. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-27ftrace: Match dot symbols when searching functions on ppc64Thiago Jung Bauermann
In the ppc64 big endian ABI, function symbols point to function descriptors. The symbols which point to the function entry points have a dot in front of the function name. Consequently, when the ftrace filter mechanism searches for the symbol corresponding to an entry point address, it gets the dot symbol. As a result, ftrace filter users have to be aware of this ABI detail on ppc64 and prepend a dot to the function name when setting the filter. The perf probe command insulates the user from this by ignoring the dot in front of the symbol name when matching function names to symbols, but the sysfs interface does not. This patch makes the ftrace filter mechanism do the same when searching symbols. Fixes the following failure in ftracetest's kprobe_ftrace.tc: .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument That failure is on this line of kprobe_ftrace.tc: echo _do_fork > set_ftrace_filter This is because there's no _do_fork entry in the functions list: # cat available_filter_functions | grep _do_fork ._do_fork This change introduces no regressions on the perf and ftracetest testsuite results. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-26tracing: Don't use the address of the buffer array name in copy_from_userWang Xiaoqiang
With the following code snippet: ... char buf[64]; ... if (copy_from_user(&buf, ubuf, cnt)) ... Even though the value of "&buf" equals "buf", but there is no need to get the address of the "buf" again. Use "buf" instead of "&buf". Link: http://lkml.kernel.org/r/20160418152329.18b72bea@debian Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-26tracing: Handle tracing_map_alloc_elts() error path correctlyTom Zanussi
If tracing_map_elt_alloc() fails, it will return ERR_PTR() instead of NULL, so change the check to IS_ERROR(). We also need to set the failed entry in the map->elts array to NULL instead of ERR_PTR() so tracing_map_free_elts() doesn't try freeing an ERR_PTR(). tracing_map_free_elts() should also zero out what it frees so a reentrant call won't find previously freed elements. Link: http://lkml.kernel.org/r/f29d03b00bce3aac8cf151a8a30e6c83e5fee66d.1461610073.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-26tracing: Add check for NULL event field when creating hist fieldTom Zanussi
Smatch flagged create_hist_field() as possibly being able to dereference a NULL pointer, although the current code exits in all cases where the event field could be NULL, so it's not actually a problem. Still, to prevent future changes to the code from overlooking new cases, make the NULL pointer check explicit and warn once in that case. Link: http://lkml.kernel.org/r/cfbc003f534a3e441b4313272fd412310aba6336.1461610073.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-26tracing: checking for NULL instead of IS_ERR()Dan Carpenter
tracing_map_elt_alloc() returns ERR_PTRs on error, never NULL. Fixes: 08d43a5fa063 ('tracing: Add lock-free tracing_map') Link: http://lkml.kernel.org/r/20160423102347.GA11136@mwanda Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-25tracing: Do not inherit event-fork option for instancesSteven Rostedt (Red Hat)
As the event-fork option requires doing work when enabled and disabled, it can not be passed down to created instances. The instance must clear this flag when it is created, and must clear it when its removed. As more options may be created with this need, a macro ZEROED_TRACE_FLAGS is created that holds the flags that must not be inherited by the top level instance, and must be cleared on removal of instances. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19bpf: add event output helper for notifications/sampling/loggingDaniel Borkmann
This patch adds a new helper for cls/act programs that can push events to user space applications. For networking, this can be f.e. for sampling, debugging, logging purposes or pushing of arbitrary wake-up events. The idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example"). The eBPF program utilizes a perf event array map that user space populates with fds from perf_event_open(), the eBPF program calls into the helper f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw)) so that the raw data is pushed into the fd f.e. at the map index of the current CPU. User space can poll/mmap/etc on this and has a data channel for receiving events that can be post-processed. The nice thing is that since the eBPF program and user space application making use of it are tightly coupled, they can define their own arbitrary raw data format and what/when they want to push. While f.e. packet headers could be one part of the meta data that is being pushed, this is not a substitute for things like packet sockets as whole packet is not being pushed and push is only done in a single direction. Intention is more of a generically usable, efficient event pipe to applications. Workflow is that tc can pin the map and applications can attach themselves e.g. after cls/act setup to one or multiple map slots, demuxing is done by the eBPF program. Adding this facility is with minimal effort, it reuses the helper introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and we get its functionality for free by overloading its BPF_FUNC_ identifier for cls/act programs, ctx is currently unused, but will be made use of in future. Example will be added to iproute2's BPF example files. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_outputDaniel Borkmann
Add a BPF_F_CURRENT_CPU flag to optimize the use-case where user space has per-CPU ring buffers and the eBPF program pushes the data into the current CPU's ring buffer which saves us an extra helper function call in eBPF. Also, make sure to properly reserve the remaining flags which are not used. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19tracing: Fix unsigned comparison to zero in hist trigger codeSteven Rostedt (Red Hat)
Fengguang Wu's bot found two comparisons of unsigned integers to zero. These were real bugs, as it would miss error conditions returned to zero. trace_events_hist.c:426:6-9: WARNING: Unsigned expression compared with zero: idx < 0 trace_events_hist.c:568:5-14: WARNING: Unsigned expression compared with zero: n_entries < 0 Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'log2' modifierNamhyung Kim
Allow users to have numeric fields displayed as log2 values in case value range is very wide by appending '.log2' to field names. For example, # echo 'hist:key=bytes_req' > kmalloc/trigger # cat kmalloc/hist { bytes_req: 504 } hitcount: 1 { bytes_req: 11 } hitcount: 1 { bytes_req: 104 } hitcount: 1 { bytes_req: 48 } hitcount: 1 { bytes_req: 2048 } hitcount: 1 { bytes_req: 4096 } hitcount: 1 { bytes_req: 240 } hitcount: 1 { bytes_req: 392 } hitcount: 1 { bytes_req: 13 } hitcount: 1 { bytes_req: 28 } hitcount: 1 { bytes_req: 12 } hitcount: 1 { bytes_req: 64 } hitcount: 2 { bytes_req: 128 } hitcount: 2 { bytes_req: 32 } hitcount: 2 { bytes_req: 8 } hitcount: 11 { bytes_req: 10 } hitcount: 13 { bytes_req: 24 } hitcount: 25 { bytes_req: 160 } hitcount: 29 { bytes_req: 16 } hitcount: 33 { bytes_req: 80 } hitcount: 36 When using '.log2' modifier, the output looks like: # echo 'hist:key=bytes_req.log2' > kmalloc/trigger # cat kmalloc/hist { bytes_req: ~ 2^12 } hitcount: 1 { bytes_req: ~ 2^11 } hitcount: 1 { bytes_req: ~ 2^9 } hitcount: 2 { bytes_req: ~ 2^6 } hitcount: 3 { bytes_req: ~ 2^3 } hitcount: 13 { bytes_req: ~ 2^5 } hitcount: 19 { bytes_req: ~ 2^8 } hitcount: 49 { bytes_req: ~ 2^7 } hitcount: 57 { bytes_req: ~ 2^4 } hitcount: 74 Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for named hist triggersTom Zanussi
Allow users to define 'named' hist triggers. All triggers created with the same 'name=xxx' option will update the same shared histogram data. This expands the hist trigger syntax from this: # echo hist:keys=xxx ... [ if filter] > event/trigger to this: # echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger Named histograms must use a 'compatible' set of keys and values, which means each event added to a set of named triggers must have the same names and types. Reading the 'hist' file of any of the participating events will produce the same output as any other participating event, which is to be expected since they share the same data. Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for named triggersTom Zanussi
Named triggers are sets of triggers that share a common set of trigger data. An example of functionality that could benefit from this type of capability would be a set of inlined probes that would each contribute event counts, for example, to a shared counter data structure. The first named trigger registered with a given name owns the common trigger data that the others subsequently registered with the same name will reference. The functions defined here allow users to add, delete, and find named triggers. It also adds functions to pause and unpause named triggers; since named triggers act upon common data, they should also be paused and unpaused as a group. Link: http://lkml.kernel.org/r/c09ff648360f65b10a3e321eddafe18060b4a04f.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for multiple hist triggers per eventTom Zanussi
Allow users to define any number of hist triggers per trace event. Any number of hist triggers may be added for a given event, which may differ by key, value, or filter. Reading the event's 'hist' file will display the output of all the hist triggers defined on an event concatenated in the order they were defined. Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add enable_hist/disable_hist triggersTom Zanussi
Similar to enable_event/disable_event triggers, these triggers enable and disable the aggregation of events into maps rather than enabling and disabling their writing into the trace buffer. They can be used to automatically start and stop hist triggers based on a matching filter condition. If there's a paused hist trigger on system:event, the following would start it when the filter condition was hit: # echo enable_hist:system:event [ if filter] > event/trigger And the following would disable a running system:event hist trigger: # echo disable_hist:system:event [ if filter] > event/trigger See Documentation/trace/events.txt for real examples. Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Remove restriction on string position in hist trigger keysTom Zanussi
If we assume the maximum size for a string field, we don't have to worry about its position. Since we only allow two keys in a compound key and having more than one string key in a given compound key doesn't make much sense anyway, trading a bit of extra space instead of introducing an arbitrary restriction makes more sense. We also need to use the event field size for static strings when copying the contents, otherwise we get random garbage in the key. Also, cast string return values to avoid warnings on 32-bit compiles. Finally, rearrange the code without changing any functionality by moving the compound key updating code into a separate function. Link: http://lkml.kernel.org/r/8976e1ab04b66bc2700ad1ed0768a2de85ac1983.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Support string type key properlyNamhyung Kim
The string in a trace event is usually recorded as dynamic array which is variable length. But current hist code only support fixed length array so it cannot support most strings. This patch fixes it by checking filter_type of the field and get proper pointer with it. With this, it can get a histogram of exec() based on filenames like below: # cd /sys/kernel/tracing/events/sched/sched_process_exec # cat 'hist:key=filename' > trigger # ps PID TTY TIME CMD 1 ? 00:00:00 init 29 ? 00:00:00 sh 38 ? 00:00:00 ps # ls enable filter format hist id trigger # cat hist # trigger info: hist:keys=filename:vals=hitcount:sort=hitcount:size=2048 [active] { filename: /usr/bin/ps } hitcount: 1 { filename: /usr/bin/ls } hitcount: 1 { filename: /usr/bin/cat } hitcount: 1 Totals: Hits: 3 Entries: 3 Dropped: 0 Link: http://lkml.kernel.org/r/610180d6df0cfdf11ee205452f3b241dea657233.1457029949.git.tom.zanussi@linux.intel.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ Added (unsigned long) typecast to fix compile warning ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for stacktraces as keysTom Zanussi
It's often useful to be able to use a stacktrace as a hash key, for keeping a count of the number of times a particular call path resulted in a trace event, for instance. Add a special key named 'stacktrace' which can be used as key in a 'keys=' param for this purpose: # echo hist:keys=stacktrace ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87515e90b3785232a874a12156174635a348edb1.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'syscall' modifierTom Zanussi
Allow users to have syscall id fields displayed as syscall names in the output by appending '.syscall' to field names: # echo hist:keys=aaa.syscall ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2bab1e59933d76a14b545bd2e02f80b8b08ac4d3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'execname' modifierTom Zanussi
Allow users to have common_pid field values displayed as program names in the output by appending '.execname' to a common_pid field name: # echo hist:keys=common_pid.execname ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/e172e81f10f5b8d1f08450e3763c850f39fbf698.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'sym' and 'sym-offset' modifiersTom Zanussi
Allow users to have address fields displayed as symbols in the output by appending '.sym' or 'sym-offset' to field names: # echo hist:keys=aaa.sym,bbb.sym-offset ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87d4935821491c0275513f0fbfb9bab8d3d3f079.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'hex' modifier for displaying numeric fieldsTom Zanussi
Allow users to have numeric fields displayed as hex values in the output by appending '.hex' to field names: # echo hist:keys=aaa,bbb.hex:vals=ccc.hex ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/67bd431edda2af5798d7694818f7e8d71b6b3463.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for clearing a traceTom Zanussi
Allow users to append 'clear' to an existing trigger in order to have the hash table cleared. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont/clear \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/ae15dd0d9b2f7af07a37c1ff682063e2dbcdf160.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for pausing and continuing a traceTom Zanussi
Allow users to append 'pause' or 'continue' to an existing trigger in order to have it paused or to have a paused trace continue. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/b672a92c14702cb924cdf6fc27ea1809bed04907.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for user-defined sorting ('sort=' param)Tom Zanussi
Allow users to specify keys and/or values to sort on. With this addition, keys and values specified using the 'keys=' and 'vals=' keywords can be used to sort the hist trigger output via a new 'sort=' keyword. If multiple sort keys are specified, the output will be sorted using the second key as a secondary sort key, etc. The default sort order is ascending; if the user wants a different sort order, '.descending' can be appended to the specific sort key. Before this addition, output was always sorted by 'hitcount' in ascending order. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy \ [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/b30a41db66ba486979c4f987aff5fab500ea53b3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for compound keysTom Zanussi
Allow users to specify multiple trace event fields to use in keys by allowing multiple fields in the 'keys=' keyword. With this addition, any unique combination of any of the fields named in the 'keys' keyword will result in a new entry being added to the hash table. Link: http://lkml.kernel.org/r/0cfa24e6ac3b0dcece7737d94aa1f322ae3afc4b.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for multiple values ('vals=' param)Tom Zanussi
Allow users to specify trace event fields to use in aggregated sums via a new 'vals=' keyword. Before this addition, the only aggregated sum supported was the implied value 'hitcount'. With this addition, 'hitcount' is also supported as an explicit value field, as is any numeric trace event field. This expands the hist trigger syntax from this: # echo hist:keys=xxx [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2a5d1adb5ba6c65d7bb2148e379f2fed47f29a68.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add 'hist' event trigger commandTom Zanussi
'hist' triggers allow users to continually aggregate trace events, which can then be viewed afterwards by simply reading a 'hist' file containing the aggregation in a human-readable format. The basic idea is very simple and boils down to a mechanism whereby trace events, rather than being exhaustively dumped in raw form and viewed directly, are automatically 'compressed' into meaningful tables completely defined by the user. This is done strictly via single-line command-line commands and without the aid of any kind of programming language or interpreter. A surprising number of typical use cases can be accomplished by users via this simple mechanism. In fact, a large number of the tasks that users typically do using the more complicated script-based tracing tools, at least during the initial stages of an investigation, can be accomplished by simply specifying a set of keys and values to be used in the creation of a hash table. The Linux kernel trace event subsystem happens to provide an extensive list of keys and values ready-made for such a purpose in the form of the event format files associated with each trace event. By simply consulting the format file for field names of interest and by plugging them into the hist trigger command, users can create an endless number of useful aggregations to help with investigating various properties of the system. See Documentation/trace/events.txt for examples. hist triggers are implemented on top of the existing event trigger infrastructure, and as such are consistent with the existing triggers from a user's perspective as well. The basic syntax follows the existing trigger syntax. Users start an aggregation by writing a 'hist' trigger to the event of interest's trigger file: # echo hist:keys=xxx [ if filter] > event/trigger Once a hist trigger has been set up, by default it continually aggregates every matching event into a hash table using the event key and a value field named 'hitcount'. To view the aggregation at any point in time, simply read the 'hist' file in the same directory as the 'trigger' file: # cat event/hist The detailed syntax provides additional options for user control, and is described exhaustively in Documentation/trace/events.txt and in the virtual tracing/README file in the tracing subsystem. Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>