summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2016-04-23lockdep: Fix lock_chain::base sizePeter Zijlstra
lock_chain::base is used to store an index into the chain_hlocks[] array, however that array contains more elements than can be indexed using the u16. Change the lock_chain structure to use a bitfield to encode the data it needs and add BUILD_BUG_ON() assertions to check the fields are wide enough. Also, for DEBUG_LOCKDEP, assert that we don't run out of elements of that array; as that would wreck the collision detectoring. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alfredo Alvarez Fernandez <alfredoalvarezfernandez@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160330093659.GS3408@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23locking/lockdep: Fix ->irq_context calculationBoqun Feng
task_irq_context() returns the encoded irq_context of the task, the return value is encoded in the same as ->irq_context of held_lock. Always return 0 if !(CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING) Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: sasha.levin@oracle.com Link: http://lkml.kernel.org/r/1455602265-16490-2-git-send-email-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentationPeter Zijlstra
Markus reported that 0 should also disable the throttling we per Documentation/sysctl/kernel.txt. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 91a612eea9a3 ("perf/core: Fix dynamic interrupt throttle") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22time: Introduce do_sys_settimeofday64()Baolin Wang
The do_sys_settimeofday() function uses a timespec, which is not year 2038 safe on 32bit systems. Thus this patch introduces do_sys_settimeofday64(), which allows us to transition users of do_sys_settimeofday() to using 64bit time types. Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> [jstultz: Include errno-base.h to avoid build issue on some arches] Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-04-22cpu/hotplug: Fix rollback during error-out in __cpu_disable()Sebastian Andrzej Siewior
The recent introduction of the hotplug thread which invokes the callbacks on the plugged cpu, cased the following regression: If takedown_cpu() fails, then we run into several issues: 1) The rollback of the target cpu states is not invoked. That leaves the smp threads and the hotplug thread in disabled state. 2) notify_online() is executed due to a missing skip_onerr flag. That causes that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which confuses quite some notifiers. 3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's not an issue per se, but it is inconsistent and in consequence blocks the patches which rely on these states being invoked on the target CPU and not on the controlling cpu. It also does not preserve the strict call order on rollback which is problematic for the ongoing state machine conversion as well. To fix this we add a rollback flag to the remote callback machinery and invoke the rollback including the CPU_DOWN_FAILED notification on the remote cpu. Further mark the notify online state with 'skip_onerr' so we don't get a double invokation. This workaround will go away once we moved the unplug invocation to the target cpu itself. [ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the target cpu ] Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-s390@vger.kernel.org Cc: rt@linutronix.de Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-22locking/rwsem: Provide down_write_killable()Michal Hocko
Now that all the architectures implement the necessary glue code we can introduce down_write_killable(). The only difference wrt. regular down_write() is that the slow path waits in TASK_KILLABLE state and the interruption by the fatal signal is reported as -EINTR to the caller. Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Zankel <chris@zankel.net> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Signed-off-by: Jason Low <jason.low2@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-21Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and ↵Paul E. McKenney
'torture.2016.04.21a' into HEAD doc.2016.04.19a: Documentation updates exp.2016.03.31d: Expedited grace-period updates fixes.2016.03.31d: Miscellaneous fixes torture.2016.004.21a Torture-test updates
2016-04-21rcutorture: Add irqs-disabled test for call_rcu()Paul E. McKenney
Mutation testing carried out by Iftekhar Ahmed of Oregon State University showed that rcutorture is failing to test invocations of call_rcu() having interrupts disabled. This commit therefore adds interrupt disabling around one of the existing invocations of call_rcu() (and friends). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-04-21rcutorture: Dump trace buffer upon shutdownPaul E. McKenney
When running from the scripts, rcutorture is completely headless, so there is no way to to manually dump the trace buffer. This commit therefore unconditionally dumps the trace buffer upon timed shutdown. However, if you are using rmmod to end the test, it is still up to you to manually dump the trace buffer. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in iwlwifi, from Matti Gottlieb. 2) Add missing registration of netfilter arp_tables into initial namespace, from Florian Westphal. 3) Fix potential NULL deref in DecNET routing code. 4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry Ivanov. 5) Fix dst ref counting in VRF, from David Ahern. 6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck. 7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause. 8) Ravalidate IPV6 datagram socket cached routes properly, particularly with UDP, from Martin KaFai Lau. 9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang. 10) Fix stats typing in bcmgenet driver, from Eric Dumazet. 11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing, from Joe Stringer. 12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown. 13) atl2 doesn't actually support scatter-gather, so don't advertise the feature. From Ben Hucthings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits) openvswitch: use flow protocol when recalculating ipv6 checksums Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets atl2: Disable unimplemented scatter/gather feature net/mlx4_en: Split SW RX dropped counter per RX ring net/mlx4_core: Don't allow to VF change global pause settings net/mlx4_core: Avoid repeated calls to pci enable/disable net/mlx4_core: Implement pci_resume callback net: phy: spi_ks8895: Don't leak references to SPI devices net: ethernet: davinci_emac: Fix platform_data overwrite net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable qede: Fix single MTU sized packet from firmware GRO flow qede: Fix setting Skb network header qede: Fix various memory allocation error flows for fastpath tcp: Merge tx_flags and tskey in tcp_shifted_skb tcp: Merge tx_flags and tskey in tcp_collapse_retrans drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks openvswitch: Orphan skbs before IPv6 defrag Revert "Prevent NUll pointer dereference with two PHYs on cpsw" VSOCK: Only check error on skb_recv_datagram when skb is NULL ...
2016-04-21perf, bpf: minimize the size of perf_trace_() tracepoint handlerAlexei Starovoitov
move trace_call_bpf() into helper function to minimize the size of perf_trace_*() tracepoint handlers. text data bss dec hex filename 10541679 5526646 2945024 19013349 1221ee5 vmlinux_before 10509422 5526646 2945024 18981092 121a0e4 vmlinux_after It may seem that perf_fetch_caller_regs() can also be moved, but that is incorrect, since ip/sp will be wrong. bpf+tracepoint performance is not affected, since perf_swevent_put_recursion_context() is now inlined. export_symbol_gpl can also be dropped. No measurable change in normal perf tracepoints. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21genirq: Dont allow affinity mask to be updated on IPIsMatt Redfearn
The IPI domain re-purposes the IRQ affinity to signify the mask of CPUs that this IPI will deliver to. This must not be modified before the IPI is destroyed again, so set the IRQ_NO_BALANCING flag to prevent the affinity being overwritten by setup_affinity(). Without this, if an IPI is reserved for a single target CPU, then allocated using __setup_irq(), the affinity is overwritten with cpu_online_mask. When ipi_destroy() is subsequently called on a multi-cpu system, it will attempt to free cpumask_weight() IRQs that were never allocated, and crash. Fixes: d17bf24e6952 ("genirq: Add a new generic IPI reservation code to irq core") Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: ralf@linux-mips.org Cc: Qais Yousef <qsyousef@gmail.com> Cc: lisa.parratt@imgtec.com Link: http://lkml.kernel.org/r/1461229712-13057-1-git-send-email-matt.redfearn@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-21futex: Acknowledge a new waiter in counter before plistDavidlohr Bueso
Otherwise an incoming waker on the dest hash bucket can miss the waiter adding itself to the plist during the lockless check optimization (small window but still the correct way of doing this); similarly to the decrement counterpart. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: bigeasy@linutronix.de Cc: dvhart@infradead.org Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-20futex: Handle unlock_pi race gracefullySebastian Andrzej Siewior
If userspace calls UNLOCK_PI unconditionally without trying the TID -> 0 transition in user space first then the user space value might not have the waiters bit set. This opens the following race: CPU0 CPU1 uval = get_user(futex) lock(hb) lock(hb) futex |= FUTEX_WAITERS .... unlock(hb) cmpxchg(futex, uval, newval) So the cmpxchg fails and returns -EINVAL to user space, which is wrong because the futex value is valid. To handle this (yes, yet another) corner case gracefully, check for a flag change and retry. [ tglx: Massaged changelog and slightly reworked implementation ] Fixes: ccf9e6a80d9e ("futex: Make unlock_pi more robust") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1460723739-5195-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-20Merge tag 'arm-memremap-for-v4.7' of ↵Russell King
git://git.linaro.org/people/ard.biesheuvel/linux-arm into devel-stable This series wires up the generic memremap() function for ARM in a way that allows it to be used as intended, i.e., without regard for whether the region being mapped is covered by a struct page and/or the linear mapping (lowmem)
2016-04-19bpf: add event output helper for notifications/sampling/loggingDaniel Borkmann
This patch adds a new helper for cls/act programs that can push events to user space applications. For networking, this can be f.e. for sampling, debugging, logging purposes or pushing of arbitrary wake-up events. The idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example"). The eBPF program utilizes a perf event array map that user space populates with fds from perf_event_open(), the eBPF program calls into the helper f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw)) so that the raw data is pushed into the fd f.e. at the map index of the current CPU. User space can poll/mmap/etc on this and has a data channel for receiving events that can be post-processed. The nice thing is that since the eBPF program and user space application making use of it are tightly coupled, they can define their own arbitrary raw data format and what/when they want to push. While f.e. packet headers could be one part of the meta data that is being pushed, this is not a substitute for things like packet sockets as whole packet is not being pushed and push is only done in a single direction. Intention is more of a generically usable, efficient event pipe to applications. Workflow is that tc can pin the map and applications can attach themselves e.g. after cls/act setup to one or multiple map slots, demuxing is done by the eBPF program. Adding this facility is with minimal effort, it reuses the helper introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") and we get its functionality for free by overloading its BPF_FUNC_ identifier for cls/act programs, ctx is currently unused, but will be made use of in future. Example will be added to iproute2's BPF example files. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_outputDaniel Borkmann
Add a BPF_F_CURRENT_CPU flag to optimize the use-case where user space has per-CPU ring buffers and the eBPF program pushes the data into the current CPU's ring buffer which saves us an extra helper function call in eBPF. Also, make sure to properly reserve the remaining flags which are not used. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19tracing: Fix unsigned comparison to zero in hist trigger codeSteven Rostedt (Red Hat)
Fengguang Wu's bot found two comparisons of unsigned integers to zero. These were real bugs, as it would miss error conditions returned to zero. trace_events_hist.c:426:6-9: WARNING: Unsigned expression compared with zero: idx < 0 trace_events_hist.c:568:5-14: WARNING: Unsigned expression compared with zero: n_entries < 0 Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'log2' modifierNamhyung Kim
Allow users to have numeric fields displayed as log2 values in case value range is very wide by appending '.log2' to field names. For example, # echo 'hist:key=bytes_req' > kmalloc/trigger # cat kmalloc/hist { bytes_req: 504 } hitcount: 1 { bytes_req: 11 } hitcount: 1 { bytes_req: 104 } hitcount: 1 { bytes_req: 48 } hitcount: 1 { bytes_req: 2048 } hitcount: 1 { bytes_req: 4096 } hitcount: 1 { bytes_req: 240 } hitcount: 1 { bytes_req: 392 } hitcount: 1 { bytes_req: 13 } hitcount: 1 { bytes_req: 28 } hitcount: 1 { bytes_req: 12 } hitcount: 1 { bytes_req: 64 } hitcount: 2 { bytes_req: 128 } hitcount: 2 { bytes_req: 32 } hitcount: 2 { bytes_req: 8 } hitcount: 11 { bytes_req: 10 } hitcount: 13 { bytes_req: 24 } hitcount: 25 { bytes_req: 160 } hitcount: 29 { bytes_req: 16 } hitcount: 33 { bytes_req: 80 } hitcount: 36 When using '.log2' modifier, the output looks like: # echo 'hist:key=bytes_req.log2' > kmalloc/trigger # cat kmalloc/hist { bytes_req: ~ 2^12 } hitcount: 1 { bytes_req: ~ 2^11 } hitcount: 1 { bytes_req: ~ 2^9 } hitcount: 2 { bytes_req: ~ 2^6 } hitcount: 3 { bytes_req: ~ 2^3 } hitcount: 13 { bytes_req: ~ 2^5 } hitcount: 19 { bytes_req: ~ 2^8 } hitcount: 49 { bytes_req: ~ 2^7 } hitcount: 57 { bytes_req: ~ 2^4 } hitcount: 74 Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for named hist triggersTom Zanussi
Allow users to define 'named' hist triggers. All triggers created with the same 'name=xxx' option will update the same shared histogram data. This expands the hist trigger syntax from this: # echo hist:keys=xxx ... [ if filter] > event/trigger to this: # echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger Named histograms must use a 'compatible' set of keys and values, which means each event added to a set of named triggers must have the same names and types. Reading the 'hist' file of any of the participating events will produce the same output as any other participating event, which is to be expected since they share the same data. Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for named triggersTom Zanussi
Named triggers are sets of triggers that share a common set of trigger data. An example of functionality that could benefit from this type of capability would be a set of inlined probes that would each contribute event counts, for example, to a shared counter data structure. The first named trigger registered with a given name owns the common trigger data that the others subsequently registered with the same name will reference. The functions defined here allow users to add, delete, and find named triggers. It also adds functions to pause and unpause named triggers; since named triggers act upon common data, they should also be paused and unpaused as a group. Link: http://lkml.kernel.org/r/c09ff648360f65b10a3e321eddafe18060b4a04f.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add support for multiple hist triggers per eventTom Zanussi
Allow users to define any number of hist triggers per trace event. Any number of hist triggers may be added for a given event, which may differ by key, value, or filter. Reading the event's 'hist' file will display the output of all the hist triggers defined on an event concatenated in the order they were defined. Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add enable_hist/disable_hist triggersTom Zanussi
Similar to enable_event/disable_event triggers, these triggers enable and disable the aggregation of events into maps rather than enabling and disabling their writing into the trace buffer. They can be used to automatically start and stop hist triggers based on a matching filter condition. If there's a paused hist trigger on system:event, the following would start it when the filter condition was hit: # echo enable_hist:system:event [ if filter] > event/trigger And the following would disable a running system:event hist trigger: # echo disable_hist:system:event [ if filter] > event/trigger See Documentation/trace/events.txt for real examples. Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Remove restriction on string position in hist trigger keysTom Zanussi
If we assume the maximum size for a string field, we don't have to worry about its position. Since we only allow two keys in a compound key and having more than one string key in a given compound key doesn't make much sense anyway, trading a bit of extra space instead of introducing an arbitrary restriction makes more sense. We also need to use the event field size for static strings when copying the contents, otherwise we get random garbage in the key. Also, cast string return values to avoid warnings on 32-bit compiles. Finally, rearrange the code without changing any functionality by moving the compound key updating code into a separate function. Link: http://lkml.kernel.org/r/8976e1ab04b66bc2700ad1ed0768a2de85ac1983.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Support string type key properlyNamhyung Kim
The string in a trace event is usually recorded as dynamic array which is variable length. But current hist code only support fixed length array so it cannot support most strings. This patch fixes it by checking filter_type of the field and get proper pointer with it. With this, it can get a histogram of exec() based on filenames like below: # cd /sys/kernel/tracing/events/sched/sched_process_exec # cat 'hist:key=filename' > trigger # ps PID TTY TIME CMD 1 ? 00:00:00 init 29 ? 00:00:00 sh 38 ? 00:00:00 ps # ls enable filter format hist id trigger # cat hist # trigger info: hist:keys=filename:vals=hitcount:sort=hitcount:size=2048 [active] { filename: /usr/bin/ps } hitcount: 1 { filename: /usr/bin/ls } hitcount: 1 { filename: /usr/bin/cat } hitcount: 1 Totals: Hits: 3 Entries: 3 Dropped: 0 Link: http://lkml.kernel.org/r/610180d6df0cfdf11ee205452f3b241dea657233.1457029949.git.tom.zanussi@linux.intel.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ Added (unsigned long) typecast to fix compile warning ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for stacktraces as keysTom Zanussi
It's often useful to be able to use a stacktrace as a hash key, for keeping a count of the number of times a particular call path resulted in a trace event, for instance. Add a special key named 'stacktrace' which can be used as key in a 'keys=' param for this purpose: # echo hist:keys=stacktrace ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87515e90b3785232a874a12156174635a348edb1.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'syscall' modifierTom Zanussi
Allow users to have syscall id fields displayed as syscall names in the output by appending '.syscall' to field names: # echo hist:keys=aaa.syscall ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2bab1e59933d76a14b545bd2e02f80b8b08ac4d3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'execname' modifierTom Zanussi
Allow users to have common_pid field values displayed as program names in the output by appending '.execname' to a common_pid field name: # echo hist:keys=common_pid.execname ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/e172e81f10f5b8d1f08450e3763c850f39fbf698.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'sym' and 'sym-offset' modifiersTom Zanussi
Allow users to have address fields displayed as symbols in the output by appending '.sym' or 'sym-offset' to field names: # echo hist:keys=aaa.sym,bbb.sym-offset ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/87d4935821491c0275513f0fbfb9bab8d3d3f079.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger 'hex' modifier for displaying numeric fieldsTom Zanussi
Allow users to have numeric fields displayed as hex values in the output by appending '.hex' to field names: # echo hist:keys=aaa,bbb.hex:vals=ccc.hex ... \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/67bd431edda2af5798d7694818f7e8d71b6b3463.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for clearing a traceTom Zanussi
Allow users to append 'clear' to an existing trigger in order to have the hash table cleared. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause/cont/clear \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/ae15dd0d9b2f7af07a37c1ff682063e2dbcdf160.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for pausing and continuing a traceTom Zanussi
Allow users to append 'pause' or 'continue' to an existing trigger in order to have it paused or to have a paused trace continue. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] >> event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \ [ if filter] >> event/trigger Link: http://lkml.kernel.org/r/b672a92c14702cb924cdf6fc27ea1809bed04907.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for user-defined sorting ('sort=' param)Tom Zanussi
Allow users to specify keys and/or values to sort on. With this addition, keys and values specified using the 'keys=' and 'vals=' keywords can be used to sort the hist trigger output via a new 'sort=' keyword. If multiple sort keys are specified, the output will be sorted using the second key as a secondary sort key, etc. The default sort order is ascending; if the user wants a different sort order, '.descending' can be appended to the specific sort key. Before this addition, output was always sorted by 'hitcount' in ascending order. This expands the hist trigger syntax from this: # echo hist:keys=xxx:vals=yyy \ [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ [ if filter] > event/trigger Link: http://lkml.kernel.org/r/b30a41db66ba486979c4f987aff5fab500ea53b3.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for compound keysTom Zanussi
Allow users to specify multiple trace event fields to use in keys by allowing multiple fields in the 'keys=' keyword. With this addition, any unique combination of any of the fields named in the 'keys' keyword will result in a new entry being added to the hash table. Link: http://lkml.kernel.org/r/0cfa24e6ac3b0dcece7737d94aa1f322ae3afc4b.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add hist trigger support for multiple values ('vals=' param)Tom Zanussi
Allow users to specify trace event fields to use in aggregated sums via a new 'vals=' keyword. Before this addition, the only aggregated sum supported was the implied value 'hitcount'. With this addition, 'hitcount' is also supported as an explicit value field, as is any numeric trace event field. This expands the hist trigger syntax from this: # echo hist:keys=xxx [ if filter] > event/trigger to this: # echo hist:keys=xxx:vals=yyy [ if filter] > event/trigger Link: http://lkml.kernel.org/r/2a5d1adb5ba6c65d7bb2148e379f2fed47f29a68.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add 'hist' event trigger commandTom Zanussi
'hist' triggers allow users to continually aggregate trace events, which can then be viewed afterwards by simply reading a 'hist' file containing the aggregation in a human-readable format. The basic idea is very simple and boils down to a mechanism whereby trace events, rather than being exhaustively dumped in raw form and viewed directly, are automatically 'compressed' into meaningful tables completely defined by the user. This is done strictly via single-line command-line commands and without the aid of any kind of programming language or interpreter. A surprising number of typical use cases can be accomplished by users via this simple mechanism. In fact, a large number of the tasks that users typically do using the more complicated script-based tracing tools, at least during the initial stages of an investigation, can be accomplished by simply specifying a set of keys and values to be used in the creation of a hash table. The Linux kernel trace event subsystem happens to provide an extensive list of keys and values ready-made for such a purpose in the form of the event format files associated with each trace event. By simply consulting the format file for field names of interest and by plugging them into the hist trigger command, users can create an endless number of useful aggregations to help with investigating various properties of the system. See Documentation/trace/events.txt for examples. hist triggers are implemented on top of the existing event trigger infrastructure, and as such are consistent with the existing triggers from a user's perspective as well. The basic syntax follows the existing trigger syntax. Users start an aggregation by writing a 'hist' trigger to the event of interest's trigger file: # echo hist:keys=xxx [ if filter] > event/trigger Once a hist trigger has been set up, by default it continually aggregates every matching event into a hash table using the event key and a value field named 'hitcount'. To view the aggregation at any point in time, simply read the 'hist' file in the same directory as the 'trigger' file: # cat event/hist The detailed syntax provides additional options for user control, and is described exhaustively in Documentation/trace/events.txt and in the virtual tracing/README file in the tracing subsystem. Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Update some tracing_map constants and commentsTom Zanussi
Make it clear exactly how many keys and values are supported through better defines, and add 1 to the vals count, since normally clients want support for at least a hitcount and two other values. Also, note the error return value for tracing_map_add_key/val_field() in the comments. Link: http://lkml.kernel.org/r/6696fa02ebc716aa344c27a571a2afaa25e5b4d4.1457029949.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Fix TRACING_MAP KconfigSteven Rostedt (Red Hat)
The config option for TRACING_MAP has "default n", which is not needed because the default of configs is 'n'. Also, since the TRACING_MAP has no config prompt, there's no reason to include "If in doubt, say N" in the help text. Fixed a typo in the comments of tracing_map.h. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add lock-free tracing_mapTom Zanussi
Add tracing_map, a special-purpose lock-free map for tracing. tracing_map is designed to aggregate or 'sum' one or more values associated with a specific object of type tracing_map_elt, which is associated by the map to a given key. It provides various hooks allowing per-tracer customization and is separated out into a separate file in order to allow it to be shared between multiple tracers, but isn't meant to be generally used outside of that context. The tracing_map implementation was inspired by lock-free map algorithms originated by Dr. Cliff Click: http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf Link: http://lkml.kernel.org/r/b43d68d1add33582a396f553c8ef705a33a6a748.1449767187.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Add infrastructure to allow set_event_pid to follow childrenSteven Rostedt
Add the infrastructure needed to have the PIDs in set_event_pid to automatically add PIDs of the children of the tasks that have their PIDs in set_event_pid. This will also remove PIDs from set_event_pid when a task exits This is implemented by adding hooks into the fork and exit tracepoints. On fork, the PIDs are added to the list, and on exit, they are removed. Add a new option called event_fork that when set, PIDs in set_event_pid will automatically get their children PIDs added when they fork, as well as any task that exits will have its PID removed from set_event_pid. This works for instances as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Use pid bitmap instead of a pid array for set_event_pidSteven Rostedt
In order to add the ability to let tasks that are filtered by the events have their children also be traced on fork (and then not traced on exit), convert the array into a pid bitmask. Most of the time the number of pids is only 32768 pids or a 4k bitmask, which is the same size as the default list currently is, and that list could grow if more pids are listed. This also greatly simplifies the code. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19tracing: Rename check_ignore_pid() to ignore_this_task()Steven Rostedt
The name "check_ignore_pid" is confusing in trying to figure out if the pid should be ignored or not. Rename it to "ignore_this_task" which is pretty straight forward, as a task (not a pid) is passed in, and should if true should be ignored. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19locking/pvqspinlock: Fix division by zero in qstat_read()Davidlohr Bueso
While playing with the qstat statistics (in <debugfs>/qlockstat/) I ran into the following splat on a VM when opening pv_hash_hops: divide error: 0000 [#1] SMP ... RIP: 0010:[<ffffffff810b61fe>] [<ffffffff810b61fe>] qstat_read+0x12e/0x1e0 ... Call Trace: [<ffffffff811cad7c>] ? mem_cgroup_commit_charge+0x6c/0xd0 [<ffffffff8119750c>] ? page_add_new_anon_rmap+0x8c/0xd0 [<ffffffff8118d3b9>] ? handle_mm_fault+0x1439/0x1b40 [<ffffffff811937a9>] ? do_mmap+0x449/0x550 [<ffffffff811d3de3>] ? __vfs_read+0x23/0xd0 [<ffffffff811d4ab2>] ? rw_verify_area+0x52/0xd0 [<ffffffff811d4bb1>] ? vfs_read+0x81/0x120 [<ffffffff811d5f12>] ? SyS_read+0x42/0xa0 [<ffffffff815720f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8 Fix this by verifying that qstat_pv_kick_unlock is in fact non-zero, similarly to what the qstat_pv_latency_wake case does, as if nothing else, this can come from resetting the statistics, thus having 0 kicks should be quite valid in this context. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Waiman Long <Waiman.Long@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: waiman.long@hpe.com Link: http://lkml.kernel.org/r/1460961103-24953-1-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-18bpf: avoid warning for wrong pointer castArnd Bergmann
Two new functions in bpf contain a cast from a 'u64' to a pointer. This works on 64-bit architectures but causes a warning on all 32-bit architectures: kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output_tp': kernel/trace/bpf_trace.c:350:13: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] u64 ctx = *(long *)r1; This changes the cast to first convert the u64 argument into a uintptr_t, which is guaranteed to be the same size as a pointer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 9940d67c93b5 ("bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint programs") Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18Merge branch 'topic/livepatch' into nextMichael Ellerman
Merge the support for live patching on ppc64le using mprofile-kernel. This branch has also been merged into the livepatching tree for v4.7.
2016-04-16Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixlet from Ingo Molnar: "Fixes a build warning on certain Kconfig combinations" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix print_collision() unused warning
2016-04-15Merge branch 'topic/livepatch' of ↵Jiri Kosina
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into for-4.7/livepatching-ppc64le Pull livepatching support for ppc64 architecture from Michael Ellerman. Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-14bpf: convert relevant helper args to ARG_PTR_TO_RAW_STACKDaniel Borkmann
This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(), bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm() and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can also be removed since the verifier already makes sure we stay within bounds on stack buffers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14bpf, verifier: add ARG_PTR_TO_RAW_STACK typeDaniel Borkmann
When passing buffers from eBPF stack space into a helper function, we have ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure that such buffers are initialized, within boundaries, etc. However, the downside with this is that we have a couple of helper functions such as bpf_skb_load_bytes() that fill out the passed buffer in the expected success case anyway, so zero initializing them prior to the helper call is unneeded/wasted instructions in the eBPF program that can be avoided. Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK. The idea is to skip the STACK_MISC check in check_stack_boundary() and color the related stack slots as STACK_MISC after we checked all call arguments. Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of the helper function will fill the provided buffer area, so that we cannot leak any uninitialized stack memory. This f.e. means that error paths need to memset() the buffers, but the expected fast-path doesn't have to do this anymore. Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK argument, we can keep it simple and don't need to check for multiple areas. Should in future such a use-case really appear, we have check_raw_mode() that will make sure we implement support for it first. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14bpf, verifier: add bpf_call_arg_meta for passing meta dataDaniel Borkmann
Currently, when the verifier checks calls in check_call() function, we call check_func_arg() for all 5 arguments e.g. to make sure expected types are correct. In some cases, we collect meta data (here: map pointer) to perform additional checks such as checking stack boundary on key/value sizes for subsequent arguments. As we're going to extend the meta data, add a generic struct bpf_call_arg_meta that we can use for passing into check_func_arg(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>