summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-04-09kthread: move sched-realeted initialization from kthreadd contextOleg Nesterov
kthreadd is the single thread which implements ths "create" request, move sched_setscheduler/etc from create_kthread() to kthread_create() to improve the scalability. We should be careful with sched_setscheduler(), use _nochek helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09kthread: Don't looking for a task in create_kthread() #2Vitaliy Gusev
Remove the unnecessary find_task_by_pid_ns(). kthread() can just use "current" to get the same result. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-08ptrace: some checkpatch fixesRoland McGrath
This fixes all the checkpatch --file complaints about kernel/ptrace.c and also removes an unused #include. I've verified that there are no changes to the compiled code on x86_64. Signed-off-by: Roland McGrath <roland@redhat.com> [ Removed the parts that just split a line - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-08perf_counter: allow for data addresses to be recordedPeter Zijlstra
Paul suggested we allow for data addresses to be recorded along with the traditional IPs as power can provide these. For now, only the software pagefault events provide data addresses, but in the future power might as well for some events. x86 doesn't seem capable of providing this atm. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.394816925@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: move PERF_RECORD_TIMEPeter Zijlstra
Move PERF_RECORD_TIME so that all the fixed length items come before the variable length ones. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.307926436@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: track task-comm dataPeter Zijlstra
Similar to the mmap data stream, add one that tracks the task COMM field, so that the userspace reporting knows what to call a task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.127422406@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: use misc field to widen typePeter Zijlstra
Push the PERF_EVENT_COUNTER_OVERFLOW bit into the misc field so that we can have the full 32bit for PERF_RECORD_ bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130408.891867663@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: provide misc bits in the event headerPeter Zijlstra
Limit the size of each record to 64k (or should we count in multiples of u64 and have a 512K limit?), this gives 16 bits or spare room in the header, which we can use for misc bits, so as to not have to grow the record with u64 every time we have a few bits to report. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130408.769271806@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: fix NMI race in task clockPeter Zijlstra
We should not be updating ctx->time from NMI context, work around that. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130408.681326666@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)Oleg Nesterov
update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case when we already have CPUCLOCK_PROF timer which should expire first. But it uses cputime_lt() instead of cputime_gt(). Test case: int main(void) { struct itimerval it = { .it_value = { .tv_sec = 1000 }, }; assert(!setitimer(ITIMER_PROF, &it, NULL)); struct rlimit rl = { .rlim_cur = 1, .rlim_max = 1, }; assert(!setrlimit(RLIMIT_CPU, &rl)); for (;;) ; return 0; } Without this patch, the task is not killed as RLIMIT_CPU demands. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Lojkin <ia6432@inbox.ru> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org LKML-Reference: <20090327000610.GA10108@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08posix-timers: fix RLIMIT_CPU && fork()Oleg Nesterov
See http://bugzilla.kernel.org/show_bug.cgi?id=12911 copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus fastpath_timer_check() returns false unless we have other expired cpu timers. Change copy_signal() to set cputime_expires.prof_exp if we have RLIMIT_CPU. Also, set cputimer.running = 1 in that case. This is not strictly necessary, but imho makes sense. Reported-by: Peter Lojkin <ia6432@inbox.ru> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Lojkin <ia6432@inbox.ru> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org LKML-Reference: <20090327000607.GA10104@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08Merge commit 'v2.6.30-rc1' into sched/urgentIngo Molnar
Merge reason: update to latest upstream to queue up fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08Merge commit 'v2.6.30-rc1' into core/urgentIngo Molnar
Merge reason: need latest upstream to queue up dependent fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08futex: fixup unlocked requeue pi caseDarren Hart
Thomas's testing caught a problem when the requeue target futex is unowned and multiple tasks are requeued to it. This patch ensures the FUTEX_WAITERS bit gets set if futex_requeue() will requeue one or more tasks in addition to the one acquiring the lock. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-04-08Merge commit 'v2.6.30-rc1' into perfcounters/coreIngo Molnar
Conflicts: arch/powerpc/include/asm/systbl.h arch/powerpc/include/asm/unistd.h include/linux/init_task.h Merge reason: the conflicts are non-trivial: PowerPC placement of sys_perf_counter_open has to be mixed with the new preadv/pwrite syscalls. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Merge branch 'core/softlockup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softlockup' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softlockup: make DETECT_HUNG_TASK default depend on DETECT_SOFTLOCKUP softlockup: move 'one' to the softlockup section in sysctl.c softlockup: ensure the task has been switched out once softlockup: remove timestamp checking from hung_task softlockup: convert read_lock in hung_task to rcu_read_lock softlockup: check all tasks in hung_task softlockup: remove unused definition for spawn_softlockup_task softlockup: fix potential race in hung_task when resetting timeout softlockup: fix to allow compiling with !DETECT_HUNG_TASK softlockup: decouple hung tasks check from softlockup detection
2009-04-07Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y branch tracer: Fix for enabling branch profiling makes sparse unusable ftrace: Correct a text align for event format output Update /debug/tracing/README tracing/ftrace: alloc the started cpumask for the trace file tracing, x86: remove duplicated #include ftrace: Add check of sched_stopped for probe_sched_wakeup function-graph: add proper initialization for init task tracing/ftrace: fix missing include string.h tracing: fix incorrect return type of ns2usecs() tracing: remove CALLER_ADDR2 from wakeup tracer blktrace: fix pdu_len when tracing packet command requests blktrace: small cleanup in blk_msg_write() blktrace: NUL-terminate user space messages tracing: move scripts/trace/power.pl to scripts/tracing/power.pl
2009-04-07Merge branch 'irq/threaded' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq/threaded' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: fix devres.o build for GENERIC_HARDIRQS=n genirq: provide old request_irq() for CONFIG_GENERIC_HARDIRQ=n genirq: threaded irq handlers review fixups genirq: add support for threaded interrupts to devres genirq: add threaded interrupt handler support
2009-04-07Merge commit 'origin/master' into for-linus/xen/masterJeremy Fitzhardinge
* commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c
2009-04-07kprobes: support per-kprobe disablingMasami Hiramatsu
Add disable_kprobe() and enable_kprobe() to disable/enable kprobes temporarily. disable_kprobe() asynchronously disables probe handlers of specified kprobe. So, after calling it, some handlers can be called at a while. enable_kprobe() enables specified kprobe. aggr_pre_handler and aggr_post_handler check disabled probes. On the other hand aggr_break_handler and aggr_fault_handler don't check it because these handlers will be called while executing pre or post handlers and usually those help error handling. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07kprobes: rename kprobe_enabled to kprobes_all_disarmedMasami Hiramatsu
Rename kprobe_enabled to kprobes_all_disarmed and invert logic due to avoiding naming confusion from per-probe disabling. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07kprobes: move EXPORT_SYMBOL_GPL just after function definitionsMasami Hiramatsu
Clean up positions of EXPORT_SYMBOL_GPL in kernel/kprobes.c according to checkpatch.pl. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07kprobes: cleanup aggr_kprobe related codeMasami Hiramatsu
Currently, kprobes can disable all probes at once, but can't disable it individually (not unregister, just disable an kprobe, because unregistering needs to wait for scheduler synchronization). These patches introduce APIs for on-the-fly per-probe disabling and re-enabling by dis-arming/re-arming its breakpoint instruction. This patch: Change old_p to ap in add_new_kprobe() for readability, copy flags member in add_aggr_kprobe(), and simplify the code flow of register_aggr_kprobe(). Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07mm: add /proc controls for pdflush threadsPeter W Morreale
Add /proc entries to give the admin the ability to control the minimum and maximum number of pdflush threads. This allows finer control of pdflush on both large and small machines. The rationale is simply one size does not fit all. Admins on large and/or small systems may want to tune the min/max pdflush thread count to best suit their needs. Right now the min/max is hardcoded to 2/8. While probably a fair estimate for smaller machines, large machines with large numbers of CPUs and large numbers of filesystems/block devices may benefit from larger numbers of threads working on different block devices. Even if the background flushing algorithm is radically changed, it is still likely that multiple threads will be involved and admins would still desire finer control on the min/max other than to have to recompile the kernel. The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions. The minimum value for nr_pdflush_threads_min is 1 and the maximum value is the current value of nr_pdflush_threads_max. This minimum is required since additional thread creation is performed in a pdflush thread itself. The minimum value for nr_pdflush_threads_max is the current value of nr_pdflush_threads_min and the maximum value can be 1000. Documentation/sysctl/vm.txt is also updated. [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly] Signed-off-by: Peter W Morreale <pmorreale@novell.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07ftrace: clean up enable logic for sched_switchZhaolei
Unify sched_switch and sched_wakeup's action to following logic: Do record_cmdline when start_cmdline_record() is called. Start tracing events when the tracer is started. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> LKML-Reference: <49D1C596.5050203@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07function-graph: use int instead of atomic for ftrace_graph_activeSteven Rostedt
Impact: cleanup The variable ftrace_graph_active is only modified under the ftrace_lock mutex, thus an atomic is not necessary for modification. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07tracing/ftrace: factorize the tracing files creationFrederic Weisbecker
Impact: cleanup Most of the tracing files creation follow the same pattern: ret = debugfs_create_file(...) if (!ret) pr_warning("Couldn't create ... entry\n") Unify it! Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07tracing: use macros to denote usec and nsec per secondLi Zefan
Impact: cleanup Use USEC_PER_SEC and NSEC_PER_SEC instead of 1000000 and 1000000000. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <49CC7870.9000309@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-07Merge branch 'tracing/urgent' into tracing/ftraceIngo Molnar
2009-04-07ftrace: Correct a text align for event format outputZhaolei
If we cat debugfs/tracing/events/ftrace/bprint/format, we'll see: name: bprint ID: 6 format: field:unsigned char common_type; offset:0; size:1; field:unsigned char common_flags; offset:1; size:1; field:unsigned char common_preempt_count; offset:2; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field:unsigned long ip; offset:12; size:4; field:char * fmt; offset:16; size:4; field: char buf; offset:20; size:0; print fmt: "%08lx (%d) fmt:%p %s" There is an inconsistent blank before char buf. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> LKML-Reference: <49D5E3EE.70201@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Update /debug/tracing/READMENikanth Karthikesan
Some of the tracers have been renamed, which was not updated in the in-kernel run-time README file. Update it. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> LKML-Reference: <200903231158.32151.knikanth@suse.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07tracing/ftrace: alloc the started cpumask for the trace fileFrederic Weisbecker
Impact: fix a crash while cat trace file Currently we are using a cpumask to remind each cpu where a trace occured. It lets us notice the user that a cpu just had its first trace. But on latest -tip we have the following crash once we cat the trace file: IP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/class/net/eth0/carrier Pid: 3897, comm: cat Not tainted (2.6.29-tip-02825-g0f22972-dirty #81) EIP: 0060:[<c0270c4a>] EFLAGS: 00010297 CPU: 0 EIP is at print_trace_fmt+0x45/0xe7 EAX: 00000000 EBX: 00000000 ECX: c12d9e98 EDX: ccdb7010 ESI: d31f4000 EDI: 00322401 EBP: d31f3f10 ESP: d31f3efc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process cat (pid: 3897, ti=d31f2000 task=d3b3cf20 task.ti=d31f2000) Stack: d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000 d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33 00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000 Call Trace: [<c0270e5c>] ? print_trace_line+0x170/0x17c [<c02718e8>] ? s_show+0xa7/0xbd [<c02bfc33>] ? seq_read+0x24a/0x327 [<c02bf9e9>] ? seq_read+0x0/0x327 [<c02ab18b>] ? vfs_read+0x86/0xe1 [<c02ab289>] ? sys_read+0x40/0x65 [<c0202d8f>] ? sysenter_do_call+0x12/0x3c Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00 EIP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:d31f3efc CR2: 0000000000000000 ---[ end trace aa9cf38e5ebed9dd ]--- This is because we alloc the iter->started cpumask on tracing_pipe_open but not on tracing_open. It hadn't been noticed until now because we need to have ring buffer overruns to activate the starting of cpu buffer detection. Also, we need a check to not print the messagge for the first trace on the file. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1238619188-6109-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07ftrace: Add check of sched_stopped for probe_sched_wakeupZhaolei
The wakeup tracing in sched_switch does not stop when a user disables tracing. This is because the probe_sched_wakeup() is missing the check to prevent the wakeup from being traced. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> LKML-Reference: <49D1C543.3010307@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07tracing/ftrace: fix missing include string.hFrederic Weisbecker
Building a kernel with tracing can raise the following warning on tip/master: kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf' We are missing an include to string.h Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1238160130-7437-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07tracing: fix incorrect return type of ns2usecs()Lai Jiangshan
Impact: fix time output bug in 32bits system ns2usecs() returns 'long', it's incorrect. (In i386) ... <idle>-0 [000] 521.442100: _spin_lock <-tick_do_update_jiffies64 <idle>-0 [000] 521.442101: do_timer <-tick_do_update_jiffies64 <idle>-0 [000] 521.442102: update_wall_time <-do_timer <idle>-0 [000] 521.442102: update_xtime_cache <-update_wall_time .... (It always print the time less than 2200 seconds besides ...) Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds) ... <idle>-0 [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq <idle>-0 [001] 4154502640.134760: _local_bh_enable <-__do_softirq <idle>-0 [001] 4154502640.134761: idle_cpu <-irq_exit ... (very large value) Because 'long' is a signed type and it is 32bits in i386. Changes in v2: return 'unsigned long long' instead of 'cycle_t' Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <49D05D10.4030009@cn.fujitsu.com> Reported-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07tracing: remove CALLER_ADDR2 from wakeup tracerSteven Rostedt
Maneesh Soni was getting a crash when running the wakeup tracer. We debugged it down to the recording of the function with the CALLER_ADDR2 macro. This is used to get the location of the caller to schedule. But the problem comes when schedule is called by assmebly. In the case that Maneesh had, retint_careful would call schedule. But retint_careful does not set up a proper frame pointer. CALLER_ADDR2 is defined as __builtin_return_address(2). This produces the following assembly in the wakeup tracer code. mov 0x0(%rbp),%rcx <--- get the frame pointer of the caller mov %r14d,%r8d mov 0xf2de8e(%rip),%rdi mov 0x8(%rcx),%rsi <-- this is __builtin_return_address(1) mov 0x28(%rdi,%rax,8),%rbx mov (%rcx),%rax <-- get the frame pointer of the caller's caller mov %r12,%rcx mov 0x8(%rax),%rdx <-- this is __builtin_return_address(2) At the reading of 0x8(%rax) Maneesh's machine would take a fault. The reason is that retint_careful did not set up the return address and the content of %rax here was zero. To verify this, I sent Maneesh a patch to create a frame pointer in retint_careful. He ran the test again but this time he would take the same type of fault from sysret_careful. The retint_careful was no longer an issue, but there are other callers that still have issues. Instead of adding frame pointers for all callers to schedule (in possibly all archs), it is much safer to simply not use CALLER_ADDR2. This loses out on knowing what called schedule, but the function tracer will help there if needed. Reported-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Merge branch 'linus' into tracing/coreIngo Molnar
Merge reason: update to upstream tracing facilities Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Merge branch 'tracing/blktrace-fixes' into tracing/urgentIngo Molnar
Merge reason: this used to be a tracing/blktrace-v2 devel topic still cooking during the merge window - has propagated to fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07x86, ptrace: add bts context unconditionallyMarkus Metzger
Add the ptrace bts context field to task_struct unconditionally. Initialize the field directly in copy_process(). Remove all the unneeded functionality used to initialize that field. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144603.292754000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07x86, hw-branch-tracer: allocate selftest iterator on heapMarkus Metzger
Allocate the trace_iterator for the hw-branch-tracer selftest on the heap. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144556.578777000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07x86, bts, hw-branch-tracer: add _noirq variants to the debug store interfaceMarkus Metzger
The hw-branch-tracer uses debug store functions from an on_each_cpu() context, which is simply wrong since the functions may sleep. Add _noirq variants for most functions, which may be called with interrupts disabled. Separate per-cpu and per-task tracing and allow per-cpu tracing to be controlled from any cpu. Make the hw-branch-tracer use the new debug store interface, synchronize with hotplug cpu event using get/put_online_cpus(), and remove the unnecessary spinlock. Make the ptrace bts and the ds selftest code use the new interface. Defer the ds selftest. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144555.658136000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07sched, hw-branch-tracer: add wait_task_context_switch() function to sched.hMarkus Metzger
Add a function to wait until some other task has been switched out at least once. This differs from wait_task_inactive() subtly, in that the latter will wait until the task has left the CPU. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Cc: markus.t.metzger@gmail.com Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144549.794157000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Merge branch 'linus' into tracing/hw-branch-tracingIngo Molnar
Merge reason: update to latest tracing and ptrace APIs Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Merge branch 'linus' into perfcounters/coreIngo Molnar
Merge reason: need the upstream facility added by: 7f1e2ca: hrtimer: fix rq->lock inversion (again) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07Merge branch 'linus' into core/softlockupIngo Molnar
Conflicts: kernel/sysctl.c
2009-04-07perf_counter: minimize context time updatesPeter Zijlstra
Push the update_context_time() calls up the stack so that we get less invokations and thereby a less noisy output: before: # ./perfstat -e 1:0 -e 1:1 -e 1:1 -e 1:1 -l ls > /dev/null Performance counter stats for 'ls': 10.163691 cpu clock ticks (msecs) (scaled from 98.94%) 10.215360 task clock ticks (msecs) (scaled from 98.18%) 10.185549 task clock ticks (msecs) (scaled from 98.53%) 10.183581 task clock ticks (msecs) (scaled from 98.71%) Wall-clock time elapsed: 11.912858 msecs after: # ./perfstat -e 1:0 -e 1:1 -e 1:1 -e 1:1 -l ls > /dev/null Performance counter stats for 'ls': 9.316630 cpu clock ticks (msecs) 9.280789 task clock ticks (msecs) 9.280789 task clock ticks (msecs) 9.280789 task clock ticks (msecs) Wall-clock time elapsed: 9.574872 msecs Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094518.618876874@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07perf_counter: remove rq->lock usagePeter Zijlstra
Now that all the task runtime clock users are gone, remove the ugly rq->lock usage from perf counters, which solves the nasty deadlock seen when a software task clock counter was read from an NMI overflow context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094518.531137582@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07perf_counter: rework the task clock software counterPeter Zijlstra
Rework the task clock software counter to use the context time instead of the task runtime clock, this removes the last such user. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094518.445450972@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07perf_counter: rework context timePeter Zijlstra
Since perf_counter_context is switched along with tasks, we can maintain the context time without using the task runtime clock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094518.353552838@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07perf_counter: change event definitionPeter Zijlstra
Currently the definition of an event is slightly ambiguous. We have wakeup events, for poll() and SIGIO, which are either generated when a record crosses a page boundary (hw_events.wakeup_events == 0), or every wakeup_events new records. Now a record can be either a counter overflow record, or a number of different things, like the mmap PROT_EXEC region notifications. Then there is the PERF_COUNTER_IOC_REFRESH event limit, which only considers counter overflows. This patch changes then wakeup_events and SIGIO notification to only consider overflow events. Furthermore it changes the SIGIO notification to report SIGHUP when the event limit is reached and the counter will be disabled. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094518.266679874@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>