summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-10-05time: Remove xtime_cachejohn stultz
With the prior logarithmic time accumulation patch, xtime will now always be within one "tick" of the current time, instead of possibly half a second off. This removes the need for the xtime_cache value, which always stored the time at the last interrupt, so this patch cleans that up removing the xtime_cache related code. This is a bit simpler, but still could use some wider testing. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1254525855.7741.95.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05time: Implement logarithmic time accumulationjohn stultz
Accumulating one tick at a time works well unless we're using NOHZ. Then it can be an issue, since we may have to run through the loop a few thousand times, which can increase timer interrupt caused latency. The current solution was to accumulate in half-second intervals with NOHZ. This kept the number of loops down, however it did slightly change how we make NTP adjustments. While not an issue with NTPd users, as NTPd makes adjustments over a longer period of time, other adjtimex() users have noticed the half-second granularity with which we can apply frequency changes to the clock. For instance, if a application tries to apply a 100ppm frequency correction for 20ms to correct a 2us offset, with NOHZ they either get no correction, or a 50us correction. Now, there will always be some granularity error for applying frequency corrections. However with users sensitive to this error have seen a 50-500x increase with NOHZ compared to running without NOHZ. So I figured I'd try another approach then just simply increasing the interval. My approach is to consume the time interval logarithmically. This reduces the number of times through the loop needed keeping latency down, while still preserving the original granularity error for adjtimex() changes. Further, this change allows us to remove the xtime_cache code (patch to follow), as xtime is always within one tick of the current time, instead of the half-second updates it saw before. An earlier version of this patch has been shipping to x86 users in the RedHat MRG releases for awhile without issue, but I've reworked this version to be even more careful about avoiding possible overflows if the shift value gets too large. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1254525473.7741.88.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05sched: Set correct normal_prio and prio values in sched_fork()Peter Williams
normal_prio should be updated if policy changes from RT to SCHED_MORMAL or if static_prio/nice is changed. Some paths through sched_fork() ignore this requirement and may result in normal_prio having an invalid value. Fixing this issue allows the call to effective_prio() in wake_up_new_task() to be removed. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <f8f46736fd4e7f090ac0.1253774830@mudlark.pw.nest> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05tracing: Use free_percpu instead of kfreeFrederic Weisbecker
In the event->profile_enable() failure path, we release the per cpu buffers using kfree which is wrong because they are per cpu pointers. Although free_percpu only wraps kfree for now, that may change in the future so lets use the correct way. Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-05tracing: Check total refcount before releasing bufs in profile_enable failureFrederic Weisbecker
When we call the profile_enable() callback of an event, we release the shared perf event tracing buffers unconditionnaly in the failure path. This is wrong because there may be other users of these. Then check the total refcount before doing this. Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-04Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits) Revert "Seperate read and write statistics of in_flight requests" cfq-iosched: don't delay async queue if it hasn't dispatched at all block: Topology ioctls cfq-iosched: use assigned slice sync value, not default cfq-iosched: rename 'desktop' sysfs entry to 'low_latency' cfq-iosched: implement slower async initiate and queue ramp up cfq-iosched: delay async IO dispatch, if sync IO was just done cfq-iosched: add a knob for desktop interactiveness Add a tracepoint for block request remapping block: allow large discard requests block: use normal I/O path for discard requests swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL fs/bio.c: move EXPORT* macros to line after function Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs cciss: fix build when !PROC_FS block: Do not clamp max_hw_sectors for stacking devices block: Set max_sectors correctly for stacking devices cciss: cciss_host_attr_groups should be const cciss: Dynamically allocate the drive_info_struct for each logical drive. cciss: Add usage_count attribute to each logical drive in /sys ...
2009-10-04HWPOISON: Clean up PR_MCE_KILL interfaceAndi Kleen
While writing the manpage I noticed some shortcomings in the current interface. - Define symbolic names for all the different values - Boundary check the kill mode values - For symmetry add a get interface too. This allows library code to get/set the current state. - For consistency define a PR_MCE_KILL_DEFAULT value Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-10-03this_cpu: Use this_cpu operations in RCUChristoph Lameter
RCU does not do dynamic allocations but it increments per cpu variables a lot. These instructions results in a move to a register and then back to memory. This patch will make it use the inc/dec instructions on x86 that do not need a register. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-03tracing/ftrace: Fix to check create_event_dir() when adding new eventsMasami Hiramatsu
Check result of event_create_dir() and add ftrace_event_call to ftrace_events list only if it is succeeded. Thanks to Li for pointing it out. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <20090925182054.10157.55219.stgit@omoto> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-10-03tracing/kprobes: Use global event perf buffers in kprobe tracerMasami Hiramatsu
Use new percpu global event buffer instead of stack in kprobe tracer while tracing through perf. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <20090925182011.10157.60140.stgit@omoto> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-10-02percpu: kill legacy percpu allocatorTejun Heo
With ia64 converted, there's no arch left which still uses legacy percpu allocator. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Delightedly-acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org>
2009-10-01memcg: some modification to softlimit under hierarchical memory reclaim.KAMEZAWA Hiroyuki
This patch clean up/fixes for memcg's uncharge soft limit path. Problems: Now, res_counter_charge()/uncharge() handles softlimit information at charge/uncharge and softlimit-check is done when event counter per memcg goes over limit. Now, event counter per memcg is updated only when memory usage is over soft limit. Here, considering hierarchical memcg management, ancesotors should be taken care of. Now, ancerstors(hierarchy) are handled in charge() but not in uncharge(). This is not good. Prolems: 1. memcg's event counter incremented only when softlimit hits. That's bad. It makes event counter hard to be reused for other purpose. 2. At uncharge, only the lowest level rescounter is handled. This is bug. Because ancesotor's event counter is not incremented, children should take care of them. 3. res_counter_uncharge()'s 3rd argument is NULL in most case. ops under res_counter->lock should be small. No "if" sentense is better. Fixes: * Removed soft_limit_xx poitner and checks in charge and uncharge. Do-check-only-when-necessary scheme works enough well without them. * make event-counter of memcg incremented at every charge/uncharge. (per-cpu area will be accessed soon anyway) * All ancestors are checked at soft-limit-check. This is necessary because ancesotor's event counter may never be modified. Then, they should be checked at the same time. Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01cgroup: catch bad css refcnt at css_putKAMEZAWA Hiroyuki
__css_put() doesn't check a bug as refcnt goes to minus. I think it should be caught. This patch adds a check for it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01const: constify remaining file_operationsAlexey Dobriyan
[akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01module: fix up CONFIG_KALLSYMS=n build.Paul Mundt
Starting from commit 4a4962263f07d14660849ec134ee42b63e95ea9a "reduce symbol table for loaded modules (v2)", the kernel/module.c build is broken with CONFIG_KALLSYMS disabled. CC kernel/module.o kernel/module.c:1995: warning: type defaults to 'int' in declaration of 'Elf_Hdr' kernel/module.c:1995: error: expected ';', ',' or ')' before '*' token kernel/module.c: In function 'load_module': kernel/module.c:2203: error: 'strmap' undeclared (first use in this function) kernel/module.c:2203: error: (Each undeclared identifier is reported only once kernel/module.c:2203: error: for each function it appears in.) kernel/module.c:2239: error: 'symoffs' undeclared (first use in this function) kernel/module.c:2239: error: implicit declaration of function 'layout_symtab' kernel/module.c:2240: error: 'stroffs' undeclared (first use in this function) make[1]: *** [kernel/module.o] Error 1 make: *** [kernel/module.o] Error 2 There are three different issues: - layout_symtab() takes a const Elf_Ehdr - layout_symtab() needs to return a value - symoffs/stroffs/strmap are referenced by the load_module() code despite being ifdefed out, which seems unnecessary given the noop behaviour of layout_symtab()/add_kallsyms() in the case of CONFIG_KALLSYMS=n. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01Add a tracepoint for block request remappingJun'ichi Nomura
Since 2.6.31 now has request-based device-mapper, it's useful to have a tracepoint for request-remapping as well as bio-remapping. This patch adds a tracepoint for request-remapping, trace_block_rq_remap(). Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfsZdenek Kabelac
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82. Release kobject also in case the request_fn is NULL. Problem was noticed via kmemleak backtrace when some sysfs entries were note properly destroyed during device removal: unreferenced object 0xffff88001aa76640 (size 80): comm "lvcreate", pid 2120, jiffies 4294885144 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e...... 90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S..... backtrace: [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60 [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0 [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120 [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0 [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0 [<ffffffff81197d93>] sysfs_create_group+0x13/0x20 [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20 [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0 [<ffffffff812447e4>] add_disk+0x94/0x160 [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod] [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod] [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod] [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod] [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0 [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01core, x86: Add user return notifiersAvi Kivity
Add a general per-cpu notifier that is called whenever the kernel is about to return to userspace. The notifier uses a thread_info flag and existing checks, so there is no impact on user return or context switch fast paths. This will be used initially to speed up KVM task switching by lazily updating MSRs. Signed-off-by: Avi Kivity <avi@redhat.com> LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-01kmemtrace: Fix up tracer registrationPaul Mundt
Commit ddc1637af217dbd8bc51f30e6d24e84476a869a6 ("kmemtrace: Print binary output only if 'bin' option is set") ended up inverting the error detection logic. register_tracer() returns 0 on success, which this change caused to treat as an error, resulting in: [ 0.132000] Warning: could not register the kmem tracer as well as bailing out of the initcall with an error value. This restores the old logic. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <20090928075540.GD6668@linux-sh.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01Merge branch 'tracing/urgent' into tracing/coreIngo Molnar
Merge reason: Pick up latest fixes and update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01perf_event: Clean up perf_event_init_task()Xiao Guangrong
While at it: we can traverse ctx->group_list to get all group leader, it should be safe since we hold ctx->mutex. Changlog v1->v2: - remove WARN_ON_ONCE() according to Peter Zijlstra's suggestion Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <4ABC5AF9.6060808@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01perf_event: Fix event group handling in __perf_event_sched_*()Xiao Guangrong
Paul Mackerras says: "Actually, looking at this more closely, it has to be a group leader anyway since it's at the top level of ctx->group_list. In fact I see four places where we do: list_for_each_entry(event, &ctx->group_list, group_entry) { if (event == event->group_leader) ... or the equivalent, three of which appear to have been introduced by afedadf2 ("perf_counter: Optimize sched in/out of counters") back in May by Peter Z. As far as I can see the if () is superfluous in each case (a singleton event will be a group of 1 and will have its group_leader pointing to itself)." [ See: http://marc.info/?l=linux-kernel&m=125361238901442&w=2 ] And Peter Zijlstra points out this is a bugfix: "The intent was to call event_sched_{in,out}() for single event groups because that's cheaper than group_sched_{in,out}(), however.. - as you noticed, I got the condition wrong, it should have read: list_empty(&event->sibling_list) - it failed to call group_can_go_on() which deals with ->exclusive. - it also doesn't call hw_perf_group_sched_in() which might break power." [ See: http://marc.info/?l=linux-kernel&m=125369523318583&w=2 ] Changelog v1->v2: - Fix the title name according to Peter Zijlstra's suggestion - Remove the comments and WARN_ON_ONCE() as Peter Zijlstra's suggestion Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <4ABC5A55.7000208@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01tracing: Fix infinite recursion in ftrace_update_pid_func()Matt Fleming
When CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST is enabled __ftrace_trace_function contains the current trace function, not ftrace_trace_function. In ftrace_update_pid_func() we currently incorrectly assign the value of ftrace_trace_function to __ftrace_trace_funcion before returning. Without this patch it is possible to execute an infinite recursion whereby ftrace_test_stop_func() calls __ftrace_trace_function, which was assigned ftrace_test_stop_func() in ftrace_update_pid_func(). Signed-off-by: Matt Fleming <matthew.fleming@imgtec.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1254152581-18347-1-git-send-email-matt@console-pimps.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-30sched_clock: Fix atomicity/continuity bug by using cmpxchg64()Eric Dumazet
Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed cmpxchg() of 64bit values was available on X86_32. That is not so - and causes some subtle scheduler misbehavior due to incorrect timestamps off to up by ~4 seconds. Two symptoms are known right now: - interactivity problems seen by Arjan: up to 600 msecs latencies instead of the expected 20-40 msecs. These latencies are very visible on the desktop. - incorrect CPU stats: occasionally too high percentages in 'top', and crazy CPU usage stats. Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090930170754.0886ff2e@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-27const: mark struct vm_struct_operationsAlexey Dobriyan
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Eliminate needless reprogramming of clock events device
2009-09-26Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Add memory barrier commentary to futex_wait_queue_me() futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me() futex: Correct futex_q woken state commentary futex: Make function kernel-doc commentary consistent futex: Correct queue_me and unqueue_me commentary futex: Correct futex_wait_requeue_pi() commentary
2009-09-26Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: Resume clocksource without taking the clocksource mutex
2009-09-26Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: modules, tracing: Remove stale struct marker signature from module_layout() tracing/workqueue: Use %pf in workqueue trace events tracing: Fix a comment and a trivial format issue in tracepoint.h tracing: Fix failure path in ftrace_regex_open() tracing: Fix failure path in ftrace_graph_write() tracing: Check the return value of trace_get_user() tracing: Fix off-by-one in trace_get_user()
2009-09-26hrtimer: Remove overly verbose "switch to high res mode" messageRoland Dreier
On big systems, printing <number of CPUs> copies of Switched to high resolution mode on CPU nnn clutters up the kernel log for minimal gain. Just get rid of them. Signed-off-by: Roland Dreier <rolandd@cisco.com> LKML-Reference: <ada1vlw126s.fsf_-_@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
Conflicts: drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/cpc-usb/TODO drivers/staging/cpc-usb/cpc-usb_drv.c drivers/staging/cpc-usb/cpc.h drivers/staging/cpc-usb/cpc_int.h drivers/staging/cpc-usb/cpcusb.h
2009-09-24clocksource: Resume clocksource without taking the clocksource mutexMartin Schwidefsky
git commit 75c5158f70c065b9 converted the clocksource spinlock to a mutex. This causes the following BUG: BUG: sleeping function called from invalid context at kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473, name: pm-suspend 2 locks held by pm-suspend/2473: #0: (&buffer->mutex){......}, at: [<ffffffff8115ab13>] sysfs_write_file+0x3c/0x137 #1: (pm_mutex){......}, at: [<ffffffff810865b5>] enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31 #1 Call Trace: [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24 [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43 [<ffffffff81073537>] clocksource_resume+0x1c/0x60 [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8 [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf [<ffffffff812aef79>] sysdev_resume+0x6d/0xae [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af [<ffffffff8108665b>] enter_state+0xdf/0x130 [<ffffffff81085dc3>] state_store+0xb6/0xd3 [<ffffffff81204c73>] kobj_attr_store+0x17/0x19 [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137 [<ffffffff811057d2>] vfs_write+0xae/0x10b [<ffffffff81208392>] ? __up_read+0x1a/0x7f [<ffffffff811058ef>] sys_write+0x4a/0x6e [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b clocksource_resume is called early in the resume process, there is only one cpu, no processes are running and the interrupts are disabled. It is therefore possible to resume the clocksources without taking the clocksource mutex. Reported-by: Xiaotian Feng <xtfeng@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Cc: Xiaotian Feng <xtfeng@gmail.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24futex: Add memory barrier commentary to futex_wait_queue_me()Darren Hart
The memory barrier semantics of futex_wait_queue_me() are non-obvious. Add some commentary to try and clarify it. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924185447.694.38948.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24tracing/filters: Unify the regex parsing helpersFrederic Weisbecker
The filter code has stolen the regex parsing function from ftrace to get the regex support. We have duplicated this code, so factorize it in the filter area and make it generally available, as the filter code is the most suited to host this feature. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-09-24tracing/filters: Provide basic regex supportFrederic Weisbecker
This patch provides basic support for regular expressions in filters. It supports the following types of regexp: - *match_beginning - *match_middle* - match_end* - !don't match Example: cd /debug/tracing/events/bkl/lock_kernel echo 'file == "*reiserfs*"' > filter echo 1 > enable gedit-4941 [000] 457.735437: lock_kernel: depth: 0, fs/reiserfs/namei.c:334 reiserfs_lookup() sync_supers-227 [001] 461.379985: lock_kernel: depth: 0, fs/reiserfs/super.c:69 reiserfs_sync_fs() sync_supers-227 [000] 461.383096: lock_kernel: depth: 0, fs/reiserfs/journal.c:1069 flush_commit_list() reiserfs/1-1369 [001] 461.479885: lock_kernel: depth: 0, fs/reiserfs/journal.c:3509 flush_async_commits() Every string is now handled as a regexp in the filter framework, which helps to factorize the code for handling both simple strings and regexp comparisons. (The regexp parsing code has been wildly cherry picked from ftrace.c written by Steve.) v2: Simplify the whole and drop the filter_regex file Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-09-24Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (24 commits) microblaze: Disable heartbeat/enable emaclite in defconfigs microblaze: Support simpleImage.dts make target microblaze: Fix _start symbol to physical address microblaze: Use LOAD_OFFSET macro to get correct LMA for all sections microblaze: Create the LOAD_OFFSET macro used to compute VMA vs LMA offsets microblaze: Copy ppc asm-compat.h for clean handling of constants in asm and C microblaze: Actually show KiB rather than pages in "Freeing initrd memory:" microblaze: Support ptrace syscall tracing. microblaze: Updated CPU version and FPGA family codes in PVR microblaze: Generate correct signal and siginfo for integer div-by-zero microblaze: Don't be noisy when userspace causes hardware exceptions microblaze: Remove ipc.h file which points to non-existing asm-generic file microblaze: Clear sticky FSR register after generating exception signals microblaze: Ensure CPU usermode is set on new userspace processes microblaze: Use correct kbuild variable KBUILD_CFLAGS microblaze: Save and restore msr in hw exception microblaze: Add architectural support for USB EHCI host controllers microblaze: Implement include/asm/syscall.h. microblaze: Improve checking mechanism for MSR instruction microblaze: Add checking mechanism for MSR instruction ...
2009-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: module: don't call percpu_modfree on NULL pointer. module: fix memory leak when load fails after srcversion/version allocated module: preferred way to use MODULE_AUTHOR param: allow whitespace as kernel parameter separator module: reduce string table for loaded modules (v2) module: reduce symbol table for loaded modules (v2)
2009-09-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-currentLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: lsm: Use a compressed IPv6 string format in audit events Audit: send signal info if selinux is disabled Audit: rearrange audit_context to save 16 bytes per struct Audit: reorganize struct audit_watch to save 8 bytes
2009-09-25module: don't call percpu_modfree on NULL pointer.Rusty Russell
The general one handles NULL, the static obsolescent (CONFIG_HAVE_LEGACY_PER_CPU_AREA) one in module.c doesn't; Eric's commit 720eba31 assumed it did, and various frobbings since then kept that assumption. All other callers in module.c all protect it with an if; this effectively does the same as free_init is only goto if we fail percpu_modalloc(). Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2009-09-25module: fix memory leak when load fails after srcversion/version allocatedRusty Russell
Normally the twisty paths of sysfs will free the attributes, but not if we fail before we hook it into sysfs (which is the last thing we do in load_module). (This sysfs code is a turd, no doubt there are other issues lurking too). Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
2009-09-25param: allow whitespace as kernel parameter separatorPeter Oberparleiter
Some boot mechanisms require that kernel parameters are stored in a separate file which is loaded to memory without further processing (e.g. the "Load from FTP" method on s390). When such a file contains newline characters, the kernel parameter preceding the newline might not be correctly parsed (due to the newline being stuck to the end of the actual parameter value) which can lead to boot failures. This patch improves kernel command line usability in such a situation by allowing generic whitespace characters as separators between kernel parameters. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25module: reduce string table for loaded modules (v2)Jan Beulich
Also remove all parts of the string table (referenced by the symbol table) that are not needed for kallsyms use (i.e. which were only referenced by symbols discarded by the previous patch, or not referenced at all for whatever reason). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25module: reduce symbol table for loaded modules (v2)Jan Beulich
Discard all symbols not interesting for kallsyms use: absolute, section, and in the common case (!KALLSYMS_ALL) data ones. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...
2009-09-24task_struct cleanup: move binfmt field to mm_structHiroshi Shimamoto
Because the binfmt is not different between threads in the same process, it can be moved from task_struct to mm_struct. And binfmt moudle is handled per mm_struct instead of task_struct. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24aio: ifdef fields in mm_structAlexey Dobriyan
->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24pidns: deny CLONE_PARENT|CLONE_NEWPID combinationSukadev Bhattiprolu
CLONE_PARENT was used to implement an older threading model. For consistency with the CLONE_THREAD check in copy_pid_ns(), disable CLONE_PARENT with CLONE_NEWPID, at least until the required semantics of pid namespaces are clear. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Oren Laadan <orenl@cs.columbia.edu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24fork(): disable CLONE_PARENT for initSukadev Bhattiprolu
When global or container-init processes use CLONE_PARENT, they create a multi-rooted process tree. Besides siblings of global init remain as zombies on exit since they are not reaped by their parent (swapper). So prevent global and container-inits from creating siblings. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Oren Laadan <orenl@cs.columbia.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24sysctl: remove "struct file *" argument of ->proc_handlerAlexey Dobriyan
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24signals: inline __fatal_signal_pendingRoland McGrath
__fatal_signal_pending inlines to one instruction on x86, probably two instructions on other machines. It takes two longer x86 instructions just to call it and test its return value, not to mention the function itself. On my random x86_64 config, this saved 70 bytes of text (59 of those being __fatal_signal_pending itself). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>