summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2010-11-23cpu: Remove incorrect BUG_ONPeter Zijlstra
Oleg mentioned that there is no actual guarantee the dying cpu's migration thread is actually finished running when we get there, so replace the BUG_ON() with a spinloop waiting for it. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-23cpu: Remove unused variableDhaval Giani
GCC warns us about: kernel/cpu.c: In function ‘take_cpu_down’: kernel/cpu.c:200:15: warning: unused variable ‘cpu’ This variable is unused since param->hcpu is directly used later on in cpu_notify. Signed-off-by: Dhaval Giani <dhaval_giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290091494.1145.5.camel@gondor.retis> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-23sched: Fix UP build breakagePeter Zijlstra
The recent cgroup-scheduling rework caused a UP build problem. Cc: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-23sched: Make task dump print all 15 chars of proc commErik Gilling
Signed-off-by: Erik Gilling <konkers@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-19Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking"Linus Torvalds
This reverts commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a. It turns out that this can break certain existing user land setups. Quoth Sarah Sharp: "On Wednesday, I updated my branch to commit 460781b from linus' tree, and my box would not boot. klogd segfaulted, which stalled the whole system. At first I thought it actually hung the box, but it continued booting after 5 minutes, and I was able to log in. It dropped back to the text console instead of the graphical bootup display for that period of time. dmesg surprisingly still works. I've bisected the problem down to this commit (commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a) The box is running klogd 1.5.5ubuntu3 (from Jaunty). Yes, I know that's old. I read the bit in the commit about changing the permissions of kallsyms after boot, but if I can't boot that doesn't help." So let's just keep the old default, and encourage distributions to do the "chmod -r /proc/kallsyms" in their bootup scripts. This is not worth a kernel option to change default behavior, since it's so easily done in user space. Reported-and-bisected-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Marcus Meissner <meissner@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-19tracing/events: Show real number in array fieldsSteven Rostedt
Currently we have in something like the sched_switch event: field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1; When a userspace tool such as perf tries to parse this, the TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro simply uses a #len to show the string of the length. When the length is an enum, we get a string that means nothing for tools. By adding a static buffer and a mutex to protect it, we can store the string into that buffer with snprintf and show the actual number. Now we get: field:char prev_comm[16]; offset:12; size:16; signed:1; Something much more useful. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-11-18Merge branch 'perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
2010-11-18Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Fix regression in evr register handling kgdb,x86: fix regression in detach handling kdb: fix crash when KDB_BASE_CMD_MAX is exceeded kdb: fix memory leak in kdb_main.c
2010-11-18tracing: New flag to allow non privileged users to use a trace eventFrederic Weisbecker
This adds a new trace event internal flag that allows them to be used in perf by non privileged users in case of task bound tracing. This is desired for syscalls tracepoint because they don't leak global system informations, like some other tracepoints. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>
2010-11-18x86: Add RO/NX protection for loadable kernel modulesmatthieu castet
This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. -v1: Initial proof-of-concept patch. -v2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting has also been corrected. -v3: Opportunistic RO/NX protection is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. -v4: Removed most macros and improved coding style. -v5: Changed page-alignment and RO/NX section size calculation -v6: Fixed comments. Restricted RO/NX enforcement to x86 only -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and set_all_modules_text_ro() in ftrace -v8: updated for compatibility with linux 2.6.33-rc5 -v9: coding style fixes -v10: more coding style fixes -v11: minor adjustments for -tip -v12: minor adjustments for v2.6.35-rc2-tip -v13: minor adjustments for v2.6.37-rc1-tip Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F914.9070106@free.fr> [ minor cleanliness edits, -v14: build failure fix ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Update tg->shares after cpu.shares writePaul Turner
Formerly sched_group_set_shares would force a rebalance by overflowing domain share sums. Now that per-cpu averages are maintained we can set the true value by issuing an update_cfs_shares() following a tg->shares update. Also initialize tg se->load to 0 for consistency since we'll now set correct weights on enqueue. Signed-off-by: Paul Turner <pjt@google.com?> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.465521344@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Allow update_cfs_load() to update global loadPaul Turner
Refactor the global load updates from update_shares_cpu() so that update_cfs_load() can update global load when it is more than ~10% out of sync. The new global_load parameter allows us to force an update, regardless of the error factor so that we can synchronize w/ update_shares(). Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.377473595@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Implement demand based update_cfs_load()Paul Turner
When the system is busy, dilation of rq->next_balance makes lb->update_shares() insufficiently frequent for threads which don't sleep (no dequeue/enqueue updates). Adjust for this by making demand based updates based on the accumulation of execution time sufficient to wrap our averaging window. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.291159744@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Update shares on idle_balancePaul Turner
Since shares updates are no longer expensive and effectively local, update them at idle_balance(). This allows us to more quickly redistribute shares to another cpu when our load becomes idle. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.204191702@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Add sysctl_sched_shares_windowPaul Turner
Introduce a new sysctl for the shares window and disambiguate it from sched_time_avg. A 10ms window appears to be a good compromise between accuracy and performance. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.112173964@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Introduce hierarchal order on shares update listPaul Turner
Avoid duplicate shares update calls by ensuring children always appear before parents in rq->leaf_cfs_rq_list. This allows us to do a single in-order traversal for update_shares(). Since we always enqueue in bottom-up order this reduces to 2 cases: 1) Our parent is already in the list, e.g. root \ b /\ c d* (root->b->c already enqueued) Since d's parent is enqueued we push it to the head of the list, implicitly ahead of b. 2) Our parent does not appear in the list (or we have no parent) In this case we enqueue to the tail of the list, if our parent is subsequently enqueued (bottom-up) it will appear to our right by the same rule. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.022488865@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Fix update_cfs_load() synchronizationPaul Turner
Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with the put path since nr_running accounting occurs at deactivation. It's also not safe to make the removal decision based on load_avg as this fails with both high periods and low shares. Resolve this by clipping history after 4 periods without activity. Note: the above will always occur from update_shares() since in the last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load is called. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.933428187@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Fix load corruption from update_cfs_shares()Paul Turner
As part of enqueue_entity both a new entity weight and its contribution to the queuing cfs_rq / rq are updated. Since update_cfs_shares will only update the queueing weights when the entity is on_rq (which in this case it is not yet), there's a dependency loop here: update_cfs_shares needs account_entity_enqueue to update cfs_rq->load.weight account_entity_enqueue needs the updated weight for the queuing cfs_rq load[*] Fix this and avoid spurious dequeue/enqueues by issuing update_cfs_shares as if we had accounted the enqueue already. This was also resulting in rq->load corruption previously. [*]: this dependency also exists when using the group cfs_rq w/ update_cfs_shares as the weight of the enqueued entity changes without the load being updated. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.844900206@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Make tg_shares_up() walk on-demandPeter Zijlstra
Make tg_shares_up() use the active cgroup list, this means we cannot do a strict bottom-up walk of the hierarchy, but assuming its a very wide tree with a small number of active groups it should be a win. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.754159484@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Implement on-demand (active) cfs_rq listPeter Zijlstra
Make certain load-balance actions scale per number of active cgroups instead of the number of existing cgroups. This makes wakeup/sleep paths more expensive, but is a win for systems where the vast majority of existing cgroups are idle. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.666535048@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Rewrite tg_shares_up)Peter Zijlstra
By tracking a per-cpu load-avg for each cfs_rq and folding it into a global task_group load on each tick we can rework tg_shares_up to be strictly per-cpu. This should improve cpu-cgroup performance for smp systems significantly. [ Paul: changed to use queueing cfs_rq + bug fixes ] Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.580480400@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Simplify cpu-hot-unplug task migrationPeter Zijlstra
While discussing the need for sched_idle_next(), Oleg remarked that since try_to_wake_up() ensures sleeping tasks will end up running on a sane cpu, we can do away with migrate_live_tasks(). If we then extend the existing hack of migrating current from CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the need for the sched_idle_next() abomination disappears as well, since idle will be the only possible thread left after the migration thread stops. This greatly simplifies the hot-unplug task migration path, as can be seen from the resulting code reduction (and about half the new lines are comments). Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289851597.2109.547.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18Merge commit 'v2.6.37-rc2' into sched/coreIngo Molnar
Merge reason: Move to a .37-rc base. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18irq_work: Drop cmpxchg() resultSergio Aguirre
The compiler warned us about: kernel/irq_work.c: In function 'irq_work_run': kernel/irq_work.c:148: warning: value computed is not used Dropping the cmpxchg() result is indeed weird, but correct - so annotate away the warning. Signed-off-by: Sergio Aguirre <saaguirre@ti.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289930567-17828-1-git-send-email-saaguirre@ti.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18perf: Fix owner-list vs exitPeter Zijlstra
Oleg noticed that a perf-fd keeping a reference on the creating task leads to a few funny side effects. There's two different aspects to this: - kernel based perf-events, these should not take out a reference on the creating task and appear on the task's event list since they're not bound to fds nor visible to userspace. - fork() and pthread_create(), these can lead to the creating task dying (and thus the task's event-list becomming useless) but keeping the list and ref alive until the event is closed. Combined they lead to malfunction of the ptrace hw_tracepoints. Cure this by not considering kernel based perf_events for the owner-list and destroying the owner-list when the owner dies. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <1289576883.2084.286.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Fix idle balancingNikhil Rao
An earlier commit reverts idle balancing throttling reset to fix a 30% regression in volanomark throughput. We still need to reset idle_stamp when we pull a task in newidle balance. Reported-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Nikhil Rao <ncrao@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290022924-3548-1-git-send-email-ncrao@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18sched: Fix volanomark performance regressionAlex Shi
Commit fab4762 triggers excessive idle balancing, causing a ~30% loss in volanomark throughput. Remove idle balancing throttle reset. Originally-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Nikhil Rao <ncrao@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289928732.5169.211.camel@maggy.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18Merge branch 'perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
2010-11-18x86, nmi_watchdog: Remove the old nmi_watchdogDon Zickus
Now that we have a new nmi_watchdog that is more generic and sits on top of the perf subsystem, we really do not need the old nmi_watchdog any more. In addition, the old nmi_watchdog doesn't really work if you are using the default clocksource, hpet. The old nmi_watchdog code relied on local apic interrupts to determine if the cpu is still alive. With hpet as the clocksource, these interrupts don't increment any more and the old nmi_watchdog triggers false postives. This piece removes the old nmi_watchdog code and stubs out any variables and functions calls. The stubs are the same ones used by the new nmi_watchdog code, so it should be well tested. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18Merge branch 'tip/perf/urgent-3' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2010-11-17rcu: move TINY_RCU from softirq to kthreadPaul E. McKenney
If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-11-17kdb: fix crash when KDB_BASE_CMD_MAX is exceededJovi Zhang
When the number of dyanmic kdb commands exceeds KDB_BASE_CMD_MAX, the kernel will fault. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-11-17kdb: fix memory leak in kdb_main.cJovi Zhang
Call kfree in the error path as well as the success path in kdb_ll(). Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-11-17BKL: remove extraneous #include <smp_lock.h>Arnd Bergmann
The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-16kernel: make /proc/kallsyms mode 400 to reduce ease of attackingMarcus Meissner
Making /proc/kallsyms readable only for root by default makes it slightly harder for attackers to write generic kernel exploits by removing one source of knowledge where things are in the kernel. This is the second submit, discussion happened on this on first submit and mostly concerned that this is just one hole of the sieve ... but one of the bigger ones. Changing the permissions of at least System.map and vmlinux is also required to fix the same set, but a packaging issue. Target of this starter patch and follow ups is removing any kind of kernel space address information leak from the kernel. [ Side note: the default of root-only reading is the "safe" value, and it's easy enough to then override at any time after boot. The /proc filesystem allows root to change the permissions with a regular chmod, so you can "revert" this at run-time by simply doing chmod og+r /proc/kallsyms as root if you really want regular users to see the kernel symbols. It does help some tools like "perf" figure them out without any setup, so it may well make sense in some situations. - Linus ] Signed-off-by: Marcus Meissner <meissner@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Eugene Teo <eugeneteo@kernel.org> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-16Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix cross-sched-class wakeup preemption sched: Fix runnable condition for stoptask sched: Use group weight, idle cpu metrics to fix imbalances during idle
2010-11-16Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / PM QoS: Fix reversed min and max PM / OPP: Hide OPP configuration when SoCs do not provide an implementation PM: Allow devices to be removed during late suspend and early resume
2010-11-16Merge branch 'futexes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Address compiler warnings in exit_robust_list
2010-11-16console: move for_each_console to linux/console.hJiri Slaby
Move it out of printk.c so that we can use it all over the code. There are some potential users which will be converted to that macro in next patches. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-16Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] kprobes: Fix the return address of multiple kretprobes [S390] kprobes: disable interrupts throughout [S390] ftrace: build without frame pointers on s390 [S390] mm: add devmem_is_allowed() for STRICT_DEVMEM checking [S390] vmlogrdr: purge after recording is switched off [S390] cio: fix incorrect ccw_device_init_count [S390] tape: add medium state notifications [S390] fix get_user_pages_fast
2010-11-16kernel/sysctl.c: Fix build failure with !CONFIG_PRINTKJoe Perches
Sigh... Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-15capabilities/syslog: open code cap_syslog logic to fix build failureEric Paris
The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build failure when CONFIG_PRINTK=n. This is because the capabilities code which used the new option was built even though the variable in question didn't exist. The patch here fixes this by moving the capabilities checks out of the LSM and into the caller. All (known) LSMs should have been calling the capabilities hook already so it actually makes the code organization better to eliminate the hook altogether. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-15PM / PM QoS: Fix reversed min and maxColin Cross
pm_qos_get_value had min and max reversed, causing all pm_qos requests to have no effect. Signed-off-by: Colin Cross <ccross@android.com> Acked-by: mark <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2010-11-12tracing: Fix recursive user stack traceSteven Rostedt
The user stack trace can fault when examining the trace. Which would call the do_page_fault handler, which would trace again, which would do the user stack trace, which would fault and call do_page_fault again ... Thus this is causing a recursive bug. We need to have a recursion detector here. [ Resubmitted by Jiri Olsa ] [ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ] Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1289390172-9730-3-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-11-12Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: remove unused copy_io_context() Documentation: remove anticipatory scheduler info block: remove REQ_HARDBARRIER ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2) ioprio: fix RCU locking around task dereference block: ioctl: fix information leak to userland block: read i_size with i_size_read() cciss: fix proc warning on attempt to remove non-existant directory bio: take care not overflow page count when mapping/copying user data block: limit vec count in bio_kmalloc() and bio_alloc_map_data() block: take care not to overflow when calculating total iov length block: check for proper length of iov entries in blk_rq_map_user_iov() cciss: remove controllers supported by hpsa cciss: use usleep_range not msleep for small sleeps cciss: limit commands allocated on reset_devices cciss: Use kernel provided PCI state save and restore functions cciss: fix board status waiting code drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code ...
2010-11-12Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation perf_events: Fix time tracking in samples perf trace: update usage perf trace: update Documentation with new perf trace variants perf trace: live-mode command-line cleanup perf trace record: handle commands correctly perf record: make the record options available outside perf record perf trace scripting: remove system-wide param from shell scripts perf trace scripting: fix some small memory leaks and missing error checks perf: Fix usages of profile_cpu in builtin-top.c to use cpu_list perf, ui: Eliminate stack-smashing protection compiler complaint
2010-11-12Restrict unprivileged access to kernel syslogDan Rosenberg
The kernel syslog contains debugging information that is often useful during exploitation of other vulnerabilities, such as kernel heap addresses. Rather than futilely attempt to sanitize hundreds (or thousands) of printk statements and simultaneously cripple useful debugging functionality, it is far simpler to create an option that prevents unprivileged users from reading the syslog. This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the dmesg_restrict sysctl. When set to "0", the default, no restrictions are enforced. When set to "1", only users with CAP_SYS_ADMIN can read the kernel syslog via dmesg(8) or other mechanisms. [akpm@linux-foundation.org: explain the config option in kernel.txt] Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Eugene Teo <eugeneteo@kernel.org> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-12latencytop: fix per task accumulatorKen Chen
Per task latencytop accumulator prematurely terminates due to erroneous placement of latency_record_count. It should be incremented whenever a new record is allocated instead of increment on every latencytop event. Also fix search iterator to only search known record events instead of blindly searching all pre-allocated space. Signed-off-by: Ken Chen <kenchen@google.com> Reviewed-by: Arjan van de Ven <arjan@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-12kernel/range.c: fix clean_sort_range() for the case of full arrayAlexey Khoroshilov
clean_sort_range() should return a number of nonempty elements of range array, but if the array is full clean_sort_range() returns 0. The problem is that the number of nonempty elements is evaluated by finding the first empty element of the array. If there is no such element it returns an initial value of local variable nr_range that is zero. The fix is trivial: it changes initial value of nr_range to size of the array. The bug can lead to loss of information regarding all ranges, since typically returned value of clean_sort_range() is considered as an actual number of ranges in the array after a series of add/subtract operations. Found by Analytical Verification project of Linux Verification Center (linuxtesting.org), thanks to Alexander Kolosov. Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-12perf,hw_breakpoint: Initialize hardware api earlierJason Wessel
When using early debugging, the kernel does not initialize the hw_breakpoint API early enough and causes the late initialization of the kernel debugger to fail. The boot arguments are: earlyprintk=vga ekgdboc=kbd kgdbwait Then simply type "go" at the kdb prompt and boot. The kernel will later emit the message: kgdb: Could not allocate hwbreakpoints And at that point the kernel debugger will cease to work correctly. The solution is to initialize the hw_breakpoint at the same time that all the other perf call backs are initialized instead of using a core_initcall() initialization which happens well after the kernel debugger can make use of hardware breakpoints. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4CD3396D.1090308@windriver.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>