summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-10-29vtime: Make vtime_account_system() irqsafeFrederic Weisbecker
vtime_account_system() currently has only one caller with vtime_account() which is irq safe. Now we are going to call it from other places like kvm where irqs are not always disabled by the time we account the cputime. So let's make it irqsafe. The arch implementation part is now prefixed with "__". vtime_account_idle() arch implementation is prefixed accordingly to stay consistent. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-10-29Merge 3.7-rc3 into tty-nextGreg Kroah-Hartman
This merges the tty changes in 3.7-rc3 into tty-next Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-27rcutorture: Use DEFINE_STATIC_SRCU()Lai Jiangshan
Use DEFINE_STATIC_SRCU() to simplify the rcutorture.c SRCU test code. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-26freezer: change ptrace_stop/do_signal_stop to use freezable_schedule()Oleg Nesterov
try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks to ensure that a task doing STOPPED/TRACED -> RUNNING transition can't escape freezing. This mostly works, but ptrace_stop() does not necessarily call schedule(), it can change task->state back to RUNNING and check freezing() without any lock/barrier in between. We could add the necessary barrier, but this patch changes ptrace_stop() and do_signal_stop() to use freezable_schedule(). This fixes the race, freezer_count() and freezer_should_skip() carefully avoid the race. And this simplifies the code, try_to_freeze_tasks/update_if_frozen no longer need to use task_is_stopped_or_traced() checks with the non trivial assumptions. We can rely on the mechanism which was specially designed to mark the sleeping task as "frozen enough". v2: As Tejun pointed out, we can also change get_signal_to_deliver() and move try_to_freeze() up before 'relock' label. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-10-25Merge branch 'akpm' (Andrew's fixes)Linus Torvalds
Merge misc fixes from Andrew Morton: "18 total. 15 fixes and some updates to a device_cgroup patchset which bring it up to date with the version which I should have merged in the first place." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (18 patches) fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check gen_init_cpio: avoid stack overflow when expanding drivers/rtc/rtc-imxdi.c: add missing spin lock initialization mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant pidns: limit the nesting depth of pid namespaces drivers/dma/dw_dmac: make driver's endianness configurable mm/mmu_notifier: allocate mmu_notifier in advance tools/testing/selftests/epoll/test_epoll.c: fix build UAPI: fix tools/vm/page-types.c mm/page_alloc.c:alloc_contig_range(): return early for err path rbtree: include linux/compiler.h for definition of __always_inline genalloc: stop crashing the system when destroying a pool backlight: ili9320: add missing SPI dependency device_cgroup: add proper checking when changing default behavior device_cgroup: stop using simple_strtoul() device_cgroup: rename deny_all to behavior cgroup: fix invalid rcu dereference mm: fix XFS oops due to dirty pages without buffers on s390
2012-10-25Makefile: Documentation for external tool should be correctH. Peter Anvin
If one includes documentation for an external tool, it should be correct. This is not: 1. Overriding the input to rngd should typically be neither necessary nor desired. This is especially so since newer versions of rngd support a number of different *types* of sources. 2. The default kernel-exported device is called /dev/hwrng not /dev/hwrandom nor /dev/hw_random (both of which were used in the past; however, kernel and udev seem to have converged on /dev/hwrng.) Overall it is better if the documentation for rngd is kept with rngd rather than in a kernel Makefile. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-25pidns: limit the nesting depth of pid namespacesAndrew Vagin
'struct pid' is a "variable sized struct" - a header with an array of upids at the end. The size of the array depends on a level (depth) of pid namespaces. Now a level of pidns is not limited, so 'struct pid' can be more than one page. Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL is not calculated from PAGE_SIZE, because in this case it depends on architectures, config options and it will be reduced, if someone adds a new fields in struct pid or struct upid. I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand "struct pid" and it's more than enough for all known for me use-cases. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce the effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can make a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. [akpm@linux-foundation.org: return -EINVAL in response to excessive nesting, not -ENOMEM] Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-25uprobes: Fix misleading log entryJovi Zhang
There don't have any 'r' prefix in uprobe event naming, remove it. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2012-10-24Merge branch 'for-3.7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This pull request contains three fixes. Two are reverts of task_lock() removal in cgroup fork path. The optimizations incorrectly assumed that threadgroup_lock can protect process forks (as opposed to thread creations) too. Further cleanup of cgroup fork path is scheduled. The third fixes cgroup emptiness notification loss." * 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "cgroup: Remove task_lock() from cgroup_post_fork()" Revert "cgroup: Drop task_lock(parent) on cgroup_fork()" cgroup: notify_on_release may not be triggered in some cases
2012-10-24Merge branch 'for-3.7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "This pull request contains one patch from Dan Magenheimer to fix cancel_delayed_work() regression introduced by its reimplementation using try_to_grab_pending(). The reimplementation made it incorrectly return %true when the work item is idle. There aren't too many consumers of the return value but it broke at least ramster." * 'for-3.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: cancel_delayed_work() should return %false if work item is idle
2012-10-24workqueue: cancel_delayed_work() should return %false if work item is idleDan Magenheimer
57b30ae77b ("workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()") made cancel_delayed_work() always return %true unless someone else is also trying to cancel the work item, which is broken - if the target work item is idle, the return value should be %false. try_to_grab_pending() indicates that the target work item was idle by zero return value. Use it for return. Note that this brings cancel_delayed_work() in line with __cancel_work_timer() in return value handling. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <444a6439-b1a4-4740-9e7e-bc37267cfe73@default>
2012-10-24audit: remove bogus tty name checkAlan Cox
tty name is an array not a pointer Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-24lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name()Cyrill Gorcunov
Not a big deal, but since other __get_key_name() callers use it lets be consistent. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20121020190519.GH25467@moon Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Describe CFS load-balancerPeter Zijlstra
Add some scribbles on how and why the load-balancer works.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341316406.23484.64.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-trackingPaul Turner
While per-entity load-tracking is generally useful, beyond computing shares distribution, e.g. runnable based load-balance (in progress), governors, power-management, etc. These facilities are not yet consumers of this data. This may be trivially reverted when the information is required; but avoid paying the overhead for calculations we will not use until then. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.422162369@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Make __update_entity_runnable_avg() fastPaul Turner
__update_entity_runnable_avg forms the core of maintaining an entity's runnable load average. In this function we charge the accumulated run-time since last update and handle appropriate decay. In some cases, e.g. a waking task, this time interval may be much larger than our period unit. Fortunately we can exploit some properties of our series to perform decay for a blocked update in constant time and account the contribution for a running update in essentially-constant* time. [*]: For any running entity they should be performing updates at the tick which gives us a soft limit of 1 jiffy between updates, and we can compute up to a 32 jiffy update in a single pass. C program to generate the magic constants in the arrays: #include <math.h> #include <stdio.h> #define N 32 #define WMULT_SHIFT 32 const long WMULT_CONST = ((1UL << N) - 1); double y; long runnable_avg_yN_inv[N]; void calc_mult_inv() { int i; double yn = 0; printf("inverses\n"); for (i = 0; i < N; i++) { yn = (double)WMULT_CONST * pow(y, i); runnable_avg_yN_inv[i] = yn; printf("%2d: 0x%8lx\n", i, runnable_avg_yN_inv[i]); } printf("\n"); } long mult_inv(long c, int n) { return (c * runnable_avg_yN_inv[n]) >> WMULT_SHIFT; } void calc_yn_sum(int n) { int i; double sum = 0, sum_fl = 0, diff = 0; /* * We take the floored sum to ensure the sum of partial sums is never * larger than the actual sum. */ printf("sum y^n\n"); printf(" %8s %8s %8s\n", "exact", "floor", "error"); for (i = 1; i <= n; i++) { sum = (y * sum + y * 1024); sum_fl = floor(y * sum_fl+ y * 1024); printf("%2d: %8.0f %8.0f %8.0f\n", i, sum, sum_fl, sum_fl - sum); } printf("\n"); } void calc_conv(long n) { long old_n; int i = -1; printf("convergence (LOAD_AVG_MAX, LOAD_AVG_MAX_N)\n"); do { old_n = n; n = mult_inv(n, 1) + 1024; i++; } while (n != old_n); printf("%d> %ld\n", i - 1, n); printf("\n"); } void main() { y = pow(0.5, 1/(double)N); calc_mult_inv(); calc_conv(1024); calc_yn_sum(N); } [ Compile with -lm ] Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.277808946@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Update_cfs_shares at period edgePaul Turner
Now that our measurement intervals are small (~1ms) we can amortize the posting of update_shares() to be about each period overflow. This is a large cost saving for frequently switching tasks. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.200772172@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Refactor update_shares_cpu() -> update_blocked_avgs()Paul Turner
Now that running entities maintain their own load-averages the work we must do in update_shares() is largely restricted to the periodic decay of blocked entities. This allows us to be a little less pessimistic regarding our occupancy on rq->lock and the associated rq->clock updates required. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.133999170@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Replace update_shares weight distribution with per-entity computationPaul Turner
Now that the machinery in place is in place to compute contributed load in a bottom up fashion; replace the shares distribution code within update_shares() accordingly. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141507.061208672@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Maintain runnable averages across throttled periodsPaul Turner
With bandwidth control tracked entities may cease execution according to user specified bandwidth limits. Charging this time as either throttled or blocked however, is incorrect and would falsely skew in either direction. What we actually want is for any throttled periods to be "invisible" to load-tracking as they are removed from the system for that interval and contribute normally otherwise. Do this by moderating the progression of time to omit any periods in which the entity belonged to a throttled hierarchy. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.998912151@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Normalize tg load contributions against runnable timePaul Turner
Entities of equal weight should receive equitable distribution of cpu time. This is challenging in the case of a task_group's shares as execution may be occurring on multiple cpus simultaneously. To handle this we divide up the shares into weights proportionate with the load on each cfs_rq. This does not however, account for the fact that the sum of the parts may be less than one cpu and so we need to normalize: load(tg) = min(runnable_avg(tg), 1) * tg->shares Where runnable_avg is the aggregate time in which the task_group had runnable children. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com>. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.930124292@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Compute load contribution by a group entityPaul Turner
Unlike task entities who have a fixed weight, group entities instead own a fraction of their parenting task_group's shares as their contributed weight. Compute this fraction so that we can correctly account hierarchies and shared entity nodes. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.855074415@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Aggregate total task_group loadPaul Turner
Maintain a global running sum of the average load seen on each cfs_rq belonging to each task group so that it may be used in calculating an appropriate shares:weight distribution. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.792901086@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Account for blocked load waking back upPaul Turner
When a running entity blocks we migrate its tracked load to cfs_rq->blocked_runnable_avg. In the sleep case this occurs while holding rq->lock and so is a natural transition. Wake-ups however, are potentially asynchronous in the presence of migration and so special care must be taken. We use an atomic counter to track such migrated load, taking care to match this with the previously introduced decay counters so that we don't migrate too much load. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.726077467@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Add an rq migration call-back to sched_classPaul Turner
Since we are now doing bottom up load accumulation we need explicit notification when a task has been re-parented so that the old hierarchy can be updated. Adds: migrate_task_rq(struct task_struct *p, int next_cpu) (The alternative is to do this out of __set_task_cpu, but it was suggested that this would be a cleaner encapsulation.) Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.660023400@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Maintain the load contribution of blocked entitiesPaul Turner
We are currently maintaining: runnable_load(cfs_rq) = \Sum task_load(t) For all running children t of cfs_rq. While this can be naturally updated for tasks in a runnable state (as they are scheduled); this does not account for the load contributed by blocked task entities. This can be solved by introducing a separate accounting for blocked load: blocked_load(cfs_rq) = \Sum runnable(b) * weight(b) Obviously we do not want to iterate over all blocked entities to account for their decay, we instead observe that: runnable_load(t) = \Sum p_i*y^i and that to account for an additional idle period we only need to compute: y*runnable_load(t). This means that we can compute all blocked entities at once by evaluating: blocked_load(cfs_rq)` = y * blocked_load(cfs_rq) Finally we maintain a decay counter so that when a sleeping entity re-awakens we can determine how much of its load should be removed from the blocked sum. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.585389902@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Aggregate load contributed by task entities on parenting cfs_rqPaul Turner
For a given task t, we can compute its contribution to load as: task_load(t) = runnable_avg(t) * weight(t) On a parenting cfs_rq we can then aggregate: runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t Maintain this bottom up, with task entities adding their contributed load to the parenting cfs_rq sum. When a task entity's load changes we add the same delta to the maintained sum. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.514678907@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Maintain per-rq runnable averagesBen Segall
Since runqueues do not have a corresponding sched_entity we instead embed a sched_avg structure directly. Signed-off-by: Ben Segall <bsegall@google.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.442637130@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24sched: Track the runnable average on a per-task entity basisPaul Turner
Instead of tracking averaging the load parented by a cfs_rq, we can track entity load directly. With the load for a given cfs_rq then being the sum of its children. To do this we represent the historical contribution to runnable average within each trailing 1024us of execution as the coefficients of a geometric series. We can express this for a given task t as: runnable_sum(t) = \Sum u_i * y^i, runnable_avg_period(t) = \Sum 1024 * y^i load(t) = weight_t * runnable_sum(t) / runnable_avg_period(t) Where: u_i is the usage in the last i`th 1024us period (approximately 1ms) ~ms and y is chosen such that y^k = 1/2. We currently choose k to be 32 which roughly translates to about a sched period. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120823141506.372695337@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-24timers, sched: Correct the comments for tick_sched_timer()Chuansheng Liu
In the comments of function tick_sched_timer(), the sentence "timer->base->cpu_base->lock held" is not right. In function __run_hrtimer(), before call timer->function(), the cpu_base->lock has been unlocked. Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Cc: fei.li@intel.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-23console: use might_sleep in console_lockDaniel Vetter
Instead of BUG_ON(in_interrupt()), since that doesn't check for all the newfangled stuff like preempt. Note that this is valid since the console_sem is essentially used like a real mutex with only two twists: - we allow trylock from hardirq context - across suspend/resume we lock the logical console_lock, but drop the semaphore protecting the locking state. Now that doesn't guarantee that no one is playing tricks in single-thread atomic contexts at suspend/resume/boot time, but - I couldn't find anything suspicious with some grepping, - might_sleep shouldn't die, - and I think the upside of catching more potential issues is worth the risk of getting a might_sleep backtrace that would have been save (and then dealing with that fallout). Cc: Dave Airlie <airlied@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Most of these are uprobes race fixes from Oleg, and their preparatory cleanups. (It's larger than what I'd normally send for an -rc kernel, but they looked significant enough to not delay them.) There's also an oprofile fix and an uncore PMU fix." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) perf/x86: Disable uncore on virtualized CPUs oprofile, x86: Fix wrapping bug in op_x86_get_ctrl() ring-buffer: Check for uninitialized cpu buffer before resizing uprobes: Fix the racy uprobe->flags manipulation uprobes: Fix prepare_uprobe() race with itself uprobes: Introduce prepare_uprobe() uprobes: Fix handle_swbp() vs unregister() + register() race uprobes: Do not delete uprobe if uprobe_unregister() fails uprobes: Don't return success if alloc_uprobe() fails uprobes/x86: Only rep+nop can be emulated correctly uprobes: Simplify is_swbp_at_addr(), remove stale comments uprobes: Kill set_orig_insn()->is_swbp_at_addr() uprobes: Introduce copy_opcode(), kill read_opcode() uprobes: Kill set_swbp()->is_swbp_at_addr() uprobes: Restrict valid_vma(false) to skip VM_SHARED vmas uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC uprobes: Change write_opcode() to use FOLL_FORCE uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume() uprobes: Kill UTASK_BP_HIT state uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp() ...
2012-10-23rcu: Dump number of callbacks in stall warning messagesPaul E. McKenney
In theory, if a grace period manages to get started despite there being no callbacks on any of the CPUs, all CPUs could go into dyntick-idle mode, so that the grace period would never end. This commit updates the RCU CPU stall warning messages to detect this condition by summing up the number of callbacks on all CPUs. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Add grace-period information to RCU CPU stall warningsPaul E. McKenney
This commit causes the last grace period started and completed to be printed on RCU CPU stall warning messages in order to aid diagnosis. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Print remote CPU's stacks in stall warningsPaul E. McKenney
The RCU CPU stall warnings rely on trigger_all_cpu_backtrace() to do NMI-based dump of the stack traces of all CPUs. Unfortunately, a number of architectures do not implement trigger_all_cpu_backtrace(), in which case RCU falls back to just dumping the stack of the running CPU. This is unhelpful in the case where the running CPU has detected that some other CPU has stalled. This commit therefore makes the running CPU dump the stacks of the tasks running on the stalled CPUs. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23srcu: Export process_srcu()Lai Jiangshan
Because process_srcu() will be used in DEFINE_SRCU(), which is a macro that could be expanded pretty much anywhere, it can no longer be static. Note that process_srcu() is still internal to srcu.h. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23srcu: Credit Lai Jiangshan with SRCU rewriteLai Jiangshan
Lai Jiangshan rewrote SRCU, so this commit ensures that he gets his proper share of blame^Wcredit. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Fix precedence error in cpu_needs_another_gp()Paul E. McKenney
The fix introduced by a10d206e (rcu: Fix day-one dyntick-idle stall-warning bug) has a C-language precedence error. It turns out that this error is harmless in that the same result is computed for all inputs, but the code is nevertheless a potential source of confusion. This commit therefore introduces parentheses in order to force the execution of the code to reflect the intent. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Add a module parameter to force use of expedited RCU primitivesAntti P Miettinen
There have been some embedded applications that would benefit from use of expedited grace-period primitives. In some ways, this is similar to synchronize_net() doing either a normal or an expedited grace period depending on lock state, but with control outside of the kernel. This commit therefore adds rcu_expedited boot and sysfs parameters that cause the kernel to substitute expedited primitives for the normal grace-period primitives. [ paulmck: Add trace/event/rcu.h to kernel/srcu.c to avoid build error. Get rid of infinite loop through contention path.] Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Remove rcu_switch()Frederic Weisbecker
It's only there to call rcu_user_hooks_switch(). Let's just call rcu_user_hooks_switch() directly, we don't need this function in the middle. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Make rcutorture give diagnostics if CPU offline failsPaul E. McKenney
This commit causes rcutorture to print the errno if cpu_down() fails when the rcutorture "verbose" module parameter is specified. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-23rcu: Fix comment about _rcu_barrier()/orphanage exclusionPaul E. McKenney
In the old days, _rcu_barrier() acquired ->onofflock to exclude rcu_send_cbs_to_orphanage(), which allowed the latter to avoid memory barriers in callback handling. However, _rcu_barrier() recently started doing get_online_cpus() to lock out CPU-hotplug operations entirely, which means that the comment in rcu_send_cbs_to_orphanage() that talks about ->onofflock is now obsolete. This commit therefore fixes the comment. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-22console: implement lockdep support for console_lockDaniel Vetter
Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock: https://lkml.org/lkml/2012/8/21/36 Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat. Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen. Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually. There are a few special cases: - The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything. - In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either. - The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that. This all leaves us with annotating the normal console_lock, _unlock and _trylock functions. And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit: ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted ------------------------------------------------------- kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7 but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10 -> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue)); *** DEADLOCK *** v2: Mark the lockdep_map static, noticed by Jani Nikula. Cc: Dave Airlie <airlied@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-23PM / QoS: Introduce request and constraint data types for PM QoS flagsRafael J. Wysocki
Introduce struct pm_qos_flags_request and struct pm_qos_flags representing PM QoS flags request type and PM QoS flags constraint type, respectively. With these definitions the data structures will be arranged so that the list member of a struct pm_qos_flags object will contain the head of a list of struct pm_qos_flags_request objects representing all of the "flags" requests present for the given device. Then, the effective_flags member of a struct pm_qos_flags object will contain the bitwise OR of the flags members of all the struct pm_qos_flags_request objects in the list. Additionally, introduce helper function pm_qos_update_flags() allowing the caller to manage the list of struct pm_qos_flags_request pointed to by the list member of struct pm_qos_flags. The flags are of type s32 so that the request's "value" field is always of the same type regardless of what kind of request it is (latency requests already have value fields of type s32). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Jean Pihet <j-pihet@ti.com> Acked-by: mark gross <markgross@thegnar.org>
2012-10-22module_signing: fix printk format warningRandy Dunlap
Fix the warning: kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t' by using the proper 'z' modifier for printing a size_t. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-21Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull ftrace ring-buffer resizing fix from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-21Merge branch 'uprobes/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent Pull various uprobes bugfixes from Oleg Nesterov - mostly race and failure path fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-21Merge branch 'nohz/core' of git://github.com/fweisbec/linux-dynticks into ↵Ingo Molnar
timers/core Pull uncontroversial cleanup/refactoring nohz patches from Frederic Weisbecker. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-20cgroup_freezer: don't use cgroup_lock_live_group()Tejun Heo
freezer_read/write() used cgroup_lock_live_group() to synchronize against task migration into and out of the target cgroup. cgroup_lock_live_group() grabs the internal cgroup lock and using it from outside cgroup core leads to complex and fragile locking dependency issues which are difficult to resolve. Now that freezer_can_attach() is replaced with freezer_attach() and update_if_frozen() updated, nothing requires excluding migration against freezer state reads and changes. This patch removes cgroup_lock_live_group() and the matching cgroup_unlock() usages. The prone-to-bitrot, already outdated and unnecessary global lock hierarchy documentation is replaced with documentation in local scope. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Li Zefan <lizefan@huawei.com>
2012-10-20cgroup_freezer: prepare update_if_frozen() for locking changeTejun Heo
Locking will change such that migration can happen while freezer_read/write() is in progress. This means that update_if_frozen() can no longer assume that all tasks in the cgroup coform to the current freezer state - newly migrated tasks which haven't finished freezer_attach() yet might be in any state. This patch updates update_if_frozen() such that it no longer verifies task states against freezer state. It now simply decides whether FREEZING stage is complete. This removal of verification makes it meaningless to call from freezer_change_state(). Drop it and move the fast exit test from freezer_read() - the only left caller - to update_if_frozen(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Li Zefan <lizefan@huawei.com>