git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-01-25	MODSIGN: Simplify Makefile with a Kconfig helper	Michal Marek
	Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-24	tracing: Mark tracing_dentry_percpu() static	Josh Triplett
	Nothing outside of kernel/trace/trace.c references tracing_dentry_percpu(). Link: http://lkml.kernel.org/r/1353302917-13995-7-git-send-email-josh@joshtriplett.org Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-24	cgroup: remove bogus comments in cgroup_diput()	Li Zefan
	Since commit 48ddbe194623ae089cc0576e60363f2d2e85662a ("cgroup: make css->refcnt clearing on cgroup removal optional"), each css holds a ref on cgroup's dentry, so cgroup_diput() won't be called until all css' refs go down to 0, which invalids the comments. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-24	cgroup: remove synchronize_rcu() from cgroup_diput()	Li Zefan
	Free cgroup via call_rcu(). The actual work is done through workqueue. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-24	cgroup: remove duplicate RCU free on struct cgroup	Li Zefan
	When destroying a cgroup, though in cgroup_diput() we've called synchronize_rcu(), we then still have to free it via call_rcu(). The story is, long ago to fix a race between reading /proc/sched_debug and freeing cgroup, the code was changed to utilize call_rcu(). See commit a47295e6bc42ad35f9c15ac66f598aa24debd4e2 ("cgroups: make cgroup_path() RCU-safe") As we've fixed cpu cgroup that cpu_cgroup_offline_css() is used to unregister a task_group so there won't be concurrent access to this task_group after synchronize_rcu() in diput(). Now we can just kfree(cgrp). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-24	sched: remove redundant NULL cgroup check in task_group_path()	Li Zefan
	A task_group won't be online (thus no one can see it) until cpu_cgroup_css_online(), and at that time tg->css.cgroup has been initialized, so this NULL check is redundant. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-24	sched: split out css_online/css_offline from tg creation/destruction	Li Zefan
	This is a preparaton for later patches. - What do we gain from cpu_cgroup_css_online(): After ss->css_alloc() and before ss->css_online(), there's a small window that tg->css.cgroup is NULL. With this change, tg won't be seen before ss->css_online(), where it's added to the global list, so we're guaranteed we'll never see NULL tg->css.cgroup. - What do we gain from cpu_cgroup_css_offline(): tg is freed via RCU, so is cgroup. Without this change, This is how synchronization works: cgroup_rmdir() no ss->css_offline() diput() syncornize_rcu() ss->css_free() <-- unregister tg, and free it via call_rcu() kfree_rcu(cgroup) <-- wait possible refs to cgroup, and free cgroup We can't just kfree(cgroup), because tg might access tg->css.cgroup. With this change: cgroup_rmdir() ss->css_offline() <-- unregister tg diput() synchronize_rcu() <-- wait possible refs to tg and cgroup ss->css_free() <-- free tg kfree_rcu(cgroup) <-- free cgroup As you see, kfree_rcu() is redundant now. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	cgroup: initialize cgrp->dentry before css_alloc()	Li Zefan
	With this change, we're guaranteed that cgroup_path() won't see NULL cgrp->dentry, and thus we can remove the NULL check in it. (Well, it's not strictly true, because dummptop.dentry is always NULL but we already handle that separately.) Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-24	uprobes: remove redundant check	Sasha Levin
	We checked for uprobe==NULL earlier, no need to redo that. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1356030701-16284-22-git-send-email-sasha.levin@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24	workqueue: post global_cwq removal cleanups	Tejun Heo
	Remove remaining references to gcwq. * __next_gcwq_cpu() steals __next_wq_cpu() name. The original __next_wq_cpu() became __next_cwq_cpu(). * s/for_each_gcwq_cpu/for_each_wq_cpu/ s/for_each_online_gcwq_cpu/for_each_online_wq_cpu/ * s/gcwq_mayday_timeout/pool_mayday_timeout/ * s/gcwq_unbind_fn/wq_unbind_fn/ * Drop references to gcwq in comments. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: rename nr_running variables	Tejun Heo
	Rename per-cpu and unbound nr_running variables such that they match the pool variables. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: remove global_cwq	Tejun Heo
	global_cwq is now nothing but a container for per-cpu standard worker_pools. Declare the worker pools directly as cpu/unbound_std_worker_pools[] and remove global_cwq. * ____cacheline_aligned_in_smp moved from global_cwq to worker_pool. This probably would have made sense even before this change as we want each pool to be aligned. * get_gcwq() is replaced with std_worker_pools() which returns the pointer to the standard pool array for a given CPU. * __alloc_workqueue_key() updated to use get_std_worker_pool() instead of open-coding pool determination. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. v2: Joonsoo pointed out that it'd better to align struct worker_pool rather than the array so that every pool is aligned. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Joonsoo Kim <js1304@gmail.com>
2013-01-24	workqueue: remove worker_pool->gcwq	Tejun Heo
	The only remaining user of pool->gcwq is std_worker_pool_pri(). Reimplement it using get_gcwq() and remove worker_pool->gcwq. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: replace for_each_worker_pool() with for_each_std_worker_pool()	Tejun Heo
	for_each_std_worker_pool() takes @cpu instead of @gcwq. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: make freezing/thawing per-pool	Tejun Heo
	Instead of holding locks from both pools and then processing the pools together, make freezing/thwaing per-pool - grab locks of one pool, process it, release it and then proceed to the next pool. While this patch changes processing order across pools, order within each pool remains the same. As each pool is independent, this shouldn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: make hotplug processing per-pool	Tejun Heo
	Instead of holding locks from both pools and then processing the pools together, make hotplug processing per-pool - grab locks of one pool, process it, release it and then proceed to the next pool. rebind_workers() is updated to take and process @pool instead of @gcwq which results in a lot of de-indentation. gcwq_claim_assoc_and_lock() and its counterpart are replaced with in-line per-pool locking. While this patch changes processing order across pools, order within each pool remains the same. As each pool is independent, this shouldn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: move global_cwq->lock to worker_pool	Tejun Heo
	Move gcwq->lock to pool->lock. The conversion is mostly straight-forward. Things worth noting are * In many places, this removes the need to use gcwq completely. pool is used directly instead. get_std_worker_pool() is added to help some of these conversions. This also leaves get_work_gcwq() without any user. Removed. * In hotplug and freezer paths, the pools belonging to a CPU are often processed together. This patch makes those paths hold locks of all pools, with highpri lock nested inside, to keep the conversion straight-forward. These nested lockings will be removed by following patches. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: move global_cwq->cpu to worker_pool	Tejun Heo
	Move gcwq->cpu to pool->cpu. This introduces a couple places where gcwq->pools[0].cpu is used. These will soon go away as gcwq is further reduced. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: move busy_hash from global_cwq to worker_pool	Tejun Heo
	There's no functional necessity for the two pools on the same CPU to share the busy hash table. It's also likely to be a bottleneck when implementing pools with user-specified attributes. This patch makes busy_hash per-pool. The conversion is mostly straight-forward. Changes worth noting are, * Large block of changes in rebind_workers() is moving the block inside for_each_worker_pool() as now there are separate hash tables for each pool. This changes the order of operations but doesn't break anything. * Thre for_each_worker_pool() loops in gcwq_unbind_fn() are combined into one. This again changes the order of operaitons but doesn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: record pool ID instead of CPU in work->data when off-queue	Tejun Heo
	Currently, when a work item is off-queue, work->data records the CPU it was last on, which is used to locate the last executing instance for non-reentrance, flushing, etc. We're in the process of removing global_cwq and making worker_pool the top level abstraction. This patch makes work->data point to the pool it was last associated with instead of CPU. After the previous WORK_OFFQ_POOL_CPU and worker_poo->id additions, the conversion is fairly straight-forward. WORK_OFFQ constants and functions are modified to record and read back pool ID instead. worker_pool_by_id() is added to allow looking up pool from ID. get_work_pool() replaces get_work_gcwq(), which is reimplemented using get_work_pool(). get_work_pool_id() replaces work_cpu(). This patch shouldn't introduce any observable behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: add worker_pool->id	Tejun Heo
	Add worker_pool->id which is allocated from worker_pool_idr. This will be used to record the last associated worker_pool in work->data. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: introduce WORK_OFFQ_CPU_NONE	Tejun Heo
	Currently, when a work item is off queue, high bits of its data encodes the last CPU it was on. This is scheduled to be changed to pool ID, which will make it impossible to use WORK_CPU_NONE to indicate no association. This patch limits the number of bits which are used for off-queue cpu number to 31 (so that the max fits in an int) and uses the highest possible value - WORK_OFFQ_CPU_NONE - to indicate no association. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: make GCWQ_FREEZING a pool flag	Tejun Heo
	Make GCWQ_FREEZING a pool flag POOL_FREEZING. This patch doesn't change locking - FREEZING on both pools of a CPU are set or clear together while holding gcwq->lock. It shouldn't cause any functional difference. This leaves gcwq->flags w/o any flags. Removed. While at it, convert BUG_ON()s in freeze_workqueue_begin() and thaw_workqueues() to WARN_ON_ONCE(). This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: make GCWQ_DISASSOCIATED a pool flag	Tejun Heo
	Make GCWQ_DISASSOCIATED a pool flag POOL_DISASSOCIATED. This patch doesn't change locking - DISASSOCIATED on both pools of a CPU are set or clear together while holding gcwq->lock. It shouldn't cause any functional difference. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: use std_ prefix for the standard per-cpu pools	Tejun Heo
	There are currently two worker pools per cpu (including the unbound cpu) and they are the only pools in use. New class of pools are scheduled to be added and some pool related APIs will be added inbetween. Call the existing pools the standard pools and prefix them with std_. Do this early so that new APIs can use std_ prefix from the beginning. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	workqueue: unexport work_cpu()	Tejun Heo
	This function no longer has any external users. Unexport it. It will be removed later on. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-01-24	cgroup: remove a NULL check in cgroup_exit()	Li Zefan
	init_task.cgroups is initialized at boot phase, and whenver a ask is forked, it's cgroups pointer is inherited from its parent, and it's never set to NULL afterwards. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-24	sched/fair: Set se->vruntime directly in place_entity()	Viresh Kumar
	We are first storing the new vruntime in a variable and then storing it in se->vruntime. Simply update se->vruntime directly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-dev@lists.linaro.org Cc: patches@linaro.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/ae59db1945518d6f6250920d46eb1f1a9cc0024e.1352361704.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	x86/MSI: Support multiple MSIs in presense of IRQ remapping	Alexander Gordeev
	The MSI specification has several constraints in comparison with MSI-X, most notable of them is the inability to configure MSIs independently. As a result, it is impossible to dispatch interrupts from different queues to different CPUs. This is largely devalues the support of multiple MSIs in SMP systems. Also, a necessity to allocate a contiguous block of vector numbers for devices capable of multiple MSIs might cause a considerable pressure on x86 interrupt vector allocator and could lead to fragmentation of the interrupt vectors space. This patch overcomes both drawbacks in presense of IRQ remapping and lets devices take advantage of multiple queues and per-IRQ affinity assignments. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c8bd86ff56b5fc118257436768aaa04489ac0a4c.1353324359.git.agordeev@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	sched/rt: Add reschedule check to switched_from_rt()	Kirill Tkhai
	Reschedule rq->curr if the first RT task has just been pulled to the rq. Signed-off-by: Kirill V Tkhai <tkhai@yandex.ru> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tkhai Kirill <tkhai@yandex.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/118761353614535@web28f.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	profiling: Remove unused timer hook	Frederic Weisbecker
	The last remaining user was oprofile and its use has been removed a while ago in commit bc078e4eab65f11bba ("oprofile: convert oprofile from timer_hook to hrtimer"). There doesn't seem to be any upstream user of this hook for about two years now. And I'm not even aware of any out of tree user. Let's remove it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1356191991-2251-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	sched: Fix the broken sched_rr_get_interval()	Zhu Yanhai
	The caller of sched_sliced() should pass se.cfs_rq and se as the arguments, however in sched_rr_get_interval() we gave it rq.cfs_rq and se, which made the following computation obviously wrong. The change was introduced by commit: 77034937dc45 sched: fix crash in sys_sched_rr_get_interval() ... 5 years ago, while it had been the correct 'cfs_rq_of' before the commit. The change seems to be irrelevant to the commit msg, which was to return a 0 timeslice for tasks that are on an idle runqueue. So I believe that was just a plain typo. Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1357621012-15039-1-git-send-email-gaoyang.zyh@taobao.com [ Since this is an ABI and an old bug, we'll test this via a slow upstream route, to hopefully discover any app breakage. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	Merge branch 'tip/perf/core' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull small function-tracing smatch fixlet from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	tracing: Fix unsigned int compare of zero in recursion check	Steven Rostedt
	Dan's smatch found a compare bug with the result of the trace_test_and_set_recursion() and comparing to less than zero. If the function fails, it returns -1, but was saved in an unsigned int, which will never be less than zero and will ignore the result of the test if a recursion did happen. Luckily this is the last of the recursion tests, as the infrastructure of ftrace would catch recursions before it got here, except for some few exceptions. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-24	Merge branch 'tip/perf/core' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updates from Steve Rostedt. This commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties seem to agree that it's safe to do now, but the devil is in the details ... Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	Merge branch 'rcu/urgent' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fixes from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	Merge branch 'core/irq_work' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/core irq_work fixes and cleanups, in preparation for full dyntics support. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24	Merge tag 'v3.8-rc4' into irq/core	Ingo Molnar
	Merge Linux 3.8-rc4 before pulling in new commits - we were on an old v3.7 base. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-23	async: replace list of active domains with global list of pending items	Tejun Heo
	Global synchronization - async_synchronize_full() - is currently implemented by keeping a list of all active registered domains and syncing them one by one until no domain is active. While this isn't necessarily a complex scheme, it can easily be simplified by keeping global list of the pending items of all registered active domains instead of list of domains and simply using the globl pending list the same way as domain syncing. This patch replaces async_domains with async_global_pending and update lowest_in_progress() to use the global pending list if @domain is %NULL. async_synchronize_full_domain(NULL) is now allowed and equivalent to async_synchronize_full(). As no one is calling with NULL domain, this doesn't affect any existing users. async_register_mutex is no longer necessary and dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-23	async: keep pending tasks on async_domain and remove async_pending	Tejun Heo
	Async kept single global pending list and per-domain running lists. When an async item is queued, it's put on the global pending list. The item is moved to the per-domain running list when its execution starts. At this point, this design complicates execution and synchronization without bringing any benefit. The list only matters for synchronization which doesn't care whether a given async item is pending or executing. Also, global synchronization is done by iterating through all active registered async_domains, so the global async_pending list doesn't help anything either. Rename async_domain->running to async_domain->pending and put async items directly there and remove when execution completes. This simplifies lowest_in_progress() a lot - the first item on the pending list is the one with the lowest cookie, and async_run_entry_fn() doesn't have to mess with moving the item from pending to running. After the change, whether a domain is empty or not can be trivially determined by looking at async_domain->pending. Remove async_domain->count and use list_empty() on pending instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-23	async: use ULLONG_MAX for infinity cookie value	Tejun Heo
	Currently, next_cookie is used as the infinity value. In most cases, this should work fine but it theoretically could bring subtle behavior difference between async_synchronize_full() and async_synchronize_full_domain(). async_synchronize_full() keeps waiting until there's no registered async_entry left regardless of what next_cookie was when the function was called. It guarantees that the queue is completely drained at least once before returning. However, async_synchronize_full_domain() doesn't. It synchronizes upto next_cookie and if further async jobs are queued after the next_cookie value to synchronize is decided, they won't be waited for. For unrelated async jobs, the behavior difference doesn't matter; however, if async jobs which are related (nested or otherwise) to the executing ones are queued while sychronization is in progress, the resulting behavior difference could be problematic. This can be easily fixed by using ULLONG_MAX as the infinity value instead. Define ASYNC_COOKIE_MAX as ULLONG_MAX and use it as the infinity value for synchronization. This makes async_synchronize_full_domain() fully drain the domain at least once before returning, making its behavior match async_synchronize_full(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-23	async: bring sanity to the use of words domain and running	Tejun Heo
	In the beginning, running lists were literal struct list_heads. Later on, struct async_domain was added. For some reason, while the conversion substituted list_heads with async_domains, the variable names weren't fully converted. In more places, "running" was used for struct async_domain while other places adopted new "domain" name. The situation is made much worse by having async_domain's running list named "domain" and async_entry's field pointing to async_domain named "running". So, we end up with mix of "running" and "domain" for variable names for async_domain, with the field names of async_domain and async_entry swapped between "running" and "domain". It feels almost intentionally made to be as confusing as possible. Bring some sanity by * Renaming all async_domain variables "domain". * s/async_running/async_dfl_domain/ * s/async_domain->domain/async_domain->running/ * s/async_entry->running/async_entry->domain/ Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Dan Williams <djbw@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-23	Merge branch 'master' into for-3.9-async	Tejun Heo
	To receive f56c3196f251012de9b3ebaff55732a9074fdaae ("async: fix __lowest_in_progress()"). Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-22	ring-buffer: Remove trace.h from ring_buffer.c	Steven Rostedt
	ring_buffer.c use to require declarations from trace.h, but these have moved to the generic header files. There's nothing in trace.h that ring_buffer.c requires. There's some headers that trace.h included that ring_buffer.c needs, but it's best that it includes them directly, and not include trace.h. Also, some things may use ring_buffer.c without having tracing configured. This removes the dependency that may come in the future. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22	ring-buffer: User context bit recursion checking	Steven Rostedt
	Using context bit recursion checking, we can help increase the performance of the ring buffer. Before this patch: # echo function > /debug/tracing/current_tracer # for i in `seq 10`; do ./hackbench 50; done Time: 10.285 Time: 10.407 Time: 10.243 Time: 10.372 Time: 10.380 Time: 10.198 Time: 10.272 Time: 10.354 Time: 10.248 Time: 10.253 (average: 10.3012) Now we have: # echo function > /debug/tracing/current_tracer # for i in `seq 10`; do ./hackbench 50; done Time: 9.712 Time: 9.824 Time: 9.861 Time: 9.827 Time: 9.962 Time: 9.905 Time: 9.886 Time: 10.088 Time: 9.861 Time: 9.834 (average: 9.876) a 4% savings! Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22	ftrace: Use only the preempt version of function tracing	Steven Rostedt
	The function tracer had two different versions of function tracing. The disabling of irqs version and the preempt disable version. As function tracing in very intrusive and can cause nasty recursion issues, it has its own recursion protection. But the old method to do this was a flat layer. If it detected that a recursion was happening then it would just return without recording. This made the preempt version (much faster than the irq disabling one) not very useful, because if an interrupt were to occur after the recursion flag was set, the interrupt would not be traced at all, because every function that was traced would think it recursed on itself (due to the context it preempted setting the recursive flag). Now that we have a recursion flag for every context level, we no longer need to worry about that. We can disable preemption, set the current context recursion check bit, and go on. If an interrupt were to come along, it would check its own context bit and happily continue to trace. As the preempt version is faster than the irq disable version, there's no more reason to keep the preempt version around. And the irq disable version still had an issue with missing out on tracing NMI code. Remove the irq disable function tracer version and have the preempt disable version be the default (and only version). Before this patch we had from running: # echo function > /debug/tracing/current_tracer # for i in `seq 10`; do ./hackbench 50; done Time: 12.028 Time: 11.945 Time: 11.925 Time: 11.964 Time: 12.002 Time: 11.910 Time: 11.944 Time: 11.929 Time: 11.941 Time: 11.924 (average: 11.9512) Now we have: # echo function > /debug/tracing/current_tracer # for i in `seq 10`; do ./hackbench 50; done Time: 10.285 Time: 10.407 Time: 10.243 Time: 10.372 Time: 10.380 Time: 10.198 Time: 10.272 Time: 10.354 Time: 10.248 Time: 10.253 (average: 10.3012) a 13.8% savings! Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22	tracing: Avoid unnecessary multiple recursion checks	Steven Rostedt
	When function tracing occurs, the following steps are made: If arch does not support a ftrace feature: call internal function (uses INTERNAL bits) which calls... If callback is registered to the "global" list, the list function is called and recursion checks the GLOBAL bits. then this function calls... The function callback, which can use the FTRACE bits to check for recursion. Now if the arch does not suppport a feature, and it calls the global list function which calls the ftrace callback all three of these steps will do a recursion protection. There's no reason to do one if the previous caller already did. The recursion that we are protecting against will go through the same steps again. To prevent the multiple recursion checks, if a recursion bit is set that is higher than the MAX bit of the current check, then we know that the check was made by the previous caller, and we can skip the current check. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22	tracing: Make the trace recursion bits into enums	Steven Rostedt
	Convert the bits into enums which makes the code a little easier to maintain. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22	ftrace: Add context level recursion bit checking	Steven Rostedt
	Currently for recursion checking in the function tracer, ftrace tests a task_struct bit to determine if the function tracer had recursed or not. If it has, then it will will return without going further. But this leads to races. If an interrupt came in after the bit was set, the functions being traced would see that bit set and think that the function tracer recursed on itself, and would return. Instead add a bit for each context (normal, softirq, irq and nmi). A check of which context the task is in is made before testing the associated bit. Now if an interrupt preempts the function tracer after the previous context has been set, the interrupt functions can still be traced. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-22	ftrace: Optimize the function tracer list loop	Steven Rostedt
	There is lots of places that perform: op = rcu_dereference_raw(ftrace_control_list); while (op != &ftrace_list_end) { Add a helper macro to do this, and also optimize for a single entity. That is, gcc will optimize a loop for either no iterations or more than one iteration. But usually only a single callback is registered to the function tracer, thus the optimized case should be a single pass. to do this we now do: op = rcu_dereference_raw(list); do { [...] } while (likely(op = rcu_dereference_raw((op)->next)) && unlikely((op) != &ftrace_list_end)); An op is always registered (ftrace_list_end when no callbacks is registered), thus when a single callback is registered, the link list looks like: top => callback => ftrace_list_end => NULL. The likely(op = op->next) still must be performed due to the race of removing the callback, where the first op assignment could equal ftrace_list_end. In that case, the op->next would be NULL. But this is unlikely (only happens in a race condition when removing the callback). But it is very likely that the next op would be ftrace_list_end, unless more than one callback has been registered. This tells gcc what the most common case is and makes the fast path with the least amount of branches. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>