summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-02-21sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mmMartin Schwidefsky
The finish_arch_post_lock_switch is called at the end of the task switch after all locks have been released. In concept it is paired with the switch_mm function, but the current code only does the call in finish_task_switch. Add the call to idle_task_exit and use_mm. One use case for the additional calls is s390 which will use finish_arch_post_lock_switch to wait for the completion of TLB flush operations. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Quite a few fixes this time. Three locking fixes, all marked for -stable. A couple error path fixes and some misc fixes. Hugh found a bug in memcg offlining sequence and we thought we could fix that from cgroup core side but that turned out to be insufficient and got reverted. A different fix has been applied to -mm" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update cgroup_enable_task_cg_lists() to grab siglock Revert "cgroup: use an ordered workqueue for cgroup destruction" cgroup: protect modifications to cgroup_idr with cgroup_mutex cgroup: fix locking in cgroup_cfts_commit() cgroup: fix error return from cgroup_create() cgroup: fix error return value in cgroup_mount() cgroup: use an ordered workqueue for cgroup destruction nfs: include xattr.h from fs/nfs/nfs3proc.c cpuset: update MAINTAINERS entry arm, pm, vmpressure: add missing slab.h includes
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two workqueue fixes. One for an unlikely but possible critical bug during kworker shutdown and the other to make lockdep names a bit more descriptive" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: ensure @task is valid across kthread_stop() workqueue: add args to workqueue lockdep name
2014-02-20user_namespace.c: Remove duplicated word in commentBrian Campbell
Signed-off-by: Brian Campbell <brian.campbell@editshare.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-20Merge branch 'master' into for-nextJiri Kosina
2014-02-19genirq: Update the a comment typoChuansheng Liu
Change the comment "chasnge" to "change". Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Link: http://lkml.kernel.org/r/1392020037-5484-2-git-send-email-chuansheng.liu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19genirq: Provide irq_wake_thread()Thomas Gleixner
In course of the sdhci/sdio discussion with Russell about killing the sdio kthread hackery we discovered the need to be able to wake an interrupt thread from software. The rationale for this is, that sdio hardware can lack proper interrupt support for certain features. So the driver needs to poll the status registers, but at the same time it needs to be woken up by an hardware interrupt. To be able to get rid of the home brewn kthread construct of sdio we need a way to wake an irq thread independent of an actual hardware interrupt. Provide an irq_wake_thread() function which wakes up the thread which is associated to a given dev_id. This allows sdio to invoke the irq thread from the hardware irq handler via the IRQ_WAKE_THREAD return value and provides a possibility to wake it via a timer for the polling scenarios. That allows to simplify the sdio logic significantly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Chris Ball <chris@printf.net> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140215003823.772565780@linutronix.de
2014-02-19genirq: Provide synchronize_hardirq()Thomas Gleixner
synchronize_irq() waits for hard irq and threaded handlers to complete before returning. For some special cases we only need to make sure that the hard interrupt part of the irq line is not in progress when we disabled the - possibly shared - interrupt at the device level. A proper use case for this was provided by Russell. The sdhci driver requires some irq triggered functions to be run in thread context. The current implementation of the thread context is a sdio private kthread construct, which has quite some shortcomings. These can be avoided when the thread is directly associated to the device interrupt via the generic threaded irq infrastructure. Though there is a corner case related to run time power management where one side disables the device interrupts at the device level and needs to make sure, that an already running hard interrupt handler has completed before proceeding further. Though that hard interrupt handler might wake the associated thread, which in turn can request the runtime PM to reenable the device. Using synchronize_irq() leads to an immediate deadlock of the irq thread waiting for the PM lock and the synchronize_irq() waiting for the irq thread to complete. Due to the fact that it is sufficient for this case to ensure that no hard irq handler is executing a new function which avoids the check for the thread is required. Add a function, which just monitors the hard irq parts and ignores the threaded handlers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Russell King <linux@arm.linux.org.uk> Cc: Chris Ball <chris@printf.net> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140215003823.653236081@linutronix.de
2014-02-19sched_clock: Prevent callers from seeing half-updated dataStephen Boyd
The generic sched_clock registration function was previously done lockless, due to the fact that it was expected to be called only once. However, now there are systems that may register multiple sched_clock sources, for which the lack of locking has casued problems: If two sched_clock sources are registered we may end up in a situation where a call to sched_clock() may be accessing the epoch cycle count for the old counter and the cycle count for the new counter. This can lead to confusing results where sched_clock() values jump and then are reset to 0 (due to the way the registration function forces the epoch_ns to be 0). Fix this by reorganizing the registration function to hold the seqlock for as short a time as possible while we update the clock_data structure for a new counter. We also put any accumulated time into epoch_ns instead of resetting the time to 0 so that the clock doesn't reset after each successful registration. [jstultz: Added extra context to the commit message] Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Cartwright <joshc@codeaurora.org> Link: http://lkml.kernel.org/r/1392662736-7803-2-git-send-email-john.stultz@linaro.org Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19user_namespace.c: Remove duplicated word in commentBrian Campbell
Signed-off-by: Brian Campbell <brian.campbell@editshare.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-19treewide: Fix typo in Documentation/DocBookMasanari Iida
This patch fix spelling typo in Documentation/DocBook. It is because .html and .xml files are generated by make htmldocs, I have to fix a typo within the source files. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-18cgroup: update cgroup_enable_task_cg_lists() to grab siglockTejun Heo
Currently, there's nothing preventing cgroup_enable_task_cg_lists() from missing set PF_EXITING and race against cgroup_exit(). Depending on the timing, cgroup_exit() may finish with the task still linked on css_set leading to list corruption. Fix it by grabbing siglock in cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be visible. This whole on-demand cg_list optimization is extremely fragile and has ample possibility to lead to bugs which can cause things like once-a-year oops during boot. I'm wondering whether the better approach would be just adding "cgroup_disable=all" handling which disables the whole cgroup rather than tempting fate with this on-demand craziness. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org
2014-02-18workqueue: ensure @task is valid across kthread_stop()Lai Jiangshan
When a kworker should die, the kworkre is notified through WORKER_DIE flag instead of kthread_should_stop(). This, IIRC, is primarily to keep the test synchronized inside worker_pool lock. WORKER_DIE is first set while holding pool->lock, the lock is dropped and kthread_stop() is called. Unfortunately, this means that there's a slight chance that the target kworker may see WORKER_DIE before kthread_stop() finishes and exits and frees the target task before or during kthread_stop(). Fix it by pinning the target task before setting WORKER_DIE and putting it after kthread_stop() is done. tj: Improved patch description and comment. Moved pinning above WORKER_DIE for better signify what it's protecting. CC: stable@vger.kernel.org Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-02-18inotify: Fix reporting of cookies for inotify eventsJan Kara
My rework of handling of notification events (namely commit 7053aee26a35 "fsnotify: do not share events between notification groups") broke sending of cookies with inotify events. We didn't propagate the value passed to fsnotify() properly and passed 4 uninitialized bytes to userspace instead (so it is also an information leak). Sadly I didn't notice this during my testing because inotify cookies aren't used very much and LTP inotify tests ignore them. Fix the problem by passing the cookie value properly. Fixes: 7053aee26a3548ebaba046ae2e52396ccf56ac6c Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-17rcu: Optimize RCU_FAST_NO_HZ for RCU_NOCB_CPU_ALLPaul E. McKenney
If CONFIG_RCU_NOCB_CPU_ALL=y, then no CPU will ever have RCU callbacks because these callbacks will instead be handled by the rcuo kthreads. However, the current version of RCU_FAST_NO_HZ nevertheless checks for RCU callbacks. This commit therefore creates static inline implementations of rcu_prepare_for_idle() and rcu_cleanup_after_idle() that are no-ops when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Optimize rcu_needs_cpu() for RCU_NOCB_CPU_ALLPaul E. McKenney
If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_needs_cpu() will always return false, however, the current version nevertheless checks for RCU callbacks. This commit therefore creates a static inline implementation of rcu_needs_cpu() that unconditionally returns false when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Optimize rcu_is_nocb_cpu() for RCU_NOCB_CPU_ALLPaul E. McKenney
If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_is_nocb_cpu() will always return true, however, the current version nevertheless checks rcu_nocb_mask. This commit therefore creates a static inline implementation of rcu_is_nocb_cpu() that unconditionally returns true when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Move SRCU grace period work to power efficient workqueueShaibal Dutta
For better use of CPU idle time, allow the scheduler to select the CPU on which the SRCU grace period work would be scheduled. This improves idle residency time and conserves power. This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Shaibal Dutta <shaibal.dutta@broadcom.com> [zoran.markovic@linaro.org: Rebased to latest kernel version. Added commit message. Fixed code alignment.] Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Disambiguate CONFIG_RCU_NOCB_CPUsPaul Bolle
This commit fixes a grammar issue in the rcu_nohz_full_cpu() comment header, so that it is clear that the plural is CPUs not Kconfig options. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Remove ACCESS_ONCE() from jiffiesPaul E. McKenney
Because jiffies is one of a very few variables marked "volatile", there is no need to use ACCESS_ONCE() when accessing it. This commit therefore removes the redundant ACCESS_ONCE() wrappers. Reported by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Stop tracking FSF's postal addressPaul E. McKenney
All of the RCU source files have the usual GPL header, which contains a long-obsolete postal address for FSF. To avoid the need to track the FSF office's movements, this commit substitutes the URL where GPL may be found. Reported-by: Greg KH <gregkh@linuxfoundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17rcu: Add ACCESS_ONCE() to ->n_force_qs_lh accessesPaul E. McKenney
The ->n_force_qs_lh field is accessed without the benefit of any synchronization, so this commit adds the needed ACCESS_ONCE() wrappers. Yes, increments to ->n_force_qs_lh can be lost, but contention should be low and the field is strictly statistical in nature, so this is not a problem. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17printk: fix syslog() overflowing user bufferLinus Torvalds
This is not a buffer overflow in the traditional sense: we don't overflow any *kernel* buffers, but we do mis-count the amount of data we copy back to user space for the SYSLOG_ACTION_READ_ALL case. In particular, if the user buffer is too small to hold everything, and *if* there is a continuation line at just the right place, we can end up giving the user more data than he asked for. The reason is that we first count up the number of bytes all the log records contains, then we walk the records again until we've skipped the records at the beginning that won't fit, and then we walk the rest of the records and copy them to the user space buffer. And in between that "skip the initial records that won't fit" and the "copy the records that *will* fit to user space", we reset the 'prev' variable that contained the record information for the last record not copied. That meant that when we started copying to user space, we now had a different character count than what we had originally calculated in the first record walk-through. The fix is to simply not clear the 'prev' flags value (in both cases where we had the same logic: syslog_print_all and kmsg_dump_get_buffer: the latter is used for pstore-like dumping) Reported-and-tested-by: Debabrata Banerjee <dbanerje@akamai.com> Acked-by: Kay Sievers <kay@vrfy.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-15Merge branches 'irq-urgent-for-linus' and 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq update from Thomas Gleixner: "Fix from the urgent branch: a trivial oneliner adding the missing Kconfig dependency curing build failures which have been discovered by several build robots. The update in the irq-core branch provides a new function in the irq/devres code, which is a prerequisite for driver developers to get rid of boilerplate code all over the place. Not a bugfix, but it has zero impact on the current kernel due to the lack of users. It's simpler to provide the infrastructure to interested parties via your tree than fulfilling the wishlist of driver maintainers on which particular commit or tag this should be based on" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Add devm_request_any_context_irq()
2014-02-15Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "The following trilogy of patches brings you: - fix for a long standing math overflow issue with HZ < 60 - an onliner fix for a corner case in the dreaded tick broadcast mechanism affecting a certain range of AMD machines which are infested with the infamous automagic C1E power control misfeature - a fix for one of the ARM platforms which allows the kernel to proceed and boot instead of stupidly panicing for no good reason. The patch is slightly larger than necessary, but it's less ugly than the alternative 5 liner" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Clear broadcast pending bit when switching to oneshot clocksource: Kona: Print warning rather than panic time: Fix overflow when HZ is smaller than 60
2014-02-14nohz: ensure users are aware boot CPU is not NO_HZ_FULLPaul Gortmaker
This bit of information is in the Kconfig help text: "Note the boot CPU will still be kept outside the range to handle the timekeeping duty." However neither the variable NO_HZ_FULL_ALL, or the prompt convey this important detail, so lets add it to the prompt to make it more explicitly obvious to the average user. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1391711781-7466-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-02-14timer: Spare IPI when deferrable timer is queued on idle remote targetsViresh Kumar
When a timer is enqueued or modified on a remote target, the latter is expected to see and handle this timer on its next tick. However if the target is idle and CONFIG_NO_HZ_IDLE=y, the CPU may be sleeping tickless and the timer may be ignored. wake_up_nohz_cpu() takes care of that by setting TIF_NEED_RESCHED and sending an IPI to idle targets so that the tick is reevaluated on the idle loop through the tick_nohz_idle_*() APIs. Now this is all performed regardless of the power properties of the timer. If the timer is deferrable, idle targets don't need to be woken up. Only the next buzy tick needs to care about it, and no IPI kick is needed for that to happen. So lets spare the IPI on idle targets when the timer is deferrable. Meanwhile we keep the current behaviour on full dynticks targets. We can spare IPIs on idle full dynticks targets as well but some tricky races against idle_cpu() must be dealt all along to make sure that the timer is well handled after idle exit. We can deal with that later since NO_HZ_FULL already has more important powersaving issues. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAKohpomMZ0TAN2e6N76_g4ZRzxd5vZ1XfuZfxrP7GMxfTNiLVw@mail.gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-02-13lto: Disable LTO for sys_niAndi Kleen
The assembler alias code in cond_syscall does not work when compiled for LTO. Just disable LTO for that file. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391846481-31491-6-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13lto: Handle LTO common symbols in module loaderJoe Mario
Here is the workaround I made for having the kernel not reject modules built with -flto. The clean solution would be to get the compiler to not emit the symbol. Or if it has to emit the symbol, then emit it as initialized data but put it into a comdat/linkonce section. Minor tweaks by AK over Joe's patch. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391846481-31491-5-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage: Make trace_hardirqs_on/off_caller visibleAndi Kleen
These functions are called from assembler, and thus need to be __visible. Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-12-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage Make __stack_chk_failed and memcmp visibleAndi Kleen
In LTO symbols implicitely referenced by the compiler need to be visible. Earlier these symbols were visible implicitely from being exported, but we disabled implicit visibility fo EXPORTs when modules are disabled to improve code size. So now these symbols have to be marked visible explicitely. Do this for __stack_chk_fail (with stack protector) and memcmp. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage: Mark rwsem functions that can be called from assembler asmlinkageAndi Kleen
Mark the rwsem functions that can be called from assembler asmlinkage. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-9-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage: Make main_extable_sort_needed visibleAndi Kleen
main_extable_sort_needed is used by the build system and needs to be a normal ELF symbol. Make it visible so that LTO does not remove or mangle it. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-8-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage, mutex: Mark __visibleAndi Kleen
Various kernel/mutex.c functions can be called from inline assembler, so they should be all global and __visible. Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-7-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage: Make trace_hardirq visibleAndi Kleen
Can be called from assembler code. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-6-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage: Make lockdep_sys_exit asmlinkageAndi Kleen
lockdep_sys_exit can be called from assembler code, so make it asmlinkage. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-5-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13asmlinkage: Make jiffies visibleAndi Kleen
Jiffies is referenced by the linker script, so it has to be visible. Handled both the generic and the x86 version. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-3-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13tick: Clear broadcast pending bit when switching to oneshotThomas Gleixner
AMD systems which use the C1E workaround in the amd_e400_idle routine trigger the WARN_ON_ONCE in the broadcast code when onlining a CPU. The reason is that the idle routine of those AMD systems switches the cpu into forced broadcast mode early on before the newly brought up CPU can switch over to high resolution / NOHZ mode. The timer related CPU1 bringup looks like this: clockevent_register_device(local_apic); tick_setup(local_apic); ... idle() tick_broadcast_on_off(FORCE); tick_broadcast_oneshot_control(ENTER) cpumask_set(cpu, broadcast_oneshot_mask); halt(); Now the broadcast interrupt on CPU0 sets CPU1 in the broadcast_pending_mask and wakes CPU1. So CPU1 continues: local_apic_timer_interrupt() tick_handle_periodic(); softirq() tick_init_highres(); cpumask_clr(cpu, broadcast_oneshot_mask); tick_broadcast_oneshot_control(ENTER) WARN_ON(cpumask_test(cpu, broadcast_pending_mask); So while we remove CPU1 from the broadcast_oneshot_mask when we switch over to highres mode, we do not clear the pending bit, which then triggers the warning when we go back to idle. The reason why this is only visible on C1E affected AMD systems is that the other machines enter the deep sleep states via acpi_idle/intel_idle and exit the broadcast mode before executing the remote triggered local_apic_timer_interrupt. So the pending bit is already cleared when the switch over to highres mode is clearing the oneshot mask. The solution is simple: Clear the pending bit together with the mask bit when we switch over to highres mode. Stanislaw came up independently with the same patch by enforcing the C1E workaround and debugging the fallout. I picked mine, because mine has a changelog :) Reported-by: poma <pomidorabelisima@gmail.com> Debugged-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Olaf Hering <olaf@aepfle.de> Cc: Dave Jones <davej@redhat.com> Cc: Justin M. Forbes <jforbes@redhat.com> Cc: Josh Boyer <jwboyer@redhat.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402111434180.21991@ionos.tec.linutronix.de Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-12Revert "cgroup: use an ordered workqueue for cgroup destruction"Tejun Heo
This reverts commit ab3f5faa6255a0eb4f832675507d9e295ca7e9ba. Explanation from Hugh: It's because more thorough testing, by others here, found that it wasn't always solving the problem: so I asked Tejun privately to hold off from sending it in, until we'd worked out why not. Most of our testing being on a v3,11-based kernel, it was perfectly possible that the problem was merely our own e.g. missing Tejun's 8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups"). But that turned out not to be enough to fix it either. Then Filipe pointed out how percpu_ref_kill_and_confirm() uses call_rcu_sched() before we ever get to put the offline on to the workqueue: by the time we get to the workqueue, the ordering has already been lost. So, thanks for the Acks, but I'm afraid that this ordered workqueue solution is just not good enough: we should simply forget that patch and provide a different answer." Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Hugh Dickins <hughd@google.com>
2014-02-11ring-buffer: Fix first commit on sub-buffer having non-zero deltaSteven Rostedt (Red Hat)
Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on that page use a 27 bit delta against that timestamp in order to save on bits written to the ring buffer. If the time between events is larger than what the 27 bits can hold, a "time extend" event is added to hold the entire 64 bit timestamp again and the events after that hold a delta from that timestamp. As a "time extend" is always paired with an event, it is logical to just allocate the event with the time extend, to make things a bit more efficient. Unfortunately, when the pairing code was written, it removed the "delta = 0" from the first commit on a page, causing the events on the page to be slightly skewed. Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together" Cc: stable@vger.kernel.org # 2.6.37+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11cgroup: protect modifications to cgroup_idr with cgroup_mutexLi Zefan
Setup cgroupfs like this: # mount -t cgroup -o cpuacct xxx /cgroup # mkdir /cgroup/sub1 # mkdir /cgroup/sub2 Then run these two commands: # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } & # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } & After seconds you may see this warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0() idr_remove called for id=6 which is not allocated. ... Call Trace: [<ffffffff8156063c>] dump_stack+0x7a/0x96 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0 [<ffffffff81300bf5>] idr_remove+0x25/0xd0 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0 [<ffffffff81085f96>] kthread+0xe6/0xf0 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0 ---[ end trace 2d1577ec10cf80d0 ]--- It's because allocating/removing cgroup ID is not properly synchronized. The bug was introduced when we converted cgroup_ida to cgroup_idr. While synchronization is already done inside ida_simple_{get,remove}(), users are responsible for concurrent calls to idr_{alloc,remove}(). tj: Refreshed on top of b58c89986a77 ("cgroup: fix error return from cgroup_create()"). Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr") Cc: <stable@vger.kernel.org> #3.12+ Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-02-11genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=nPaul Gortmaker
In allmodconfig builds for sparc and any other arch which does not set CONFIG_SPARSE_IRQ, the following will be seen at modpost: CC [M] lib/cpu-notifier-error-inject.o CC [M] lib/pm-notifier-error-inject.o ERROR: "irq_to_desc" [drivers/gpio/gpio-mcp23s08.ko] undefined! make[2]: *** [__modpost] Error 1 This happens because commit 3911ff30f5 ("genirq: export handle_edge_irq() and irq_to_desc()") added one export for it, but there were actually two instances of it, in an if/else clause for CONFIG_SPARSE_IRQ. Add the second one. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: stable@vger.kernel.org # 3.4+ Link: http://lkml.kernel.org/r/1392057610-11514-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-11sched/idle: Move cpu/idle.c to sched/idle.cNicolas Pitre
Integration of cpuidle with the scheduler requires that the idle loop be closely integrated with the scheduler proper. Moving cpu/idle.c into the sched directory will allow for a smoother integration, and eliminate a subdirectory which contained only one source file. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1401301102210.1652@knanqh.ubzr Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11sched/idle: Move the cpuidle entry point to the generic idle loopNicolas Pitre
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and allowing for the warning to trig. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Olof Johansson <olof@lixom.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1401291526320.1652@knanqh.ubzr Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11sched: Add statistic for newidle load balance costAlex Shi
Tracking rq->max_idle_balance_cost and sd->max_newidle_lb_cost. It's useful to know these values in debug mode. Signed-off-by: Alex Shi <alex.shi@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/52E0F3BF.5020904@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11sched: Delete is_same_group() outside CONFIG_FAIR_GROUP_SCHEDDietmar Eggemann
Since is_same_group() is only used in the group scheduling code, there is no need to define it outside CONFIG_FAIR_GROUP_SCHED. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391005773-29493-1-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11sched: Push down pre_schedule() and idle_balance()Peter Zijlstra
This patch both merged idle_balance() and pre_schedule() and pushes both of them into pick_next_task(). Conceptually pre_schedule() and idle_balance() are rather similar, both are used to pull more work onto the current CPU. We cannot however first move idle_balance() into pre_schedule_fair() since there is no guarantee the last runnable task is a fair task, and thus we would miss newidle balances. Similarly, the dl and rt pre_schedule calls must be ran before idle_balance() since their respective tasks have higher priority and it would not do to delay their execution searching for less important tasks first. However, by noticing that pick_next_tasks() already traverses the sched_class hierarchy in the right order, we can get the right behaviour and do away with both calls. We must however change the special case optimization to also require that prev is of sched_class_fair, otherwise we can miss doing a dl or rt pull where we needed one. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11PM / QoS: Introcuce latency tolerance device PM QoS typeRafael J. Wysocki
Add a new latency tolerance device PM QoS type to be use for specifying active state (RPM_ACTIVE) memory access (DMA) latency tolerance requirements for devices. It may be used to prevent hardware from choosing overly aggressive energy-saving operation modes (causing too much latency to appear) for the whole platform. This feature reqiures hardware support, so it only will be available for devices having a new .set_latency_tolerance() callback in struct dev_pm_info populated, in which case the routine pointed to by it should implement whatever is necessary to transfer the effective requirement value to the hardware. Whenever the effective latency tolerance changes for the device, its .set_latency_tolerance() callback will be executed and the effective value will be passed to it. If that value is negative, which means that the list of latency tolerance requirements for the device is empty, the callback is expected to switch the underlying hardware latency tolerance control mechanism to an autonomous mode if available. If that value is PM_QOS_LATENCY_ANY, in turn, and the hardware supports a special "no requirement" setting, the callback is expected to use it. That allows software to prevent the hardware from automatically updating the device's latency tolerance in response to its power state changes (e.g. during transitions from D3cold to D0), which generally may be done in the autonomous latency tolerance control mode. If .set_latency_tolerance() is present for the device, a new pm_qos_latency_tolerance_us attribute will be present in the devivce's power directory in sysfs. Then, user space can use that attribute to specify its latency tolerance requirement for the device, if any. Writing "any" to it means "no requirement, but do not let the hardware control latency tolerance" and writing "auto" to it allows the hardware to be switched to the autonomous mode if there are no other requirements from the kernel side in the device's list. This changeset includes a fix from Mika Westerberg. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-11PM / QoS: Add no_constraints_value field to struct pm_qos_constraintsRafael J. Wysocki
Add a new field, no_constraints_value, to struct pm_qos_constraints representing a list of PM QoS constraint requests to be returned by pm_qos_get_value() when that list of requests is empty. That field will be equal to default_value for all of the existing global PM QoS classes and for the resume latency device PM QoS type, but it will be different from default_value for the new latency tolerance device PM QoS type introduced by the next changeset. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-10sched: Clean up idle task SMP logicPeter Zijlstra
The idle post_schedule flag is just a vile waste of time, furthermore it appears unneeded, move the idle_enter_fair() call into pick_next_task_idle(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: alex.shi@linaro.org Cc: mingo@kernel.org Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-aljykihtxJt3mkokxi0qZurb@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>