summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2015-07-17genirq: Prevent resend to interrupts marked IRQ_NESTED_THREADThomas Gleixner
The resend mechanism happily calls the interrupt handler of interrupts which are marked IRQ_NESTED_THREAD from softirq context. This can result in crashes because the interrupt handler is not the proper way to invoke the device handlers. They must be invoked via handle_nested_irq. Prevent the resend even if the interrupt has no valid parent irq set. Its better to have a lost interrupt than a crashing machine. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2015-07-15rcutorture: Add RCU-tasks qualifier to dereferencePaul E. McKenney
Although RCU-tasks isn't really designed to support rcu_dereference() and list manipulation, that is how rcutorture tests it. Which means that lockdep-RCU complains about the rcu_dereference_check() invocations because RCU-tasks doesn't have read-side markers. This commit therefore creates a torturing_tasks() to silence the lockdep-RCU complaints from rcu_dereference_check() when RCU-tasks is being tortured. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcutorture: Fix rcu_torture_cbflood() for callback-free RCUPaul E. McKenney
The rcu_torture_cbflood() function correctly checks for flavors of RCU that lack analogs to call_rcu() and rcu_barrier(), but in that case it fails to terminate correctly. In fact, it terminates so incorrectly that segfaults can result. This commit therefore causes rcu_torture_cbflood() to do the proper wait-for-stop procedure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcutorture: Bounds-check rcutorture.shuffle_intervalPaul E. McKenney
Specifying a negative rcutorture.shuffle_interval value will cause a negative value to be used as a sleep time. This commit therefore refuses to start shuffling unless the rcutorture.shuffle_interval value is greater than zero. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcutorture: Check nfakewriters parameterPaul E. McKenney
Currently, a negative value for rcutorture.nfakewriters= can cause rcutorture to pass a negative size to the memory allocator, which is not really a particularly good thing to do. This commit therefore adds bounds checking to this parameter, so that values that are less than or equal to zero disable fake writing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcutorture: Better bounds checking for n_barrier_cbsPaul E. McKenney
A negative value for rcutorture.n_barrier_cbs can pass a negative value to the memory allocator, so this commit instead causes rcu_barrier() testing to be disabled in this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Simplify arithmetic to calculate number of RCU nodesAlexander Gordeev
This update makes arithmetic to calculate number of RCU nodes more straight and easy to read. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Limit count of static data to the number of RCU levelsAlexander Gordeev
Although a number of RCU levels may be less than the current maximum of four, some static data associated with each level are allocated for all four levels. As result, the extra data never get accessed and just wast memory. This update limits count of allocated items to the number of used RCU levels. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Remove unnecessary fields from rcu_state structureAlexander Gordeev
Members rcu_state::levelcnt[] and rcu_state::levelspread[] are only used at init. There is no reason to keep them afterwards. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Limit rcu_capacity[] size to RCU_NUM_LVLS itemsAlexander Gordeev
Number of items in rcu_capacity[] array is defined by macro MAX_RCU_LVLS. However, that array is never accessed beyond RCU_NUM_LVLS index. Therefore, we can limit the array to RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result, in most cases the memory is conserved. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Limit rcu_state::levelcnt[] to RCU_NUM_LVLS itemsAlexander Gordeev
Variable rcu_num_lvls is limited by RCU_NUM_LVLS macro. In turn, rcu_state::levelcnt[] array is never accessed beyond rcu_num_lvls. Thus, rcu_state::levelcnt[] is safe to limit to RCU_NUM_LVLS items. Since rcu_num_lvls could be changed during boot (as result of rcutree.rcu_fanout_leaf kernel parameter update) one might assume a new value could overflow the value of RCU_NUM_LVLS. However, that is not the case, since leaf-level fanout is only permitted to increase, resulting in rcu_num_lvls possibly to decrease. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Simplify rcu_init_geometry() capacity arithmeticsAlexander Gordeev
Current code suggests that introducing the extra level to rcu_capacity[] array makes some of the arithmetic easier. Well, in fact it appears rather confusing and unnecessary. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Cleanup rcu_init_geometry() code and arithmeticsAlexander Gordeev
This update simplifies rcu_init_geometry() code flow and makes calculation of the total number of rcu_node structures more easy to read. The update relies on the fact num_rcu_lvl[] is never accessed beyond rcu_num_lvls index by the rest of the code. Therefore, there is no need initialize the whole num_rcu_lvl[]. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Remove superfluous local variable in rcu_init_geometry()Alexander Gordeev
Local variable 'n' mimics 'nr_cpu_ids' while the both are used within one function. There is no reason for 'n' to exist whatsoever. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Panic if RCU tree can not accommodate all CPUsAlexander Gordeev
Currently a condition when RCU tree is unable to accommodate the configured number of CPUs is not permitted and causes a fall back to compile-time values. However, the code has no means to exceed the RCU tree capacity neither at compile-time nor in run-time. Therefore, if the condition is met in run- time then it indicates a serios problem elsewhere and should be handled with a panic. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Provide more diagnostics for stalled GP kthreadPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Change return type to boolNicholas Mc Guire
Type-checking coccinelle spatches are being used to locate type mismatches between function signatures and return values in this case this produced: ./kernel/rcu/srcu.c:271 WARNING: return of wrong type int != unsigned long, srcu_readers_active() returns an int that is the sum of per_cpu unsigned long but the only user is cleanup_srcu_struct() which is using it as a boolean (condition) to see if there is any readers rather than actually using the approximate number of readers. The theoretically possible unsigned long overflow case does not need to be handled explicitly - if we had 4G++ readers then something else went wrong a long time ago. proposal: change the return type to boolean. The function name is left unchanged as it fits the naming expectation for a boolean. patch was compile tested for x86_64_defconfig (implies CONFIG_SRCU=y) patch is against 4.1-rc5 (localversion-next is -next-20150525) Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOCDenys Vlasenko
DEBUG_LOCK_ALLOC=y is not a production setting, but it is not very unusual either. Many developers routinely use kernels built with it enabled. Apart from being selected by hand, it is also auto-selected by PROVE_LOCKING "Lock debugging: prove locking correctness" and LOCK_STAT "Lock usage statistics" config options. LOCK STAT is necessary for "perf lock" to work. I wouldn't spend too much time optimizing it, but this particular function has a very large cost in code size: when it is deinlined, code size decreases by 830,000 bytes: text data bss dec hex filename 85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before 84837612 22294424 20627456 127759492 79d7484 vmlinux (with this config: http://busybox.net/~vda/kernel_config) Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Lai Jiangshan <laijs@cn.fujitsu.com> CC: Tejun Heo <tj@kernel.org> CC: Oleg Nesterov <oleg@redhat.com> CC: linux-kernel@vger.kernel.org Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15seccomp: swap hard-coded zeros to defined nameKees Cook
For clarity, if CONFIG_SECCOMP isn't defined, seccomp_mode() is returning "disabled". This makes that more clear, along with another 0-use, and results in no operational change. Signed-off-by: Kees Cook <keescook@chromium.org>
2015-07-15seccomp: add ptrace options for suspend/resumeTycho Andersen
This patch is the first step in enabling checkpoint/restore of processes with seccomp enabled. One of the things CRIU does while dumping tasks is inject code into them via ptrace to collect information that is only available to the process itself. However, if we are in a seccomp mode where these processes are prohibited from making these syscalls, then what CRIU does kills the task. This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp filters to disable (and re-enable) seccomp filters for another task so that they can be successfully dumped (and restored). We restrict the set of processes that can disable seccomp through ptrace because although today ptrace can be used to bypass seccomp, there is some discussion of closing this loophole in the future and we would like this patch to not depend on that behavior and be future proofed for when it is removed. Note that seccomp can be suspended before any filters are actually installed; this behavior is useful on criu restore, so that we can suspend seccomp, restore the filters, unmap our restore code from the restored process' address space, and then resume the task by detaching and have the filters resumed as well. v2 changes: * require that the tracer have no seccomp filters installed * drop TIF_NOTSC manipulation from the patch * change from ptrace command to a ptrace option and use this ptrace option as the flag to check. This means that as soon as the tracer detaches/dies, seccomp is re-enabled and as a corrollary that one can not disable seccomp across PTRACE_ATTACHs. v3 changes: * get rid of various #ifdefs everywhere * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly used v4 changes: * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace directly v5 changes: * check that seccomp is not enabled (or suspended) on the tracer Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com> CC: Will Drewry <wad@chromium.org> CC: Roland McGrath <roland@hack.frob.com> CC: Pavel Emelyanov <xemul@parallels.com> CC: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> [kees: access seccomp.mode through seccomp_mode() instead] Signed-off-by: Kees Cook <keescook@chromium.org>
2015-07-15seccomp: Replace smp_read_barrier_depends() with lockless_dereference()Pranith Kumar
Recently lockless_dereference() was added which can be used in place of hard-coding smp_read_barrier_depends(). The following PATCH makes the change. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2015-07-15Merge tag 'trace-v4.2-rc1-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fengguang Wu discovered a crash that happened to be because of the branch tracer (traces unlikely and likely branches) when enabled with certain debug options. What happened was that various debug options like lockdep and DEBUG_PREEMPT can cause parts of the branch tracer to recurse outside its recursion protection. In fact, part of its recursion protection used these features that caused the lockup. This cleans up the code a little and makes the recursion protection a bit more robust" * tag 'trace-v4.2-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have branch tracer use recursive field of task struct
2015-07-15genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for nowThomas Gleixner
Boris reported that the sparse_irq protection around __cpu_up() in the generic code causes a regression on Xen. Xen allocates interrupts and some more in the xen_cpu_up() function, so it deadlocks on the sparse_irq_lock. There is no simple fix for this and we really should have the protection for all architectures, but for now the only solution is to move it to x86 where actual wreckage due to the lack of protection has been observed. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down' Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: xen-devel <xen-devel@lists.xenproject.org>
2015-07-14cgroup: implement the PIDs subsystemAleksa Sarai
Adds a new single-purpose PIDs subsystem to limit the number of tasks that can be forked inside a cgroup. Essentially this is an implementation of RLIMIT_NPROC that applies to a cgroup rather than a process tree. However, it should be noted that organisational operations (adding and removing tasks from a PIDs hierarchy) will *not* be prevented. Rather, the number of tasks in the hierarchy cannot exceed the limit through forking. This is due to the fact that, in the unified hierarchy, attach cannot fail (and it is not possible for a task to overcome its PIDs cgroup policy limit by attaching to a child cgroup -- even if migrating mid-fork it must be able to fork in the parent first). PIDs are fundamentally a global resource, and it is possible to reach PID exhaustion inside a cgroup without hitting any reasonable kmemcg policy. Once you've hit PID exhaustion, you're only in a marginally better state than OOM. This subsystem allows PID exhaustion inside a cgroup to be prevented. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-07-14cgroup: allow a cgroup subsystem to reject a forkAleksa Sarai
Add a new cgroup subsystem callback can_fork that conditionally states whether or not the fork is accepted or rejected by a cgroup policy. In addition, add a cancel_fork callback so that if an error occurs later in the forking process, any state modified by can_fork can be reverted. Allow for a private opaque pointer to be passed from cgroup_can_fork to cgroup_post_fork, allowing for the fork state to be stored by each subsystem separately. Also add a tagging system for cgroup_subsys.h to allow for CGROUP_<TAG> enumerations to be be defined and used. In addition, explicitly add a CGROUP_CANFORK_COUNT macro to make arrays easier to define. This is in preparation for implementing the pids cgroup subsystem. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-07-14livepatch: Improve error handling in klp_disable_func()Minfei Huang
In case of func->state or func->old_addr not having expected values, we'd rather bail out immediately from klp_disable_func(). This can't really happen with the current codebase, but fix this anyway in the sake of robustness. [jkosina@suse.com: reworded the changelog a bit] Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2015-07-14PM / autosleep: Use workqueue for user space wakeup sources garbage collectorSungEun Kim
The synchronous synchronize_rcu() in wakeup_source_remove() makes user process which writes to /sys/kernel/wake_unlock blocked sometimes. For example, when android eventhub tries to release a wakelock, this blocking process can occur, and eventhub can't get input events for a while. Using a work item instead of direct function call at pm_wake_unlock() can prevent this unnecessary delay from happening. Signed-off-by: SungEun Kim <cleaneye.kim@lge.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-14tick: Move the export of tick_broadcast_oneshot_control to the proper placeThomas Gleixner
tick_broadcast_oneshot_control got moved from tick-broadcast to tick-common, but the export stayed in the old place. Fix it up. Fixes: f32dd1170511 'tick/broadcast: Make idle check independent from mode and config' Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bridge/br_mdb.c Minor conflict in br_mdb.c, in 'net' we added a memset of the on-stack 'ip' variable whereas in 'net-next' we assign a new member 'vid'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-13ebpf: remove self-assignment in interpreter's tail callDaniel Borkmann
ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1] and thus has no effect. Add a comment instead, explaining what happens and why it's okay to just remove it. Since from user space side, a tail call is invoked as a pseudo helper function via bpf_tail_call_proto, the verifier checks the arguments just like with any other helper function and makes sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-12Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "This update from the timer departement contains: - A series of patches which address a shortcoming in the tick broadcast code. If the broadcast device is not available or an hrtimer emulated broadcast device, some of the original assumptions lead to boot failures. I rather plugged all of the corner cases instead of only addressing the issue reported, so the change got a little larger. Has been extensivly tested on x86 and arm. - Get rid of the last holdouts using do_posix_clock_monotonic_gettime() - A regression fix for the imx clocksource driver - An update to the new state callbacks mechanism for clockevents. This is required to simplify the conversion, which will take place in 4.3" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/broadcast: Prevent NULL pointer dereference time: Get rid of do_posix_clock_monotonic_gettime cris: Replace do_posix_clock_monotonic_gettime() tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build tick/broadcast: Handle spurious interrupts gracefully tick/broadcast: Check for hrtimer broadcast active early tick/broadcast: Return busy when IPI is pending tick/broadcast: Return busy if periodic mode and hrtimer broadcast tick/broadcast: Move the check for periodic mode inside state handling tick/broadcast: Prevent deep idle if no broadcast device available tick/broadcast: Make idle check independent from mode and config tick/broadcast: Sanity check the shutdown of the local clock_event tick/broadcast: Prevent hrtimer recursion clockevents: Allow set-state callbacks to be optional clocksource/imx: Define clocksource for mx27
2015-07-12Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for a cpu hotplug race vs. interrupt descriptors: Prevent irq setup/teardown across the cpu starting/dying parts of cpu hotplug so that the starting/dying cpu has a stable view of the descriptor space. This has been an issue for all architectures in the cpu dying phase, where interrupts are migrated away from the dying cpu. In the starting phase its mostly a x86 issue vs the vector space update" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hotplug: Prevent alloc/free of irq descriptors during cpu up/down
2015-07-11genirq: Remove the irq argument from setup_affinity()Jiang Liu
Unused except for the alpha wrapper, which can retrieve if from the irq descriptor. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1433391238-19471-21-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Provide and use __irq_can_set_affinity()Jiang Liu
Provide a irq_desc based variant of irq_can_set_affinity() to avoid a redundant lookup for the core code users. [ tglx: Split out from combo patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove the irq argument from note_interrupt()Jiang Liu
Only required for the slow path. Retrieve it from irq descriptor if necessary. [ tglx: Split out from combo patch. Left [try_]misrouted_irq() untouched as there is no win in the slow path ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/1433391238-19471-19-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove irq argument from try_one_irq()Jiang Liu
Unused argument. [ tglx: Split out from combo patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove irq argument from report_bad_irq()Jiang Liu
Not really a hotpath, so __report_bad_irq() can retrieve the irq number from the irq descriptor. [ tglx: Split out from combo patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove irq argument from suspend/resume_irq()Jiang Liu
Unused argument in both functions. [ tglx: Split out from combo patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove irq argument from __enable/__disable_irq()Jiang Liu
Solely used for debug output. Can be retrieved from irq descriptor if necessary. [ tglx: Split out from combo patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove irq arg from __irq_set_trigger()Jiang Liu
It's only required for debug output and can be retrieved from the irq descriptor if necessary. [ tglx: Split out from combo patch ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove the irq argument from check_irq_resend()Jiang Liu
It's only used in the software resend case and can be retrieved from irq_desc if necessary. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1433391238-19471-18-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11genirq: Remove the parameter 'irq' of kstat_incr_irqs_this_cpu()Jiang Liu
The first parameter 'irq' is never used by kstat_incr_irqs_this_cpu(). Remove it. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1433391238-19471-16-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-11tick/broadcast: Prevent NULL pointer dereferenceThomas Gleixner
Dan reported that the recent changes to the broadcast code introduced a potential NULL dereference. Add the proper check. Fixes: e0454311903d "tick/broadcast: Sanity check the shutdown of the local clock_event" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-10vfs: Commit to never having exectuables on proc and sysfs.Eric W. Biederman
Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-09module: Fix load_module() error pathPeter Zijlstra
The load_module() error path frees a module but forgot to take it out of the mod_tree, leaving a dangling entry in the tree, causing havoc. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Tested-by: Arthur Marsh <arthur.marsh@internode.on.net> Fixes: 93c2e105f6bc ("module: Optimize __module_address() using a latched RB-tree") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-07-08Fix broken audit tests for exec arg lenLinus Torvalds
The "fix" in commit 0b08c5e5944 ("audit: Fix check of return value of strnlen_user()") didn't fix anything, it broke things. As reported by Steven Rostedt: "Yes, strnlen_user() returns 0 on fault, but if you look at what len is set to, than you would notice that on fault len would be -1" because we just subtracted one from the return value. So testing against 0 doesn't test for a fault condition, it tests against a perfectly valid empty string. Also fix up the usual braindamage wrt using WARN_ON() inside a conditional - make it part of the conditional and remove the explicit unlikely() (which is already part of the WARN_ON*() logic, exactly so that you don't have to write unreadable code. Reported-and-tested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Paul Moore <pmoore@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-08tracing: Have branch tracer use recursive field of task structSteven Rostedt (Red Hat)
Fengguang Wu's tests triggered a bug in the branch tracer's start up test when CONFIG_DEBUG_PREEMPT set. This was because that config adds some debug logic in the per cpu field, which calls back into the branch tracer. The branch tracer has its own recursive checks, but uses a per cpu variable to implement it. If retrieving the per cpu variable calls back into the branch tracer, you can see how things will break. Instead of using a per cpu variable, use the trace_recursion field of the current task struct. Simply set a bit when entering the branch tracing and clear it when leaving. If the bit is set on entry, just don't do the tracing. There's also the case with lockdep, as the local_irq_save() called before the recursion can also trigger code that can call back into the function. Changing that to a raw_local_irq_save() will protect that as well. This prevents the recursion and the inevitable crash that follows. Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com Cc: stable@vger.kernel.org # 3.10+ Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-08hotplug: Prevent alloc/free of irq descriptors during cpu up/downThomas Gleixner
When a cpu goes up some architectures (e.g. x86) have to walk the irq space to set up the vector space for the cpu. While this needs extra protection at the architecture level we can avoid a few race conditions by preventing the concurrent allocation/free of irq descriptors and the associated data. When a cpu goes down it moves the interrupts which are targeted to this cpu away by reassigning the affinities. While this happens interrupts can be allocated and freed, which opens a can of race conditions in the code which reassignes the affinities because interrupt descriptors might be freed underneath. Example: CPU1 CPU2 cpu_up/down irq_desc = irq_to_desc(irq); remove_from_radix_tree(desc); raw_spin_lock(&desc->lock); free(desc); We could protect the irq descriptors with RCU, but that would require a full tree change of all accesses to interrupt descriptors. But fortunately these kind of race conditions are rather limited to a few things like cpu hotplug. The normal setup/teardown is very well serialized. So the simpler and obvious solution is: Prevent allocation and freeing of interrupt descriptors accross cpu hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
2015-07-07tick/broadcast: Handle spurious interrupts gracefullyThomas Gleixner
Andriy reported that on a virtual machine the warning about negative expiry time in the clock events programming code triggered: hpet: hpet0 irq 40 for MSI hpet: hpet1 irq 41 for MSI Switching to clocksource hpet WARNING: at kernel/time/clockevents.c:239 [<ffffffff810ce6eb>] clockevents_program_event+0xdb/0xf0 [<ffffffff810cf211>] tick_handle_periodic_broadcast+0x41/0x50 [<ffffffff81016525>] timer_interrupt+0x15/0x20 When the second hpet is installed as a per cpu timer the broadcast event is not longer required and stopped, which sets the next_evt of the broadcast device to KTIME_MAX. If after that a spurious interrupt happens on the broadcast device, then the current code blindly handles it and tries to reprogram the broadcast device afterwards, which adds the period to next_evt. KTIME_MAX + period results in a negative expiry value causing the WARN_ON in the clockevents code to trigger. Add a proper check for the state of the broadcast device into the interrupt handler and return if the interrupt is spurious. [ Folded in pointer fix from Sudeep ] Reported-by: Andriy Gapon <avg@FreeBSD.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20150705205221.802094647@linutronix.de
2015-07-07tick/broadcast: Check for hrtimer broadcast active earlyThomas Gleixner
If the current cpu is the one which has the hrtimer based broadcast queued then we better return busy immediately instead of going through loops and hoops to figure that out. [ Split out from a larger combo patch ] Tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Suzuki Poulose <Suzuki.Poulose@arm.com> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> Cc: Catalin Marinas <Catalin.Marinas@arm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos