summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2017-07-26device property: export irqchip_fwnode_opsArnd Bergmann
The newly added irqchip_fwnode_ops structure is not exported, which can lead to link errors: ERROR: "irqchip_fwnode_ops" [drivers/gpio/gpio-xgene-sb.ko] undefined! I checked that all other such symbols that were introduced are exported if they need to be, this is the only missing one. Fixes: db3e50f3234b (device property: Get rid of struct fwnode_handle type field) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-26cpufreq: schedutil: Set dynamic_switching to trueViresh Kumar
Set dynamic_switching to 'true' to disallow use of schedutil governor for platforms with transition_latency set to CPUFREQ_ETERNAL, as they may not want to do automatic dynamic frequency switching. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-25rcu: Move callback-list warning to irq-disable regionPaul E. McKenney
After adopting callbacks from a newly offlined CPU, the adopting CPU checks to make sure that its callback list's count is zero only if the list has no callbacks and vice versa. Unfortunately, it does so after enabling interrupts, which means that false positives are possible due to interrupt handlers invoking call_rcu(). Although these false positives are improbable, rcutorture did make it happen once. This commit therefore moves this check to an irq-disabled region of code, thus suppressing the false positive. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Remove unused RCU list functionsPaul E. McKenney
Given changes to callback migration, rcu_cblist_head(), rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(), rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are no longer used. This commit therefore removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Localize rcu_state ->orphan_pend and ->orphan_donePaul E. McKenney
Given that the rcu_state structure's >orphan_pend and ->orphan_done fields are used only during migration of callbacks from the recently offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() are combined, these fields can become local variables in the combined function. This commit therefore combines rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new rcu_segcblist_merge() function and removes the ->orphan_pend and ->orphan_done fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Advance callbacks after migrationPaul E. McKenney
When migrating callbacks from a newly offlined CPU, we are already holding the root rcu_node structure's lock, so it costs almost nothing to advance and accelerate the newly migrated callbacks. This patch therefore makes this advancing and acceleration happen. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Eliminate rcu_state ->orphan_lockPaul E. McKenney
The ->orphan_lock is acquired and released only within the rcu_migrate_callbacks() function, which now acquires the root rcu_node structure's ->lock. This commit therefore eliminates the ->orphan_lock in favor of the root rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Advance outgoing CPU's callbacks before migrating themPaul E. McKenney
It is possible that the outgoing CPU is unaware of recent grace periods, and so it is also possible that some of its pending callbacks are actually ready to be invoked. The current callback-migration code would needlessly force these callbacks to pass through another grace period. This commit therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in order to give them full credit for having passed through any recent grace periods. This also fixes an odd theoretical bug where there are no callbacks in the system except for those on the outgoing CPU, none of those callbacks have yet been associated with a grace-period number, there is never again another callback registered, and the surviving CPU never again takes a scheduling-clock interrupt, never goes idle, and never enters nohz_full userspace execution. Yes, this is (just barely) possible. It requires that the surviving CPU be a nohz_full CPU, that its scheduler-clock interrupt be shut off, and that it loop forever in the kernel. You get bonus points if you can make this one happen! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Make NOCB CPUs migrate CBs directly from outgoing CPUPaul E. McKenney
RCU's CPU-hotplug callback-migration code first moves the outgoing CPU's callbacks to ->orphan_done and ->orphan_pend, and only then moves them to the NOCB callback list. This commit avoids the extra step (and simplifies the code) by moving the callbacks directly from the outgoing CPU's callback list to the NOCB callback list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Check for NOCB CPUs and empty lists earlier in CB migrationPaul E. McKenney
The current CPU-hotplug RCU-callback-migration code checks for the source (newly offlined) CPU being a NOCBs CPU down in rcu_send_cbs_to_orphanage(). This commit simplifies callback migration a bit by moving this check up to rcu_migrate_callbacks(). This commit also adds a check for the source CPU having no callbacks, which eases analysis of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Remove orphan/adopt event-tracing fieldsPaul E. McKenney
The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields are updated, but never read. This commit therefore removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25torture: Fix typo suppressing CPU-hotplug statisticsPaul E. McKenney
The torture status line contains a series of values preceded by "onoff:". The last value in that line, the one preceding the "HZ=" string, is always zero. The reason that it is always zero is that torture_offline() was incrementing the sum_offl pointer instead of the value that this pointer referenced. This commit therefore makes this increment operate on the statistic rather than the pointer to the statistic. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Make expedited GPs correctly handle hardware CPU insertionPaul E. McKenney
The update of the ->expmaskinitnext and of ->ncpus are unsynchronized, with the value of ->ncpus being incremented long before the corresponding ->expmaskinitnext mask is updated. If an RCU expedited grace period sees ->ncpus change, it will update the ->expmaskinit masks from the new ->expmaskinitnext masks. But it is possible that ->ncpus has already been updated, but the ->expmaskinitnext masks still have their old values. For the current expedited grace period, no harm done. The CPU could not have been online before the grace period started, so there is no need to wait for its non-existent pre-existing readers. But the next RCU expedited grace period is in a world of hurt. The value of ->ncpus has already been updated, so this grace period will assume that the ->expmaskinitnext masks have not changed. But they have, and they won't be taken into account until the next never-been-online CPU comes online. This means that RCU will be ignoring some CPUs that it should be paying attention to. The solution is to update ->ncpus and ->expmaskinitnext while holding the ->lock for the rcu_node structure containing the ->expmaskinitnext mask. Because smp_store_release() is now used to update ->ncpus and smp_load_acquire() is now used to locklessly read it, if the expedited grace period sees ->ncpus change, then the updating CPU has to already be holding the corresponding ->lock. Therefore, when the expedited grace period later acquires that ->lock, it is guaranteed to see the new value of ->expmaskinitnext. On the other hand, if the expedited grace period loads ->ncpus just before an update, earlier full memory barriers guarantee that the incoming CPU isn't far enough along to be running any RCU readers. This commit therefore makes the required change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Migrate callbacks earlier in the CPU-offline timelinePaul E. McKenney
RCU callbacks must be migrated away from an outgoing CPU, and this is done near the end of the CPU-hotplug operation, after the outgoing CPU is long gone. Unfortunately, this means that other CPU-hotplug callbacks can execute while the outgoing CPU's callbacks are still immobilized on the long-gone CPU's callback lists. If any of these CPU-hotplug callbacks must wait, either directly or indirectly, for the invocation of any of the immobilized RCU callbacks, the system will hang. This commit avoids such hangs by migrating the callbacks away from the outgoing CPU immediately upon its departure, shortly after the return from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these callbacks and invoke them, which allows all the after-the-fact CPU-hotplug callbacks to wait on these RCU callbacks without risk of a hang. While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including dead code on the one hand and to avoid define-without-use warnings on the other hand. Reported-by: Jeffrey Hugo <jhugo@codeaurora.org> Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Richard Weinberger <richard@nod.at>
2017-07-25workqueue: implicit ordered attribute should be overridableTejun Heo
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automatically enabled ordered attribute for unbound workqueues w/ max_active == 1. Because ordered workqueues reject max_active and some attribute changes, this implicit ordered mode broke cases where the user creates an unbound workqueue w/ max_active == 1 and later explicitly changes the related attributes. This patch distinguishes explicit and implicit ordered setting and overrides from attribute changes if implict. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
2017-07-25cgroup: add comment to cgroup_enable_threaded()Tejun Heo
Explain cgroup_enable_threaded() and note that the function can never be called on the root cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Waiman Long <longman@redhat.com>
2017-07-25cgroup: remove unnecessary empty check when enabling threaded modeTejun Heo
cgroup_enable_threaded() checks that the cgroup doesn't have any tasks or children and fails the operation if so. This test is unnecessary because the first part is already checked by cgroup_can_be_thread_root() and the latter is unnecessary. The latter actually cause a behavioral oddity. Please consider the following hierarchy. All cgroups are domains. A / \ B C \ D If B is made threaded, C and D becomes invalid domains. Due to the no children restriction, threaded mode can't be enabled on C. For C and D, the only thing the user can do is removal. There is no reason for this restriction. Remove it. Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-25task_work: Replace spin_unlock_wait() with lock/unlock pairOleg Nesterov
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond the cmpxchg() dequeue loop. This should be safe from a performance perspective because ->pi_lock is local to the task and because calls to the other side of the race, task_work_cancel(), should be rare. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25rcu: Use timer as backstop for NOCB deferred wakeupsPaul E. McKenney
The handling of RCU's no-CBs CPUs has a maintenance headache, namely that if call_rcu() is invoked with interrupts disabled, the rcuo kthread wakeup must be defered to a point where we can be sure that scheduler locks are not held. Of course, there are a lot of code paths leading from an interrupts-disabled invocation of call_rcu(), and missing any one of these can result in excessive callback-invocation latency, and potentially even system hangs. This commit therefore uses a timer to guarantee that the wakeup will eventually occur. If one of the deferred-wakeup points kicks in, then the timer is simply cancelled. This commit also fixes up an incomplete removal of commits that were intended to plug remaining exit paths, which should have the added benefit of reducing the overhead of RCU's context-switch hooks. In addition, it simplifies leader-to-follower callback-list handoff by introducing locking. The call_rcu()-to-leader handoff continues to use atomic operations in order to maintain good real-time latency for common-case use of call_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
2017-07-25module: fix ddebug_remove_module()Zhou Chengming
ddebug_remove_module() use mod->name to find the ddebug_table of the module and remove it. But dynamic_debug_setup() use the first _ddebug->modname to create ddebug_table for the module. It's ok when the _ddebug->modname is the same with the mod->name. But livepatch module is special, it may contain _ddebugs of other modules, the modname of which is different from the name of livepatch module. So ddebug_remove_module() can't use mod->name to find the right ddebug_table and remove it. It can cause kernel crash when we cat the file <debugfs>/dynamic_debug/control. Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-07-25sched/core: Fix some documentation build warningsJonathan Corbet
The kerneldoc comments for try_to_wake_up_local() were out of date, leading to these documentation build warnings: ./kernel/sched/core.c:2080: warning: No description found for parameter 'rf' ./kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local' Update the comment to reflect current reality and give us some peace and quiet. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20170724135628.695cecfc@lwn.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-25sync to Linus v4.13-rc2 for subsystem developers to work againstJames Morris
2017-07-24bpf: dev_map_alloc() shouldn't return NULLDan Carpenter
We forgot to set the error code on two error paths which means that we return ERR_PTR(0) which is NULL. The caller, find_and_alloc_map(), is not expecting that and will have a NULL dereference. Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24rcutorture: Invoke call_rcu() from timer handlerPaul E. McKenney
The Linux kernel invokes call_rcu() from various interrupt/softirq handlers, but rcutorture does not. This commit therefore adds this behavior to rcutorture's repertoire. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcu: Add last-CPU to GP-kthread starvation messagesPaul E. McKenney
This commit augments the grace-period-kthread starvation debugging messages by adding the last CPU that ran the kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Eliminate unused ts_rem local from rcu_trace_clock_local()Paul E. McKenney
This commit removes an unused local variable named ts_rem that is marked __maybe_unused. Yes, the variable was assigned to, but it was never used beyond that point, hence not needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Add task's CPU for rcutorture writer stallsPaul E. McKenney
It appears that at least some of the rcutorture writer stall messages coincide with unusually long CPU-online operations, for example, no fewer than 205 seconds in a recent test. It is of course possible that the writer stall is not unrelated to this unusually long CPU-hotplug operation, and so this commit adds the rcutorture writer task's CPU to the stall message to gain more information about this possible connection. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Place event-traced strings into trace bufferPaul E. McKenney
Strings used in event tracing need to be specially handled, for example, being copied to the trace buffer instead of being pointed to by the trace buffer. Although the TPS() macro can be used to "launder" pointed-to strings, this might not be all that effective within a loadable module. This commit therefore copies rcutorture's strings to the trace buffer. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2017-07-24rcutorture: Enable SRCU readers from timer handlerPaul E. McKenney
Now that it is legal to invoke srcu_read_lock() and srcu_read_unlock() for a given srcu_struct from both process context and {soft,}irq handlers, it is time to test it. This commit therefore enables testing of SRCU readers from rcutorture's timer handler, using in_task() to determine whether or not it is safe to sleep in the SRCU read-side critical sections. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcu: Remove CONFIG_TASKS_RCU ifdef from rcuperf.cPaul E. McKenney
The synchronize_rcu_tasks() and call_rcu_tasks() APIs are now available regardless of kernel configuration, so this commit removes the CONFIG_TASKS_RCU ifdef from rcuperf.c. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Print SRCU lock/unlock totalsPaul E. McKenney
This commit adds printing of SRCU lock/unlock totals, which are just the sums of the per-CPU counts. Saves a bit of mental arithmetic. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24rcutorture: Move SRCU status printing to SRCU implementationsPaul E. McKenney
This commit gets rid of some ugly #ifdefs in rcutorture.c by moving the SRCU status printing to the SRCU implementations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24srcu: Make process_srcu() be staticPaul E. McKenney
The function process_srcu() is not invoked outside of srcutree.c, so this commit makes it static and drops the EXPORT_SYMBOL_GPL(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24srcu: Move rcu_scheduler_starting() from Tiny RCU to Tiny SRCUPaul E. McKenney
Other than lockdep support, Tiny RCU has no need for the scheduler status. However, Tiny SRCU will need this to control boot-time behavior independent of lockdep. Therefore, this commit moves rcu_scheduler_starting() from kernel/rcu/tiny_plugin.h to kernel/rcu/srcutiny.c. This in turn allows the complete removal of kernel/rcu/tiny_plugin.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24PM / suspend: Define pr_fmt() in suspend.cRafael J. Wysocki
Define a common prefix ("PM:") for messages printed by the code in kernel/power/suspend.c. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-07-24PM / suspend: Use mem_sleep_labels[] strings in messagesRafael J. Wysocki
Some messages in suspend.c currently print state names from pm_states[], but that may be confusing if the mem_sleep sysfs attribute is changed to anything different from "mem", because in those cases the messages will say either "freeze" or "standby" after writing "mem" to /sys/power/state. To avoid the confusion, use mem_sleep_labels[] strings in those messages instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-07-24PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUGRafael J. Wysocki
The pm_test sysfs attribute is under CONFIG_PM_DEBUG, but it doesn't make sense to provide it if CONFIG_PM_SLEEP is unset, so put it under CONFIG_PM_SLEEP_DEBUG instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-24PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()Rafael J. Wysocki
Restore the pm_wakeup_pending() check in __device_suspend_noirq() removed by commit eed4d47efe95 (ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle) as that allows the function to return earlier if there's a wakeup event pending already (so that it may spend less time on carrying out operations that will be reversed shortly anyway) and rework the main suspend-to-idle loop to take that optimization into account. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-24PM / s2idle: Rearrange the main suspend-to-idle loopRafael J. Wysocki
As a preparation for subsequent changes, rearrange the core suspend-to-idle code by moving the initial invocation of dpm_suspend_noirq() into s2idle_loop(). This also causes debug messages from that code to appear in a less confusing order. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-24bpf/verifier: fix min/max handling in BPF_SUBEdward Cree
We have to subtract the src max from the dst min, and vice-versa, since (e.g.) the smallest result comes from the largest subtrahend. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24signal: Fix sending signals with siginfoEric W. Biederman
Today sending a signal with rt_sigqueueinfo and receving it on a signalfd does not work reliably. The issue is that reading a signalfd instead of returning a siginfo returns a signalfd_siginfo and the kernel must convert from one to the other. The kernel does not currently have the code to deduce which union members of struct siginfo are in use. In this patchset I fix that by introducing a new function siginfo_layout that can look at a siginfo and report which union member of struct siginfo is in use. Before that I clean up how we populate struct siginfo. The siginfo structure has two key members si_signo and si_code. Some si_codes are signal specific and for those it takes si_signo and si_code to indicate the members of siginfo that are valid. The rest of the si_code values are signal independent like SI_USER, SI_KERNEL, SI_QUEUE, and SI_TIMER and only si_code is needed to indicate which members of siginfo are valid. At least that is how POSIX documents them, and how common sense would indicate they should function. In practice we have been rather sloppy about maintaining the ABI in linux and we have some exceptions. We have a couple of buggy architectures that make SI_USER mean something different when combined with SIGFPE or SIGTRAP. Worse we have fcntl(F_SETSIG) which results in the si_codes POLL_IN, POLL_OUT, POLL_MSG, POLL_ERR, POLL_PRI, POLL_HUP being sent with any arbitrary signal, while the values are in a range that overlaps the signal specific si_codes. Thankfully the ambiguous cases with the POLL_NNN si_codes are for things no sane persion would do that so we can rectify the situtation. AKA no one cares so we won't cause a regression fixing it. As part of fixing this I stop leaking the __SI_xxxx codes to userspace and stop storing them in the high 16bits of si_code. Making the kernel code fundamentally simpler. We have already confirmed that the one application that would see this difference in kernel behavior CRIU won't be affected by this change as it copies values verbatim from one kernel interface to another. v3: - Corrected the patches so they bisect properly v2: - Benchmarked the code to confirm no performance changes are visible. - Reworked the first couple of patches so that TRAP_FIXME and FPE_FIXME are not exported to userspace. - Rebased on top of the siginfo cleanup that came in v4.13-rc1 - Updated alpha to use both TRAP_FIXME and FPE_FIXME Eric W. Biederman (7): signal/alpha: Document a conflict with SI_USER for SIGTRAP signal/ia64: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/mips: Document a conflict with SI_USER with SIGFPE signal/testing: Don't look for __SI_FAULT in userspace fcntl: Don't use ambiguous SIG_POLL si_codes signal: Remove kernel interal si_code magic Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-07-24signal: Remove kernel interal si_code magicEric W. Biederman
struct siginfo is a union and the kernel since 2.4 has been hiding a union tag in the high 16bits of si_code using the values: __SI_KILL __SI_TIMER __SI_POLL __SI_FAULT __SI_CHLD __SI_RT __SI_MESGQ __SI_SYS While this looks plausible on the surface, in practice this situation has not worked well. - Injected positive signals are not copied to user space properly unless they have these magic high bits set. - Injected positive signals are not reported properly by signalfd unless they have these magic high bits set. - These kernel internal values leaked to userspace via ptrace_peek_siginfo - It was possible to inject these kernel internal values and cause the the kernel to misbehave. - Kernel developers got confused and expected these kernel internal values in userspace in kernel self tests. - Kernel developers got confused and set si_code to __SI_FAULT which is SI_USER in userspace which causes userspace to think an ordinary user sent the signal and that it was not kernel generated. - The values make it impossible to reorganize the code to transform siginfo_copy_to_user into a plain copy_to_user. As si_code must be massaged before being passed to userspace. So remove these kernel internal si codes and make the kernel code simpler and more maintainable. To replace these kernel internal magic si_codes introduce the helper function siginfo_layout, that takes a signal number and an si_code and computes which union member of siginfo is being used. Have siginfo_layout return an enumeration so that gcc will have enough information to warn if a switch statement does not handle all of union members. A couple of architectures have a messed up ABI that defines signal specific duplications of SI_USER which causes more special cases in siginfo_layout than I would like. The good news is only problem architectures pay the cost. Update all of the code that used the previous magic __SI_ values to use the new SIL_ values and to call siginfo_layout to get those values. Escept where not all of the cases are handled remove the defaults in the switch statements so that if a new case is missed in the future the lack will show up at compile time. Modify the code that copies siginfo si_code to userspace to just copy the value and not cast si_code to a short first. The high bits are no longer used to hold a magic union member. Fixup the siginfo header files to stop including the __SI_ values in their constants and for the headers that were missing it to properly update the number of si_codes for each signal type. The fixes to copy_siginfo_from_user32 implementations has the interesting property that several of them perviously should never have worked as the __SI_ values they depended up where kernel internal. With that dependency gone those implementations should work much better. The idea of not passing the __SI_ values out to userspace and then not reinserting them has been tested with criu and criu worked without changes. Ref: 2.4.0-test1 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-07-23cgroup: fix error return value from cgroup_subtree_control()Tejun Heo
While refactoring, f7b2814bb9b6 ("cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write()") broke error return value from the function. The return value from the last operation is always overridden to zero. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-23PM / timekeeping: Print debug messages when requestedRafael J. Wysocki
The messages printed by tk_debug_account_sleep_time() are basically useful for system sleep debugging, so print them only when the other debug messages from the core suspend/hibernate code are enabled. While at it, make it clear that the messages from tk_debug_account_sleep_time() are about timekeeping suspend duration, because in general timekeeping may be suspeded and resumed for multiple times during one system suspend-resume cycle. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-22PM / sleep: Mark suspend/hibernation start and finishRafael J. Wysocki
Regardless of whether or not debug messages from the core system suspend/hibernation code are enabled, it is useful to know when system-wide transitions start and finish (or fail), so print "info" messages at these points. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Mark Salyzyn <salyzyn@android.com>
2017-07-22PM / sleep: Do not print debug messages by defaultRafael J. Wysocki
Debug messages from the system suspend/hibernation infrastructure can fill up the entire kernel log buffer in some cases and anyway they are only useful for debugging. They depend on CONFIG_PM_DEBUG, but that is set as a rule as some generally useful diagnostic facilities depend on it too. For this reason, avoid printing those messages by default, but make it possible to turn them on as needed with the help of a new sysfs attribute under /sys/power/. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-22PM / suspend: Export pm_suspend_target_stateFlorian Fainelli
Have the core suspend/resume framework store the system-wide suspend state (suspend_state_t) we are about to enter, and expose it to drivers via pm_suspend_target_state in order to retrieve that. The state is assigned in suspend_devices_and_enter(). This is useful for platform specific drivers that may need to take a slightly different suspend/resume path based on the system's suspend/resume state being entered. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-22cpufreq: Use transition_delay_us for legacy governors as wellViresh Kumar
The policy->transition_delay_us field is used only by the schedutil governor currently, and this field describes how fast the driver wants the cpufreq governor to change CPUs frequency. It should rather be a common thing across all governors, as it doesn't have any schedutil dependency here. Create a new helper cpufreq_policy_transition_delay_us() to get the transition delay across all governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-22device property: Get rid of struct fwnode_handle type fieldSakari Ailus
Instead of relying on the struct fwnode_handle type field, define fwnode_operations structs for all separate types of fwnodes. To find out the type, compare to the ops field to relevant ops structs. This change has two benefits: 1. it avoids adding the type field to each and every instance of struct fwnode_handle, thus saving memory and 2. makes the ops field the single factor that defines both the types of the fwnode as well as defines the implementation of its operations, decreasing the possibility of bugs when developing code dealing with fwnode internals. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-21Merge tag 'trace-v4.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Three minor updates - Use the new GFP_RETRY_MAYFAIL to be more aggressive in allocating memory for the ring buffer without causing OOMs - Fix a memory leak in adding and removing instances - Add __rcu annotation to be able to debug RCU usage of function tracing a bit better" * tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: trace: fix the errors caused by incompatible type of RCU variables tracing: Fix kmemleak in instance_rmdir tracing/ring_buffer: Try harder to allocate