summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-03-09bug: use %pB in BUG and stack protector failureKees Cook
The BUG and stack protector reports were still using a raw %p. This changes it to %pB for more meaningful output. Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Richard Weinberger <richard.weinberger@gmail.com>, Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-09arm64: signal: Ensure si_code is valid for all fault signalsDave Martin
Currently, as reported by Eric, an invalid si_code value 0 is passed in many signals delivered to userspace in response to faults and other kernel errors. Typically 0 is passed when the fault is insufficiently diagnosable or when there does not appear to be any sensible alternative value to choose. This appears to violate POSIX, and is intuitively wrong for at least two reasons arising from the fact that 0 == SI_USER: 1) si_code is a union selector, and SI_USER (and si_code <= 0 in general) implies the existence of a different set of fields (siginfo._kill) from that which exists for a fault signal (siginfo._sigfault). However, the code raising the signal typically writes only the _sigfault fields, and the _kill fields make no sense in this case. Thus when userspace sees si_code == 0 (SI_USER) it may legitimately inspect fields in the inactive union member _kill and obtain garbage as a result. There appears to be software in the wild relying on this, albeit generally only for printing diagnostic messages. 2) Software that wants to be robust against spurious signals may discard signals where si_code == SI_USER (or <= 0), or may filter such signals based on the si_uid and si_pid fields of siginfo._sigkill. In the case of fault signals, this means that important (and usually fatal) error conditions may be silently ignored. In practice, many of the faults for which arm64 passes si_code == 0 are undiagnosable conditions such as exceptions with syndrome values in ESR_ELx to which the architecture does not yet assign any meaning, or conditions indicative of a bug or error in the kernel or system and thus that are unrecoverable and should never occur in normal operation. The approach taken in this patch is to translate all such undiagnosable or "impossible" synchronous fault conditions to SIGKILL, since these are at least probably localisable to a single process. Some of these conditions should really result in a kernel panic, but due to the lack of diagnostic information it is difficult to be certain: this patch does not add any calls to panic(), but this could change later if justified. Although si_code will not reach userspace in the case of SIGKILL, it is still desirable to pass a nonzero value so that the common siginfo handling code can detect incorrect use of si_code == 0 without false positives. In this case the si_code dependent siginfo fields will not be correctly initialised, but since they are not passed to userspace I deem this not to matter. A few faults can reasonably occur in realistic userspace scenarios, and _should_ raise a regular, handleable (but perhaps not ignorable/blockable) signal: for these, this patch attempts to choose a suitable standard si_code value for the raised signal in each case instead of 0. arm64 was the only arch to define a BUS_FIXME code, so after this patch nobody defines it. This patch therefore also removes the relevant code from siginfo_layout(). Cc: James Morse <james.morse@arm.com> Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-09softirq: Consolidate common code in tasklet_[hi]_action()Ingo Molnar
tasklet_action() + tasklet_hi_action() are almost identical. Move the common code from both function into __tasklet_action_common() and let both functions invoke it with different arguments. [ bigeasy: Splitted out from RT's "tasklet: Prevent tasklets from going into infinite spin in RT" and added commit message] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Julia Cartwright <juliac@eso.teric.us> Link: https://lkml.kernel.org/r/20180227164808.10093-3-bigeasy@linutronix.de
2018-03-09softirq: Consolidate common code in __tasklet_[hi]_schedule()Ingo Molnar
__tasklet_schedule() and __tasklet_hi_schedule() are almost identical. Move the common code from both function into __tasklet_schedule_common() and let both functions invoke it with different arguments. [ bigeasy: Splitted out from RT's "tasklet: Prevent tasklets from going into infinite spin in RT" and added commit message. Use this_cpu_ptr(headp) in __tasklet_schedule_common() as suggested by Julia Cartwright ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Julia Cartwright <juliac@eso.teric.us> Link: https://lkml.kernel.org/r/20180227164808.10093-2-bigeasy@linutronix.de
2018-03-09genirq/irq_sim: Return the base of the irq range from irq_sim_init()Bartosz Golaszewski
Returning the base of the allocated interrupt range from irq_sim_init() and devm_irq_sim_init() allows users to handle the logic of associating irq numbers with any other driver-specific resources without having to use irq_sim_irqnum(). Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20180304121018.640-4-brgl@bgdev.pl
2018-03-09genirq/irq_sim: Check the return value of irq_sim_init() for error codesBartosz Golaszewski
As discussed with Marc Zyngier: irq_sim_init() and its devres variant should return the base of the allocated interrupt range on success rather than 0. Make devm_irq_sim_init() check for an error code. This is a preparatory change for modifying irq_sim_init() itself. Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20180304121018.640-3-brgl@bgdev.pl
2018-03-09genirq/irq_sim: Explicitly include slab.hBartosz Golaszewski
kfree() is used in the irq_sim code but slab.h is pulled in indirectly via irq.h. Include it explicitly. Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20180304121018.640-2-brgl@bgdev.pl
2018-03-09rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsitesBoqun Feng
When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a HARDIRQ-safe->HARDIRQ-unsafe deadlock: ================================ WARNING: inconsistent lock state 4.16.0-rc4+ #1 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. takes: __schedule+0xbe/0xaf0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock+0x2a/0x40 scheduler_tick+0x47/0xf0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rq->lock); <Interrupt> lock(&rq->lock); *** DEADLOCK *** 1 lock held by rcu_torture_rea/724: rcu_torture_read_lock+0x0/0x70 stack backtrace: CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: lock_acquire+0x90/0x200 ? __schedule+0xbe/0xaf0 _raw_spin_lock+0x2a/0x40 ? __schedule+0xbe/0xaf0 __schedule+0xbe/0xaf0 preempt_schedule_irq+0x2f/0x60 retint_kernel+0x1b/0x2d RIP: 0010:rcu_read_unlock_special+0x0/0x680 ? rcu_torture_read_unlock+0x60/0x60 __rcu_read_unlock+0x64/0x70 rcu_torture_read_unlock+0x17/0x60 rcu_torture_reader+0x275/0x450 ? rcutorture_booster_init+0x110/0x110 ? rcu_torture_stall+0x230/0x230 ? kthread+0x10e/0x130 kthread+0x10e/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ? call_usermodehelper_exec_async+0x11a/0x150 ret_from_fork+0x3a/0x50 This happens with the following even sequence: preempt_schedule_irq(); local_irq_enable(); __schedule(): local_irq_disable(); // irq off ... rcu_note_context_switch(): rcu_note_preempt_context_switch(): rcu_read_unlock_special(): local_irq_save(flags); ... raw_spin_unlock_irqrestore(...,flags); // irq remains off rt_mutex_futex_unlock(): raw_spin_lock_irq(); ... raw_spin_unlock_irq(); // accidentally set irq on <return to __schedule()> rq_lock(): raw_spin_lock(); // acquiring rq->lock with irq on which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause deadlocks in scheduler code. This problem was introduced by commit 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints"). That brought the user of rt_mutex_futex_unlock() with irq off. To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with *lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock() with irq off. Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints") Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.feng@gmail.com
2018-03-09bpf: comment why dots in filenames under BPF virtual FS are not allowedQuentin Monnet
When pinning a file under the BPF virtual file system (traditionally /sys/fs/bpf), using a dot in the name of the location to pin at is not allowed. For example, trying to pin at "/sys/fs/bpf/foo.bar" will be rejected with -EPERM. This check was introduced at the same time as the BPF file system itself, with commit b2197755b263 ("bpf: add support for persistent maps/progs"). At this time, it was checked in a function called "bpf_dname_reserved()", which made clear that using a dot was reserved for future extensions. This function disappeared and the check was moved elsewhere with commit 0c93b7d85d40 ("bpf: reject invalid names right in ->lookup()"), and the meaning of the dot ban was lost. The present commit simply adds a comment in the source to explain to the reader that the usage of dots is reserved for future usage. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09perf/core: Fix ctx_event_type in ctx_resched()Song Liu
In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is added. However, ctx_resched() calculates ctx_event_type before checking this condition. As a result, pinned events will NOT get higher priority than flexible events. The following shows this issue on an Intel CPU (where ref-cycles can only use one hardware counter). 1. First start: perf stat -C 0 -e ref-cycles -I 1000 2. Then, in the second console, run: perf stat -C 0 -e ref-cycles:D -I 1000 The second perf uses pinned events, which is expected to have higher priority. However, because it failed in ctx_resched(). It is never run. This patch fixes this by calculating ctx_event_type after re-evaluating event_type. Reported-by: Ephraim Park <ephiepark@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <jolsa@redhat.com> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 487f05e18aa4 ("perf/core: Optimize event rescheduling on active contexts") Link: http://lkml.kernel.org/r/20180306055504.3283731-1-songliubraving@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/completions: Use bool in try_wait_for_completion()gaurav jindal
Since the return type of the function is bool, the internal 'ret' variable should be bool too. Signed-off-by: Gaurav Jindal<gauravjindal1104@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180221125407.GA14292@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Update blocked load when newly idleVincent Guittot
When NEWLY_IDLE load balance is not triggered, we might need to update the blocked load anyway. We can kick an ilb so an idle CPU will take care of updating blocked load or we can try to update them locally before entering idle. In the latter case, we reuse part of the nohz_idle_balance. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: brendan.jackman@arm.com Cc: dietmar.eggemann@arm.com Cc: morten.rasmussen@foss.arm.com Cc: valentin.schneider@arm.com Link: http://lkml.kernel.org/r/1518622006-16089-4-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Move idle_balance()Peter Zijlstra
We're going to want to call nohz_idle_balance() or parts thereof from idle_balance(). Since we already have a forward declaration of idle_balance() move it down such that it's below nohz_idle_balance() avoiding the need for a forward declaration for that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/nohz: Merge CONFIG_NO_HZ_COMMON blocksPeter Zijlstra
Now that we have two back-to-back NO_HZ_COMMON blocks, merge them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Move rebalance_domains()Peter Zijlstra
This pure code movement results in two #ifdef CONFIG_NO_HZ_COMMON sections landing next to each other. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/nohz: Optimize nohz_idle_balance()Peter Zijlstra
Avoid calling update_blocked_averages() when it does not in fact have any by re-using/extending update_nohz_stats(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Reduce the periodic update durationVincent Guittot
Instead of using the cfs_rq_is_decayed() which monitors all *_avg and *_sum, we create a cfs_rq_has_blocked() which only takes care of util_avg and load_avg. We are only interested by these 2 values which are decaying faster than the *_sum so we can stop the periodic update earlier. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: brendan.jackman@arm.com Cc: dietmar.eggemann@arm.com Cc: morten.rasmussen@foss.arm.com Cc: valentin.schneider@arm.com Link: http://lkml.kernel.org/r/1518517879-2280-3-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/nohz: Stop NOHZ stats when decayedVincent Guittot
Stopped the periodic update of blocked load when all idle CPUs have fully decayed. We introduce a new nohz.has_blocked that reflect if some idle CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked is set everytime that a Idle CPU can have blocked load and it is then clear when no more blocked load has been detected during an update. We don't need atomic operation but only to make cure of the right ordering when updating nohz.idle_cpus_mask and nohz.has_blocked. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: brendan.jackman@arm.com Cc: dietmar.eggemann@arm.com Cc: morten.rasmussen@foss.arm.com Cc: valentin.schneider@arm.com Link: http://lkml.kernel.org/r/1518517879-2280-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/cpufreq: Provide migration hintPeter Zijlstra
It was suggested that a migration hint might be usefull for the CPU-freq governors. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/nohz: Clean up nohz enter/exitPeter Zijlstra
The primary observation is that nohz enter/exit is always from the current CPU, therefore NOHZ_TICK_STOPPED does not in fact need to be an atomic. Secondary is that we appear to have 2 nearly identical hooks in the nohz enter code, set_cpu_sd_state_idle() and nohz_balance_enter_idle(). Fold the whole set_cpu_sd_state thing into nohz_balance_{enter,exit}_idle. Removes an atomic op from both enter and exit paths. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Update blocked load from NEWIDLEPeter Zijlstra
Since we already iterate CPUs looking for work on NEWIDLE, use this iteration to age the blocked load. If the domain for which this is done completely spand the idle set, we can push the ILB based aging forward. Suggested-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Add NOHZ stats balancingPeter Zijlstra
Teach the idle balancer about the need to update statistics which have a different periodicity from regular balancing. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Restructure nohz_balance_kick()Peter Zijlstra
The current: if (nohz_kick_needed()) nohz_balancer_kick() is pointless complexity, fold them into a single call and avoid the various conditions at the call site. When we introduce multiple different needs to kick the ilb, the above construct also becomes a problem. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Add NOHZ_STATS_KICKPeter Zijlstra
Split the NOHZ idle balancer into doing two separate actions: - update blocked load statistic - actually load-balance Since the latter requires the former, ensure this happens. For now always tag both bits at the same time. Prepares for a future where we can toggle only the STATS bit. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/core: Convert nohz_flags to atomic_tPeter Zijlstra
Using atomic_t allows us to use the more flexible bitops provided there. Also its smaller. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09cpufreq/schedutil: Rewrite CPUFREQ_RT supportPeter Zijlstra
Instead of trying to duplicate scheduler state to track if an RT task is running, directly use the scheduler runqueue state for it. This vastly simplifies things and fixes a number of bugs related to sugov and the scheduler getting out of sync wrt this state. As a consequence we not also update the remove cfs/dl state when iterating the shared mask. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09cpufreq/schedutil: Remove unused CPUFREQ_DLPeter Zijlstra
Bitrot... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09sched/fair: Add ';' after label attributesNorbert Manthey
Due to using GCC defines for configuration, some labels might be unused in certain configurations. While adding a __maybe_unused to the label is fine in general, the line has to be terminated with ';'. This is also reflected in the GCC documentation, but GCC parsed the previous variant without an error message. This has been spotted while compiling with goto-cc, the compiler for the CPROVER tool suite. Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Signed-off-by: Michael Tautschnig <tautschn@amazon.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1519717660-16157-1-git-send-email-nmanthey@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-08audit: link denied should not directly generate PATH recordRichard Guy Briggs
Audit link denied events generate duplicate PATH records which disagree in different ways from symlink and hardlink denials. audit_log_link_denied() should not directly generate PATH records. See: https://github.com/linux-audit/audit-kernel/issues/21 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-08audit: make ANOM_LINK obey audit_enabled and audit_dummy_contextRichard Guy Briggs
Audit link denied events emit disjointed records when audit is disabled. No records should be emitted when audit is disabled. See: https://github.com/linux-audit/audit-kernel/issues/21 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-08module: propagate error in modules_open()Leon Yu
otherwise kernel can oops later in seq_release() due to dereferencing null file->private_data which is only set if seq_open() succeeds. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: seq_release+0xc/0x30 Call Trace: close_pdeo+0x37/0xd0 proc_reg_release+0x5d/0x60 __fput+0x9d/0x1d0 ____fput+0x9/0x10 task_work_run+0x75/0x90 do_exit+0x252/0xa00 do_group_exit+0x36/0xb0 SyS_exit_group+0xf/0x10 Fixes: 516fb7f2e73d ("/proc/module: use the same logic as /proc/kallsyms for address exposure") Cc: Jessica Yu <jeyu@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-03-08Merge tag 'mips_fixes_4.16_4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fixes from James Hogan: "A miscellaneous pile of MIPS fixes for 4.16: - move put_compat_sigset() to evade hardened usercopy warnings (4.16) - select ARCH_HAVE_PC_{SERIO,PARPORT} for Loongson64 platforms (4.16) - fix kzalloc() failure handling in ath25 (3.19) and Octeon (4.0) - fix disabling of IPIs during BMIPS suspend (3.19)" * tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: BMIPS: Do not mask IPIs during suspend MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_SERIO MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_PARPORT signals: Move put_compat_sigset to compat.h to silence hardened usercopy MIPS: OCTEON: irq: Check for null return on kzalloc allocation MIPS: ath25: Check for kzalloc allocation failure
2018-03-08panic: Add closing panic marker parenthesisBorislav Petkov
Otherwise it looks unbalanced. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180306094920.16917-2-bp@alien8.de
2018-03-08bpf: add support to read sample address in bpf programTeng Qin
This commit adds new field "addr" to bpf_perf_event_data which could be read and used by bpf programs attached to perf events. The value of the field is copied from bpf_perf_event_data_kern.addr and contains the address value recorded by specifying sample_type with PERF_SAMPLE_ADDR when calling perf_event_open. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-07usb, signal, security: only pass the cred, not the secid, to ↵Stephen Smalley
kill_pid_info_as_cred and security_task_kill commit d178bc3a708f39cbfefc3fab37032d3f2511b4ec ("user namespace: usb: make usb urbs user namespace aware (v2)") changed kill_pid_info_as_uid to kill_pid_info_as_cred, saving and passing a cred structure instead of uids. Since the secid can be obtained from the cred, drop the secid fields from the usb_dev_state and async structures, and drop the secid argument to kill_pid_info_as_cred. Replace the secid argument to security_task_kill with the cred. Update SELinux, Smack, and AppArmor to use the cred, which avoids the need for Smack and AppArmor to use a secid at all in this hook. Further changes to Smack might still be required to take full advantage of this change, since it should now be possible to perform capability checking based on the supplied cred. The changes to Smack and AppArmor have only been compile-tested. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2018-03-06kernel/memremap: Remove stale devres_free() callOliver O'Halloran
devm_memremap_pages() was re-worked in e8d513483300 "memremap: change devm_memremap_pages interface to use struct dev_pagemap" to take a caller allocated struct dev_pagemap as a function parameter. A call to devres_free() was left in the error cleanup path which results in a kernel panic if the remap fails for some reason. Remove it to fix the panic and let devm_memremap_pages() fail gracefully. Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-06audit: do not panic on invalid boot parameterGreg Edwards
If you pass in an invalid audit boot parameter value, e.g. "audit=off", the kernel panics very early in boot before the regular console is initialized. Unless you have earlyprintk enabled, there is no indication of what the problem is on the console. Convert the panic() calls to pr_err(), and leave auditing enabled if an invalid parameter value was passed in. Modify the parameter to also accept "on" or "off" as valid values, and update the documentation accordingly. Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-06Merge tag 'v4.16-rc4' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use an appropriate TSQ pacing shift in mac80211, from Toke Høiland-Jørgensen. 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk in ip6_route_me_harder, from Eric Dumazet. 3) Fix several shutdown races and similar other problems in l2tp, from James Chapman. 4) Handle missing XDP flush properly in tuntap, for real this time. From Jason Wang. 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel Borkmann. 6) Fix phy_resume() locking, from Andrew Lunn. 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix from Xin Long. 8) Revert F-RTO middle box workarounds, they only handle one dimension of the problem. From Yuchung Cheng. 9) Fix socket refcounting in RDS, from Ka-Cheong Poon. 10) Don't allow ppp unit registration to an unregistered channel, from Guillaume Nault. 11) Various hv_netvsc fixes from Stephen Hemminger. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits) hv_netvsc: propagate rx filters to VF hv_netvsc: filter multicast/broadcast hv_netvsc: defer queue selection to VF hv_netvsc: use napi_schedule_irqoff hv_netvsc: fix race in napi poll when rescheduling hv_netvsc: cancel subchannel setup before halting device hv_netvsc: fix error unwind handling if vmbus_open fails hv_netvsc: only wake transmit queue if link is up hv_netvsc: avoid retry on send during shutdown virtio-net: re enable XDP_REDIRECT for mergeable buffer ppp: prevent unregistered channels from connecting to PPP units tc-testing: skbmod: fix match value of ethertype mlxsw: spectrum_switchdev: Check success of FDB add operation net: make skb_gso_*_seglen functions private net: xfrm: use skb_gso_validate_network_len() to check gso sizes net: sched: tbf: handle GSO_BY_FRAGS case in enqueue net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len rds: Incorrect reference counting in TCP socket creation net: ethtool: don't ignore return from driver get_fecparam method vrf: check forwarding on the original netdevice when generating ICMP dest unreachable ...
2018-03-04Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of fixes from the timer departement: - Add a missing timer wheel clock forward when migrating timers off a unplugged CPU to prevent operating on a stale clock base and missing timer deadlines. - Use the proper shift count to extract data from a register value to prevent evaluating unrelated bits - Make the error return check in the FSL timer driver work correctly. Checking an unsigned variable for less than zero does not really work well. - Clarify the confusing comments in the ARC timer code" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Forward timer base before migrating timers clocksource/drivers/arc_timer: Update some comments clocksource/drivers/mips-gic-timer: Use correct shift count to extract data clocksource/drivers/fsl_ftm_timer: Fix error return checking
2018-03-04sched/core: Undefine tracepoint creation at the end of core.cIngo Molnar
Make it easier to concatenate all the scheduler .c files for single-module compilation. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04sched/deadline, rt: Rename queue_push_tasks/queue_pull_task to create ↵Ingo Molnar
separate namespace There are similarly named functions in both of these modules: kernel/sched/deadline.c:static inline void queue_push_tasks(struct rq *rq) kernel/sched/deadline.c:static inline void queue_pull_task(struct rq *rq) kernel/sched/deadline.c:static inline void queue_push_tasks(struct rq *rq) kernel/sched/deadline.c:static inline void queue_pull_task(struct rq *rq) kernel/sched/deadline.c: queue_push_tasks(rq); kernel/sched/deadline.c: queue_pull_task(rq); kernel/sched/deadline.c: queue_push_tasks(rq); kernel/sched/deadline.c: queue_pull_task(rq); kernel/sched/rt.c:static inline void queue_push_tasks(struct rq *rq) kernel/sched/rt.c:static inline void queue_pull_task(struct rq *rq) kernel/sched/rt.c:static inline void queue_push_tasks(struct rq *rq) kernel/sched/rt.c: queue_push_tasks(rq); kernel/sched/rt.c: queue_pull_task(rq); kernel/sched/rt.c: queue_push_tasks(rq); kernel/sched/rt.c: queue_pull_task(rq); ... which makes it harder to grep for them. Prefix them with deadline_ and rt_, respectively. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04sched/idle: Merge kernel/sched/idle.c and kernel/sched/idle_task.cIngo Molnar
Merge these two small .c modules as they implement two aspects of idle task handling. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-04sched/headers: Simplify and clean up header usage in the schedulerIngo Molnar
Do the following cleanups and simplifications: - sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again. - order the <linux/sched/*.h> headers alphabetically - add all <linux/sched/*.h> headers to kernel/sched/sched.h - remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h. Finally, make all scheduler .c files use a single common header: #include "sched.h" ... which now contains a union of the relied upon headers. This makes the various .c files easier to read and easier to handle. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-03Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A 4.16 regression fix, three fixes for -stable, and a cleanup fix: - During the merge window support for the new ACPI NVDIMM Platform Capabilities structure disabled support for "deep flush", a force-unit- access like mechanism for persistent memory. Restore that mechanism. - VFIO like RDMA is yet one more memory registration / pinning interface that is incompatible with Filesystem-DAX. Disable long term pins of Filesystem-DAX mappings via VFIO. - The Filesystem-DAX detection to prevent long terms pins mistakenly also disabled Device-DAX pins which are not subject to the same block- map collision concerns. - Similar to the setup path, softlockup warnings can trigger in the shutdown path for large persistent memory namespaces. Teach for_each_device_pfn() to perform cond_resched() in all cases. - Boaz noticed that the might_sleep() in dax_direct_access() is stale as of the v4.15 kernel. These have received a build success notification from the 0day robot, and the longterm pin fixes have appeared in -next. However, I recently rebased the tree to remove some other fixes that need to be reworked after review feedback. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: memremap: fix softlockup reports at teardown libnvdimm: re-enable deep flush for pmem devices via fsync() vfio: disable filesystem-dax page pinning dax: fix vma_is_fsdax() helper dax: ->direct_access does not sleep anymore
2018-03-03sched: Clean up and harmonize the coding style of the scheduler code baseIngo Molnar
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all these details: - fix speling in comments, - use curly braces for multi-line statements, - remove unnecessary parentheses from integer literals, - capitalize consistently, - remove stray newlines, - add comments where necessary, - remove invalid/unnecessary comments, - align structure definitions and other data types vertically, - add missing newlines for increased readability, - fix vertical tabulation where it's misaligned, - harmonize preprocessor conditional block labeling and vertical alignment, - remove line-breaks where they uglify the code, - add newline after local variable definitions, No change in functionality: md5: 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-03sched/deadline: Clean up various coding style detailsMario Leinweber
- Fixed style error: Missing space before the open parenthesis - Fixed style warnings: 2x Missing blank line after declaration One warning left: else after return (I don't feel comfortable fixing that without side effects) Signed-off-by: Mario Leinweber <marioleinweber@web.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20180302182007.28691-1-marioleinweber@web.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-02memremap: fix softlockup reports at teardownDan Williams
The cond_resched() currently in the setup path needs to be duplicated in the teardown path. Rather than require each instance of for_each_device_pfn() to open code the same sequence, embed it in the helper. Link: https://github.com/intel/ixpdimm_sw/issues/11 Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Fixes: 71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page()...") Signed-off-by: Dan Williams <dan.j.williams@intel.com>