summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2016-09-05futex: Add some more function commentryThomas Gleixner
Add some more comments and reformat existing ones to kernel doc style. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Link: http://lkml.kernel.org/r/1464770609-30168-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-24locking/hung_task: Show all locksVegard Nossum
When we get a hung task it can often be valuable to see _all_ the held locks on the system (in case we are being blocked on trying to acquire one), e.g. with this patch we can immediately see where the problem is below: INFO: task trinity-c3:14933 blocked for more than 120 seconds. Not tainted 4.8.0-rc1+ #135 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. trinity-c3 D ffff88010c16fc88 0 14933 1 0x00080004 ffff88010c16fc88 000000003b9aca00 0000000000000000 0000000000000296 00000000776cdf88 ffff88011a520ae0 ffff88011a520b08 ffff88011a520198 ffffffff867d7f00 ffff88011942c080 ffff880116841580 ffff88010c168000 Call Trace: [<ffffffff845e9d37>] schedule+0x77/0x230 [<ffffffff833cb8b9>] __lock_sock+0x129/0x250 [<ffffffff833cb790>] ? __sk_destruct+0x450/0x450 [<ffffffff81408ac0>] ? wake_bit_function+0x2e0/0x2e0 [<ffffffff833d832b>] lock_sock_nested+0xeb/0x120 [<ffffffff83bad815>] irda_setsockopt+0x65/0xb40 [<ffffffff833c6c09>] SyS_setsockopt+0x139/0x230 [<ffffffff833c6ad0>] ? SyS_recv+0x20/0x20 [<ffffffff81004660>] ? trace_event_raw_event_sys_enter+0xb90/0xb90 [<ffffffff823c7023>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8162ee60>] ? __context_tracking_exit.part.3+0x30/0x1b0 [<ffffffff833c6ad0>] ? SyS_recv+0x20/0x20 [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0 [<ffffffff845f84aa>] entry_SYSCALL64_slow_path+0x25/0x25 Showing all locks held in the system: 2 locks held by khungtaskd/563: #0: (rcu_read_lock){......}, at: [<ffffffff81534ce6>] watchdog+0x106/0x910 #1: (tasklist_lock){......}, at: [<ffffffff8141b3c4>] debug_show_all_locks+0x74/0x360 1 lock held by trinity-c0/19280: #0: (sk_lock-AF_IRDA){......}, at: [<ffffffff83bab7c6>] irda_accept+0x176/0x10f0 1 lock held by trinity-c0/12865: #0: (sk_lock-AF_IRDA){......}, at: [<ffffffff83bab7c6>] irda_accept+0x176/0x10f0 Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471538460-7505-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18locking/rwsem: Scan the wait_list for readers only onceDavidlohr Bueso
When wanting to wakeup readers, __rwsem_mark_wakeup() currently iterates the wait_list twice while looking to wakeup the first N queued reader-tasks. While this can be quite inefficient, it was there such that a awoken reader would be first and foremost acknowledged by the lock counter. Keeping the same logic, we can further benefit from the use of wake_qs and avoid entirely the first wait_list iteration that sets the counter as wake_up_process() isn't going to occur right away, and therefore we maintain the counter->list order of going about things. Other than saving cycles with O(n) "scanning", this change also nicely cleans up a good chunk of __rwsem_mark_wakeup(); both visually and less tedious to read. For example, the following improvements where seen on some will it scale microbenchmarks, on a 48-core Haswell: v4.7 v4.7-rwsem-v1 Hmean signal1-processes-8 5792691.42 ( 0.00%) 5771971.04 ( -0.36%) Hmean signal1-processes-12 6081199.96 ( 0.00%) 6072174.38 ( -0.15%) Hmean signal1-processes-21 3071137.71 ( 0.00%) 3041336.72 ( -0.97%) Hmean signal1-processes-48 3712039.98 ( 0.00%) 3708113.59 ( -0.11%) Hmean signal1-processes-79 4464573.45 ( 0.00%) 4682798.66 ( 4.89%) Hmean signal1-processes-110 4486842.01 ( 0.00%) 4633781.71 ( 3.27%) Hmean signal1-processes-141 4611816.83 ( 0.00%) 4692725.38 ( 1.75%) Hmean signal1-processes-172 4638157.05 ( 0.00%) 4714387.86 ( 1.64%) Hmean signal1-processes-203 4465077.80 ( 0.00%) 4690348.07 ( 5.05%) Hmean signal1-processes-224 4410433.74 ( 0.00%) 4687534.43 ( 6.28%) Stddev signal1-processes-8 6360.47 ( 0.00%) 8455.31 ( 32.94%) Stddev signal1-processes-12 4004.98 ( 0.00%) 9156.13 (128.62%) Stddev signal1-processes-21 3273.14 ( 0.00%) 5016.80 ( 53.27%) Stddev signal1-processes-48 28420.25 ( 0.00%) 26576.22 ( -6.49%) Stddev signal1-processes-79 22038.34 ( 0.00%) 18992.70 (-13.82%) Stddev signal1-processes-110 23226.93 ( 0.00%) 17245.79 (-25.75%) Stddev signal1-processes-141 6358.98 ( 0.00%) 7636.14 ( 20.08%) Stddev signal1-processes-172 9523.70 ( 0.00%) 4824.75 (-49.34%) Stddev signal1-processes-203 13915.33 ( 0.00%) 9326.33 (-32.98%) Stddev signal1-processes-224 15573.94 ( 0.00%) 10613.82 (-31.85%) Other runs that saw improvements include context_switch and pipe; and as expected, this is particularly highlighted on larger thread counts as it becomes more expensive to walk the list twice. No change in wakeup ordering or semantics. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman.Long@hp.com Cc: dave@stgolabs.net Cc: jason.low2@hpe.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1470384285-32163-4-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18locking/rwsem: Remove a few useless commentsDavidlohr Bueso
Our rwsem code (xadd, at least) is rather well documented, but there are a few really annoying comments in there that serve no purpose and we shouldn't bother with them. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman.Long@hp.com Cc: dave@stgolabs.net Cc: jason.low2@hpe.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1470384285-32163-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18locking/rwsem: Return void in __rwsem_mark_wake()Davidlohr Bueso
We currently return a rw_semaphore structure, which is the same lock we passed to the function's argument in the first place. While there are several functions that choose this return value, the callers use it, for example, for things like ERR_PTR. This is not the case for __rwsem_mark_wake(), and in addition this function is really about the lock waiters (which we know there are at this point), so its somewhat odd to be returning the sem structure. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman.Long@hp.com Cc: dave@stgolabs.net Cc: jason.low2@hpe.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1470384285-32163-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()Peter Zijlstra
The current percpu-rwsem read side is entirely free of serializing insns at the cost of having a synchronize_sched() in the write path. The latency of the synchronize_sched() is too high for cgroups. The commit 1ed1328792ff talks about the write path being a fairly cold path but this is not the case for Android which moves task to the foreground cgroup and back around binder IPC calls from foreground processes to background processes, so it is significantly hotter than human initiated operations. Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the problem, hopefully it should not be that slow after another commit: 80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global impact"). We could just add rcu_sync_enter() into cgroup_init() but we do not want another synchronize_sched() at boot time, so this patch adds the new helper which doesn't block but currently can only be called before the first use. Reported-by: John Stultz <john.stultz@linaro.org> Reported-by: Dmitry Shmidt <dimitrysh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Colin Cross <ccross@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rom Lemarchand <romlem@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10locking/percpu-rwsem: Optimize readers and reduce global impactPeter Zijlstra
Currently the percpu-rwsem switches to (global) atomic ops while a writer is waiting; which could be quite a while and slows down releasing the readers. This patch cures this problem by ordering the reader-state vs reader-count (see the comments in __percpu_down_read() and percpu_down_write()). This changes a global atomic op into a full memory barrier, which doesn't have the global cacheline contention. This also enables using the percpu-rwsem with rcu_sync disabled in order to bias the implementation differently, reducing the writer latency by adding some cost to readers. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org [ Fixed modular build. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10locking/pvstat: Separate wait_again and spurious wakeup statsWaiman Long
Currently there are overlap in the pvqspinlock wait_again and spurious_wakeup stat counters. Because of lock stealing, it is no longer possible to accurately determine if spurious wakeup has happened in the queue head. As they track both the queue node and queue head status, it is also hard to tell how many of those comes from the queue head and how many from the queue node. This patch changes the accounting rules so that spurious wakeup is only tracked in the queue node. The wait_again count, however, is only tracked in the queue head when the vCPU failed to acquire the lock after a vCPU kick. This should give a much better indication of the wait-kick dynamics in the queue node and the queue head. Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1464713631-1066-2-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10locking/qspinlock: Improve readabilityPeter Zijlstra
Restructure pv_queued_spin_steal_lock() as I found it hard to read. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10locking/pvqspinlock: Fix a bug in qstat_read()Pan Xinhui
It's obviously wrong to set stat to NULL. So lets remove it. Otherwise it is always zero when we check the latency of kick/wake. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <Waiman.Long@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468405414-3700-1-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10locking/pvqspinlock: Fix double hash raceWanpeng Li
When the lock holder vCPU is racing with the queue head: CPU 0 (lock holder) CPU1 (queue head) =================== ================= spin_lock(); spin_lock(); pv_kick_node(): pv_wait_head_or_lock(): if (!lp) { lp = pv_hash(lock, pn); xchg(&l->locked, _Q_SLOW_VAL); } WRITE_ONCE(pn->state, vcpu_halted); cmpxchg(&pn->state, vcpu_halted, vcpu_hashed); WRITE_ONCE(l->locked, _Q_SLOW_VAL); (void)pv_hash(lock, pn); In this case, lock holder inserts the pv_node of queue head into the hash table and set _Q_SLOW_VAL unnecessary. This patch avoids it by restoring/setting vcpu_hashed state after failing adaptive locking spinning. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hpe.com> Link: http://lkml.kernel.org/r/1468484156-4521-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10Merge branch 'linus' into locking/urgent, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-09Revert "printk: create pr_<level> functions"Linus Torvalds
This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142. Geert Uytterhoeven reports: "This change seems to have an (unintendent?) side-effect. Before, pr_*() calls without a trailing newline characters would be printed with a newline character appended, both on the console and in the output of the dmesg command. After this commit, no new line character is appended, and the output of the next pr_*() call of the same type may be appended, like in: - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000 - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM) + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)" Joe Perches says: "No, that is not intentional. The newline handling code inside vprintk_emit is a bit involved and for now I suggest a revert until this has all the same behavior as earlier" Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Requested-by: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08printk: Remove unnecessary #ifdef CONFIG_PRINTKAndreas Ziegler
In commit 874f9c7da9a4 ("printk: create pr_<level> functions"), new pr_level defines were added to printk.c. These new defines are guarded by an #ifdef CONFIG_PRINTK - however, there is already a surrounding #ifdef CONFIG_PRINTK starting a lot earlier in line 249 which means the newly introduced #ifdef is unnecessary. Let's remove it to avoid confusion. Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Cc: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Mostly tooling fixes and some late tooling updates, plus two perf related printk message fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tests bpf: Use SyS_epoll_wait alias perf tests: objdump output can contain multi byte chunks perf record: Add --sample-cpu option perf hists: Introduce output_resort_cb method perf tools: Move config/Makefile into Makefile.config perf tests: Add test for bitmap_scnprintf function tools lib: Add bitmap_and function tools lib: Add bitmap_scnprintf function tools lib: Add bitmap_alloc function tools lib traceevent: Ignore generated library files perf tools: Fix build failure on perl script context perf/core: Change log level for duration warning to KERN_INFO perf annotate: Plug filename string leak perf annotate: Introduce strerror for handling symbol__disassemble() errors perf annotate: Rename symbol__annotate() to symbol__disassemble() perf/x86: Modify error message in virtualized environment perf target: str_error_r() always returns the buffer it receives perf annotate: Use pipe + fork instead of popen perf evsel: Introduce constructor for cycles event
2016-08-05Merge tag 'powerpc-4.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "These were delayed for various reasons, so I let them sit in next a bit longer, rather than including them in my first pull request. Fixes: - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan - Move register_process_table() out of ppc_md from Michael Ellerman Use jump_label use for [cpu|mmu]_has_feature(): - Add mmu_early_init_devtree() from Michael Ellerman - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman - Do hash device tree scanning earlier from Michael Ellerman - Do radix device tree scanning earlier from Michael Ellerman - Do feature patching before MMU init from Michael Ellerman - Check features don't change after patching from Michael Ellerman - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V - Convert mmu_has_feature() to returning bool from Michael Ellerman - Convert cpu_has_feature() to returning bool from Michael Ellerman - Define radix_enabled() in one place & use static inline from Michael Ellerman - Add early_[cpu|mmu]_has_feature() from Michael Ellerman - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V - Remove mfvtb() from Kevin Hao - Move cpu_has_feature() to a separate file from Kevin Hao - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman - Add option to use jump label for cpu_has_feature() from Kevin Hao - Add option to use jump label for mmu_has_feature() from Kevin Hao - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V - Annotate jump label assembly from Michael Ellerman TLB flush enhancements from Aneesh Kumar K.V: - radix: Implement tlb mmu gather flush efficiently - Add helper for finding SLBE LLP encoding - Use hugetlb flush functions - Drop multiple definition of mm_is_core_local - radix: Add tlb flush of THP ptes - radix: Rename function and drop unused arg - radix/hugetlb: Add helper for finding page size - hugetlb: Add flush_hugetlb_tlb_range - remove flush_tlb_page_nohash Add new ptrace regsets from Anshuman Khandual and Simon Guo: - elf: Add powerpc specific core note sections - Add the function flush_tmregs_to_thread - Enable in transaction NT_PRFPREG ptrace requests - Enable in transaction NT_PPC_VMX ptrace requests - Enable in transaction NT_PPC_VSX ptrace requests - Adapt gpr32_get, gpr32_set functions for transaction - Enable support for NT_PPC_CGPR - Enable support for NT_PPC_CFPR - Enable support for NT_PPC_CVMX - Enable support for NT_PPC_CVSX - Enable support for TM SPR state - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR - Enable support for EBB registers - Enable support for Performance Monitor registers" * tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits) powerpc/mm: Move register_process_table() out of ppc_md powerpc/perf: Fix incorrect event codes in power9-event-list powerpc/32: Fix early access to cpu_spec relocation powerpc/ptrace: Enable support for Performance Monitor registers powerpc/ptrace: Enable support for EBB registers powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR powerpc/ptrace: Enable support for TM SPR state powerpc/ptrace: Enable support for NT_PPC_CVSX powerpc/ptrace: Enable support for NT_PPC_CVMX powerpc/ptrace: Enable support for NT_PPC_CFPR powerpc/ptrace: Enable support for NT_PPC_CGPR powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests powerpc/process: Add the function flush_tmregs_to_thread elf: Add powerpc specific core note sections powerpc/mm: remove flush_tlb_page_nohash powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range ...
2016-08-04Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "The only interesting thing here is Jessica's patch to add ro_after_init support to modules. The rest are all trivia" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: extable.h: add stddef.h so "NULL" definition is not implicit modules: add ro_after_init support jump_label: disable preemption around __module_text_address(). exceptions: fork exception table content from module.h into extable.h modules: Add kernel parameter to blacklist modules module: Do a WARN_ON_ONCE() for assert module mutex not held Documentation/module-signing.txt: Note need for version info if reusing a key module: Invalidate signatures on force-loaded modules module: Issue warnings when tainting kernel module: fix redundant test. module: fix noreturn attribute for __module_put_and_exit()
2016-08-04jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABELJason Baron
The current jump_label.h includes bug.h for things such as WARN_ON(). This makes the header problematic for inclusion by kernel.h or any headers that kernel.h includes, since bug.h includes kernel.h (circular dependency). The inclusion of atomic.h is similarly problematic. Thus, this should make jump_label.h 'includable' from most places. Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04tree-wide: replace config_enabled() with IS_ENABLED()Masahiro Yamada
The use of config_enabled() against config options is ambiguous. In practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the author might have used it for the meaning of IS_ENABLED(). Using IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention clearer. This commit replaces config_enabled() with IS_ENABLED() where possible. This commit is only touching bool config options. I noticed two cases where config_enabled() is used against a tristate option: - config_enabled(CONFIG_HWMON) [ drivers/net/wireless/ath/ath10k/thermal.c ] - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE) [ drivers/gpu/drm/gma500/opregion.c ] I did not touch them because they should be converted to IS_BUILTIN() in order to keep the logic, but I was not sure it was the authors' intention. Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Stas Sergeev <stsp@list.ru> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Joshua Kinard <kumba@gentoo.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Will Drewry <wad@chromium.org> Cc: Nikolay Martynov <mar.kolya@gmail.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: Rafal Milecki <zajec5@gmail.com> Cc: James Cowgill <James.Cowgill@imgtec.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alex Smith <alex.smith@imgtec.com> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Qais Yousef <qais.yousef@imgtec.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Tony Wu <tung7970@gmail.com> Cc: Huaitong Han <huaitong.han@intel.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Gelmini <andrea.gelmini@gelma.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rabin Vincent <rabin@rab.in> Cc: "Maciej W. Rozycki" <macro@imgtec.com> Cc: David Daney <david.daney@cavium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04modules: add ro_after_init supportJessica Yu
Add ro_after_init support for modules by adding a new page-aligned section in the module layout (after rodata) for ro_after_init data and enabling RO protection for that section after module init runs. Signed-off-by: Jessica Yu <jeyu@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2016-08-04jump_label: disable preemption around __module_text_address().Rusty Russell
Steven reported a warning caused by not holding module_mutex or rcu_read_lock_sched: his backtrace was corrupted but a quick audit found this possible cause. It's wrong anyway... Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2016-08-04modules: Add kernel parameter to blacklist modulesPrarit Bhargava
Blacklisting a module in linux has long been a problem. The current procedure is to use rd.blacklist=module_name, however, that doesn't cover the case after the initramfs and before a boot prompt (where one is supposed to use /etc/modprobe.d/blacklist.conf to blacklist runtime loading). Using rd.shell to get an early prompt is hit-or-miss, and doesn't cover all situations AFAICT. This patch adds this functionality of permanently blacklisting a module by its name via the kernel parameter module_blacklist=module_name. [v2]: Rusty, use core_param() instead of __setup() which simplifies things. [v3]: Rusty, undo wreckage from strsep() [v4]: Rusty, simpler version of blacklisted() Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-doc@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2016-08-04module: Do a WARN_ON_ONCE() for assert module mutex not heldSteven Rostedt
When running with lockdep enabled, I triggered the WARN_ON() in the module code that asserts when module_mutex or rcu_read_lock_sched are not held. The issue I have is that this can also be called from the dump_stack() code, causing us to enter an infinite loop... ------------[ cut here ]------------ WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 ffff880215e8fa70 ffff880215e8fa70 ffffffff812fc8e3 0000000000000000 ffffffff81d3e55b ffff880215e8fac0 ffffffff8104fc88 ffffffff8104fcab 0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001 Call Trace: [<ffffffff812fc8e3>] dump_stack+0x67/0x90 [<ffffffff8104fc88>] __warn+0xcb/0xe9 [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f ------------[ cut here ]------------ WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 ffff880215e8f7a0 ffff880215e8f7a0 ffffffff812fc8e3 0000000000000000 ffffffff81d3e55b ffff880215e8f7f0 ffffffff8104fc88 ffffffff8104fcab 0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001 Call Trace: [<ffffffff812fc8e3>] dump_stack+0x67/0x90 [<ffffffff8104fc88>] __warn+0xcb/0xe9 [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f ------------[ cut here ]------------ WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 ffff880215e8f4d0 ffff880215e8f4d0 ffffffff812fc8e3 0000000000000000 ffffffff81d3e55b ffff880215e8f520 ffffffff8104fc88 ffffffff8104fcab 0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001 Call Trace: [<ffffffff812fc8e3>] dump_stack+0x67/0x90 [<ffffffff8104fc88>] __warn+0xcb/0xe9 [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f ------------[ cut here ]------------ WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e [...] Which gives us rather useless information. Worse yet, there's some race that causes this, and I seldom trigger it, so I have no idea what happened. This would not be an issue if that warning was a WARN_ON_ONCE(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2016-08-03Merge tag 'trace-v4.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few updates and fixes: - move the suppressing of the __builtin_return_address >0 warning to the tracing directory only. - metag recordmcount fix for newer glibc's - two tracing histogram fixes that were reported by KASAN" * tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix use-after-free in hist_register_trigger() tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all Makefile: Mute warning for __builtin_return_address(>0) for tracing only ftrace/recordmcount: Work around for addition of metag magic but not relocations
2016-08-02config: add android config fragmentsRob Herring
Copy the config fragments from the AOSP common kernel android-4.4 branch. It is becoming possible to run mainline kernels with Android, but the kernel defconfigs don't work as-is and debugging missing config options is a pain. Adding the config fragments into the kernel tree, makes configuring a mainline kernel as simple as: make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config The following non-upstream config options were removed: CONFIG_NETFILTER_XT_MATCH_QTAGUID CONFIG_NETFILTER_XT_MATCH_QUOTA2 CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG CONFIG_PPPOLAC CONFIG_PPPOPNS CONFIG_SECURITY_PERF_EVENTS_RESTRICT CONFIG_USB_CONFIGFS_F_MTP CONFIG_USB_CONFIGFS_F_PTP CONFIG_USB_CONFIGFS_F_ACC CONFIG_USB_CONFIGFS_F_AUDIO_SRC CONFIG_USB_CONFIGFS_UEVENT CONFIG_INPUT_KEYCHORD CONFIG_INPUT_KEYRESET Link: http://lkml.kernel.org/r/1466708235-28593-1-git-send-email-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02relay: add global mode support for buffer-only channelsAkash Goel
Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early logging") added support to use channels with no associated files. This is useful when the exact location of relay file is not known or the the parent directory of relay file is not available, while creating the channel and the logging has to start right from the boot. But there was no provision to use global mode with buffer-only channels, which is added by this patch, without modifying the interface where initially there will be a dummy invocation of create_buf_file callback through which kernel client can convey the need of a global buffer. For the use case where drivers/kernel clients want a simple interface for the userspace, which enables them to capture data/logs from relay file inorder & without any post processing, support of Global buffer mode is warranted. Modules, like i915, using relay_open() in early init would have to later register their buffer-only relays, once debugfs is available, by calling relay_late_setup_files(). Hence relay_late_setup_files() symbol also needs to be exported. Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com Signed-off-by: Akash Goel <akash.goel@intel.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: add restriction on kexec_load() segment sizeszhong jiang
I hit the following issue when run trinity in my system. The kernel is 3.4 version, but mainline has the same issue. The root cause is that the segment size is too large so the kerenl spends too long trying to allocate a page. Other cases will block until the test case quits. Also, OOM conditions will occur. Call Trace: __alloc_pages_nodemask+0x14c/0x8f0 alloc_pages_current+0xaf/0x120 kimage_alloc_pages+0x10/0x60 kimage_alloc_control_pages+0x5d/0x270 machine_kexec_prepare+0xe5/0x6c0 ? kimage_free_page_list+0x52/0x70 sys_kexec_load+0x141/0x600 ? vfs_write+0x100/0x180 system_call_fastpath+0x16/0x1b The patch changes sanity_check_segment_list() to verify that the usage by all segments does not exceed half of memory. [akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment] Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: add a kexec_crash_loaded() functionPetr Tesarik
Provide a wrapper function to be used by kernel code to check whether a crash kernel is loaded. It returns the same value that can be seen in /sys/kernel/kexec_crash_loaded by userspace programs. I'm exporting the function, because it will be used by Xen, and it is possible to compile Xen modules separately to enable the use of PV drivers with unmodified bare-metal kernels. Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: use core_param for crash_kexec_post_notifiers boot optionHidehiro Kawai
crash_kexec_post_notifiers ia a boot option which controls whether the 1st kernel calls panic notifiers or not before booting the 2nd kernel. However, there is no need to limit it to being modifiable only at boot time. So, use core_param instead of early_param. Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: allow architectures to override boot mappingRussell King
kexec physical addresses are the boot-time view of the system. For certain ARM systems (such as Keystone 2), the boot view of the system does not match the kernel's view of the system: the boot view uses a special alias in the lower 4GB of the physical address space. To cater for these kinds of setups, we need to translate between the boot view physical addresses and the normal kernel view physical addresses. This patch extracts the current transation points into linux/kexec.h, and allows an architecture to override the functions. Due to the translations required, we unfortunately end up with six translation functions, which are reduced down to four that the architecture can override. [akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()] Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_tRussell King
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB physical on 32-bit architectures, so we need a wider type than "unsigned long" here. Arrange for paddr_vmcoreinfo_note() to return a phys_addr_t, thereby allowing it to be located above 4GB. This makes no difference for kexec-tools, as they already assume a 64-bit type when reading from this file. Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: ensure user memory sizes do not wrapRussell King
Ensure that user memory sizes do not wrap around when validating the user input, which can lead to the following input validation working incorrectly. [akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch] Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: return error number directlyMinfei Huang
This is a cleanup patch to make kexec more clear to return error number directly. The variable result is useless, because there is no other function's return value assignes to it. So remove it. Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com Signed-off-by: Minfei Huang <mnghuan@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kernel/exit.c: quieten greatest stack depth printkAnton Blanchard
Many targets enable CONFIG_DEBUG_STACK_USAGE, and while the information is useful, it isn't worthy of pr_warn(). Reduce it to pr_info(). Link: http://lkml.kernel.org/r/1466982072-29836-1-git-send-email-anton@ozlabs.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: add kernel parameter to control writes to /dev/kmsgBorislav Petkov
Add a "printk.devkmsg" kernel command line parameter which controls how userspace writes into /dev/kmsg. It has three options: * ratelimit - ratelimit logging from userspace. * on - unlimited logging from userspace * off - logging from userspace gets ignored The default setting is to ratelimit the messages written to it. This changes the kernel default setting of "on" to "ratelimit" and we do that because we want to keep userspace spamming /dev/kmsg to sane levels. This is especially moot when a small kernel log buffer wraps around and messages get lost. So the ratelimiting setting should be a sane setting where kernel messages should have a bit higher chance of survival from all the spamming. It additionally does not limit logging to /dev/kmsg while the system is booting if we haven't disabled it on the command line. Furthermore, we can control the logging from a lower priority sysctl interface - kernel.printk_devkmsg. That interface will succeed only if printk.devkmsg *hasn't* been supplied on the command line. If it has, then printk.devkmsg is a one-time setting which remains for the duration of the system lifetime. This "locking" of the setting is to prevent userspace from changing the logging on us through sysctl(2). This patch is based on previous patches from Linus and Steven. [bp@suse.de: fixes] Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dave Young <dyoung@redhat.com> Cc: Franck Bui <fbui@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: include <asm/sections.h> instead of <asm-generic/sections.h>Christoph Hellwig
asm-generic headers are generic implementations for architecture specific code and should not be included by common code. Thus use the asm/ version of sections.h to get at the linker sections. Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: introduce suppress_message_printing()Sergey Senozhatsky
Messages' levels and console log level are inspected when the actual printing occurs, which may provoke console_unlock() and console_cont_flush() to waste CPU cycles on every message that has loglevel above the current console_loglevel. Schematically, console_unlock() does the following: console_unlock() { ... for (;;) { ... raw_spin_lock_irqsave(&logbuf_lock, flags); skip: msg = log_from_idx(console_idx); if (msg->flags & LOG_NOCONS) { ... goto skip; } level = msg->level; len += msg_print_text(); >> sprintfs memcpy, etc. if (nr_ext_console_drivers) { ext_len = msg_print_ext_header(); >> scnprintf ext_len += msg_print_ext_body(); >> scnprintfs etc. } ... raw_spin_unlock(&logbuf_lock); call_console_drivers(level, ext_text, ext_len, text, len) { if (level >= console_loglevel && >> drop the message !ignore_loglevel) return; console->write(...); } local_irq_restore(flags); } ... } The thing here is this deferred `level >= console_loglevel' check. We are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages that we will eventually drop. This can be huge when we register a new CON_PRINTBUFFER console, for instance. For every such a console register_console() resets the console_seq, console_idx, console_prev and sets a `exclusive console' pointer to replay the log buffer to that just-registered console. And there can be a lot of messages to replay, in the worst case most of which can be dropped after console_loglevel test. We know messages' levels long before we call msg_print_text() and friends, so we can just move console_loglevel check out of call_console_drivers() and format a new message only if we are sure that it won't be dropped. The patch factors out loglevel check into suppress_message_printing() function and tests message->level and console_loglevel before formatting functions in console_unlock() and console_cont_flush() are getting executed. This improves things not only for exclusive CON_PRINTBUFFER consoles, but for every console_unlock() that attempts to print a message of level above the console_loglevel. Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Calvin Owens <calvinowens@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: create pr_<level> functionsJoe Perches
Using functions instead of macros can reduce overall code size by eliminating unnecessary "KERN_SOH<digit>" prefixes from format strings. defconfig x86-64: $ size vmlinux* text data bss dec hex filename 10193570 4331464 1105920 15630954 ee826a vmlinux.new 10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old As the return value are unimportant and unused in the kernel tree, these new functions return void. Miscellanea: - change pr_<level> macros to call new __pr_<level> functions - change vprintk_nmi and vprintk_default to add LOGLEVEL_<level> argument [akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe] Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02printk: do not include interrupt.hSergey Senozhatsky
A trivial cosmetic change: interrupt.h header is redundant since commit 6b898c07cb1d ("console: use might_sleep in console_lock"). Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02dynamic_debug: only add header when usedLuis de Bethencourt
kernel.h header doesn't directly use dynamic debug, instead we can include it in module.c (which used it via kernel.h). printk.h only uses it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen in that case. Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com [luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h] Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_worksOleg Nesterov
Change task_work_cancel() to use lockless_dereference(), this is what the code really wants but we didn't have this helper when it was written. Also add the fast-path task->task_works == NULL check, in the likely case this task has no pending works and we can avoid spin_lock(task->pi_lock). While at it, change other users of ACCESS_ONCE() to use READ_ONCE(). Link: http://lkml.kernel.org/r/20160610150042.GA13868@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02tracing: Fix use-after-free in hist_register_trigger()Tom Zanussi
This fixes a use-after-free case flagged by KASAN; make sure the test happens before the potential free in this case. Link: http://lkml.kernel.org/r/48fd74ab61bebd7dca9714386bb47d7c5ccd6a7b.1467247517.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_allSteven Rostedt
While running tools/testing/selftests test suite with KASAN, Dmitry Vyukov hit the following use-after-free report: ================================================================== BUG: KASAN: use-after-free in hist_unreg_all+0x1a1/0x1d0 at addr ffff880031632cc0 Read of size 8 by task ftracetest/7413 ================================================================== BUG kmalloc-128 (Not tainted): kasan: bad access detected ------------------------------------------------------------------ This fixes the problem, along with the same problem in hist_enable_unreg_all(). Link: http://lkml.kernel.org/r/c3d05b79e42555b6e36a3a99aae0e37315ee5304.1467247517.git.tom.zanussi@linux.intel.com Cc: Dmitry Vyukov <dvyukov@google.com> [Copied Steve's hist_enable_unreg_all() fix to hist_unreg_all()] Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02Makefile: Mute warning for __builtin_return_address(>0) for tracing onlySteven Rostedt
With the latest gcc compilers, they give a warning if __builtin_return_address() parameter is greater than 0. That is because if it is used by a function called by a top level function (or in the case of the kernel, by assembly), it can try to access stack frames outside the stack and crash the system. The tracing system uses __builtin_return_address() of up to 2! But it is well aware of the dangers that it may have, and has even added precautions to protect against it (see the thunk code in arch/x86/entry/thunk*.S) Linus originally added KBUILD_CFLAGS that would suppress the warning for the entire kernel, as simply adding KBUILD_CFLAGS to the tracing directory wouldn't work. The tracing directory plays a bit with the CFLAGS and requires a little more logic. This adds that special logic to only suppress the warning for the tracing directory. If it is used anywhere else outside of tracing, the warning will still be triggered. Link: http://lkml.kernel.org/r/20160728223043.51996267@grimm.local.home Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-08-02perf/core: Change log level for duration warning to KERN_INFODavid Ahern
When the perf interrupt handler exceeds a threshold warning messages are displayed on console: [12739.31793] perf interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [71340.165065] perf interrupt took too long (5005 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 Many customers and users are confused by the message wondering if something is wrong or they need to take action to fix a problem. Since a user can not do anything to fix the issue, the message is really more informational than a warning. Adjust the log level accordingly. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1470084569-438-1-git-send-email-dsa@cumulusnetworks.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01jump_label: Make it possible for arches to invoke jump_label_init() earlierKevin Hao
Some arches (powerpc at least) would like to invoke jump_label_init() much earlier in boot. So check static_key_initialized in order to make sure this function runs only once. LGTM-by: Ingo (http://marc.info/?l=linux-kernel&m=144049104329961&w=2) Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-30Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc fixes from Thomas Gleixner: "This update contains: - a fix for stomp-machine so the nmi_watchdog wont trigger on the cpu waiting for the others to execute the callback - various fixes and updates to objtool including an resync of the instruction decoder to match the kernel's decoder" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Un-capitalize "Warning" for out-of-sync instruction decoder objtool: Resync x86 instruction decoder with the kernel's objtool: Support new GCC 6 switch jump table pattern stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE objtool: Add 'fixdep' to objtool/.gitignore
2016-07-29Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit updates from Paul Moore: "Six audit patches for 4.8. There are a couple of style and minor whitespace tweaks for the logs, as well as a minor fixup to catch errors on user filter rules, however the major improvements are a fix to the s390 syscall argument masking code (reviewed by the nice s390 folks), some consolidation around the exclude filtering (less code, always a win), and a double-fetch fix for recording the execve arguments" * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit: audit: fix a double fetch in audit_log_single_execve_arg() audit: fix whitespace in CWD record audit: add fields to exclude filter by reusing user filter s390: ensure that syscall arguments are properly masked on s390 audit: fix some horrible switch statement style crimes audit: fixup: log on errors from filter user rules
2016-07-29Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...