summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-01-14locking/atomic, kref: Add kref_read()Peter Zijlstra
Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/atomic, kref: Add KREF_INIT()Peter Zijlstra
Since we need to change the implementation, stop exposing internals. Provide KREF_INIT() to allow static initialization of struct kref. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/ww_mutex: Fix compilation of __WW_MUTEX_INITIALIZERChris Wilson
From conflicting macro parameters, passing the wrong name to __MUTEX_INITIALIZER and a stray '\', #define __WW_MUTEX_INITIALIZER was very unhappy. One unnecessary change was to choose to pass &ww_class instead of implicitly taking the address of the class within the macro. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 1b375dc30710 ("mutex: Move ww_mutex definitions to ww_mutex.h") Link: http://lkml.kernel.org/r/20161201114711.28697-2-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14math64, timers: Fix 32bit mul_u64_u32_shr() and friendsPeter Zijlstra
It turns out that while GCC-4.4 manages to generate 32x32->64 mult instructions for the 32bit mul_u64_u32_shr() code, any GCC after that fails horribly. Fix this by providing an explicit mul_u32_u32() function which can be architcture provided. Reported-by: Chris Metcalf <cmetcalf@mellanox.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Cc: Christopher S. Hall <christopher.s.hall@intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: John Stultz <john.stultz@linaro.org> Cc: Laurent Vivier <lvivier@redhat.com> Cc: Liav Rehana <liavr@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Parit Bhargava <prarit@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161209083011.GD15765@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/mutex, sched/wait: Add mutex_lock_io()Tejun Heo
We sometimes end up propagating IO blocking through mutexes; however, because there currently is no way of annotating mutex sleeps as iowait, there are cases where iowait and /proc/stat:procs_blocked report misleading numbers obscuring the actual state of the system. This patch adds mutex_lock_io() so that mutex sleeps can be marked as iowait in those cases. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adilger.kernel@dilger.ca Cc: jack@suse.com Cc: kernel-team@fb.com Cc: mingbo@fb.com Cc: tytso@mit.edu Link: http://lkml.kernel.org/r/1477673892-28940-4-git-send-email-tj@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: Separate out io_schedule_prepare() and io_schedule_finish()Tejun Heo
Now that IO schedule accounting is done inside __schedule(), io_schedule() can be split into three steps - prep, schedule, and finish - where the schedule part doesn't need any special annotation. This allows marking a sleep as iowait by simply wrapping an existing blocking function with io_schedule_prepare() and io_schedule_finish(). Because task_struct->in_iowait is single bit, the caller of io_schedule_prepare() needs to record and the pass its state to io_schedule_finish() to be safe regarding nesting. While this isn't the prettiest, these functions are mostly gonna be used by core functions and we don't want to use more space for ->in_iowait. While at it, as it's simple to do now, reimplement io_schedule() without unnecessarily going through io_schedule_timeout(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adilger.kernel@dilger.ca Cc: jack@suse.com Cc: kernel-team@fb.com Cc: mingbo@fb.com Cc: tytso@mit.edu Link: http://lkml.kernel.org/r/1477673892-28940-3-git-send-email-tj@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/clock: Delay switching sched_clock to stablePeter Zijlstra
Currently we switch to the stable sched_clock if we guess the TSC is usable, and then switch back to the unstable path if it turns out TSC isn't stable during SMP bringup after all. Delay switching to the stable path until after SMP bringup is complete. This way we'll avoid switching during the time we detect the worst of the TSC offences. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/clock, clocksource: Add optional cs::mark_unstable() methodThomas Gleixner
PeterZ reported that we'd fail to mark the TSC unstable when the clocksource watchdog finds it unsuitable. Allow a clocksource to run a custom action when its being marked unstable and hook up the TSC unstable code. Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debuggingNicolai Hähnle
Help catch cases where mutex_lock is used directly on w/w mutexes, which otherwise result in the w/w tasks reading uninitialized data. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-12-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/ww_mutex: Add waiters in stamp orderNicolai Hähnle
Add regular waiters in stamp order. Keep adding waiters that have no context in FIFO order and take care not to starve them. While adding our task as a waiter, back off if we detect that there is a waiter with a lower stamp in front of us. Make sure to call lock_contended even when we back off early. For w/w mutexes, being first in the wait list is only stable when taking the lock without a context. Therefore, the purpose of the first flag is split into two: 'first' remains to indicate whether we want to spin optimistically, while 'handoff' indicates that we should be prepared to accept a handoff. For w/w locking with a context, we always accept handoffs after the first schedule(), to handle the following sequence of events: 1. Task #0 unlocks and hands off to Task #2 which is first in line 2. Task #1 adds itself in front of Task #2 3. Task #2 wakes up and must accept the handoff even though it is no longer first in line Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-7-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/ww_mutex: Remove the __ww_mutex_lock*() inline wrappersNicolai Hähnle
Keep the documentation in the header file since there is no good place for it in mutex.c: there are two rather different implementations with different EXPORT_SYMBOLs for each function. Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-6-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/ww_mutex: Set use_ww_ctx even when locking without a contextNicolai Hähnle
We will add a new field to struct mutex_waiter. This field must be initialized for all waiters if any waiter uses the ww_use_ctx path. So there is a trade-off: Keep ww_mutex locking without a context on the faster non-use_ww_ctx path, at the cost of adding the initialization to all mutex locks (including non-ww_mutexes), or avoid the additional cost for non-ww_mutex locks, at the cost of adding additional checks to the use_ww_ctx path. We take the latter choice. It may be worth eliminating the users of ww_mutex_lock(lock, NULL), but there are a lot of them. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-5-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/mutex: Fix mutex handoffPeter Zijlstra
While reviewing the ww_mutex patches, I noticed that it was still possible to (incorrectly) succeed for (incorrect) code like: mutex_lock(&a); mutex_lock(&a); This was possible if the second mutex_lock() would block (as expected) but then receive a spurious wakeup. At that point it would find itself at the front of the queue, request a handoff and instantly claim ownership and continue, since owner would point to itself. Avoid this scenario and simplify the code by introducing a third low bit to signal handoff pickup. So once we request handoff, unlock clears the handoff bit and sets the pickup bit along with the new owner. This also removes the need for the .handoff argument to __mutex_trylock(), since that becomes superfluous with PICKUP. In order to guarantee enough low bits, ensure task_struct alignment is at least L1_CACHE_BYTES (which seems a good ideal regardless). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9d659ae14b54 ("locking/mutex: Add lock handoff to avoid starvation") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14locking/percpu-rwsem: Replace waitqueue with rcuwaitDavidlohr Bueso
The use of any kind of wait queue is an overkill for pcpu-rwsems. While one option would be to use the less heavy simple (swait) flavor, this is still too much for what pcpu-rwsems needs. For one, we do not care about any sort of queuing in that the only (rare) time writers (and readers, for that matter) are queued is when trying to acquire the regular contended rw_sem. There cannot be any further queuing as writers are serialized by the rw_sem in the first place. Given that percpu_down_write() must not be called after exit_notify(), we can replace the bulky waitqueue with rcuwait such that a writer can wait for its turn to take the lock. As such, we can avoid the queue handling and locking overhead. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1484148146-14210-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/wait, RCU: Introduce rcuwait machineryDavidlohr Bueso
rcuwait provides support for (single) RCU-safe task wait/wake functionality, with the caveat that it must not be called after exit_notify(), such that we avoid racing with rcu delayed_put_task_struct callbacks, task_struct being rcu unaware in this context -- for which we similarly have task_rcu_dereference() magic, but with different return semantics, which can conflict with the wakeup side. The interfaces are quite straightforward: rcuwait_wait_event() rcuwait_wake_up() More details are in the comments, but it's perhaps worth mentioning at least, that users must provide proper serialization when waiting on a condition, and avoid corrupting a concurrent waiter. Also care must be taken between the task and the condition for when calling the wakeup -- we cannot miss wakeups. When porting users, this is for example, a given when using waitqueues in that everything is done under the q->lock. As such, it can remove sources of non preemptable unbounded work for realtime. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: Remove set_task_state()Davidlohr Bueso
This is a nasty interface and setting the state of a foreign task must not be done. As of the following commit: be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()") ... everyone in the kernel calls set_task_state() with current, allowing the helper to be removed. However, as the comment indicates, it is still around for those archs where computing current is more expensive than using a pointer, at least in theory. An important arch that is affected is arm64, however this has been addressed now [1] and performance is up to par making no difference with either calls. Of all the callers, if any, it's the locking bits that would care most about this -- ie: we end up passing a tsk pointer to a lot of the lock slowpath, and setting ->state on that. The following numbers are based on two tests: a custom ad-hoc microbenchmark that just measures latencies (for ~65 million calls) between get_task_state() vs get_current_state(). Secondly for a higher overview, an unlink microbenchmark was used, which pounds on a single file with open, close,unlink combos with increasing thread counts (up to 4x ncpus). While the workload is quite unrealistic, it does contend a lot on the inode mutex or now rwsem. [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com == 1. x86-64 == Avg runtime set_task_state(): 601 msecs Avg runtime set_current_state(): 552 msecs vanilla dirty Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%) Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%) Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%) Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%) Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%) Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%) Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%) Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%) Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%) Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%) Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%) Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%) Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%) == 2. ppc64le == Avg runtime set_task_state(): 938 msecs Avg runtime set_current_state: 940 msecs vanilla dirty Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%) Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%) Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%) Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%) Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%) Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%) Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%) Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%) Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%) Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%) Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%) Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%) Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%) Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%) Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%) Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%) x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching) and ppc64 (with paca) show similar improvements in the unlink microbenches. The small delta for ppc64 (2ms), does not represent the gains on the unlink runs. In the case of x86, there was a decent amount of variation in the latency runs, but always within a 20 to 50ms increase), ppc was more constant. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14perf/x86/intel: Account interrupts for PEBS errorsJiri Olsa
It's possible to set up PEBS events to get only errors and not any data, like on SNB-X (model 45) and IVB-EP (model 62) via 2 perf commands running simultaneously: taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10 This leads to a soft lock up, because the error path of the intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt for error PEBS interrupts, so in case you're getting ONLY errors you don't have a way to stop the event when it's over the max_samples_per_tick limit: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816] ... RIP: 0010:[<ffffffff81159232>] [<ffffffff81159232>] smp_call_function_single+0xe2/0x140 ... Call Trace: ? trace_hardirqs_on_caller+0xf5/0x1b0 ? perf_cgroup_attach+0x70/0x70 perf_install_in_context+0x199/0x1b0 ? ctx_resched+0x90/0x90 SYSC_perf_event_open+0x641/0xf90 SyS_perf_event_open+0x9/0x10 do_syscall_64+0x6c/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 Add perf_event_account_interrupt() which does the interrupt and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s error path. We keep the pending_kill and pending_wakeup logic only in the __perf_event_overflow() path, because they make sense only if there's any data to deliver. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/cputime: Rename vtime_account_user() to vtime_flush()Frederic Weisbecker
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and account it on ticks and context switches only through the vtime_account_user() function. Now this model has been generalized on the 3 archs for all kind of cputime (system, irq, ...) and all the cputime flushing happens under vtime_account_user(). So let's rename this function to better reflect its new role. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/cputime: Export account_guest_time()Frederic Weisbecker
In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay cputime accounting to the tick, let's allow archs to account cputime directly to gtime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/cputime: Allow accounting system time using cpustat indexFrederic Weisbecker
In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay cputime accounting to the tick, let's provide APIs to account system time to precise contexts: hardirq, softirq, pure system, ... Inspired-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14kprobes, extable: Identify kprobes trampolines as kernel text areaMasami Hiramatsu
Improve __kernel_text_address()/kernel_text_address() to return true if the given address is on a kprobe's instruction slot trampoline. This can help stacktraces to determine the address is on a text area or not. To implement this atomically in is_kprobe_*_slot(), also change the insn_cache page list to an RCU list. This changes timings a bit (it delays page freeing to the RCU garbage collection phase), but none of that is in the hot path. Note: this change can add small overhead to stack unwinders because it adds 2 additional checks to __kernel_text_address(). However, the impact should be very small, because kprobe_insn_pages list has 1 entry per 256 probes(on x86, on arm/arm64 it will be 1024 probes), and kprobe_optinsn_pages has 1 entry per 32 probes(on x86). In most use cases, the number of kprobe events may be less than 20, which means that is_kprobe_*_slot() will check just one entry. Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/148388747896.6869.6354262871751682264.stgit@devbox [ Improved the changelog and coding style. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14x86/platform/intel: Remove PMIC GPIO block supportAndy Shevchenko
Moorestown support was removed by commit: 1a8359e411eb ("x86/mid: Remove Intel Moorestown") Remove this leftover. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: platform-driver-x86@vger.kernel.org Link: http://lkml.kernel.org/r/20170112112331.93236-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-13tcp: remove thin_dupack featureYuchung Cheng
Thin stream DUPACK is to start fast recovery on only one DUPACK provided the connection is a thin stream (i.e., low inflight). But this older feature is now subsumed with RACK. If a connection receives only a single DUPACK, RACK would arm a reordering timer and soon starts fast recovery instead of timeout if no further ACKs are received. The socket option (THIN_DUPACK) is kept as a nop for compatibility. Note that this patch does not change another thin-stream feature which enables linear RTO. Although it might be good to generalize that in the future (i.e., linear RTO for the first say 3 retries). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: remove early retransmitYuchung Cheng
This patch removes the support of RFC5827 early retransmit (i.e., fast recovery on small inflight with <3 dupacks) because it is subsumed by the new RACK loss detection. More specifically when RACK receives DUPACKs, it'll arm a reordering timer to start fast recovery after a quarter of (min)RTT, hence it covers the early retransmit except RACK does not limit itself to specific inflight or dupack numbers. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: remove forward retransmit featureYuchung Cheng
Forward retransmit is an esoteric feature in RFC3517 (condition(3) in the NextSeg()). Basically if a packet is not considered lost by the current criteria (# of dupacks etc), but the congestion window has room for more packets, then retransmit this packet. However it actually conflicts with the rest of recovery design. For example, when reordering is detected we want to be conservative in retransmitting packets but forward-retransmit feature would break that to force more retransmission. Also the implementation is fairly complicated inside the retransmission logic inducing extra iterations in the write queue. With RACK losses are being detected timely and this heuristic is no longer necessary. There this patch removes the feature. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: enable RACK loss detection to trigger recoveryYuchung Cheng
This patch changes two things: 1. Start fast recovery with RACK in addition to other heuristics (e.g., DUPACK threshold, FACK). Prior to this change RACK is enabled to detect losses only after the recovery has started by other algorithms. 2. Disable TCP early retransmit. RACK subsumes the early retransmit with the new reordering timer feature. A latter patch in this series removes the early retransmit code. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: use sequence to break TS ties for RACK loss detectionYuchung Cheng
The packets inside a jumbo skb (e.g., TSO) share the same skb timestamp, even though they are sent sequentially on the wire. Since RACK is based on time, it can not detect some packets inside the same skb are lost. However, we can leverage the packet sequence numbers as extended timestamps to detect losses. Therefore, when RACK timestamp is identical to skb's timestamp (i.e., one of the packets of the skb is acked or sacked), we use the sequence numbers of the acked and unacked packets to break ties. We can use the same sequence logic to advance RACK xmit time as well to detect more losses and avoid timeout. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: add reordering timer in RACK loss detectionYuchung Cheng
This patch makes RACK install a reordering timer when it suspects some packets might be lost, but wants to delay the decision a little bit to accomodate reordering. It does not create a new timer but instead repurposes the existing RTO timer, because both are meant to retransmit packets. Specifically it arms a timer ICSK_TIME_REO_TIMEOUT when the RACK timing check fails. The wait time is set to RACK.RTT + RACK.reo_wnd - (NOW - Packet.xmit_time) + fudge This translates to expecting a packet (Packet) should take (RACK.RTT + RACK.reo_wnd + fudge) to deliver after it was sent. When there are multiple packets that need a timer, we use one timer with the maximum timeout. Therefore the timer conservatively uses the maximum window to expire N packets by one timeout, instead of N timeouts to expire N packets sent at different times. The fudge factor is 2 jiffies to ensure when the timer fires, all the suspected packets would exceed the deadline and be marked lost by tcp_rack_detect_loss(). It has to be at least 1 jiffy because the clock may tick between calling icsk_reset_xmit_timer(timeout) and actually hang the timer. The next jiffy is to lower-bound the timeout to 2 jiffies when reo_wnd is < 1ms. When the reordering timer fires (tcp_rack_reo_timeout): If we aren't in Recovery we'll enter fast recovery and force fast retransmit. This is very similar to the early retransmit (RFC5827) except RACK is not constrained to only enter recovery for small outstanding flights. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: record most recent RTT in RACK loss detectionYuchung Cheng
Record the most recent RTT in RACK. It is often identical to the "ca_rtt_us" values in tcp_clean_rtx_queue. But when the packet has been retransmitted, RACK choses to believe the ACK is for the (latest) retransmitted packet if the RTT is over minimum RTT. This requires passing the arrival time of the most recent ACK to RACK routines. The timestamp is now recorded in the "ack_time" in tcp_sacktag_state during the ACK processing. This patch does not change the RACK algorithm itself. It only adds the RTT variable to prepare the next main patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13tcp: new helper for RACK to detect lossYuchung Cheng
Create a new helper tcp_rack_detect_loss to prepare the upcoming RACK reordering timer patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13Merge branch 'for-linus-4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are all over the place. The tracepoint part of the pull fixes a crash and adds a little more information to two tracepoints, while the rest are good old fashioned fixes" * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: make tracepoint format strings more compact Btrfs: add truncated_len for ordered extent tracepoints Btrfs: add 'inode' for extent map tracepoint btrfs: fix crash when tracepoint arguments are freed by wq callbacks Btrfs: adjust outstanding_extents counter properly when dio write is split Btrfs: fix lockdep warning about log_mutex Btrfs: use down_read_nested to make lockdep silent btrfs: fix locking when we put back a delayed ref that's too new btrfs: fix error handling when run_delayed_extent_op fails btrfs: return the actual error value from from btrfs_uuid_tree_iterate
2017-01-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: - fix for module unload vs deferred jump labels (note: there might be other buggy modules!) - two NULL pointer dereferences from syzkaller - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem made worse during this merge window, "just" kernel memory leak on releases - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix emulation of "MOV SS, null selector" KVM: x86: fix NULL deref in vcpu_scan_ioapic KVM: eventfd: fix NULL deref irqbypass consumer KVM: x86: Introduce segmented_write_std KVM: x86: flush pending lapic jump label updates on module unload jump_labels: API for flushing deferred jump label updates
2017-01-13block: add blk_rq_payload_bytesChristoph Hellwig
Add a helper to calculate the actual data transfer size for special payload requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-13tcp: fix tcp_fastopen unaligned access complaints on sparcShannon Nelson
Fix up a data alignment issue on sparc by swapping the order of the cookie byte array field with the length field in struct tcp_fastopen_cookie, and making it a proper union to clean up the typecasting. This addresses log complaints like these: log_unaligned: 113 callbacks suppressed Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360 Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360 Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360 Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13pinctrl: core: Fix regression caused by delayed work for hogsTony Lindgren
Commit df61b366af26 ("pinctrl: core: Use delayed work for hogs") caused a regression at least with sh-pfc that is also a GPIO controller as noted by Geert Uytterhoeven <geert@linux-m68k.org>. As the original pinctrl_register() has issues calling pin controller driver functions early before the controller has finished registering, we can't just revert commit df61b366af26. That would break the drivers using GENERIC_PINCTRL_GROUPS or GENERIC_PINMUX_FUNCTIONS. So let's fix the issue with the following steps as a single patch: 1. Revert the late_init parts of commit df61b366af26. The late_init clearly won't work and we have to just give up on fixing pinctrl_register() for GENERIC_PINCTRL_GROUPS and GENERIC_PINMUX_FUNCTIONS. 2. Split pinctrl_register() into two parts By splitting pinctrl_register() into pinctrl_init_controller() and pinctrl_create_and_start() we have better control over when it's safe to call pinctrl_create(). 3. Introduce a new pinctrl_register_and_init() function As suggested by Linus Walleij <linus.walleij@linaro.org>, we can just introduce a new function for the controllers that need pinctrl_create() called later. 4. Convert the four known problem cases to use new function Let's convert pinctrl-imx, pinctrl-single, sh-pfc and ti-iodelay to use the new function to fix the issues. The rest of the drivers can be converted later. Let's also update Documentation/pinctrl.txt accordingly because of the known issues with pinctrl_register(). Fixes: df61b366af26 ("pinctrl: core: Use delayed work for hogs") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gary Bisson <gary.bisson@boundarydevices.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-01-13KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systemsJintack Lim
Current KVM world switch code is unintentionally setting wrong bits to CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical timer. Bit positions of CNTHCTL_EL2 are changing depending on HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is not set, but they are 11th and 10th bits respectively when E2H is set. In fact, on VHE we only need to set those bits once, not for every world switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE == 1, which makes those bits have no effect for the host kernel execution. So we just set those bits once for guests, and that's it. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13cfg80211: Fix documentation for connect resultJouni Malinen
The function documentation for cfg80211_connect_bss() and cfg80211_connect_result() was still claiming that they are used only for a success case while these functions can now be used to report both success and various failure cases. The actual use cases were already described in the connect() documentation. Update the function specific comments to note the failure cases and also describe how the special status == -1 case is used in cfg80211_connect_bss() to indicate a connection timeout based on the internal implementation in cfg80211_connect_timeout(). Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> [use tabs for indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13cfg80211: Specify the reason for connect timeoutPurushottam Kushwaha
This enhances the connect timeout API to also carry the reason for the timeout. These reason codes for the connect time out are represented by enum nl80211_timeout_reason and are passed to user space through a new attribute NL80211_ATTR_TIMEOUT_REASON (u32). Signed-off-by: Purushottam Kushwaha <pkushwah@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> [keep gfp_t argument last] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13cfg80211: Add support to sched scan to report better BSSsvamsi krishna
Enhance sched scan to support option of finding a better BSS while in connected state. Firmware scans the medium and reports when it finds a known BSS which has better RSSI than the current connected BSS. New attributes to specify the relative RSSI (compared to the current BSS) are added to the sched scan to implement this. Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13cfg80211: Add support for randomizing TA of Public Action framesvamsi krishna
Add support to use a random local address (Address 2 = TA in transmit and the same address in receive functionality) for Public Action frames in order to improve privacy of WLAN clients. Applications fill the random transmit address in the frame buffer in the NL80211_CMD_FRAME command. This can be used only with the drivers that indicate support for random local address by setting the new NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA and/or NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA_CONNECTED in ext_features. The driver needs to configure receive behavior to accept frames to the specified random address during the time the frame exchange is pending and such frames need to be acknowledged similarly to frames sent to the local permanent address when this random address functionality is not used. Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13wext: uninline stream addition functionsJohannes Berg
With 78, 111 and 85 bytes respectively (on x86-64), the functions iwe_stream_add_event(), iwe_stream_add_point() and iwe_stream_add_value() really shouldn't be inlines. It appears that at least my compiler already decided the same, and created a single instance of each one of them for each file using it, but that's still a number of instances in the system overall, which this reduces. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-12Merge tag 'sound-4.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This time we got a few more fixes than the previous rc's, and most of commits were about ASoC. The only significant change in the core side is the regression fix wrt the aux device list handling, and all the rest are driver-specific small / trivial fixes" * tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add a quirk for Plantronics BT600 ASoC: rt5645: set sel_i2s_pre_div1 to 2 ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused ASoC: Intel: Skylake: Release FW ctx in cleanup ASoC: Intel: bytcr-rt5640: fix settings in internal clock mode ASoC: fsl_ssi: set fifo watermark to more reliable value ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL ASoC: nau8825: correct the function name of register ASoC: Intel: Skylake: Fix to fail safely if module not available in path ASoC: tlv320aic3x: Mark the RESET register as volatile ASoC: Fix binding and probing of auxiliary components ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data ASoC: Intel: bytcr_rt5640: fallback mechanism if MCLK is not enabled ASoC: hdmi-codec: use unsigned type to structure members with bit-field ASoC: topology: kfree kcontrol->private_value before freeing kcontrol ASoC: rsnd: don't double free kctrl ASoC: dwc: Fix PIO mode initialization
2017-01-12ipmr: improve hash scalabilityNikolay Aleksandrov
Recently we started using ipmr with thousands of entries and easily hit soft lockups on smaller devices. The reason is that the hash function uses the high order bits from the src and dst, but those don't change in many common cases, also the hash table is only 64 elements so with thousands it doesn't scale at all. This patch migrates the hash table to rhashtable, and in particular the rhl interface which allows for duplicate elements to be chained because of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple duplicate entries to be added with different interfaces (IMO wrong, but it's been in for a long time). And here are some results from tests I've run in a VM: mr_table size (default, allocated for all namespaces): Before After 49304 bytes 2400 bytes Add 65000 routes (the diff is much larger on smaller devices): Before After 1m42s 58s Forwarding 256 byte packets with 65000 routes (test done in a VM): Before After 3 Mbps / ~1465 pps 122 Mbps / ~59000 pps As a bonus we no longer see the soft lockups on smaller devices which showed up even with 2000 entries before. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12sunrpc: don't call sleeping functions from the notifier block callbacksScott Mayhew
The inet6addr_chain is an atomic notifier chain, so we can't call anything that might sleep (like lock_sock)... instead of closing the socket from svc_age_temp_xprts_now (which is called by the notifier function), just have the rpc service threads do it instead. Cc: stable@vger.kernel.org Fixes: c3d4879e01be "sunrpc: Add a function to close..." Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12drm/probe-helpers: Drop locking from poll_enableDaniel Vetter
It was only needed to protect the connector_list walking, see commit 8c4ccc4ab6f64e859d4ff8d7c02c2ed2e956e07f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Jul 9 23:44:26 2015 +0200 drm/probe-helper: Grab mode_config.mutex in poll_init/enable Unfortunately the commit message of that patch fails to mention that the new locking check was for the connector_list. But that requirement disappeared in commit c36a3254f7857f1ad9badbe3578ccc92be541a8e Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Dec 15 16:58:43 2016 +0100 drm: Convert all helpers to drm_connector_list_iter and so we can drop this again. This fixes a locking inversion on nouveau, where the rpm code needs to re-enable. But in other places the rpm_get() calls are nested within the big modeset locks. While at it, also improve the kerneldoc for these two functions a notch. v2: Update the kerneldoc even more to explain that these functions can't be called concurrently, or bad things happen (Chris). Cc: Dave Airlie <airlied@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Lyude <lyude@redhat.com> Reviewed-by: Lyude <lyude@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111090117.5134-1-daniel.vetter@ffwll.ch
2017-01-12i2c: do not enable fall back to Host Notify by defaultDmitry Torokhov
Falling back unconditionally to HostNotify as primary client's interrupt breaks some drivers which alter their functionality depending on whether interrupt is present or not, so let's introduce a board flag telling I2C core explicitly if we want wired interrupt or HostNotify-based one: I2C_CLIENT_HOST_NOTIFY. For DT-based systems we introduce "host-notify" property that we convert to I2C_CLIENT_HOST_NOTIFY board flag. Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-01-12Merge tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteprocLinus Torvalds
Pull remoteproc fixes from Bjorn Andersson: "This fixes two regressions that have been reported to be introduced in v4.10-rc1. - correct an incorrect usage of the kref api - revert the change to make the resource table read-only. As the space each vdev resource is used as virtio device config space it must be shared with the remote" * tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteproc: Revert "remoteproc: Merge table_ptr and cached_table pointers" remoteproc: fix vdev reference management
2017-01-12Merge branch 'scsi-target-for-v4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux Pull scsi target fixes from Bart Van Assche: - a series of bug fixes for the XCOPY implementation from David Disseldorp - one bug fix for the ibmvscsis driver, a driver that is used for communication between partitions on IBM POWER systems. * 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux: ibmvscsis: Fix srp_transfer_data fail return code target: support XCOPY requests without parameters target: check for XCOPY parameter truncation target: use XCOPY segment descriptor CSCD IDs target: check XCOPY segment descriptor CSCD IDs target: simplify XCOPY wwn->se_dev lookup helper target: return UNSUPPORTED TARGET/SEGMENT DESC TYPE CODE sense target: bounds check XCOPY total descriptor list length target: bounds check XCOPY segment descriptor list target: use XCOPY TOO MANY TARGET DESCRIPTORS sense target: add XCOPY target/segment desc sense codes
2017-01-12regmap: Fixup the kernel-doc comments on functions/structuresCharles Keepax
Most of the kernel-doc comments in regmap don't actually generate correctly. This patch fixes up a few common issues, corrects some typos and adds some missing argument descriptions. The most common issues being using a : after the function name which causes the short description to not render correctly and not separating the long and short descriptions of the function. There are quite a few instances of arguments not being described or given the wrong name as well. This patch doesn't fixup functions/structures that are currently missing descriptions. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-01-12security,selinux,smack: kill security_task_wait hookStephen Smalley
As reported by yangshukui, a permission denial from security_task_wait() can lead to a soft lockup in zap_pid_ns_processes() since it only expects sys_wait4() to return 0 or -ECHILD. Further, security_task_wait() can in general lead to zombies; in the absence of some way to automatically reparent a child process upon a denial, the hook is not useful. Remove the security hook and its implementations in SELinux and Smack. Smack already removed its check from its hook. Reported-by: yangshukui <yangshukui@huawei.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>