summaryrefslogtreecommitdiff
path: root/kernel/locking/lockdep.c
AgeCommit message (Collapse)Author
2018-05-14locking/lockdep: Move sanity check to inside lockdep_print_held_locks()Tetsuo Handa
Calling lockdep_print_held_locks() on a running thread is considered unsafe. Since all callers should follow that rule and the sanity check is not heavy, this patch moves the sanity check to inside lockdep_print_held_locks(). As a side effect of this patch, the number of locks held by running threads will be printed as well. This change will be preferable when we want to know which threads might be relevant to a problem but are unable to print any clues because that thread is running. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1523011279-8206-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-14locking/lockdep: Use for_each_process_thread() for debug_show_all_locks()Tetsuo Handa
debug_show_all_locks() tries to grab the tasklist_lock for two seconds, but calling while_each_thread() without tasklist_lock held is not safe. See the following commit for more information: 4449a51a7c281602 ("vm_is_stack: use for_each_thread() rather then buggy while_each_thread()") Change debug_show_all_locks() from "do_each_thread()/while_each_thread() with possibility of missing tasklist_lock" to "for_each_process_thread() with RCU", and add a call to touch_all_softlockup_watchdogs() like show_state_filter() does. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1523011279-8206-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-29lockdep: Make the lock debug output more usefulTetsuo Handa
The lock debug output in print_lock() has a few shortcomings: - It prints the hlock->acquire_ip field in %px and %pS format. That's redundant information. - It lacks information about the lock object itself. The lock class is not helpful to identify a particular instance of a lock. Change the output so it prints: - hlock->instance to allow identification of a particular lock instance. - only the %pS format of hlock->ip_acquire which is sufficient to decode the actual code line with faddr2line. The resulting output is: 3 locks held by a.out/31106: #0: 00000000b0f753ba (&mm->mmap_sem){++++}, at: copy_process.part.41+0x10d5/0x1fe0 #1: 00000000ef64d539 (&mm->mmap_sem/1){+.+.}, at: copy_process.part.41+0x10fe/0x1fe0 #2: 00000000b41a282e (&mapping->i_mmap_rwsem){++++}, at: copy_process.part.41+0x12f2/0x1fe0 [ tglx: Massaged changelog ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/201803271941.GBE57310.tVSOJLQOFFOHFM@I-love.SAKURA.ne.jp
2018-02-28locking/lockdep: Show unadorned pointersBorislav Petkov
Show unadorned pointers in lockdep reports - lockdep is a debugging facility and hashing pointers there doesn't make a whole lotta sense. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20180226134926.23069-1-bp@alien8.de
2018-01-30Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes relate to making lock_is_held() et al (and external wrappers of them) work on const data types - this requires const propagation through the depths of lockdep. This removes a number of ugly type hacks the external helpers used" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Convert some users to const lockdep: Make lockdep checking constant lockdep: Assign lock keys on registration
2018-01-24locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()Tejun Heo
debug_show_all_locks() iterates all tasks and print held locks whole holding tasklist_lock. This can take a while on a slow console device and may end up triggering NMI hardlockup detector if someone else ends up waiting for tasklist_lock. Touch the NMI watchdog while printing the held locks to avoid spuriously triggering the hardlockup detector. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-18lockdep: Make lockdep checking constantMatthew Wilcox
There are several places in the kernel which would like to pass a const pointer to lockdep_is_held(). Constify the entire path so nobody has to trick the compiler. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/20180117151414.23686-3-willy@infradead.org
2018-01-18lockdep: Assign lock keys on registrationMatthew Wilcox
Lockdep is assigning lock keys when a lock was looked up. This is unnecessary; if the lock has never been registered then it is known that it is not locked. It also complicates the calling convention. Switch to assigning the lock key in register_lock_class(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/20180117151414.23686-2-willy@infradead.org
2017-12-12locking/lockdep: Remove the cross-release locking checksIngo Molnar
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y), while it found a number of old bugs initially, was also causing too many false positives that caused people to disable lockdep - which is arguably a worse overall outcome. If we disable cross-release by default but keep the code upstream then in practice the most likely outcome is that we'll allow the situation to degrade gradually, by allowing entropy to introduce more and more false positives, until it overwhelms maintenance capacity. Another bad side effect was that people were trying to work around the false positives by uglifying/complicating unrelated code. There's a marked difference between annotating locking operations and uglifying good code just due to bad lock debugging code ... This gradual decrease in quality happened to a number of debugging facilities in the kernel, and lockdep is pretty complex already, so we cannot risk this outcome. Either cross-release checking can be done right with no false positives, or it should not be included in the upstream kernel. ( Note that it might make sense to maintain it out of tree and go through the false positives every now and then and see whether new bugs were introduced. ) Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06locking/lockdep: Fix possible NULL derefPeter Zijlstra
We can't invalidate xhlocks when we've not yet allocated any. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: f52be5708076 ("locking/lockdep: Untangle xhlock history save/restore from task independence") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-15kmemcheck: remove annotationsLevin, Alexander (Sasha Levin)
Patch series "kmemcheck: kill kmemcheck", v2. As discussed at LSF/MM, kill kmemcheck. KASan is a replacement that is able to work without the limitation of kmemcheck (single CPU, slow). KASan is already upstream. We are also not aware of any users of kmemcheck (or users who don't consider KASan as a suitable replacement). The only objection was that since KASAN wasn't supported by all GCC versions provided by distros at that time we should hold off for 2 years, and try again. Now that 2 years have passed, and all distros provide gcc that supports KASAN, kill kmemcheck again for the very same reasons. This patch (of 4): Remove kmemcheck annotations, and calls to kmemcheck from the kernel. [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs] Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-25locking/lockdep: Introduce CONFIG_BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK=yByungchul Park
Add a Kconfig knob that enables the lockdep "crossrelease_fullstack" boot parameter. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: darrick.wong@oracle.com Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johan@kernel.org Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-7-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25locking/lockdep: Add a boot parameter allowing unwind in cross-release and ↵Byungchul Park
disable it by default Johan Hovold reported a heavy performance regression caused by lockdep cross-release: > Boot time (from "Linux version" to login prompt) had in fact doubled > since 4.13 where it took 17 seconds (with my current config) compared to > the 35 seconds I now see with 4.14-rc4. > > I quick bisect pointed to lockdep and specifically the following commit: > > 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition > of a crosslock") > > which I've verified is the commit which doubled the boot time (compared > to 28a903f63ec0^) (added by lockdep crossrelease series [1]). Currently cross-release performs unwind on every acquisition, but that is very expensive. This patch makes unwind optional and disables it by default and only records acquire_ip. Full stack traces are sometimes required for full analysis, in which case a boot paramter, crossrelease_fullstack, can be specified. On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU', where $QEMU launches a kernel with init=/bin/true: 1. No lockdep enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.756558155 seconds time elapsed ( +- 0.09% ) 2. Lockdep enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.968710420 seconds time elapsed ( +- 0.12% ) 3. Lockdep enabled + cross-release enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 3.153839636 seconds time elapsed ( +- 0.31% ) 4. Lockdep enabled + cross-release enabled + this patch applied: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.963669551 seconds time elapsed ( +- 0.11% ) I.e. lockdep cross-release performance is now indistinguishable from vanilla lockdep. Bisected-by: Johan Hovold <johan@kernel.org> Analyzed-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Johan Hovold <johan@kernel.org> Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: darrick.wong@oracle.com Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-5-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10locking/lockdep: Fix stacktrace messPeter Zijlstra
There is some complication between check_prevs_add() and check_prev_add() wrt. saving stack traces. The problem is that we want to be frugal with saving stack traces, since it consumes static resources. We'll only know in check_prev_add() if we need the trace, but we can call into it multiple times. So we want to do on-demand and re-use. A further complication is that check_prev_add() can drop graph_lock and mess with our static resources. In any case, the current state; after commit: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") is that we'll assume the trace contains valid data once check_prev_add() returns '2'. However, as noted by Josh, this is false, check_prev_add() can return '2' before having saved a trace, this then result in the possibility of using uninitialized data. Testing, as reported by Wu, shows a NULL deref. So simplify. Since the graph_lock() thing is a debug path that hasn't really been used in a long while, take it out back and avoid the head-ache. Further initialize the stack_trace to a known 'empty' state; as long as nr_entries == 0, nothing should deref entries. We can then use the 'entries == NULL' test for a valid trace / on-demand saving. Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/lockdep: Untangle xhlock history save/restore from task independencePeter Zijlstra
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to ensure the temporal IRQ events don't interact with task state, the XHLOCK_PROC is a fundament different beast that just happens to share the interface. The purpose of XHLOCK_PROC is to annotate independent execution inside one task. For example workqueues, each work should appear to run in its own 'pristine' 'task'. Remove XHLOCK_PROC in favour of its own interface to avoid confusion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: kernel-team@lge.com Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25locking/lockdep: Fix workqueue crossrelease annotationPeter Zijlstra
The new completion/crossrelease annotations interact unfavourable with the extant flush_work()/flush_workqueue() annotations. The problem is that when a single work class does: wait_for_completion(&C) and complete(&C) in different executions, we'll build dependencies like: lock_map_acquire(W) complete_acquire(C) and lock_map_acquire(W) complete_release(C) which results in the dependency chain: W->C->W, which lockdep thinks spells deadlock, even though there is no deadlock potential since works are ran concurrently. One possibility would be to change the work 'lock' to recursive-read, but that would mean hitting a lockdep limitation on recursive locks. Also, unconditinoally switching to recursive-read here would fail to detect the actual deadlock on single-threaded workqueues, which do have a problem with this. For now, forcefully disregard these locks for crossrelease. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: byungchul.park@lge.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: oleg@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-14locking/lockdep: Fix the rollback and overwrite detection logic in crossreleaseByungchul Park
As Boqun Feng pointed out, current->hist_id should be aligned with the latest valid xhlock->hist_id so that hist_id_save[] storing current->hist_id can be comparable with xhlock->hist_id. Fix it. Additionally, the condition for overwrite-detection should be the opposite. Fix the code and the comments as well. <- direction to visit hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh (h: history) ^^ ^ || start from here |previous entry current entry Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: linux-mm@kvack.org Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502694052-16085-3-git-send-email-byungchul.park@lge.com [ Improve the comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-14locking/lockdep: Add a comment about crossrelease_hist_end() in ↵Byungchul Park
lockdep_sys_exit() In lockdep_sys_exit(), crossrelease_hist_end() is called unconditionally even when getting here without having started e.g. just after forking. But it's no problem since it would roll back to an invalid entry anyway. Add a comment to explain this. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: linux-mm@kvack.org Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502694052-16085-2-git-send-email-byungchul.park@lge.com [ Improved the description and the comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Make print_circular_bug() aware of crossreleaseByungchul Park
print_circular_bug() reporting circular bug assumes that target hlock is owned by the current. However, in crossrelease, target hlock can be owned by other than the current. So the report format needs to be changed to reflect the change. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-9-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Handle non(or multi)-acquisition of a crosslockByungchul Park
No acquisition might be in progress on commit of a crosslock. Completion operations enabling crossrelease are the case like: CONTEXT X CONTEXT Y --------- --------- trigger completion context complete AX commit AX wait_for_complete AX acquire AX wait where AX is a crosslock. When no acquisition is in progress, we should not perform commit because the lock does not exist, which might cause incorrect memory access. So we have to track the number of acquisitions of a crosslock to handle it. Moreover, in case that more than one acquisition of a crosslock are overlapped like: CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z --------- --------- --------- --------- acquire AX (gen_id: 1) acquire A acquire AX (gen_id: 10) acquire B commit AX acquire C commit AX where A, B and C are typical locks and AX is a crosslock. Current crossrelease code performs commits in Y and Z with gen_id = 10. However, we can use gen_id = 1 to do it, since not only 'acquire AX in X' but 'acquire AX in W' also depends on each acquisition in Y and Z until their commits. So make it use gen_id = 1 instead of 10 on their commits, which adds an additional dependency 'AX -> A' in the example above. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-8-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Detect and handle hist_lock ring buffer overwriteByungchul Park
The ring buffer can be overwritten by hardirq/softirq/work contexts. That cases must be considered on rollback or commit. For example, |<------ hist_lock ring buffer size ----->| ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii wrapped > iiiiiiiiiiiiiiiiiiiiiii.................... where 'p' represents an acquisition in process context, 'i' represents an acquisition in irq context. On irq exit, crossrelease tries to rollback idx to original position, but it should not because the entry already has been invalid by overwriting 'i'. Avoid rollback or commit for entries overwritten. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-7-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Implement the 'crossrelease' featureByungchul Park
Lockdep is a runtime locking correctness validator that detects and reports a deadlock or its possibility by checking dependencies between locks. It's useful since it does not report just an actual deadlock but also the possibility of a deadlock that has not actually happened yet. That enables problems to be fixed before they affect real systems. However, this facility is only applicable to typical locks, such as spinlocks and mutexes, which are normally released within the context in which they were acquired. However, synchronization primitives like page locks or completions, which are allowed to be released in any context, also create dependencies and can cause a deadlock. So lockdep should track these locks to do a better job. The 'crossrelease' implementation makes these primitives also be tracked. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Make check_prev_add() able to handle external stack_traceByungchul Park
Currently, a space for stack_trace is pinned in check_prev_add(), that makes us not able to use external stack_trace. The simplest way to achieve it is to pass an external stack_trace as an argument. A more suitable solution is to pass a callback additionally along with a stack_trace so that callers can decide the way to save or whether to save. Actually crossrelease needs to do other than saving a stack_trace. So pass a stack_trace and callback to handle it, to check_prev_add(). Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Change the meaning of check_prev_add()'s return valueByungchul Park
Firstly, return 1 instead of 2 when 'prev -> next' dependency already exists. Since the value 2 is not referenced anywhere, just return 1 indicating success in this case. Secondly, return 2 instead of 1 when successfully added a lock_list entry with saving stack_trace. With that, a caller can decide whether to avoid redundant save_trace() on the caller site. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-4-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Add a function building a chain between two classesByungchul Park
Crossrelease needs to build a chain between two classes regardless of their contexts. However, add_chain_cache() cannot be used for that purpose since it assumes that it's called in the acquisition context of the hlock. So this patch introduces a new function doing it. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-3-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Refactor lookup_chain_cache()Byungchul Park
Currently, lookup_chain_cache() provides both 'lookup' and 'add' functionalities in a function. However, each is useful. So this patch makes lookup_chain_cache() only do 'lookup' functionality and makes add_chain_cahce() only do 'add' functionality. And it's more readable than before. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/1502089981-21272-2-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Avoid creating redundant linksPeter Zijlstra
Two boots + a make defconfig, the first didn't have the redundant bit in, the second did: lock-classes: 1168 1169 [max: 8191] direct dependencies: 7688 5812 [max: 32768] indirect dependencies: 25492 25937 all direct dependencies: 220113 217512 dependency chains: 9005 9008 [max: 65536] dependency chain hlocks: 34450 34366 [max: 327680] in-hardirq chains: 55 51 in-softirq chains: 371 378 in-process chains: 8579 8579 stack-trace entries: 108073 88474 [max: 524288] combined max dependencies: 178738560 169094640 max locking depth: 15 15 max bfs queue depth: 320 329 cyclic checks: 9123 9190 redundant checks: 5046 redundant links: 1828 find-mask forwards checks: 2564 2599 find-mask backwards checks: 39521 39789 So it saves nearly 2k links and a fair chunk of stack-trace entries, but as expected, makes no real difference on the indirect dependencies. At the same time, you see the max BFS depth increase, which is also expected, although it could easily be boot variance -- these numbers are not entirely stable between boots. The down side is that the cycles in the graph become larger and thus the reports harder to read. XXX: do we want this as a CONFIG variable, implied by LOCKDEP_SMALL? Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: iamjoonsoo.kim@lge.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Link: http://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking/lockdep: Rework FS_RECLAIM annotationPeter Zijlstra
A while ago someone, and I cannot find the email just now, asked if we could not implement the RECLAIM_FS inversion stuff with a 'fake' lock like we use for other things like workqueues etc. I think this should be possible which allows reducing the 'irq' states and will reduce the amount of __bfs() lookups we do. Removing the 1 IRQ state results in 4 less __bfs() walks per dependency, improving lockdep performance. And by moving this annotation out of the lockdep code it becomes easier for the mm people to extend. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: boqun.feng@gmail.com Cc: iamjoonsoo.kim@lge.com Cc: kernel-team@lge.com Cc: kirill@shutemov.name Cc: npiggin@gmail.com Cc: walken@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-08rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig optionPaul E. McKenney
The PROVE_RCU_REPEATEDLY Kconfig option was initially added due to the volume of messages from PROVE_RCU: Doing just one per boot would have required excessive numbers of boots to locate them all. However, PROVE_RCU messages are now relatively rare, so there is no longer any reason to need more than one such message per boot. This commit therefore removes the PROVE_RCU_REPEATEDLY Kconfig option. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org>
2017-06-08lockdep: Use consistent printing primitivesPaul E. McKenney
Commit a5dd63efda3d ("lockdep: Use "WARNING" tag on lockdep splats") substituted pr_warn() for printk() in places called out by Dmitry Vyukov. However, this resulted in an ugly mix of pr_warn() and printk(). This commit therefore changes printk() to pr_warn() or pr_cont(), depending on the absence or presence of KERN_CONT. This is done in all functions that had printk() changed to pr_warn() by the aforementioned commit. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-10Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
2017-05-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - most of MM - KASAN updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) kasan: separate report parts by empty lines kasan: improve double-free report format kasan: print page description after stacks kasan: improve slab object description kasan: change report header kasan: simplify address description logic kasan: change allocation and freeing stack traces headers kasan: unify report headers kasan: introduce helper functions for determining bug type mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page mm: hwpoison: call shake_page() unconditionally mm/swapfile.c: fix swap space leak in error path of swap_free_entries() mm/gup.c: fix access_ok() argument type mm/truncate: avoid pointless cleancache_invalidate_inode() calls. mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty fs/block_dev: always invalidate cleancache in invalidate_bdev() fs: fix data invalidation in the cleancache during direct IO zram: reduce load operation in page_same_filled zram: use zram_free_page instead of open-coded zram: introduce zram data accessor ...
2017-05-03mm: introduce memalloc_nofs_{save,restore} APIMichal Hocko
GFP_NOFS context is used for the following 5 reasons currently: - to prevent from deadlocks when the lock held by the allocation context would be needed during the memory reclaim - to prevent from stack overflows during the reclaim because the allocation is performed from a deep context already - to prevent lockups when the allocation context depends on other reclaimers to make a forward progress indirectly - just in case because this would be safe from the fs POV - silence lockdep false positives Unfortunately overuse of this allocation context brings some problems to the MM. Memory reclaim is much weaker (especially during heavy FS metadata workloads), OOM killer cannot be invoked because the MM layer doesn't have enough information about how much memory is freeable by the FS layer. In many cases it is far from clear why the weaker context is even used and so it might be used unnecessarily. We would like to get rid of those as much as possible. One way to do that is to use the flag in scopes rather than isolated cases. Such a scope is declared when really necessary, tracked per task and all the allocation requests from within the context will simply inherit the GFP_NOFS semantic. Not only this is easier to understand and maintain because there are much less problematic contexts than specific allocation requests, this also helps code paths where FS layer interacts with other layers (e.g. crypto, security modules, MM etc...) and there is no easy way to convey the allocation context between the layers. Introduce memalloc_nofs_{save,restore} API to control the scope of GFP_NOFS allocation context. This is basically copying memalloc_noio_{save,restore} API we have for other restricted allocation context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is just an alias for PF_FSTRANS which has been xfs specific until recently. There are no more PF_FSTRANS users anymore so let's just drop it. PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags is renamed to current_gfp_context because it now cares about both PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve their semantic. kmem_flags_convert() doesn't need to evaluate the flag anymore. This patch shouldn't introduce any functional changes. Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS) usage as much as possible and only use a properly documented memalloc_nofs_{save,restore} checkpoints where they are appropriate. [akpm@linux-foundation.org: fix comment typo, reflow comment] Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03lockdep: allow to disable reclaim lockup detectionMichal Hocko
The current implementation of the reclaim lockup detection can lead to false positives and those even happen and usually lead to tweak the code to silence the lockdep by using GFP_NOFS even though the context can use __GFP_FS just fine. See http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example. ================================= [ INFO: inconsistent lock state ] 4.5.0-rc2+ #4 Tainted: G O --------------------------------- inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs] {RECLAIM_FS-ON-R} state was registered at: mark_held_locks+0x79/0xa0 lockdep_trace_alloc+0xb3/0x100 kmem_cache_alloc+0x33/0x230 kmem_zone_alloc+0x81/0x120 [xfs] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs] __xfs_refcount_find_shared+0x75/0x580 [xfs] xfs_refcount_find_shared+0x84/0xb0 [xfs] xfs_getbmap+0x608/0x8c0 [xfs] xfs_vn_fiemap+0xab/0xc0 [xfs] do_vfs_ioctl+0x498/0x670 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x12/0x6f CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 3 locks held by kswapd0/543: stack backtrace: CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4 Call Trace: lock_acquire+0xd8/0x1e0 down_write_nested+0x5e/0xc0 xfs_ilock+0x177/0x200 [xfs] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] xfs_fs_evict_inode+0xdc/0x1e0 [xfs] evict+0xc5/0x190 dispose_list+0x39/0x60 prune_icache_sb+0x4b/0x60 super_cache_scan+0x14f/0x1a0 shrink_slab.part.63.constprop.79+0x1e9/0x4e0 shrink_zone+0x15e/0x170 kswapd+0x4f1/0xa80 kthread+0xf2/0x110 ret_from_fork+0x3f/0x70 To quote Dave: "Ignoring whether reflink should be doing anything or not, that's a "xfs_refcountbt_init_cursor() gets called both outside and inside transactions" lockdep false positive case. The problem here is lockdep has seen this allocation from within a transaction, hence a GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context. Also note that we have an active reference to this inode. So, because the reclaim annotations overload the interrupt level detections and it's seen the inode ilock been taken in reclaim ("interrupt") context, this triggers a reclaim context warning where it thinks it is unsafe to do this allocation in GFP_KERNEL context holding the inode ilock..." This sounds like a fundamental problem of the reclaim lock detection. It is really impossible to annotate such a special usecase IMHO unless the reclaim lockup detection is reworked completely. Until then it is much better to provide a way to add "I know what I am doing flag" and mark problematic places. This would prevent from abusing GFP_NOFS flag which has a runtime effect even on configurations which have lockdep disabled. Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to skip the current allocation request. While we are at it also make sure that the radix tree doesn't accidentaly override tags stored in the upper part of the gfp_mask. Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03lockdep: teach lockdep about memalloc_noio_saveNikolay Borisov
Patch series "scope GFP_NOFS api", v5. This patch (of 7): Commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation") added the memalloc_noio_(save|restore) functions to enable people to modify the MM behavior by disabling I/O during memory allocation. This was further extended in commit 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent allocation paths recursing back into the filesystem without explicitly changing the flags for every allocation site. However, lockdep hasn't been keeping up with the changes and it entirely misses handling the memalloc_noio adjustments. Instead, it is left to the callers of __lockdep_trace_alloc to call the function after they have shaven the respective GFP flags which can lead to false positives: ================================= [ INFO: inconsistent lock state ] 4.10.0-nbor #134 Not tainted --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230 {IN-RECLAIM_FS-W} state was registered at: __lock_acquire+0x62a/0x17c0 lock_acquire+0xc5/0x220 down_write_nested+0x4f/0x90 xfs_ilock+0x141/0x230 xfs_reclaim_inode+0x12a/0x320 xfs_reclaim_inodes_ag+0x2c8/0x4e0 xfs_reclaim_inodes_nr+0x33/0x40 xfs_fs_free_cached_objects+0x19/0x20 super_cache_scan+0x191/0x1a0 shrink_slab+0x26f/0x5f0 shrink_node+0xf9/0x2f0 kswapd+0x356/0x920 kthread+0x10c/0x140 ret_from_fork+0x31/0x40 irq event stamp: 173777 hardirqs last enabled at (173777): __local_bh_enable_ip+0x70/0xc0 hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0 softirqs last enabled at (173776): _xfs_buf_find+0x67a/0xb70 softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 4 locks held by fsstress/3365: #0: (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50 #1: (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0 #2: (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140 #3: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230 stack backtrace: CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: kmem_cache_alloc_node_trace+0x3a/0x2c0 vm_map_ram+0x2a1/0x510 _xfs_buf_map_pages+0x77/0x140 xfs_buf_get_map+0x185/0x2a0 xfs_attr_rmtval_set+0x233/0x430 xfs_attr_leaf_addname+0x2d2/0x500 xfs_attr_set+0x214/0x420 xfs_xattr_set+0x59/0xb0 __vfs_setxattr+0x76/0xa0 __vfs_setxattr_noperm+0x5e/0xf0 vfs_setxattr+0xae/0xb0 setxattr+0x15e/0x1a0 path_setxattr+0x8f/0xc0 SyS_lsetxattr+0x11/0x20 entry_SYSCALL_64_fastpath+0x23/0xc6 Let's fix this by making lockdep explicitly do the shaving of respective GFP flags. Fixes: 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set") Link: http://lkml.kernel.org/r/20170306131408.9828-2-mhocko@kernel.org Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm u pdates from Dave Airlie: "This is the main drm pull request for v4.12. Apart from two fixes pulls, everything should have been in drm-next for at least 2 weeks. The biggest thing in here is AMD released the public headers for their upcoming VEGA GPUs. These as always are quite a sizeable chunk of header files. They've also added initial non-display support for those GPUs, though they aren't available in production yet. Otherwise it's pretty much normal. New bridge drivers: - megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++ - generic LVDS bridge support. Core: - Displayport link train failure reporting to userspace - debugfs interface cleaned up - subsystem TODO in kerneldoc now - Extended fbdev support (flipping and vblank wait) - drm_platform removed - EDP CRC support in helper - HF-VSDB SCDC support in EDID parser - Lots of code cleanups and header extraction - Thunderbolt external GPU awareness - Atomic helper improvements - Documentation improvements panel: - Sitronix and Samsung new panel support amdgpu: - Preliminary vega10 support - Multi-level page table support - GPU sensor support for userspace - PRT support for sparse buffers - SR-IOV improvements - Non-contig VRAM CPU mapping i915: - Atomic modesetting enabled by default on Gen5+ - LSPCON improvements - Atomic state handling for cdclk - GPU reset improvements - In-kernel unit tests - Geminilake improvements and color manager support - Designware i2c fixes - vblank evasion improvements - Hotplug safe connector iterators - GVT scheduler QoS support - GVT Kabylake support nouveau: - Acceleration support for Pascal (GP10x). - Rearchitecture of code handling proprietary signed firmware - Fix GTX 970 with odd MMU configuration - GP10B support - GP107 acceleration support vmwgfx: - Atomic modesetting support for vmwgfx omapdrm: - Support for render nodes - Refactor omapdss code - Fix some probe ordering issues - Fix too dark RGB565 rendering sunxi: - prelim rework for multiple pipes. mali-dp: - Color management support - Plane scaling - Power management improvements imx-drm: - Prefetch Resolve Engine/Gasket on i.MX6QP - Deferred plane disabling - Separate alpha support mediatek: - Mediatek SoC MT2701 support rcar-du: - Gen3 HDMI support msm: - 4k support for newer chips - OPP bindings for gpu - prep work for per-process pagetables vc4: - HDMI audio support - fixes qxl: - minor fixes. dw-hdmi: - PHY improvements - CSC fixes - Amlogic GX SoC support" * tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits) drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr() drm/nouveau/kms: Increase max retries in scanout position queries. drm/nouveau/bios/bitP: check that table is long enough for optional pointers drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine drm: mali-dp: use div_u64 for expensive 64-bit divisions drm/i915: Confirm the request is still active before adding it to the await drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio drm/i915/selftests: Allocate inode/file dynamically drm/i915: Fix system hang with EI UP masked on Haswell drm/i915: checking for NULL instead of IS_ERR() in mock selftests drm/i915: Perform link quality check unconditionally during long pulse drm/i915: Fix use after free in lpe_audio_platdev_destroy() drm/i915: Use the right mapping_gfp_mask for final shmem allocation drm/i915: Make legacy cursor updates more unsynced drm/i915: Apply a cond_resched() to the saturated signaler drm/i915: Park the signaler before sleeping drm: mali-dp: Check the mclk rate and allow up/down scaling drm: mali-dp: Enable image enhancement when scaling drm: mali-dp: Add plane upscaling support ...
2017-04-19lockdep: Use "WARNING" tag on lockdep splatsPaul E. McKenney
This commit changes lockdep splats to begin lines with "WARNING" and to use pr_warn() instead of printk(). This change eases scripted analysis of kernel console output. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-03-23BackMerge tag 'v4.11-rc3' into drm-nextDave Airlie
Linux 4.11-rc3 as requested by Daniel
2017-03-16locking/lockdep: Handle statically initialized PER_CPU locks properlyThomas Gleixner
If a PER_CPU struct which contains a spin_lock is statically initialized via: DEFINE_PER_CPU(struct foo, bla) = { .lock = __SPIN_LOCK_UNLOCKED(bla.lock) }; then lockdep assigns a seperate key to each lock because the logic for assigning a key to statically initialized locks is to use the address as the key. With per CPU locks the address is obvioulsy different on each CPU. That's wrong, because all locks should have the same key. To solve this the following modifications are required: 1) Extend the is_kernel/module_percpu_addr() functions to hand back the canonical address of the per CPU address, i.e. the per CPU address minus the per CPU offset. 2) Check the lock address with these functions and if the per CPU check matches use the returned canonical address as the lock key, so all per CPU locks have the same key. 3) Move the static_obj(key) check into look_up_lock_class() so this check can be avoided for statically initialized per CPU locks. That's required because the canonical address fails the static_obj(key) check for obvious reasons. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Merged Dan's fixups for !MODULES and !SMP into this patch. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Murphy <dmurphy@ti.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16locking/lockdep: Add new check to lock_downgrade()J. R. Okajima
Commit: f8319483f57f ("locking/lockdep: Provide a type check for lock_is_held") didn't fully cover rwsems as downgrade_write() was left out. Introduce lock_downgrade() and use it to add new checks. See-also: http://marc.info/?l=linux-kernel&m=148581164003149&w=2 Originally-written-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-3-git-send-email-hooanon05g@gmail.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16locking/lockdep: Factor out the validate_held_lock() helper functionJ. R. Okajima
Behaviour should not change. Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-2-git-send-email-hooanon05g@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16locking/lockdep: Factor out the find_held_lock() helper functionJ. R. Okajima
A simple consolidataion to factor out repeated patterns. The behaviour should not change. Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-1-git-send-email-hooanon05g@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-14drm/i915: annote drop_caches debugfs interface with lockdepDaniel Vetter
The trouble we have is that we can't really test all the shrinker recursion stuff exhaustively in BAT because any kind of thrashing stress test just takes too long. But that leaves a really big gap open, since shrinker recursions are one of the most annoying bugs. Now lockdep already has support for checking allocation deadlocks: - Direct reclaim paths are marked up with lockdep_set_current_reclaim_state() and lockdep_clear_current_reclaim_state(). - Any allocation paths are marked with lockdep_trace_alloc(). If we simply mark up our debugfs with the reclaim annotations, any code and locks taken in there will automatically complete the picture with any allocation paths we already have, as long as we have a simple testcase in BAT which throws out a few objects using this interface. Not stress test or thrashing needed at all. v2: Need to EXPORT_SYMBOL_GPL to make it compile as a module. v3: Fixup rebase fail (spotted by Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://patchwork.freedesktop.org/patch/msgid/20170312205340.16202-1-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-03-07Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Change the new refcount_t warnings from WARN() to WARN_ONCE() - two ww_mutex fixes - plus a new lockdep self-consistency check for a bug that triggered in practice * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/ww_mutex: Adjust the lock number for stress test locking/lockdep: Add nest_lock integrity test locking/ww_mutex: Replace cpu_relax() with cond_resched() for tests locking/refcounts: Change WARN() to WARN_ONCE()
2017-03-02locking/lockdep: Add nest_lock integrity testPeter Zijlstra
Boqun reported that hlock->references can overflow. Add a debug test for that to generate a clear error when this happens. Without this, lockdep is likely to report a mysterious failure on unlock. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/task.h> We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-20Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Implement wraparound-safe refcount_t and kref_t types based on generic atomic primitives (Peter Zijlstra) - Improve and fix the ww_mutex code (Nicolai Hähnle) - Add self-tests to the ww_mutex code (Chris Wilson) - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr Bueso) - Micro-optimize the current-task logic all around the core kernel (Davidlohr Bueso) - Tidy up after recent optimizations: remove stale code and APIs, clean up the code (Waiman Long) - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) fork: Fix task_struct alignment locking/spinlock/debug: Remove spinlock lockup detection code lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS lkdtm: Convert to refcount_t testing kref: Implement 'struct kref' using refcount_t refcount_t: Introduce a special purpose refcount type sched/wake_q: Clarify queue reinit comment sched/wait, rcuwait: Fix typo in comment locking/mutex: Fix lockdep_assert_held() fail locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock() locking/rwsem: Reinit wake_q after use locking/rwsem: Remove unnecessary atomic_long_t casts jump_labels: Move header guard #endif down where it belongs locking/atomic, kref: Implement kref_put_lock() locking/ww_mutex: Turn off __must_check for now locking/atomic, kref: Avoid more abuse locking/atomic, kref: Use kref_get_unless_zero() more locking/atomic, kref: Kill kref_sub() locking/atomic, kref: Add kref_read() locking/atomic, kref: Add KREF_INIT() ...
2017-02-10lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKSByungchul Park
Bug messages and stack dump for MAX_LOCKDEP_CHAIN_HLOCKS should only be printed once. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1484275324-28192-1-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-23lockdep: Make RCU suspicious-access splats use pr_errPaul E. McKenney
This commit switches RCU suspicious-access splats use pr_err() instead of the current INFO printk()s. This change makes it easier to automatically classify splats. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>