summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-01-12x86, mce: Fix mce_start_timer semanticsBorislav Petkov
So mce_start_timer() has a 'cpu' argument which is supposed to mean to start a timer on that cpu. However, the code currently starts a timer on the *current* cpu the function runs on and causes the sanity-check in mce_timer_fn to fire: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn because it is running on the wrong cpu. This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining all the cpus in succession. Then, we were fiddling with the CMCI storm settings when starting the timer whereas there's no need for that - if there's storm happening on this newly restarted cpu, we're going to be in normal CMCI mode initially and then when the CMCI interrupt starts firing, we're going to go to the polling mode with the timer real soon. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
2014-01-12ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disablingTaras Kondratiuk
Kexec disables outer cache before jumping to reboot code, but it doesn't flush it explicitly. Flush is done implicitly inside of l2x0_disable(). But some SoC's override default .disable handler and don't flush cache. This may lead to a corrupted memory during Kexec reboot on these platforms. This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable() handlers to make it consistent with default l2x0_disable(). Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-01-12Merge branch 'fortglx/3.14/time' of ↵Ingo Molnar
git://git.linaro.org/people/john.stultz/linux into timers/core Pull timekeeping updates from John Stultz. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12Merge branch 'linus' into timers/coreIngo Molnar
Pick up the latest fixes and refresh the branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqsPrarit Bhargava
During heavy CPU-hotplug operations the following spurious kernel warnings can trigger: do_IRQ: No ... irq handler for vector (irq -1) [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ] When downing a cpu it is possible that there are unhandled irqs left in the APIC IRR register. The following code path shows how the problem can occur: 1. CPU 5 is to go down. 2. cpu_disable() on CPU 5 executes with interrupt flag cleared by local_irq_save() via stop_machine(). 3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because interrupt flag is cleared (CPU unabled to handle the irq) 4. IRQs are migrated off of CPU 5, and the vectors' irqs are set to -1. 5. stop_machine() finishes cpu_disable() 6. cpu_die() for CPU 5 executes in normal context. 7. CPU 5 attempts to handle IRQ 12 because the IRR is set for IRQ 12. The code attempts to find the vector's IRQ and cannot because it has been set to -1. 8. do_IRQ() warning displays warning about CPU 5 IRQ 12. I added a debug printk to output which CPU & vector was retriggered and discovered that that we are getting bogus events. I see a 100% correlation between this debug printk in fixup_irqs() and the do_IRQ() warning. This patchset resolves this by adding definitions for VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying the code to use them. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Rui Wang <rui.y.wang@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: janet.morgan@Intel.com Cc: tony.luck@Intel.com Cc: ruiv.wang@gmail.com Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com [ Cleaned up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12Linux 3.13-rc8v3.13-rc8Linus Torvalds
2014-01-12SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()Steven Rostedt
While running stress tests on adding and deleting ftrace instances I hit this bug: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: selinux_inode_permission+0x85/0x160 PGD 63681067 PUD 7ddbe067 PMD 0 Oops: 0000 [#1] PREEMPT CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000 RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840 RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000 RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54 R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000 R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000 FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0 Call Trace: security_inode_permission+0x1c/0x30 __inode_permission+0x41/0xa0 inode_permission+0x18/0x50 link_path_walk+0x66/0x920 path_openat+0xa6/0x6c0 do_filp_open+0x43/0xa0 do_sys_open+0x146/0x240 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff RIP selinux_inode_permission+0x85/0x160 CR2: 0000000000000020 Investigating, I found that the inode->i_security was NULL, and the dereference of it caused the oops. in selinux_inode_permission(): isec = inode->i_security; rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); Note, the crash came from stressing the deletion and reading of debugfs files. I was not able to recreate this via normal files. But I'm not sure they are safe. It may just be that the race window is much harder to hit. What seems to have happened (and what I have traced), is the file is being opened at the same time the file or directory is being deleted. As the dentry and inode locks are not held during the path walk, nor is the inodes ref counts being incremented, there is nothing saving these structures from being discarded except for an rcu_read_lock(). The rcu_read_lock() protects against freeing of the inode, but it does not protect freeing of the inode_security_struct. Now if the freeing of the i_security happens with a call_rcu(), and the i_security field of the inode is not changed (it gets freed as the inode gets freed) then there will be no issue here. (Linus Torvalds suggested not setting the field to NULL such that we do not need to check if it is NULL in the permission check). Note, this is a hack, but it fixes the problem at hand. A real fix is to restructure the destroy_inode() to call all the destructor handlers from the RCU callback. But that is a major job to do, and requires a lot of work. For now, we just band-aid this bug with this fix (it works), and work on a more maintainable solution in the future. Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-12thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once onlyHugh Dickins
We see General Protection Fault on RSI in copy_page_rep: that RSI is what you get from a NULL struct page pointer. RIP: 0010:[<ffffffff81154955>] [<ffffffff81154955>] copy_page_rep+0x5/0x10 RSP: 0000:ffff880136e15c00 EFLAGS: 00010286 RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200 RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000 RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001 R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0 Call Trace: copy_user_huge_page+0x93/0xab do_huge_pmd_wp_page+0x710/0x815 handle_mm_fault+0x15d8/0x1d70 __do_page_fault+0x14d/0x840 do_page_fault+0x2f/0x90 page_fault+0x22/0x30 do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but since shrink_huge_zero_page() can free the huge_zero_page, and we have no hold of our own on it here (except where the fourth test holds page_table_lock and has checked pmd_same), it's possible for it to answer yes the first time, but no to the second or third test. Change all those last three to tests for NULL page. (Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin in https://lkml.org/lkml/2013/3/29/103. I believe that one is due to the source page being split, and a tail page freed, while copy is in progress; and not a problem without DEBUG_PAGEALLOC, since the pmd_same check will prevent a miscopy from being made visible.) Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # v3.10 v3.11 v3.12 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-12arch: Introduce smp_load_acquire(), smp_store_release()Peter Zijlstra
A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12arch: Clean up asm/barrier.h implementations using asm-generic/barrier.hPeter Zijlstra
We're going to be adding a few new barrier primitives, and in order to avoid endless duplication make more agressive use of asm-generic/barrier.h. Change the asm-generic/barrier.h such that it allows partial barrier definitions and fills out the rest with defaults. There are a few architectures (m32r, m68k) that could probably do away with their barrier.h file entirely but are kept for now due to their unconventional nop() implementation. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131213150640.846368594@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.hPeter Zijlstra
Move the barriers functions that depend on the atomic implementation into the atomic implementation. Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Vineet Gupta <vgupta@synopsys.com> [for arch/arc bits] Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20131213150640.786183683@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASEPeter Zijlstra
The LOCK and UNLOCK barriers as described in our barrier document are generally known as ACQUIRE and RELEASE barriers in other literature. Since we plan to introduce the acquire and release nomenclature in generic kernel primitives we should amend the document to avoid confusion as to what an acquire/release means. Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131217092435.GC21999@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12block: null_blk: fix queue leak inside removing deviceMing Lei
When queue_mode is NULL_Q_MQ and null_blk is being removed, blk_cleanup_queue() isn't called to cleanup queue, so the queue allocated won't be freed. This patch calls blk_cleanup_queue() for MQ to drain all pending requests first and release the reference counter of queue kobject, then blk_mq_free_queue() will be called in queue kobject's release handler when queue kobject's reference counter drops to zero. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-12perf: Introduce a flag to enable close-on-exec in perf_event_open()Yann Droneaud
Unlike recent modern userspace API such as: epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC), fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC), signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC), or the venerable general purpose open (O_CLOEXEC), perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC (eg. close-on-exec) flag on file descriptor it returns to userspace. The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow perf_event_open() syscall to atomically set close-on-exec. Having this flag will enable userspace to remove the file descriptor from the list of file descriptors being inherited across exec, without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the associated race condition between the current thread and another thread calling fork(2) then execve(2). Links: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html - Notes in DMA buffer sharing: leak and security hole http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/dma-buf-sharing.txt?id=v3.13-rc3#n428 Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8c03f54e1598b1727c19706f3af03f98685d9fe6.1388952061.git.ydroneaud@opteya.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12perf/x86/intel: Add Intel RAPL PP1 energy counter supportStephane Eranian
This patch adds support for the Intel RAPL energy counter PP1 (Power Plane 1). On client processors, it usually corresponds to the energy consumption of the builtin graphic card. That is why the sysfs event is called energy-gpu. New event: - name: power/energy-gpu/ - code: event=0x4 - unit: 2^-32 Joules On processors without graphics, this should count 0. The patch only enables this event on client processors. Reviewed-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12perf/x86: Fix active_entry initializationStephane Eranian
This patch fixes a problem with the initialization of the struct perf_event active_entry field. It is defined inside an anonymous union and was initialized in perf_event_alloc() using INIT_LIST_HEAD(). However at that time, we do not know whether the event is going to use active_entry or hlist_entry (SW). Or at last, we don't want to make that determination there. The problem is that hlist and list_head are not initialized the same way. One is okay with NULL (from kzmalloc), the other needs to pointers to point to self. This patch resolves this problem by dropping the union. This will avoid problems later on, if someone starts using active_entry or hlist_entry without verifying that they actually overlap. This also solves the initialization problem. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12sched_clock: Disable seqlock lockdep usage in sched_clock()John Stultz
Unfortunately the seqlock lockdep enablement can't be used in sched_clock(), since the lockdep infrastructure eventually calls into sched_clock(), which causes a deadlock. Thus, this patch changes all generic sched_clock() usage to use the raw_* methods. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Krzysztof Hałasa <khalasa@piap.pl> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12seqlock: Use raw_ prefix instead of _no_lockdepJohn Stultz
Linus disliked the _no_lockdep() naming, so instead use the more-consistent raw_* prefix to the non-lockdep enabled seqcount methods. This also adds raw_ methods for the write operations as well, which will be utilized in a following patch. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Krzysztof Hałasa <khalasa@piap.pl> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12sched: Calculate effective load even if local weight is 0Rik van Riel
Thomas Hellstrom bisected a regression where erratic 3D performance is experienced on virtual machines as measured by glxgears. It identified commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA node") as the problem which had modified the behaviour of effective_load. Effective load calculates the difference to the system-wide load if a scheduling entity was moved to another CPU. The task group is not heavier as a result of the move but overall system load can increase/decrease as a result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA node") changed effective_load to make it suitable for calculating if a particular NUMA node was compute overloaded. To reduce the cost of the function, it assumed that a current sched entity weight of 0 was uninteresting but that is not the case. wake_affine() uses a weight of 0 for sync wakeups on the grounds that it is assuming the waking task will sleep and not contribute to load in the near future. In this case, we still want to calculate the effective load of the sched entity hierarchy. As effective_load is no longer used by task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide search to find swap/migration candidates), this patch simply restores the historical behaviour. Reported-and-tested-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Rik van Riel <riel@redhat.com> [ Wrote changelog] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-11x86, fpu, amd: Clear exceptions in AMD FXSAVE workaroundLinus Torvalds
Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-11kernfs: remove unnecessary NULL check in __kernfs_remove()Tejun Heo
895a068a524e ("kernfs: make kernfs_get_active() block if the node is deactivated but not removed") added "struct kernfs_root *root = kernfs_root(kn);" at the head of the function; however, the parameter @kn is checked for later implying that the function may be called with NULL. This means that we may end up invoking kernfs_root() with NULL which will oops. None of the existing users invokes removal with NULL @kn, so this bug doesn't actually trigger. We can relocate kernfs_root() invocation after NULL check; however, allowing NULL param tends to cause more confusion than actually helping anything. As there's no existing user, let's remove the spurious NULL check. This bug was detected by smatch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-11ARM: 7939/1: traps: fix opcode endianness when read from user memoryTaras Kondratiuk
Currently code has an inverted logic: opcode from user memory is swapped to a proper endianness only in case of read error. While normally opcode should be swapped only if it was read correctly from user memory. Reviewed-by: Victor Kamensky <victor.kamensky@linaro.org> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-01-11ARM: 7937/1: perf_event: Silence sparse warningStephen Boyd
arch/arm/kernel/perf_event_cpu.c:274:25: warning: incorrect type in assignment (different modifiers) arch/arm/kernel/perf_event_cpu.c:274:25: expected int ( *init_fn )( ... ) arch/arm/kernel/perf_event_cpu.c:274:25: got void const *const data Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-01-11ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous matchSudeep Holla
The MPIDR contains specific bitfields(MPIDR.Aff{2..0}) which uniquely identify a CPU, in addition to some non-identifying information and reserved bits. The ARM cpu binding defines the 'reg' property to only contain the affinity bits, and any cpu nodes with other bits set in their 'reg' entry are skipped. As such it is not necessary to mask the phys_id with MPIDR_HWID_BITMASK, and doing so could lead to matching erroneous CPU nodes in the device tree. This patch removes the masking of the physical identifier. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-01-10drivers/base: provide an infrastructure for componentised subsystemsRussell King
Subsystems such as ALSA, DRM and others require a single card-level device structure to represent a subsystem. However, firmware tends to describe the individual devices and the connections between them. Therefore, we need a way to gather up the individual component devices together, and indicate when we have all the component devices. We do this in DT by providing a "superdevice" node which specifies the components, eg: imx-drm { compatible = "fsl,drm"; crtcs = <&ipu1>; connectors = <&hdmi>; }; The superdevice is declared into the component support, along with the subcomponents. The superdevice receives callbacks to locate the subcomponents, and identify when all components are present. At this point, we bind the superdevice, which causes the appropriate subsystem to be initialised in the conventional way. When any of the components or superdevice are removed from the system, we unbind the superdevice, thereby taking the subsystem down. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()Tejun Heo
All device_schedule_callback_owner() users are converted to use device_remove_file_self(). Remove now unused {sysfs|device}_schedule_callback_owner(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Famouse last words: "final pull request" :-) I'm sending this because Jason Wang's fixes are pretty important 1) Add missing per-cpu stats initialization to ip6_vti. Otherwise lockdep spits out a call trace. From Li RongQing. 2) Fix NULL oops in wireless hwsim, from Javier Lopez 3) TIPC deferred packet queue unlink must NULL out skb->next to avoid crashes. From Erik Hugne 4) Fix access to uninitialized buffer in nf_nat netfilter code, from Daniel Borkmann 5) Fix lifetime of ipv6 loopback and SIT tunnel addresses, otherwise they basically timeout immediately. From Hannes Frederic Sowa 6) Fix DMA unmapping of TSO packets in bnx2x driver, from Michal Schmidt 7) Do not allow L2 forwarding offload via macvtap device, the way things are now it will not end up being forwaded at all. From Jason Wang 8) Fix transmit queue selection via ndo_dfwd_start_xmit(), fixing things like applying NETIF_F_LLTX to the wrong device (!!) and eliding the proper transmit watchdog handling 9) qlcnic driver was not updating tx statistics at all, from Manish Chopra" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: qlcnic: Fix ethtool statistics length calculation qlcnic: Fix bug in TX statistics net: core: explicitly select a txq before doing l2 forwarding macvlan: forbid L2 fowarding offload for macvtap bnx2x: fix DMA unmapping of TSO split BDs ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME bnx2x: prevent WARN during driver unload tipc: correctly unlink packets from deferred packet queue ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c netfilter: only warn once on wrong seqadj usage netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper NFC: Fix target mode p2p link establishment iwlwifi: add new devices for 7265 series mac80211: move "bufferable MMPDU" check to fix AP mode scan mac80211_hwsim: Fix NULL pointer dereference
2014-01-11Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs bugfixes from Ben Myers: "Here we have a bugfix for an off-by-one in the remote attribute verifier that results in a forced shutdown which you can hit with v5 superblock by creating a 64k xattr, and a fix for a missing destroy_work_on_stack() in the allocation worker. It's a bit late, but they are both fairly straightforward" * tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs: xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() xfs: fix off-by-one error in xfs_attr3_rmt_verify
2014-01-11Merge branch 'leds-fixes-for-3.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED fix from Bryan Wu: "Pali Rohár and Pavel Machek reported the LED of Nokia N900 doesn't work with our latest 3.13-rc6 kernel. Milo fixed the regression here" * 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: leds: lp5521/5523: Remove duplicate mutex
2014-01-11Merge tag 'pm+acpi-3.13-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - Recent commits modifying the lists of C-states in the intel_idle driver introduced bugs leading to crashes on some systems. Two fixes from Jiang Liu. - The ACPI AC driver should receive all types of notifications, but recent change made it ignore some of them. Fix from Alexander Mezin. - intel_pstate's validity checks for MSRs it depends on are not sufficient to catch the lack of support in nested KVM setups, so they are extended to cover that case. From Dirk Brandewie. - NEC LZ750/LS has a botched up _BIX method in its ACPI tables, so our ACPI battery driver needs a quirk for it. From Lan Tianyu. - The tpm_ppi driver sometimes leaks memory allocated by acpi_get_name(). Fix from Jiang Liu. * tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: close avn_cstates array with correct marker Revert "intel_idle: mark states tables with __initdata tag" ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters. ACPI / TPM: fix memory leak when walking ACPI namespace ACPI / AC: change notification handler type to ACPI_ALL_NOTIFY
2014-01-11Merge tag 'mfd-fixes-3.13-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes Pull MFD fix from Samuel Ortiz: "This is the 2nd MFD pull request for 3.13 It only contains one fix for the rtsx_pcr driver. Without it we see a kernel panic on some machines, when resuming from suspend to RAM" * tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes: mfd: rtsx_pcr: Disable interrupts before cancelling delayed works
2014-01-10leds: lp5521/5523: Remove duplicate mutexMilo Kim
It can be a problem when a pattern is loaded via the firmware interface. LP55xx common driver has already locked the mutex in 'lp55xx_firmware_loaded()'. So it should be deleted. On the other hand, locks are required in store_engine_load() on updating program memory. Reported-by: Pali Rohár <pali.rohar@gmail.com> Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Milo Kim <milo.kim@ti.com> Signed-off-by: Bryan Wu <cooloney@gmail.com> Cc: <stable@vger.kernel.org>
2014-01-10s390: use device_remove_file_self() instead of device_schedule_callback()Tejun Heo
driver-core now supports synchrnous self-deletion of attributes and the asynchrnous removal mechanism is scheduled for removal. Use it instead of device_schedule_callback(). * Conversions in arch/s390/pci/pci_sysfs.c and drivers/s390/block/dcssblk.c are straightforward. * drivers/s390/cio/ccwgroup.c is a bit more tricky because ccwgroup_notifier() was (ab)using device_schedule_callback() to purely obtain a process context to kick off ungroup operation which may block from a notifier callback. Rename ccwgroup_ungroup_callback() to ccwgroup_ungroup() and make it take ccwgroup_device * instead. The new function is now called directly from ccwgroup_ungroup_store(). ccwgroup_notifier() chain is updated to explicitly bounce through ccwgroup_device->ungroup_work. This also removes possible failure from memory pressure. Only compile-tested. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10scsi: use device_remove_file_self() instead of device_schedule_callback()Tejun Heo
driver-core now supports synchrnous self-deletion of attributes and the asynchrnous removal mechanism is scheduled for removal. Use it instead of device_schedule_callback(). This makes "delete" behave synchronously. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10pci: use device_remove_file_self() instead of device_schedule_callback()Tejun Heo
driver-core now supports synchrnous self-deletion of attributes and the asynchrnous removal mechanism is scheduled for removal. Use it instead of device_schedule_callback(). This makes "remove" behave synchronously. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappersTejun Heo
Sometimes it's necessary to implement a node which wants to delete nodes including itself. This isn't straightforward because of kernfs active reference. While a file operation is in progress, an active reference is held and kernfs_remove() waits for all such references to drain before completing. For a self-deleting node, this is a deadlock as kernfs_remove() ends up waiting for an active reference that itself is sitting on top of. This currently is worked around in the sysfs layer using sysfs_schedule_callback() which makes such removals asynchronous. While it works, it's rather cumbersome and inherently breaks synchronicity of the operation - the file operation which triggered the operation may complete before the removal is finished (or even started) and the removal may fail asynchronously. If a removal operation is immmediately followed by another operation which expects the specific name to be available (e.g. removal followed by rename onto the same name), there's no way to make the latter operation reliable. The thing is there's no inherent reason for this to be asynchrnous. All that's necessary to do this synchronous is a dedicated operation which drops its own active ref and deactivates self. This patch implements kernfs_remove_self() and its wrappers in sysfs and driver core. kernfs_remove_self() is to be called from one of the file operations, drops the active ref and deactivates using __kernfs_deactivate_self(), removes the self node, and restores active ref to the dead node using __kernfs_reactivate_self() so that the ref is balanced afterwards. __kernfs_remove() is updated so that it takes an early exit if the target node is already fully removed so that the active ref restored by kernfs_remove_self() after removal doesn't confuse the deactivation path. This makes implementing self-deleting nodes very easy. The normal removal path doesn't even need to be changed to use kernfs_remove_self() for the self-deleting node. The method can invoke kernfs_remove_self() on itself before proceeding the normal removal path. kernfs_remove() invoked on the node by the normal deletion path will simply be ignored. This will replace sysfs_schedule_callback(). A subtle feature of sysfs_schedule_callback() is that it collapses multiple invocations - even if multiple removals are triggered, the removal callback is run only once. An equivalent effect can be achieved by testing the return value of kernfs_remove_self() - only the one which gets %true return value should proceed with actual deletion. All other instances of kernfs_remove_self() will wait till the enclosing kernfs operation which invoked the winning instance of kernfs_remove_self() finishes and then return %false. This trivially makes all users of kernfs_remove_self() automatically show correct synchronous behavior even when there are multiple concurrent operations - all "echo 1 > delete" instances will finish only after the whole operation is completed by one of the instances. v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing and sysfs_remove_file_self() had incorrect return type. Fix it. Reported by kbuild test bot. v3: Updated to use __kernfs_{de|re}activate_self(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: implement kernfs_{de|re}activate[_self]()Tejun Heo
This patch implements four functions to manipulate deactivation state - deactivate, reactivate and the _self suffixed pair. A new fields kernfs_node->deact_depth is added so that concurrent and nested deactivations are handled properly. kernfs_node->hash is moved so that it's paired with the new field so that it doesn't increase the size of kernfs_node. A kernfs user's lock would normally nest inside active ref but during removal the user may want to perform kernfs_remove() while holding the said lock, which would introduce a reverse locking dependency. This function can be used to break such reverse dependency by allowing deactivation step to performed separately outside user's critical section. This will also be used implement kernfs_remove_self(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: make kernfs_get_active() block if the node is deactivated but not ↵Tejun Heo
removed Currently, kernfs_get_active() fails if the target node is deactivated. This is fine as a node always gets removed after deactivation; however, we're gonna add reactivation so the assumption won't hold. It'd be incorrect for kernfs_get_active() to fail for a node which was deactivated only temporarily. This patch makes kernfs_get_active() block if the node is deactivated but not removed. If the node gets reactivated (not yet implemented), it will be retried and succeed. If the node gets removed, it will be woken up and fail. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: remove kernfs_addrm_cxtTejun Heo
kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were added because there were operations which should be performed outside kernfs_mutex after adding and removing kernfs_nodes. The necessary operations were recorded in kernfs_addrm_cxt and performed by kernfs_addrm_finish(); however, after the recent changes which relocated deactivation and unmapping so that they're performed directly during removal, the only operation kernfs_addrm_finish() performs is kernfs_put(), which can be moved inside the removal path too. This patch moves the kernfs_put() of the base ref to __kernfs_remove() and remove kernfs_addrm_cxt and kernfs_addrm_start/finish(). * kernfs_add_one() is updated to grab and release the parent's active ref and kernfs_mutex itself. kernfs_get/put_active() and kernfs_addrm_start/finish() invocations around it are removed from all users. * __kernfs_remove() puts an unlinked node directly instead of chaining it to kernfs_addrm_cxt. Its callers are updated to grab and release kernfs_mutex instead of calling kernfs_addrm_start/finish() around it. v2: Updated to fit the v2 restructuring of removal path. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()Tejun Heo
kernfs_unmap_bin_file() is supposed to unmap all memory mappings of the target file before kernfs_remove() finishes; however, it currently is being called from kernfs_addrm_finish() and has the same race problem as the original implementation of deactivation when there are multiple removers - only the remover which snatches the node to its addrm_cxt->removed list is guaranteed to wait for its completion before returning. It can be fixed by moving kernfs_unmap_bin_file() invocation from kernfs_addrm_finish() to __kernfs_remove(). The function may be called multiple times but that shouldn't do any harm. We end up dropping kernfs_mutex in the removal loop and the node may be removed inbetween by someone else. kernfs_unlink_sibling() is updated to test whether the node has already been removed and return accordingly. __kernfs_remove() in turn performs post-unlinking cleanup only if it actually unlinked the node. KERNFS_HAS_MMAP test is moved out of the unmap function into __kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily. While at it, drop the now meaningless "bin" qualifier from the function name. v2: Rewritten to fit the v2 restructuring of removal path. HAS_MMAP test relocated. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: restructure removal path to fix possible premature returnTejun Heo
The recursive nature of kernfs_remove() means that, even if kernfs_remove() is not allowed to be called multiple times on the same node, there may be race conditions between removal of parent and its descendants. While we can claim that kernfs_remove() shouldn't be called on one of the descendants while the removal of an ancestor is in progress, such rule is unnecessarily restrictive and very difficult to enforce. It's better to simply allow invoking kernfs_remove() as the caller sees fit as long as the caller ensures that the node is accessible. The current behavior in such situations is broken. Whoever enters removal path first takes the node off the hierarchy and then deactivates. Following removers either return as soon as it notices that it's not the first one or can't even find the target node as it has already been removed from the hierarchy. In both cases, the following removers may finish prematurely while the nodes which should be removed and drained are still being processed by the first one. This patch restructures so that multiple removers, whether through recursion or direction invocation, always follow the following rules. * When there are multiple concurrent removers, only one puts the base ref. * Regardless of which one puts the base ref, all removers are blocked until the target node is fully deactivated and removed. To achieve the above, removal path now first deactivates the subtree, drains it and then unlinks one-by-one. __kernfs_deactivate() is called directly from __kernfs_removal() and drops and regrabs kernfs_mutex for each descendant to drain active refs. As this means that multiple removers can enter __kernfs_deactivate() for the same node, the function is updated so that it can handle multiple deactivators of the same node - only one actually deactivates but all wait till drain completion. The restructured removal path guarantees that a removed node gets unlinked only after the node is deactivated and drained. Combined with proper multiple deactivator handling, this guarantees that any invocation of kernfs_remove() returns only after the node itself and all its descendants are deactivated, drained and removed. v2: Draining separated into a separate loop (used to be in the same loop as unlink) and done from __kernfs_deactivate(). This is to allow exposing deactivation as a separate interface later. Root node removal was broken in v1 patch. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: remove KERNFS_REMOVEDTejun Heo
KERNFS_REMOVED is used to mark half-initialized and dying nodes so that they don't show up in lookups and deny adding new nodes under or renaming it; however, its role overlaps those of deactivation and removal from rbtree. It's necessary to deny addition of new children while removal is in progress; however, this role considerably intersects with deactivation - KERNFS_REMOVED prevents new children while deactivation prevents new file operations. There's no reason to have them separate making things more complex than necessary. KERNFS_REMOVED is also used to decide whether a node is still visible to vfs layer, which is rather redundant as equivalent determination can be made by testing whether the node is on its parent's children rbtree or not. This patch removes KERNFS_REMOVED. * Instead of KERNFS_REMOVED, each node now starts its life deactivated. This means that we now use both atomic_add() and atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler generates an overflow warnings when negating INT_MIN as the negation can't be represented as a positive number. Nothing is actually broken but let's bump BIAS by one to avoid the warnings for archs which negates the subtrahend.. * KERNFS_REMOVED tests in add and rename paths are replaced with kernfs_get/put_active() of the target nodes. Due to the way the add path is structured now, active ref handling is done in the callers of kernfs_add_one(). This will be consolidated up later. * kernfs_remove_one() is updated to deactivate instead of setting KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(), which is now renamed to kernfs_drain(). * kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is dropped. A node which is removed from the children rbtree is not included in the iteration in the first place. This means that a node may be visible through vfs a bit longer - it's now also visible after deactivation until the actual removal. This slightly enlarged window difference doesn't make any difference to the userland. * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with checks on the active ref. * Some comment style updates in the affected area. v2: Reordered before removal path restructuring. kernfs_active() dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE() used in the lookup paths. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()Tejun Heo
There currently are two mechanisms gating active ref lockdep annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask. The former disables lockdep annotations in kernfs_get/put_active() while the latter disables all of kernfs_deactivate(). While KERNFS_ACTIVE_REF also behaves as an optimization to skip the deactivation step for non-file nodes, the benefit is marginal and it needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF and use KERNFS_LOCKDEP in kernfs_deactivate() too. While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP flag so that it's more convenient and the related code can be compiled out when not enabled. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitqTejun Heo
kernfs_node->u.completion is used to notify deactivation completion from kernfs_put_active() to kernfs_deactivate(). We now allow multiple racing removals of the same node and the current removal scheme is no longer correct - kernfs_remove() invocation may return before the node is properly deactivated if it races against another removal. The removal path will be restructured to address the issue. To help such restructure which requires supporting multiple waiters, this patch replaces kernfs_node->u.completion with kernfs_root->deactivate_waitq. This makes deactivation event notifications share a per-root waitqueue_head; however, the wait path is quite cold and this will also allow shaving one pointer off kernfs_node. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10kernfs: fix get_active failure handling in kernfs_seq_*()Tejun Heo
When kernfs_seq_start() fails to obtain an active reference, it returns ERR_PTR(-ENODEV). kernfs_seq_stop() is then invoked with the error pointer value; however, it still proceeds to invoke kernfs_put_active() on the node leading to unbalanced put. If kernfs_seq_stop() is called even after active ref failure, it should skip invocation of @ops->seq_stop() and put_active. Unfortunately, this is a bit complicated because active ref failure isn't the only thing which may fail with ERR_PTR(-ENODEV). @ops->seq_start/next() may also fail with the error value and kernfs_seq_stop() doesn't have a way to tell apart those failures. Work it around by factoring out the active part of kernfs_seq_stop() into kernfs_seq_stop_active() and invoking it directly if @ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating kernfs_seq_stop() to skip kernfs_seq_stop_active() on ERR_PTR(-ENODEV). This is a bit nasty but ensures that the active put is skipped iff get_active failed in kernfs_seq_start(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10drm/i915/bdw: make sure south port interrupts are enabled properly v2Jesse Barnes
We were apparently relying on the defaults on BDW, which resulted in no hotplug or AUX interrupts. So be sure to call the ibx_irq_preinstall to enable all interrupts. v2: use preinstall instead of redundant SDIER write References: https://bugs.freedesktop.org/show_bug.cgi?id=72834 References: https://bugs.freedesktop.org/show_bug.cgi?id=72833 Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-01-10xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()Chuansheng Liu
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to call destroy_work_on_stack() which frees the debug object to pair with INIT_WORK_ONSTACK(). Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 6f96b3063cdd473c68664a190524ed966ac0cd92)
2014-01-10xfs: fix off-by-one error in xfs_attr3_rmt_verifyJie Liu
With CRC check is enabled, if trying to set an attributes value just equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote attr write verification procedure failure, which would yield the back trace like below: <snip> XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c <snip> Call Trace: [<ffffffff816f0042>] dump_stack+0x45/0x56 [<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs] [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs] [<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs] [<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs] [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs] [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs] [<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs] [<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460 [<ffffffff81097230>] ? wake_up_state+0x20/0x20 [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs] [<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs] [<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs] [<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs] [<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs] [<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs] [<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs] [<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs] [<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs] [<ffffffff811df9b2>] generic_setxattr+0x62/0x80 [<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0 [<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10 [<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0 [<ffffffff811e054e>] setxattr+0x12e/0x1c0 [<ffffffff811c6e82>] ? final_putname+0x22/0x50 [<ffffffff811c708b>] ? putname+0x2b/0x40 [<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90 [<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0 [<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0 [<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0 [<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f Tests: setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile This patch fix it to check the remote EA size is greater than the XATTR_SIZE_MAX rather than more than or equal to it, because it's valid if the specified EA value size is equal to the limitation as per VFS setxattr interface. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 85dd0707f0cad26d60f2dc574d17a5ab948d10f7)
2014-01-10qlcnic: Fix ethtool statistics length calculationShahed Shaikh
o Consider number of Tx queues while calculating the length of Tx statistics as part of ethtool stats. o Calculate statistics lenght properly for 82xx and 83xx adapter Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10qlcnic: Fix bug in TX statisticsManish Chopra
o Driver was not updating TX stats so it was not populating statistics in `ifconfig` command output. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>