summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-02x86/entry/64: Remove all remaining direct thread_struct::sp0 readsAndy Lutomirski
The only remaining readers in context switch code or vm86(), and they all just want to update TSS.sp0 to match the current task. Replace them all with a new helper update_sp0(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2d231687f4ff288c9d9e98d7861b7df374246ac3.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Stop initializing TSS.sp0 at bootAndy Lutomirski
In my quest to get rid of thread_struct::sp0, I want to clean up or remove all of its readers. Two of them are in cpu_init() (32-bit and 64-bit), and they aren't needed. This is because we never enter userspace at all on the threads that CPUs are initialized in. Poison the initial TSS.sp0 and stop initializing it on CPU init. The comment text mostly comes from Dave Hansen. Thanks! Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ee4a00540ad28c6cff475fbcc7769a4460acc861.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/xen/64, x86/entry/64: Clean up SP code in cpu_initialize_context()Andy Lutomirski
I'm removing thread_struct::sp0, and Xen's usage of it is slightly dubious and unnecessary. Use appropriate helpers instead. While we're at at, reorder the code slightly to make it more obvious what's going on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d5b9a3da2b47c68325bd2bbe8f82d9554dee0d0f.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry: Add task_top_of_stack() to find the top of a task's stackAndy Lutomirski
This will let us get rid of a few places that hardcode accesses to thread.sp0. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b49b3f95a8ff858c40c9b0f5b32be0355324327d.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Pass SP0 directly to load_sp0()Andy Lutomirski
load_sp0() had an odd signature: void load_sp0(struct tss_struct *tss, struct thread_struct *thread); Simplify it to: void load_sp0(unsigned long sp0); Also simplify a few get_cpu()/put_cpu() sequences to preempt_disable()/preempt_enable(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2655d8b42ed940aa384fe18ee1129bbbcf730a08.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/32: Pull the MSR_IA32_SYSENTER_CS update code out of native_load_sp0()Andy Lutomirski
This causes the MSR_IA32_SYSENTER_CS write to move out of the paravirt callback. This shouldn't affect Xen PV: Xen already ignores MSR_IA32_SYSENTER_ESP writes. In any event, Xen doesn't support vm86() in a useful way. Note to any potential backporters: This patch won't break lguest, as lguest didn't have any SYSENTER support at all. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/75cf09fe03ae778532d0ca6c65aa58e66bc2f90c.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: De-Xen-ify our NMI codeAndy Lutomirski
Xen PV is fundamentally incompatible with our fancy NMI code: it doesn't use IST at all, and Xen entries clobber two stack slots below the hardware frame. Drop Xen PV support from our NMI code entirely. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bfbe711b5ae03f672f8848999a8eb2711efc7f98.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02xen, x86/entry/64: Add xen NMI trap entryJuergen Gross
Instead of trying to execute any NMI via the bare metal's NMI trap handler use a Xen specific one for PV domains, like we do for e.g. debug traps. As in a PV domain the NMI is handled via the normal kernel stack this is the correct thing to do. This will enable us to get rid of the very fragile and questionable dependencies between the bare metal NMI handler and Xen assumptions believed to be broken anyway. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5baf5c0528d58402441550c5770b98e7961e7680.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Remove the RESTORE_..._REGS infrastructureAndy Lutomirski
All users of RESTORE_EXTRA_REGS, RESTORE_C_REGS and such, and REMOVE_PT_GPREGS_FROM_STACK are gone. Delete the macros. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c32672f6e47c561893316d48e06c7656b1039a36.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Use POP instead of MOV to restore regs on NMI returnAndy Lutomirski
This gets rid of the last user of the old RESTORE_..._REGS infrastructure. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/652a260f17a160789bc6a41d997f98249b73e2ab.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Merge the fast and slow SYSRET pathsAndy Lutomirski
They did almost the same thing. Remove a bunch of pointless instructions (mostly hidden in macros) and reduce cognitive load by merging them. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1204e20233fcab9130a1ba80b3b1879b5db3fc1f.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Use pop instead of movq in syscall_return_via_sysretAndy Lutomirski
Saves 64 bytes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6609b7f74ab31c36604ad746e019ea8495aec76c.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Shrink paranoid_exit_restore and make labels localAndy Lutomirski
paranoid_exit_restore was a copy of restore_regs_and_return_to_kernel. Merge them and make the paranoid_exit internal labels local. Keeping .Lparanoid_exit makes the code a bit shorter because it allows a 2-byte jnz instead of a 5-byte jnz. Saves 96 bytes of text. ( This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS kernel, but fixing that would make the code rather messy. ) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/510d66a1895cda9473c84b1086f0bb974f22de6a.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Simplify reg restore code in the standard IRET pathsAndy Lutomirski
The old code restored all the registers with movq instead of pop. In theory, this was done because some CPUs have higher movq throughput, but any gain there would be tiny and is almost certainly outweighed by the higher text size. This saves 96 bytes of text. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ad82520a207ccd851b04ba613f4f752b33ac05f7.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Move SWAPGS into the common IRET-to-usermode pathAndy Lutomirski
All of the code paths that ended up doing IRET to usermode did SWAPGS immediately beforehand. Move the SWAPGS into the common code. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/27fd6f45b7cd640de38fb9066fd0349bcd11f8e1.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Split the IRET-to-user and IRET-to-kernel pathsAndy Lutomirski
These code paths will diverge soon. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/dccf8c7b3750199b4b30383c812d4e2931811509.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/entry/64: Remove the restore_c_regs_and_iret labelAndy Lutomirski
The only user was the 64-bit opportunistic SYSRET failure path, and that path didn't really need it. This change makes the opportunistic SYSRET code a bit more straightforward and gets rid of the label. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/be3006a7ad3326e3458cf1cc55d416252cbe1986.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02Merge branch 'x86/fpu' into x86/asmIngo Molnar
We are about to commit complex rework of various x86 entry code details - create a unified base tree (with FPU commits included) before doing that. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02KEYS: fix out-of-bounds read during ASN.1 parsingEric Biggers
syzkaller with KASAN reported an out-of-bounds read in asn1_ber_decoder(). It can be reproduced by the following command, assuming CONFIG_X509_CERTIFICATE_PARSER=y and CONFIG_KASAN=y: keyctl add asymmetric desc $'\x30\x30' @s The bug is that the length of an ASN.1 data value isn't validated in the case where it is encoded using the short form, causing the decoder to read past the end of the input buffer. Fix it by validating the length. The bug report was: BUG: KASAN: slab-out-of-bounds in asn1_ber_decoder+0x10cb/0x1730 lib/asn1_decoder.c:233 Read of size 1 at addr ffff88003cccfa02 by task syz-executor0/6818 CPU: 1 PID: 6818 Comm: syz-executor0 Not tainted 4.14.0-rc7-00008-g5f479447d983 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0xb3/0x10b lib/dump_stack.c:52 print_address_description+0x79/0x2a0 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x236/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 asn1_ber_decoder+0x10cb/0x1730 lib/asn1_decoder.c:233 x509_cert_parse+0x1db/0x650 crypto/asymmetric_keys/x509_cert_parser.c:89 x509_key_preparse+0x64/0x7a0 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xcb/0x1a0 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x347/0xb20 security/keys/key.c:855 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0x1cd/0x340 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x447c89 RSP: 002b:00007fca7a5d3bd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000f8 RAX: ffffffffffffffda RBX: 00007fca7a5d46cc RCX: 0000000000447c89 RDX: 0000000020006f4a RSI: 0000000020006000 RDI: 0000000020001ff5 RBP: 0000000000000046 R08: fffffffffffffffd R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fca7a5d49c0 R15: 00007fca7a5d4700 Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Cc: <stable@vger.kernel.org> # v3.7+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-11-02KEYS: trusted: fix writing past end of buffer in trusted_read()Eric Biggers
When calling keyctl_read() on a key of type "trusted", if the user-supplied buffer was too small, the kernel ignored the buffer length and just wrote past the end of the buffer, potentially corrupting userspace memory. Fix it by instead returning the size required, as per the documentation for keyctl_read(). We also don't even fill the buffer at all in this case, as this is slightly easier to implement than doing a short read, and either behavior appears to be permitted. It also makes it match the behavior of the "encrypted" key type. Fixes: d00a1c72f7f4 ("keys: add new trusted key-type") Reported-by: Ben Hutchings <ben@decadent.org.uk> Cc: <stable@vger.kernel.org> # v2.6.38+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-11-02KEYS: return full count in keyring_read() if buffer is too smallEric Biggers
Commit e645016abc80 ("KEYS: fix writing past end of user-supplied buffer in keyring_read()") made keyring_read() stop corrupting userspace memory when the user-supplied buffer is too small. However it also made the return value in that case be the short buffer size rather than the size required, yet keyctl_read() is actually documented to return the size required. Therefore, switch it over to the documented behavior. Note that for now we continue to have it fill the short buffer, since it did that before (pre-v3.13) and dump_key_tree_aux() in keyutils arguably relies on it. Fixes: e645016abc80 ("KEYS: fix writing past end of user-supplied buffer in keyring_read()") Reported-by: Ben Hutchings <ben@decadent.org.uk> Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-11-02Merge branch 'x86/mpx/prep' into x86/asmIngo Molnar
Pick up some of the MPX commits that modify the syscall entry code, to have a common base and to reduce conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02x86/insn-eval: Extend get_seg_base_addr() to also obtain segment limitRicardo Neri
In protected mode, it is common to want to obtain the limit of a segment along with its base address. This is useful, for instance, to verify that an effective address lies within a segment before computing a linear address. Up to this point, this library only computes linear addresses in long mode. Subsequent patches will include support for protected mode. Support to verify the segment limit will be needed. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qiaowei Ren <qiaowei.ren@intel.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509148310-30862-2-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02ALSA: usb-audio: support new Amanero Combo384 firmware versionJussi Laako
Support DSD_U32_BE sample format on new Amanero Combo384 firmware version on older VID/PID. Fixes: 3eff682d765b ("ALSA: usb-audio: Support both DSD LE/BE Amanero firmware versions") Signed-off-by: Jussi Laako <jussi@sonarnerd.net> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-02MAINTAINERS: Step down from a co-maintaner of DW DMAC driverAndy Shevchenko
As discussed at ELCE 2017 there is little to anticipate from me in the future with regard to the driver, and since I have many things to keep an eye on, I would like to step down to simple designated reviewer. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2017-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains two one-liner fixes for your net tree, they are: 1) Disable fast hash operations for 2-bytes length keys which is leading to incorrect lookups in nf_tables, from Anatole Denis. 2) Reload pointer ipv4 header after ip_route_me_harder() given this may result in use-after-free due to skbuff header reallocation, patch from Tejaswi Tanikella. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02net: vrf: correct FRA_L3MDEV encode typeJeff Barnhill
FRA_L3MDEV is defined as U8, but is being added as a U32 attribute. On big endian architecture, this results in the l3mdev entry not being added to the FIB rules. Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02tcp_nv: fix division by zero in tcpnv_acked()Konstantin Khlebnikov
Average RTT could become zero. This happened in real life at least twice. This patch treats zero as 1us. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Lawrence Brakmo <Brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02s390: pass endianness info to sparseLuc Van Oostenryck
s390 is big-endian only but sparse assumes the same endianness as the building machine. This is problematic for code which expect __BYTE_ORDER__ being correctly predefined by the compiler which sparse can then pre-process differently from what gcc would, depending on the building machine endianness. Fix this by letting sparse know about the architecture endianness. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-02Merge branch 'drm-fixes-4.14' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Just two small patches for stable to fix the driver failing to load on polaris cards with harvested VCE or UVD blocks. * 'drm-fixes-4.14' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: allow harvesting check for Polaris VCE drm/amdgpu: return -ENOENT from uvd 6.0 early init for harvesting
2017-11-01md: don't check MD_SB_CHANGE_CLEAN in md_allow_writeArtur Paszkiewicz
Only MD_SB_CHANGE_PENDING should be used to wait for transition from clean to dirty. Checking also MD_SB_CHANGE_CLEAN is unnecessary and can race with e.g. md_do_sync(). This sporadically causes a hang when changing consistency policy during resync: INFO: task mdadm:6183 blocked for more than 30 seconds. Not tainted 4.14.0-rc3+ #391 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdadm D12752 6183 6022 0x00000000 Call Trace: __schedule+0x93f/0x990 schedule+0x6b/0x90 md_allow_write+0x100/0x130 [md_mod] ? do_wait_intr_irq+0x90/0x90 resize_stripes+0x3a/0x5b0 [raid456] ? kernfs_fop_write+0xbe/0x180 raid5_change_consistency_policy+0xa6/0x200 [raid456] consistency_policy_store+0x2e/0x70 [md_mod] md_attr_store+0x90/0xc0 [md_mod] sysfs_kf_write+0x42/0x50 kernfs_fop_write+0x119/0x180 __vfs_write+0x28/0x110 ? rcu_sync_lockdep_assert+0x12/0x60 ? __sb_start_write+0x15a/0x1c0 ? vfs_write+0xa3/0x1a0 vfs_write+0xb4/0x1a0 SyS_write+0x49/0xa0 entry_SYSCALL_64_fastpath+0x18/0xad Fixes: 2214c260c72b ("md: don't return -EAGAIN in md_allow_write for external metadata arrays") Cc: <stable@vger.kernel.org> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md-cluster: update document for raid10Guoqing Jiang
Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: remove redundant variable qColin Ian King
The pointer q is assigned but never read; it is redundant and can be removed. Cleans up clang warning: drivers/md/md-multipath.c:260:4: warning: Value stored to 'q' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01raid1: remove obsolete code in raid1_write_requestGuoqing Jiang
There are some lines could be removed due to recent change for raid1 such as commit 3956df15d634 ("md: move suspend_hi/lo handling into core md code"). Also, seems some comments are put to wrong place, move them before wait_barrier. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md-cluster: Use a small window for raid10 resyncGuoqing Jiang
Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window is maintained in r10conf as cluster_sync_low and cluster_sync_high, and processed in raid10's sync_request(). If the current resync is outside the cluster resync window: 1. Set the cluster_sync_low to curr_resync_completed. 2. Set cluster_sync_high to cluster_sync_low + stripe size. 3. Send a message to all nodes so they may add it in their suspension list. Note: We only support "near" raid10 so far, resync a far or offset raid10 array could have trouble. So raid10_run checks the layout of clustered raid10, it will refuse to run if the layout is not correct. With the "near" layout we process one stripe at a time progressing monotonically through the address space. So we can have a sliding window of whole-stripes which moves through the array suspending IO on other nodes, and both resync which uses array addresses and recovery which uses device addresses can stay within this window. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md-cluster: Suspend writes in RAID10 if within rangeGuoqing Jiang
If there is a resync going on, all nodes must suspend writes to the range. This is recorded in suspend_info and suspend_list. If there is an I/O within the ranges of any of the suspend_info, area_resyncing will return 1. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md-cluster/raid10: set "do_balance = 0" if area is resyncingGuoqing Jiang
Just like clustered raid1, it is impossible for cluster raid10 to choose the best device for read balance when the area of array is resyncing. Because we cannot trust the data to be the same on all devices at that time, so we choose just the first one to use, so set do_balance to 0. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: use lockdep_assert_heldShaohua Li
lockdep_assert_held is a better way to assert lock held, and it works for UP. Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01raid1: prevent freeze_array/wait_all_barriers deadlockNate Dailey
If freeze_array is attempted in the middle of close_sync/ wait_all_barriers, deadlock can occur. freeze_array will wait for nr_pending and nr_queued to line up. wait_all_barriers increments nr_pending for each barrier bucket, one at a time, but doesn't actually issue IO that could be counted in nr_queued. So freeze_array is blocked until wait_all_barriers completes and allow_all_barriers runs. At the same time, when _wait_barrier sees array_frozen == 1, it stops and waits for freeze_array to complete. Prevent the deadlock by making close_sync call _wait_barrier and _allow_barrier for one bucket at a time, instead of deferring the _allow_barrier calls until after all _wait_barriers are complete. Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Fix: fd76863e37fe(RAID1: a new I/O barrier implementation to remove resync window) Reviewed-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org (v4.11) Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: use TASK_IDLE instead of blocking signalsMikulas Patocka
Hi - I submit this patch for the next merge window: Some times ago, I made a patch f9c79bc05a2a that blocks signals around the schedule() calls in MD. The MD subsystem needs to do an uninterruptible sleep that is not accounted in load average - so we block signals and use interruptible sleep. The kernel has a special TASK_IDLE state for this purpose, so we can use it instead of blocking signals. This patch doesn't fix any bug, it just makes the code simpler. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: remove special meaning of ->quiesce(.., 2)NeilBrown
The '2' argument means "wake up anything that is waiting". This is an inelegant part of the design and was added to help support management of suspend_lo/suspend_hi setting. Now that suspend_lo/hi is managed in mddev_suspend/resume, that need is gone. These is still a couple of places where we call 'quiesce' with an argument of '2', but they can safely be changed to call ->quiesce(.., 1); ->quiesce(.., 0) which achieve the same result at the small cost of pausing IO briefly. This removes a small "optimization" from suspend_{hi,lo}_store, but it isn't clear that optimization served a useful purpose. The code now is a lot clearer. Suggested-by: Shaohua Li <shli@kernel.org> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: allow metadata update while suspending.NeilBrown
There are various deadlocks that can occur when a thread holds reconfig_mutex and calls ->quiesce(mddev, 1). As some write request block waiting for metadata to be updated (e.g. to record device failure), and as the md thread updates the metadata while the reconfig mutex is held, holding the mutex can stop write requests completing, and this prevents ->quiesce(mddev, 1) from completing. ->quiesce() is now usually called from mddev_suspend(), and it is always called with reconfig_mutex held. So at this time it is safe for the thread to update metadata without explicitly taking the lock. So add 2 new flags, one which says the unlocked updates is allowed, and one which ways it is happening. Then allow it while the quiesce completes, and then wait for it to finish. Reported-and-tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: use mddev_suspend/resume instead of ->quiesce()NeilBrown
mddev_suspend() is a more general interface than calling ->quiesce() and is so more extensible. A future patch will make use of this. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: move suspend_hi/lo handling into core md codeNeilBrown
responding to ->suspend_lo and ->suspend_hi is similar to responding to ->suspended. It is best to wait in the common core code without incrementing ->active_io. This allows mddev_suspend()/mddev_resume() to work while requests are waiting for suspend_lo/hi to change. This is will be important after a subsequent patch which uses mddev_suspend() to synchronize updating for suspend_lo/hi. So move the code for testing suspend_lo/hi out of raid1.c and raid5.c, and place it in md.c Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: don't call bitmap_create() while array is quiesced.NeilBrown
bitmap_create() allocates memory with GFP_KERNEL and so can wait for IO. If called while the array is quiesced, it could wait indefinitely for write out to the array - deadlock. So call bitmap_create() before quiescing the array. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: always hold reconfig_mutex when calling mddev_suspend()NeilBrown
Most often mddev_suspend() is called with reconfig_mutex held. Make this a requirement in preparation a subsequent patch. Also require reconfig_mutex to be held for mddev_resume(), partly for symmetry and partly to guarantee no races with incr/decr of mddev->suspend. Taking the mutex in r5c_disable_writeback_async() is a little tricky as this is called from a work queue via log->disable_writeback_work, and flush_work() is called on that while holding ->reconfig_mutex. If the work item hasn't run before flush_work() is called, the work function will not be able to get the mutex. So we use mddev_trylock() inside the wait_event() call, and have that abort when conf->log is set to NULL, which happens before flush_work() is called. We wait in mddev->sb_wait and ensure this is woken when any of the conditions change. This requires waking mddev->sb_wait in mddev_unlock(). This is only like to trigger extra wake_ups of threads that needn't be woken when metadata is being written, and that doesn't happen often enough that the cost would be noticeable. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01md: forbid a RAID5 from having both a bitmap and a journal.NeilBrown
Having both a bitmap and a journal is pointless. Attempting to do so can corrupt the bitmap if the journal replay happens before the bitmap is initialized. Rather than try to avoid this corruption, simply refuse to allow arrays with both a bitmap and a journal. So: - if raid5_run sees both are present, fail. - if adding a bitmap finds a journal is present, fail - if adding a journal finds a bitmap is present, fail. Cc: stable@vger.kernel.org (4.10+) Signed-off-by: NeilBrown <neilb@suse.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Acked-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Shaohua Li <shli@fb.com>
2017-11-01drm/amdgpu: allow harvesting check for Polaris VCELeo Liu
Fixes init failures on Polaris cards with harvested VCE blocks. Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2017-11-01drm/amdgpu: return -ENOENT from uvd 6.0 early init for harvestingLeo Liu
Fixes init failures on polaris cards with harvested UVD. Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2017-11-02Merge tag 'drm-intel-fixes-2017-11-01' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Fixes for Stable: - Fix KBL Blank Screen (Jani) - Fix FIFO Underrun on SNB (Maarten) Other fixes: - Fix GPU Hang on i915gm (Chris) - Fix gem_tiled_pread_pwrite IGT case (Chris) - Cancel modeset retry work during modeset clean-up (Manasi) * tag 'drm-intel-fixes-2017-11-01' of git://anongit.freedesktop.org/drm/drm-intel: drm/i915: Check incoming alignment for unfenced buffers (on i915gm) drm/i915: Hold rcu_read_lock when iterating over the radixtree (vma idr) drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects) drm/i915/edp: read edp display control registers unconditionally drm/i915: Do not rely on wm preservation for ILK watermarks drm/i915: Cancel the modeset retry work during modeset cleanup