summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-10-28x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTOREDmitry Safonov
As userspace knows nothing about kernel config, thus #ifdefs around ABI prctl constants makes them invisible to userspace. Let it be clean'n'simple: remove #ifdefs. If kernel has CONFIG_CHECKPOINT_RESTORE disabled, sys_prctl() will return -EINVAL for those prctls. Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: oleg@redhat.com Fixes: 2eefd8789698 ("x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*") Link: http://lkml.kernel.org/r/20161027141516.28447-2-dsafonov@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-27x86/unwind: Detect bad stack return addressJosh Poimboeuf
If __kernel_text_address() doesn't recognize a return address on the stack, it probably means that it's some generated code which __kernel_text_address() doesn't know about yet. Otherwise there's probably some stack corruption. Either way, warn about it. Use printk_deferred_once() because the unwinder can be called with the console lock by lockdep via save_stack_trace(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2d897898f324e275943b590d160b55e482bba65f.1477496147.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-27x86/dumpstack: Warn on stack recursionJosh Poimboeuf
Print a warning if stack recursion is detected. Use printk_deferred_once() because the unwinder can be called with the console lock by lockdep via save_stack_trace(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/def18247aafaab480844484398e793f552b79bda.1477496147.git.jpoimboe@redhat.com [ Unbroke the lines. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-27x86/unwind: Warn on bad frame pointerJosh Poimboeuf
Detect situations in the unwinder where the frame pointer refers to a bad address, and print an appropriate warning. Use printk_deferred_once() because the unwinder can be called with the console lock by lockdep via save_stack_trace(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/03c888f6f7414d54fa56b393ea25482be6899b5f.1477496147.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-26x86/decoder: Use stderr if insn sanity test failsPaul Bolle
If the instruction sanity test fails, it prints a "Failure" message to stdout. Make this program behave like the rest of the build and print that message to stderr. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1477428965-20548-3-git-send-email-pebolle@tiscali.nl Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-26x86/decoder: Use stdout if insn decoder test is successfulPaul Bolle
If the instruction decoder test ran successful it prints a message like this to stderr: Succeed: decoded and checked 1767380 instructions But, as described in "console mode programming user interface guidelines version 101" which doesn't exist, programs should use stderr for errors or warnings. We're told about a successful run here, so the instruction decoder test should use stdout. Let's fix the typo too, while we're at it. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1477428965-20548-2-git-send-email-pebolle@tiscali.nl Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25x86/dumpstack: Remove raw stack dumpJosh Poimboeuf
For mostly historical reasons, the x86 oops dump shows the raw stack values: ... [registers] Stack: ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0 ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321 0000000000000002 0000000000000000 0000000000000000 0000000000000000 Call Trace: ... This seems to be an artifact from long ago, and probably isn't needed anymore. It generally just adds noise to the dump, and it can be actively harmful because it leaks kernel addresses. Linus says: "The stack dump actually goes back to forever, and it used to be useful back in 1992 or so. But it used to be useful mainly because stacks were simpler and we didn't have very good call traces anyway. I definitely remember having used them - I just do not remember having used them in the last ten+ years. Of course, it's still true that if you can trigger an oops, you've likely already lost the security game, but since the stack dump is so useless, let's aim to just remove it and make games like the above harder." This also removes the related 'kstack=' cmdline option and the 'kstack_depth_to_print' sysctl. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25x86/dumpstack: Remove kernel text addresses from stack dumpJosh Poimboeuf
Printing kernel text addresses in stack dumps is of questionable value, especially now that address randomization is becoming common. It can be a security issue because it leaks kernel addresses. It also affects the usefulness of the stack dump. Linus says: "I actually spend time cleaning up commit messages in logs, because useless data that isn't actually information (random hex numbers) is actively detrimental. It makes commit logs less legible. It also makes it harder to parse dumps. It's not useful. That makes it actively bad. I probably look at more oops reports than most people. I have not found the hex numbers useful for the last five years, because they are just randomized crap. The stack content thing just makes code scroll off the screen etc, for example." The only real downside to removing these addresses is that they can be used to disambiguate duplicate symbol names. However such cases are rare, and the context of the stack dump should be enough to be able to figure it out. There's now a 'faddr2line' script which can be used to convert a function address to a file name and line: $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 write_sysrq_trigger+0x51/0x60: write_sysrq_trigger at drivers/tty/sysrq.c:1098 Or gdb can be used: $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in" (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378). (But note that when there are duplicate symbol names, gdb will only show the first symbol it finds. faddr2line is recommended over gdb because it handles duplicates and it also does function size checking.) Here's an example of what a stack dump looks like after this change: BUG: unable to handle kernel NULL pointer dereference at (null) IP: sysrq_handle_crash+0x45/0x80 PGD 36bfa067 [ 29.650644] PUD 7aca3067 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ... CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: ffff880078582a40 task.stack: ffffc90000ba8000 RIP: 0010:sysrq_handle_crash+0x45/0x80 RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296 RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292 RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000 FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0 Stack: ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002 0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20 ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40 Call Trace: __handle_sysrq+0x138/0x220 ? __handle_sysrq+0x5/0x220 write_sysrq_trigger+0x51/0x60 proc_reg_write+0x42/0x70 __vfs_write+0x37/0x140 ? preempt_count_sub+0xa1/0x100 ? __sb_start_write+0xf5/0x210 ? vfs_write+0x183/0x1a0 vfs_write+0xb8/0x1a0 SyS_write+0x58/0xc0 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7ffb42f55940 RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940 RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001 RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700 R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10 R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010 Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7 RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8 CR2: 0000000000000000 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25x86/entry64: Remove unused audit related macrosAlexander Kuleshov
These macros were added in the following commit: 86a1c34a929f ("x86_64 syscall audit fast-path") They were used in two-phase sycalls entry tracing, but this functionality was then moved to the arch/x86/entry/common.c:syscall_trace_enter() function, in the following commit: 1f484aa69046 ("x86/entry: Move C entry and exit code to arch/x86/entry/common.c") syscall_trace_enter() now uses the defines from <linux/audit.h>, so these defines entry_64.S are no longer used anywhere. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161023135646.4453-1-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/dumpstack: Print orig_ax in __show_regs()Josh Poimboeuf
The value of regs->orig_ax contains potentially useful debugging data: For syscalls it contains the syscall number. For interrupts it contains the (negated) vector number. To reduce noise, print it only if it has a useful value (i.e., something other than -1). Here's what it looks like for a write syscall: RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940 RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940 RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001 ... Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/dumpstack: Fix duplicate RIP address display in __show_regs()Josh Poimboeuf
The RIP address is shown twice in __show_regs(). Before: RIP: 0010:[<ffffffff81070446>] [<ffffffff81070446>] native_write_msr+0x6/0x30 After: RIP: 0010:[<ffffffff81070446>] native_write_msr+0x6/0x30 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b3fda66f36761759b000883b059cdd9a7649dcc1.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/dumpstack: Print any pt_regs found on the stackJosh Poimboeuf
Now that we can find pt_regs registers on the stack, print them. Here's an example of what it looks like: Call Trace: <IRQ> [<ffffffff8144b793>] dump_stack+0x86/0xc3 [<ffffffff81142c73>] hrtimer_interrupt+0xb3/0x1c0 [<ffffffff8105eb86>] local_apic_timer_interrupt+0x36/0x60 [<ffffffff818b27cd>] smp_apic_timer_interrupt+0x3d/0x50 [<ffffffff818b06ee>] apic_timer_interrupt+0x9e/0xb0 RIP: 0010:[<ffffffff818aef43>] [<ffffffff818aef43>] _raw_spin_unlock_irq+0x33/0x60 RSP: 0018:ffff880079c4f760 EFLAGS: 00000202 RAX: ffff880078738000 RBX: ffff88007d3da0c0 RCX: 0000000000000007 RDX: 0000000000006d78 RSI: ffff8800787388f0 RDI: ffff880078738000 RBP: ffff880079c4f768 R08: 0000002199088f38 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81e0d540 R13: ffff8800369fb700 R14: 0000000000000000 R15: ffff880078738000 <EOI> [<ffffffff810e1f14>] finish_task_switch+0xb4/0x250 [<ffffffff810e1ed6>] ? finish_task_switch+0x76/0x250 [<ffffffff818a7b61>] __schedule+0x3e1/0xb20 ... [<ffffffff810759c8>] trace_do_page_fault+0x58/0x2c0 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 [<ffffffff818b1dd8>] async_page_fault+0x28/0x30 RIP: 0010:[<ffffffff8145b062>] [<ffffffff8145b062>] __clear_user+0x42/0x70 RSP: 0018:ffff880079c4fd38 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138 RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640 RBP: ffff880079c4fd48 R08: 0000002198feefd7 R09: ffffffff82a40928 R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640 R13: 0000000000000000 R14: ffff880079c50000 R15: ffff8800791d7400 [<ffffffff8145b043>] ? __clear_user+0x23/0x70 [<ffffffff8145b0fb>] clear_user+0x2b/0x40 [<ffffffff812fbda2>] load_elf_binary+0x1472/0x1750 [<ffffffff8129a591>] search_binary_handler+0xa1/0x200 [<ffffffff8129b69b>] do_execveat_common.isra.36+0x6cb/0x9f0 [<ffffffff8129b5f3>] ? do_execveat_common.isra.36+0x623/0x9f0 [<ffffffff8129bcaa>] SyS_execve+0x3a/0x50 [<ffffffff81003f5c>] do_syscall_64+0x6c/0x1e0 [<ffffffff818afa3f>] entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:[<00007fd2e2f2e537>] [<00007fd2e2f2e537>] 0x7fd2e2f2e537 RSP: 002b:00007ffc449c5fc8 EFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007ffc449c8860 RCX: 00007fd2e2f2e537 RDX: 000000000127cc40 RSI: 00007ffc449c8860 RDI: 00007ffc449c6029 RBP: 00007ffc449c60b0 R08: 65726f632d667265 R09: 00007ffc449c5e20 R10: 00000000000005a7 R11: 0000000000000246 R12: 000000000127cc40 R13: 000000000127ce05 R14: 00007ffc449c6029 R15: 000000000127ce01 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5cc2c512ec82cfba00dd22467644d4ed751a48c0.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/dumpstack: Print stack identifier on its own lineJosh Poimboeuf
show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a newline so that any stack address printed after it will appear on the same line. That causes the first stack address to be vertically misaligned with the rest, making it visually cluttered and slightly confusing: Call Trace: <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3 [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160 [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0 ... <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60 [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 It will look worse once we start printing pt_regs registers found in the middle of the stack: <IRQ> RIP: 0010:[<ffffffff8189c6c3>] [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60 RSP: 0018:ffff88007876f720 EFLAGS: 00000206 RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007 ... Improve readability by adding a newline to the stack name: Call Trace: <IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3 [<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160 [<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0 ... <EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60 [<ffffffff810e1c84>] finish_task_switch+0xb4/0x250 [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0 Now that "continued" lines are no longer needed, we can also remove the hack of using the empty string (aka KERN_CONT) and replace it with KERN_DEFAULT. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/unwind: Create stack frames for saved syscall registersJosh Poimboeuf
The entry code doesn't encode the pt_regs pointer for syscalls. But the pt_regs are always at the same location, so we can add a manual check for them. A later patch prints them as part of the oops stack dump. They could be useful, for example, to determine the arguments to a system call. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21x86/entry/unwind: Create stack frames for saved interrupt registersJosh Poimboeuf
With frame pointers, when a task is interrupted, its stack is no longer completely reliable because the function could have been interrupted before it had a chance to save the previous frame pointer on the stack. So the caller of the interrupted function could get skipped by a stack trace. This is problematic for live patching, which needs to know whether a stack trace of a sleeping task can be relied upon. There's currently no way to detect if a sleeping task was interrupted by a page fault exception or preemption before it went to sleep. Another issue is that when dumping the stack of an interrupted task, the unwinder has no way of knowing where the saved pt_regs registers are, so it can't print them. This solves those issues by encoding the pt_regs pointer in the frame pointer on entry from an interrupt or an exception. This patch also updates the unwinder to be able to decode it, because otherwise the unwinder would be broken by this change. Note that this causes a change in the behavior of the unwinder: each instance of a pt_regs on the stack is now considered a "frame". So callers of unwind_get_return_address() will now get an occasional 'regs->ip' address that would have previously been skipped over. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-21entry/64: Remove unused ZERO_EXTRA_REGS macroAlexander Kuleshov
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20161020120704.24042-1-kuleshovmail@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/boot: Move the _stext marker to before the boot codeJosh Poimboeuf
When core_kernel_text() is used to determine whether an address on a task's stack trace is a kernel text address, it incorrectly returns false for early text addresses for the head code between the _text and _stext markers. Among other things, this can cause the unwinder to behave incorrectly when unwinding to x86 head code. Head code is text code too, so mark it as such. This seems to match the intent of other users of the _stext symbol, and it also seems consistent with what other architectures are already doing. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/789cf978866420e72fa89df44aa2849426ac378d.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/boot: Fix the end of the stack for idle tasksJosh Poimboeuf
Thanks to all the recent x86 entry code refactoring, most tasks' kernel stacks start at the same offset right below their saved pt_regs, regardless of which syscall was used to enter the kernel. That creates a nice convention which makes it straightforward to identify the end of the stack, which can be useful for the unwinder to verify the stack is sane. However, the boot CPU's idle "swapper" task doesn't follow that convention. Fix that by starting its stack at a sizeof(pt_regs) offset from the end of the stack page. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/81aee3beb6ed88e44f1bea6986bb7b65c368f77a.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/boot/64: Put a real return address on the idle task stackJosh Poimboeuf
The frame at the end of each idle task stack has a zeroed return address. This is inconsistent with real task stacks, which have a real return address at that spot. This inconsistency can be confusing for stack unwinders. It also hides useful information about what asm code was involved in calling into C. Make it a real address by using the side effect of a call instruction to push the instruction pointer on the stack. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f59593ae7b15d5126f872b0a23143173d28aa32d.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/boot/64: Use a common function for starting CPUsJosh Poimboeuf
There are two different pieces of code for starting a CPU: start_cpu0() and the end of secondary_startup_64(). They're identical except for the stack setup. Combine the common parts into a shared start_cpu() function. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1d692ffa62fcb3cc835a5b254e953f2d9bab3549.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/boot/smp/32: Fix initial idle stack location on 32-bit kernelsJosh Poimboeuf
On 32-bit kernels, the initial idle stack calculation doesn't take into account the TOP_OF_KERNEL_STACK_PADDING, making the stack end address inconsistent with other tasks on 32-bit. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6cf569410bfa84cf923902fc4d628444cace94be.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/boot/32: Fix the end of the stack for idle tasksJosh Poimboeuf
The frame at the end of each idle task stack is inconsistent with real task stacks, which have a stack frame header and a real return address before the pt_regs area. This inconsistency can be confusing for stack unwinders. It also hides useful information about what asm code was involved in calling into C. Fix that by changing the initial code jumps to calls. Also add infinite loops after the calls to make it clear that the calls don't return, and to hang if they do. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2588f34b6fbac4ae6f6f9ead2a78d7f8d58a6341.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/entry/32: Fix the end of the stack for newly forked tasksJosh Poimboeuf
Thanks to all the recent x86 entry code refactoring, most tasks' kernel stacks start at the same offset right below their saved pt_regs, regardless of which syscall was used to enter the kernel. That creates a nice convention which makes it straightforward to identify the end of the stack, which can be useful for the unwinder to verify the stack is sane. Calling schedule_tail() directly breaks that convention because its an asmlinkage function so its argument has to be pushed on the stack. Add a wrapper which creates a proper "end of stack" frame header before the call. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ecafcd882676bf48ceaf50483782552bb98476e5.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/entry/32: Rename 'error_code' to 'common_exception'Josh Poimboeuf
The 'error_code' label is awkwardly named, especially when it shows up in a stack trace. Move it to its own local function and rename it to 'common_exception', analagous to the existing 'common_interrupt'. This also makes related stack traces more sensible. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cca1734a93e52799556d946281b32468f9b93950.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/entry/32, x86/boot/32: Use local labelsJosh Poimboeuf
Add the local label prefix to all non-function named labels in head_32.S and entry_32.S. In addition to decluttering the symbol table, it also will help stack traces to be more sensible. For example, the last reported function in the idle task stack trace will be startup_32_smp() instead of is486(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/14f9f7afd478b23a762f40734da1a57c0c273f6e.1474480779.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20x86/entry/64: Remove unused 'addskip' parameter of the ↵Alexander Kuleshov
ALLOC_PT_GPREGS_ON_STACK macro Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161019191108.2230-1-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-19Merge branch 'gup_flag-cleanups'Linus Torvalds
Merge the gup_flags cleanups from Lorenzo Stoakes: "This patch series adjusts functions in the get_user_pages* family such that desired FOLL_* flags are passed as an argument rather than implied by flags. The purpose of this change is to make the use of FOLL_FORCE explicit so it is easier to grep for and clearer to callers that this flag is being used. The use of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for the VMA whose pages we are reading from/writing to, which can result in surprising behaviour. The patch series came out of the discussion around commit 38e088546522 ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"), which addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE set but having been overridden by FOLL_FORCE. do_numa_page() was run on the assumption the page _must_ be one marked for NUMA node migration as an actual PROT_NONE page would have been dealt with prior to this code path, however FOLL_FORCE introduced a situation where this assumption did not hold. See https://marc.info/?l=linux-mm&m=147585445805166 for the patch proposal" Additionally, there's a fix for an ancient bug related to FOLL_FORCE and FOLL_WRITE by me. [ This branch was rebased recently to add a few more acked-by's and reviewed-by's ] * gup_flag-cleanups: mm: replace access_process_vm() write parameter with gup_flags mm: replace access_remote_vm() write parameter with gup_flags mm: replace __access_remote_vm() write parameter with gup_flags mm: replace get_user_pages_remote() write/force parameters with gup_flags mm: replace get_user_pages() write/force parameters with gup_flags mm: replace get_vaddr_frames() write/force parameters with gup_flags mm: replace get_user_pages_locked() write/force parameters with gup_flags mm: replace get_user_pages_unlocked() write/force parameters with gup_flags mm: remove write/force parameters from __get_user_pages_unlocked() mm: remove write/force parameters from __get_user_pages_locked() mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
2016-10-19mm: replace access_process_vm() write parameter with gup_flagsLorenzo Stoakes
This removes the 'write' argument from access_process_vm() and replaces it with 'gup_flags' as use of this function previously silently implied FOLL_FORCE, whereas after this patch callers explicitly pass this flag. We make this explicit as use of FOLL_FORCE can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19mm: replace get_user_pages() write/force parameters with gup_flagsLorenzo Stoakes
This removes the 'write' and 'force' from get_user_pages() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18mm: replace get_user_pages_unlocked() write/force parameters with gup_flagsLorenzo Stoakes
This removes the 'write' and 'force' use from get_user_pages_unlocked() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes, plus hw-enablement changes: - fix persistent RAM handling - remove pkeys warning - remove duplicate macro - fix debug warning in irq handler - add new 'Knights Mill' CPU related constants and enable the perf bits" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add Knights Mill CPUID perf/x86/intel/rapl: Add Knights Mill CPUID perf/x86/intel: Add Knights Mill CPUID x86/cpu/intel: Add Knights Mill to Intel family x86/e820: Don't merge consecutive E820_PRAM ranges pkeys: Remove easily triggered WARN x86: Remove duplicate rtit status MSR macro x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt()
2016-10-18Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Four tooling fixes, two kprobes KASAN related fixes and an x86 PMU driver fix/cleanup" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf jit: Fix build issue on Ubuntu perf jevents: Handle events including .c and .o perf/x86/intel: Remove an inconsistent NULL check kprobes: Unpoison stack in jprobe_return() for KASAN kprobes: Avoid false KASAN reports during stack copy perf header: Set nr_numa_nodes only when we parsed all the data perf top: Fix refreshing hierarchy entries on TUI
2016-10-18Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two fixes: - a file locks fix (missing critical section, bug introduced in this merge window) - an x86 down_write() stack frame annotation" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking, fs/locks: Add missing file_sem locks locking/rwsem/x86: Add stack frame dependency for ____down_write()
2016-10-18locking/rwsem/x86: Add stack frame dependency for ____down_write()Josh Poimboeuf
Arnd reported the following objtool warning: kernel/locking/rwsem.o: warning: objtool: down_write_killable()+0x16: call without frame pointer save/setup The warning means gcc placed the ____down_write() inline asm (and its call instruction) before the frame pointer setup in down_write_killable(), which breaks frame pointer convention and can result in incorrect stack traces. Force the stack frame to be created before the call instruction by listing the stack pointer as an output operand in the inline asm statement. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1188b7015f04baf361e59de499ee2d7272c59dce.1476393828.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-17x86, pkeys: remove cruft from never-merged syscallsDave Hansen
pkey_set() and pkey_get() were syscalls present in older versions of the protection keys patches. The syscall number definitions were inadvertently left in place. This patch removes them. I did a git grep and verified that these are the last places in the tree that these appear, save for the protection_keys.c tests and Documentation. Those spots talk about functions called pkey_get/set() which are wrappers for the direct PKRU instructions, not the syscalls. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Fixes: f9afc6197e9bb ("x86: Wire up protection keys system calls") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-17perf/x86/intel/uncore: Add Knights Mill CPUIDPiotr Luc
Add Knights Mill (KNM) to the list of CPUIDs supported by PMU. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161012182758.2925-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-17perf/x86/intel/rapl: Add Knights Mill CPUIDPiotr Luc
Add Knights Mill (KNM) to the list of CPUIDs supported by rapl. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161012182725.2701-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-17perf/x86/intel: Add Knights Mill CPUIDPiotr Luc
Add Knights Mill (KNM) to the list of CPUIDs supported by PMU. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161012182634.2462-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-17x86/cpu/intel: Add Knights Mill to Intel familyPiotr Luc
Add CPUID of Knights Mill (KNM) processor to Intel family list. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161012180520.30976-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16perf/x86/intel: Remove an inconsistent NULL checkDan Carpenter
Smatch complains that we don't check "event->ctx" consistently. It's never NULL so we can just remove the check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kernel-janitors@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16Merge tag 'v4.9-rc1' into x86/urgent, to pick up updatesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16x86/e820: Don't merge consecutive E820_PRAM rangesDan Williams
Commit: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") ... fixed up the broken manipulations of max_pfn in the presence of E820_PRAM ranges. However, it also broke the sanitize_e820_map() support for not merging E820_PRAM ranges. Re-introduce the enabling to keep resource boundaries between consecutive defined ranges. Otherwise, for example, an environment that boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0 device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhang Yi <yizhan@redhat.com> Cc: linux-nvdimm@lists.01.org Fixes: 917db484dc6a ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation") Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16kprobes: Unpoison stack in jprobe_return() for KASANDmitry Vyukov
I observed false KSAN positives in the sctp code, when sctp uses jprobe_return() in jsctp_sf_eat_sack(). The stray 0xf4 in shadow memory are stack redzones: [ ] ================================================================== [ ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480 [ ] Read of size 1 by task syz-executor/18535 [ ] page:ffffea00017923c0 count:0 mapcount:0 mapping: (null) index:0x0 [ ] flags: 0x1fffc0000000000() [ ] page dumped because: kasan: bad access detected [ ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28 [ ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ ] ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8 [ ] ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000 [ ] ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370 [ ] Call Trace: [ ] [<ffffffff82d2b849>] dump_stack+0x12e/0x185 [ ] [<ffffffff817d3169>] kasan_report+0x489/0x4b0 [ ] [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20 [ ] [<ffffffff82d49529>] memcmp+0xe9/0x150 [ ] [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0 [ ] [<ffffffff817d2031>] save_stack+0xb1/0xd0 [ ] [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0 [ ] [<ffffffff817d05b8>] kfree+0xc8/0x2a0 [ ] [<ffffffff85b03f19>] skb_free_head+0x79/0xb0 [ ] [<ffffffff85b0900a>] skb_release_data+0x37a/0x420 [ ] [<ffffffff85b090ff>] skb_release_all+0x4f/0x60 [ ] [<ffffffff85b11348>] consume_skb+0x138/0x370 [ ] [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180 [ ] [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70 [ ] [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0 [ ] [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0 [ ] [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190 [ ] [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20 [ ... ] [ ] Memory state around the buggy address: [ ] ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ^ [ ] ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ ] ================================================================== KASAN stack instrumentation poisons stack redzones on function entry and unpoisons them on function exit. If a function exits abnormally (e.g. with a longjmp like jprobe_return()), stack redzones are left poisoned. Later this leads to random KASAN false reports. Unpoison stack redzones in the frames we are going to jump over before doing actual longjmp in jprobe_return(). Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: kasan-dev@googlegroups.com Cc: surovegin@google.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-16kprobes: Avoid false KASAN reports during stack copyDmitry Vyukov
Kprobes save and restore raw stack chunks with memcpy(). With KASAN these chunks can contain poisoned stack redzones, as the result memcpy() interceptor produces false stack out-of-bounds reports. Use __memcpy() instead of memcpy() for stack copying. __memcpy() is not instrumented by KASAN and does not lead to the false reports. Currently there is a spew of KASAN reports during boot if CONFIG_KPROBES_SANITY_TEST is enabled: [ ] Kprobe smoke test: started [ ] ================================================================== [ ] BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0x17c/0x280 at addr ffff88085259fba8 [ ] Read of size 64 by task swapper/0/1 [ ] page:ffffea00214967c0 count:0 mapcount:0 mapping: (null) index:0x0 [ ] flags: 0x2fffff80000000() [ ] page dumped because: kasan: bad access detected [...] Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kasan-dev@googlegroups.com [ Improved various details. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-14Merge branch 'kbuild' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - EXPORT_SYMBOL for asm source by Al Viro. This does bring a regression, because genksyms no longer generates checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is working on a patch to fix this. Plus, we are talking about functions like strcpy(), which rarely change prototypes. - Fixes for PPC fallout of the above by Stephen Rothwell and Nick Piggin - fixdep speedup by Alexey Dobriyan. - preparatory work by Nick Piggin to allow architectures to build with -ffunction-sections, -fdata-sections and --gc-sections - CONFIG_THIN_ARCHIVES support by Stephen Rothwell - fix for filenames with colons in the initramfs source by me. * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits) initramfs: Escape colons in depfile ppc: there is no clear_pages to export powerpc/64: whitelist unresolved modversions CRCs kbuild: -ffunction-sections fix for archs with conflicting sections kbuild: add arch specific post-link Makefile kbuild: allow archs to select link dead code/data elimination kbuild: allow architectures to use thin archives instead of ld -r kbuild: Regenerate genksyms lexer kbuild: genksyms fix for typeof handling fixdep: faster CONFIG_ search ia64: move exports to definitions sparc32: debride memcpy.S a bit [sparc] unify 32bit and 64bit string.h sparc: move exports to definitions ppc: move exports to definitions arm: move exports to definitions s390: move exports to definitions m68k: move exports to definitions alpha: move exports to actual definitions x86: move exports to actual definitions ...
2016-10-14Merge branch 'for-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Nick improved generic implementations of percpu operations which modify the variable and return so that they calculate the physical address only once. - percpu_ref percpu <-> atomic mode switching improvements. The patchset was originally posted about a year ago but fell through the crack. - misc non-critical fixes. * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk() mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk() percpu: eliminate two sparse warnings percpu: improve generic percpu modify-return implementation percpu-refcount: init ->confirm_switch member properly percpu_ref: allow operation mode switching operations to be called concurrently percpu_ref: restructure operation mode switching percpu_ref: unify staggered atomic switching wait behavior percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic() percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
2016-10-14x86: Remove duplicate rtit status MSR macroLongpeng(Mike)
The MSR_IA32_RTIT_STATUS is defined twice, so remove one. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: len.brown@intel.com Cc: peterz@infradead.org Cc: rafael.j.wysocki@intel.com Cc: alexander.shishkin@linux.intel.com Cc: ray.huang@amd.com Cc: Aravind.Gopalakrishnan@amd.com Cc: wu.wubin@huawei.com Cc: srinivas.pandruvada@linux.intel.com Cc: zhaoshenglong@huawei.com Cc: vladimir_zapolskiy@mentor.com Link: http://lkml.kernel.org/r/1476405740-80816-1-git-send-email-longpeng2@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-14x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt()Wanpeng Li
=============================== [ INFO: suspicious RCU usage. ] 4.8.0+ #24 Not tainted ------------------------------- ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! no locks held by swapper/1/0. [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140 [<ffffffff9d06f860>] native_write_msr+0x20/0x30 [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30 [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30 [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0 Reschedule interrupt may be called in cpu idle state. This causes lockdep check warning above. Add irq_enter/exit() in smp_reschedule_interrupt(), irq_enter() tells the RCU subsystems to end the extended quiescent state, so the following trace call in ack_APIC_irq() works correctly. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/1476409733-5133-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-11Merge branch 'work.uaccess2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess.h prepwork from Al Viro: "Preparations to tree-wide switch to use of linux/uaccess.h (which, obviously, will allow to start unifying stuff for real). The last step there, ie PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h` is not taken here - I would prefer to do it once just before or just after -rc1. However, everything should be ready for it" * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove a stray reference to asm/uaccess.h in docs sparc64: separate extable_64.h, switch elf_64.h to it score: separate extable.h, switch module.h to it mips: separate extable.h, switch module.h to it x86: separate extable.h, switch sections.h to it remove stray include of asm/uaccess.h from cacheflush.h mn10300: remove a bogus processor.h->uaccess.h include xtensa: split uaccess.h into C and asm sides bonding: quit messing with IOCTL kill __kernel_ds_p off mn10300: finish verify_area() off frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h exceptions: detritus removal
2016-10-11Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "Core: - Fence destaging work - DRIVER_LEGACY to split off legacy drm drivers - drm_mm refactoring - Splitting drm_crtc.c into chunks and documenting better - Display info fixes - rbtree support for prime buffer lookup - Simple VGA DAC driver Panel: - Add Nexus 7 panel - More simple panels i915: - Refactoring GEM naming - Refactored vma/active tracking - Lockless request lookups - Better stolen memory support - FBC fixes - SKL watermark fixes - VGPU improvements - dma-buf fencing support - Better DP dongle support amdgpu: - Powerplay for Iceland asics - Improved GPU reset support - UVD/VEC powergating support for CZ/ST - Preinitialised VRAM buffer support - Virtual display support - Initial SI support - GTT rework - PCI shutdown callback support - HPD IRQ storm fixes amdkfd: - bugfixes tilcdc: - Atomic modesetting support mediatek: - AAL + GAMMA engine support - Hook up gamma LUT - Temporal dithering support imx: - Pixel clock from devicetree - drm bridge support for LVDS bridges - active plane reconfiguration - VDIC deinterlacer support - Frame synchronisation unit support - Color space conversion support analogix: - PSR support - Better panel on/off support rockchip: - rk3399 vop/crtc support - PSR support vc4: - Interlaced vblank timing - 3D rendering CPU overhead reduction - HDMI output fixes tda998x: - HDMI audio ASoC support sunxi: - Allwinner A33 support - better TCON support msm: - DT binding cleanups - Explicit fence-fd support sti: - remove sti415/416 support etnaviv: - MMUv2 refactoring - GC3000 support exynos: - Refactoring HDMI DCC/PHY - G2D pm regression fix - Page fault issues with wait for vblank There is no nouveau work in this tree, as Ben didn't get a pull request in, and he was fighting moving to atomic and adding mst support, so maybe best it waits for a cycle" * tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits) drm/crtc: constify drm_crtc_index parameter drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next drm/i915/guc: Unwind GuC workqueue reservation if request construction fails drm/i915: Reset the breadcrumbs IRQ more carefully drm/i915: Force relocations via cpu if we run out of idle aperture drm/i915: Distinguish last emitted request from last submitted request drm/i915: Allow DP to work w/o EDID drm/i915: Move long hpd handling into the hotplug work drm/i915/execlists: Reinitialise context image after GPU hang drm/i915: Use correct index for backtracking HUNG semaphores drm/i915: Unalias obj->phys_handle and obj->userptr drm/i915: Just clear the mmiodebug before a register access drm/i915/gen9: only add the planes actually affected by ddb changes drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED drm/i915/bxt: Fix HDMI DPLL configuration drm/i915/gen9: fix the watermark res_blocks value drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations drm/i915/gen9: minimum scanlines for Y tile is not always 4 drm/i915/gen9: fix the WaWmMemoryReadLatency implementation drm/i915/kbl: KBL also needs to run the SAGV code ...