summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2014-04-24kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotationMasami Hiramatsu
Use NOKPROBE_SYMBOL macro for protecting functions from kprobes instead of __kprobes annotation under arch/x86. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. This just folds a bunch of previous NOKPROBE_SYMBOL() cleanup patches for x86 to one patch. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Allow kprobes on text_poke/hw_breakpointMasami Hiramatsu
Allow kprobes on text_poke/hw_breakpoint because those are not related to the critical int3-debug recursive path of kprobes at this moment. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/20140417081807.26341.73219.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Allow probe on some kprobe preparation functionsMasami Hiramatsu
There is no need to prohibit probing on the functions used in preparation phase. Those are safely probed because those are not invoked from breakpoint/fault/debug handlers, there is no chance to cause recursive exceptions. Following functions are now removed from the kprobes blacklist: can_boost can_probe can_optimize is_IF_modifier __copy_instruction copy_optimized_instructions arch_copy_kprobe arch_prepare_kprobe arch_arm_kprobe arch_disarm_kprobe arch_remove_kprobe arch_trampoline_kprobe arch_prepare_kprobe_ftrace arch_prepare_optimized_kprobe arch_check_optimized_kprobe arch_within_optimized_kprobe __arch_remove_optimized_kprobe arch_remove_optimized_kprobe arch_optimize_kprobes arch_unoptimize_kprobe I tested those functions by putting kprobes on all instructions in the functions with the bash script I sent to LKML. See: https://lkml.org/lkml/2014/3/27/33 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Link: http://lkml.kernel.org/r/20140417081747.26341.36065.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Call exception_enter after kprobes handledMasami Hiramatsu
Move exception_enter() call after kprobes handler is done. Since the exception_enter() involves many other functions (like printk), it can cause recursive int3/break loop when kprobes probe such functions. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081740.26341.10894.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Call exception handlers directly from do_int3/do_debugMasami Hiramatsu
To avoid a kernel crash by probing on lockdep code, call kprobe_int3_handler() and kprobe_debug_handler()(which was formerly called post_kprobe_handler()) directly from do_int3 and do_debug. Currently kprobes uses notify_die() to hook the int3/debug exceptoins. Since there is a locking code in notify_die, the lockdep code can be invoked. And because the lockdep involves printk() related things, theoretically, we need to prohibit probing on such code, which means much longer blacklist we'll have. Instead, hooking the int3/debug for kprobes before notify_die() can avoid this problem. Anyway, most of the int3 handlers in the kernel are already called from do_int3 directly, e.g. ftrace_int3_handler, poke_int3_handler, kgdb_ll_trap. Actually only kprobe_exceptions_notify is on the notifier_call_chain. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081733.26341.24423.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Prohibit probing on thunk functions and restoreMasami Hiramatsu
thunk/restore functions are also used for tracing irqoff etc. and those are involved in kprobe's exception handling. Prohibit probing on them to avoid kernel crash. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140417081726.26341.3872.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt()Masami Hiramatsu
Since the kprobes uses do_debug for single stepping, functions called from do_debug() before notify_die() must not be probed. And also native_load_idt() is called from paranoid_exit when returning int3, this also must not be probed. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20140417081719.26341.65542.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Prohibit probing on debug_stack_*()Masami Hiramatsu
Prohibit probing on debug_stack_reset and debug_stack_set_zero. Since the both functions are called from TRACE_IRQS_ON/OFF_DEBUG macros which run in int3 ist entry, probing it may cause a soft lockup. This happens when the kernel built with CONFIG_DYNAMIC_FTRACE=y and CONFIG_TRACE_IRQFLAGS=y. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Beulich <JBeulich@suse.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081712.26341.32994.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklistMasami Hiramatsu
Introduce NOKPROBE_SYMBOL() macro which builds a kprobes blacklist at kernel build time. The usage of this macro is similar to EXPORT_SYMBOL(), placed after the function definition: NOKPROBE_SYMBOL(function); Since this macro will inhibit inlining of static/inline functions, this patch also introduces a nokprobe_inline macro for static/inline functions. In this case, we must use NOKPROBE_SYMBOL() for the inline function caller. When CONFIG_KPROBES=y, the macro stores the given function address in the "_kprobe_blacklist" section. Since the data structures are not fully initialized by the macro (because there is no "size" information), those are re-initialized at boot time by using kallsyms. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081705.26341.96719.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Alok Kataria <akataria@vmware.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christopher Li <sparse@chrisli.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: David S. Miller <davem@davemloft.net> Cc: Jan-Simon Möller <dl9pf@gmx.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-sparse@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes: Prohibit probing on .entry.text codeMasami Hiramatsu
.entry.text is a code area which is used for interrupt/syscall entries, which includes many sensitive code. Thus, it is better to prohibit probing on all of such code instead of a part of that. Since some symbols are already registered on kprobe blacklist, this also removes them from the blacklist. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081658.26341.57354.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Allow to handle reentered kprobe on single-steppingMasami Hiramatsu
Since the NMI handlers(e.g. perf) can interrupt in the single stepping (or preparing the single stepping, do_debug etc.), we should consider a kprobe is hit in the NMI handler. Even in that case, the kprobe is allowed to be reentered as same as the kprobes hit in kprobe handlers (KPROBE_HIT_ACTIVE or KPROBE_HIT_SSDONE). The real issue will happen when a kprobe hit while another reentered kprobe is processing (KPROBE_REENTER), because we already consumed a saved-area for the previous kprobe. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Link: http://lkml.kernel.org/r/20140417081651.26341.10593.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-22Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso fix from Peter Anvin: "This is a single build fix for building with gold as opposed to GNU ld. It got queued up separately and was expected to be pushed during the merge window, but it got left behind" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, vdso: Make the vdso linker script compatible with Gold
2014-04-19Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "This fixes the preemption-count imbalance crash reported by Owen Kibel" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix CMCI preemption bugs
2014-04-19Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two kernel side fixes: - an Intel uncore PMU driver potential crash fix - a kprobes/perf-call-graph interaction fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU kprobes/x86: Fix page-fault handling logic
2014-04-18perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMUVenkatesh Srinivas
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com Cc: zheng.z.yan@intel.com Cc: eranian@google.com Cc: ak@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17Merge tag 'stable/for-linus-3.15-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from David Vrabel: "Xen regression and bug fixes for 3.15-rc1: - fix completely broken 32-bit PV guests caused by x86 refactoring 32-bit thread_info. - only enable ticketlock slow path on Xen (not bare metal) - fix two bugs with PV guests not shutting down when requested - fix a minor memory leak in xen-pciback error path" * tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/manage: Poweroff forcefully if user-space is not yet up. xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart. xen/spinlock: Don't enable them unconditionally. xen-pciback: silence an unwanted debug printk xen: fix memory leak in __xen_pcibk_add_pci_dev() x86/xen: Fix 32-bit PV guests's usage of kernel_stack
2014-04-17kprobes/x86: Fix page-fault handling logicMasami Hiramatsu
Current kprobes in-kernel page fault handler doesn't expect that its single-stepping can be interrupted by an NMI handler which may cause a page fault(e.g. perf with callback tracing). In that case, the page-fault handled by kprobes and it misunderstands the page-fault has been caused by the single-stepping code and tries to recover IP address to probed address. But the truth is the page-fault has been caused by the NMI handler, and do_page_fault failes to handle real page fault because the IP address is modified and causes Kernel BUGs like below. ---- [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40 To handle this correctly, I fixed the kprobes fault handler to ensure the faulted ip address is its own single-step buffer instead of checking current kprobe state. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: fche@redhat.com Cc: systemtap@sourceware.org Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17x86/mce: Fix CMCI preemption bugsIngo Molnar
The following commit: 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms") Added two preemption bugs: - machine_check_poll() does a get_cpu_var() without a matching put_cpu_var(), which causes preemption imbalance and crashes upon bootup. - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. To fix these bugs fix the imbalance and change cmci_discover_lock to a regular spinlock. Reported-by: Owen Kibel <qmewlo@gmail.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Todorov <atodorov@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> -- arch/x86/kernel/cpu/mcheck/mce.c | 4 +--- arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++--------- 2 files changed, 10 insertions(+), 12 deletions(-)
2014-04-16Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes: - reboot regression fix - build message spam fix - GPU quirk fix - 'make kvmconfig' fix plus the wire-up of the renameat2() system call on i386" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove the PCI reboot method from the default chain x86/build: Supress "Nothing to be done for ..." messages x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks x86/platform: Fix "make O=dir kvmconfig" i386: Wire up the renameat2() syscall
2014-04-16Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes, plus a simple hardware-enablement patch for the Intel RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is still fine at this stage" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Instead of redirecting flex output, use -o perf tools: Fix double free in perf test 21 (code-reading.c) perf stat: Initialize statistics correctly perf bench: Set more defaults in the 'numa' suite perf bench: Fix segfault at the end of an 'all' execution perf bench: Update manpage to mention numa and futex perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi() perf probe: Fix to handle errors in line_range searching perf probe: Fix --line option behavior perf tools: Pick up libdw without explicit LIBDW_DIR MAINTAINERS: Change e-mail to kernel.org one perf callchains: Disable unwind libraries when libelf isn't found tools lib traceevent: Do not call warning() directly tools lib traceevent: Print event name when show warning if possible perf top: Fix documentation of invalid -s option perf/x86: Enable DRAM RAPL support on Intel Haswell
2014-04-16x86: Remove the PCI reboot method from the default chainIngo Molnar
Steve reported a reboot hang and bisected it back to this commit: a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list He heroically tested all reboot methods and found the following: reboot=t # triple fault ok reboot=k # keyboard ctrl FAIL reboot=b # BIOS ok reboot=a # ACPI FAIL reboot=e # EFI FAIL [system has no EFI] reboot=p # PCI 0xcf9 FAIL And I think it's pretty obvious that we should only try PCI 0xcf9 as a last resort - if at all. The other observation is that (on this box) we should never try the PCI reboot method, but close with either the 'triple fault' or the 'BIOS' (terminal!) reboot methods. Thirdly, CF9_COND is a total misnomer - it should be something like CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ... So this patch fixes the worst problems: - it orders the actual reboot logic to follow the reboot ordering pattern - it was in a pretty random order before for no good reason. - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and BOOT_CF9_SAFE flags to make the code more obvious. - it tries the BIOS reboot method before the PCI reboot method. (Since 'BIOS' is a terminal reboot method resulting in a hang if it does not work, this is essentially equivalent to removing the PCI reboot method from the default reboot chain.) - just for the miraculous possibility of terminal (resulting in hang) reboot methods of triple fault or BIOS returning without having done their job, there's an ordering between them as well. Reported-and-bisected-and-tested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Li Aubrey <aubrey.li@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-15xen/spinlock: Don't enable them unconditionally.Konrad Rzeszutek Wilk
The git commit a945928ea2709bc0e8e8165d33aed855a0110279 ('xen: Do not enable spinlocks before jump_label_init() has executed') was added to deal with the jump machinery. Earlier the code that turned on the jump label was only called by Xen specific functions. But now that it had been moved to the initcall machinery it gets called on Xen, KVM, and baremetal - ouch!. And the detection machinery to only call it on Xen wasn't remembered in the heat of merge window excitement. This means that the slowpath is enabled on baremetal while it should not be. Reported-by: Waiman Long <waiman.long@hp.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> CC: stable@vger.kernel.org CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-15x86/xen: Fix 32-bit PV guests's usage of kernel_stackBoris Ostrovsky
Commit 198d208df4371734ac4728f69cb585c284d20a15 ("x86: Keep thread_info on thread stack in x86_32") made 32-bit kernels use kernel_stack to point to thread_info. That change missed a couple of updates needed by Xen's 32-bit PV guests: 1. kernel_stack needs to be initialized for secondary CPUs 2. GET_THREAD_INFO() now uses %fs register which may not be the kernel's version when executing xen_iret(). With respect to the second issue, we don't need GET_THREAD_INFO() anymore: we used it as an intermediate step to get to per_cpu xen_vcpu and avoid referencing %fs. Now that we are going to use %fs anyway we may as well go directly to xen_vcpu. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-04-14Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Marcelo Tosatti: - Fix for guest triggerable BUG_ON (CVE-2014-0155) - CR4.SMAP support - Spurious WARN_ON() fix * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: remove WARN_ON from get_kernel_ns() KVM: Rename variable smep to cr4_smep KVM: expose SMAP feature to guest KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode KVM: Add SMAP support when setting CR4 KVM: Remove SMAP bit from CR4_RESERVED_BITS KVM: ioapic: try to recover if pending_eoi goes out of range KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14KVM: x86: remove WARN_ON from get_kernel_ns()Marcelo Tosatti
Function and callers can be preempted. https://bugzilla.kernel.org/show_bug.cgi?id=73721 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-14KVM: Rename variable smep to cr4_smepFeng Wu
Rename variable smep to cr4_smep, which can better reflect the meaning of the variable. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14KVM: expose SMAP feature to guestFeng Wu
This patch exposes SMAP feature to guest Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14KVM: Disable SMAP for guests in EPT realmode and EPT unpaging modeFeng Wu
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14KVM: Add SMAP support when setting CR4Feng Wu
This patch adds SMAP handling logic when setting CR4 for guests Thanks a lot to Paolo Bonzini for his suggestion to use the branchless way to detect SMAP violation. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14KVM: Remove SMAP bit from CR4_RESERVED_BITSFeng Wu
This patch removes SMAP bit from CR4_RESERVED_BITS. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-14Merge tag 'v3.15-rc1' into perf/urgentIngo Molnar
Pick up the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14x86/build: Supress "Nothing to be done for ..." messagesMasahiro Yamada
When we build an already built kernel again, arch/x86/syscalls/Makefile and arch/x86/tools/Makefile emits "Nothing to be done for ..." messages. Here is the command log: $ make defconfig [ snip ] $ make [ snip ] $ make make[1]: Nothing to be done for `all'. <----- make[1]: Nothing to be done for `relocs'. <----- CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h Besides not emitting those, "all" and "relocs" should be added to PHONY as well. Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com> Acked-by: Peter Foley <pefoley2@pefoley.com> Acked-by: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1397093742-11144-1-git-send-email-yamada.m@jp.panasonic.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirksVille Syrjälä
Have the KB(),MB(),GB() macros produce unsigned longs to avoid unintended sign extension issues with the gen2 memory size detection. What happens is first the uint8_t returned by read_pci_config_byte() gets promoted to an int which gets multiplied by another int from the MB() macro, and finally the result gets sign extended to size_t. Although this shouldn't be a problem in practice as all affected gen2 platforms are 32bit AFAIK, so size_t will be 32 bits. Reported-by: Bjorn Helgaas <bhelgaas@google.com> Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1397382303-17525-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14x86/platform: Fix "make O=dir kvmconfig"Antonio Borneo
Running: make O=dir x86_64_defconfig make O=dir kvmconfig the second command dirties the source tree with file ".config", symlink "source" and objects in folder "scripts". Fixed by using properly prefixed paths in the arch Makefile. Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1397377568-8375-1-git-send-email-borneo.antonio@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-12Merge tag 'llvmlinux-for-v3.15' of ↵Linus Torvalds
git://git.linuxfoundation.org/llvmlinux/kernel Pull llvm patches from Behan Webster: "These are some initial updates to support compiling the kernel with clang. These patches have been through the proper reviews to the best of my ability, and have been soaking in linux-next for a few weeks. These patches by themselves still do not completely allow clang to be used with the kernel code, but lay the foundation for other patches which are still under review. Several other of the LLVMLinux patches have been already added via maintainer trees" * tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel: x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id" x86 kbuild: LLVMLinux: More cc-options added for clang x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h LLVMLinux: Remove warning about returning an uninitialized variable kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang Documentation: LLVMLinux: Update Documentation/dontdiff kbuild: LLVMLinux: Adapt warnings for compilation with clang kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
2014-04-12Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit updates from Eric Paris. * git://git.infradead.org/users/eparis/audit: (28 commits) AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range audit: do not cast audit_rule_data pointers pointlesly AUDIT: Allow login in non-init namespaces audit: define audit_is_compat in kernel internal header kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c sched: declare pid_alive as inline audit: use uapi/linux/audit.h for AUDIT_ARCH declarations syscall_get_arch: remove useless function arguments audit: remove stray newline from audit_log_execve_info() audit_panic() call audit: remove stray newlines from audit_log_lost messages audit: include subject in login records audit: remove superfluous new- prefix in AUDIT_LOGIN messages audit: allow user processes to log from another PID namespace audit: anchor all pid references in the initial pid namespace audit: convert PPIDs to the inital PID namespace. pid: get pid_t ppid of task in init_pid_ns audit: rename the misleading audit_get_context() to audit_take_context() audit: Add generic compat syscall support audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL ...
2014-04-11i386: Wire up the renameat2() syscallMiklos Szeredi
The renameat2() system call was only wired up for x86-64. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Link: http://lkml.kernel.org/r/1397211951-20549-2-git-send-email-miklos@szeredi.hu Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-11Merge tag 'pm+acpi-3.15-rc1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management fixes and updates from Rafael Wysocki: "This is PM and ACPI material that has emerged over the last two weeks and one fix for a CPU hotplug regression introduced by the recent CPU hotplug notifiers registration series. Included are intel_idle and turbostat updates from Len Brown (these have been in linux-next for quite some time), a new cpufreq driver for powernv (that might spend some more time in linux-next, but BenH was asking me so nicely to push it for 3.15 that I couldn't resist), some cpufreq fixes and cleanups (including fixes for some silly breakage in a couple of cpufreq drivers introduced during the 3.14 cycle), assorted ACPI cleanups, wakeup framework documentation fixes, a new sysfs attribute for cpuidle and a new command line argument for power domains diagnostics. Specifics: - Fix for a recently introduced CPU hotplug regression in ARM KVM from Ming Lei. - Fixes for breakage in the at32ap, loongson2_cpufreq, and unicore32 cpufreq drivers introduced during the 3.14 cycle (-stable material) from Chen Gang and Viresh Kumar. - New powernv cpufreq driver from Vaidyanathan Srinivasan, with bits from Gautham R Shenoy and Srivatsa S Bhat. - Exynos cpufreq driver fix preventing it from being included into multiplatform builds that aren't supported by it from Sachin Kamat. - cpufreq cleanups related to the usage of the driver_data field in struct cpufreq_frequency_table from Viresh Kumar. - cpufreq ppc driver cleanup from Sachin Kamat. - Intel BayTrail support for intel_idle and ACPI idle from Len Brown. - Intel CPU model 54 (Atom N2000 series) support for intel_idle from Jan Kiszka. - intel_idle fix for Intel Ivy Town residency targets from Len Brown. - turbostat updates (Intel Broadwell support and output cleanups) from Len Brown. - New cpuidle sysfs attribute for exporting C-states' target residency information to user space from Daniel Lezcano. - New kernel command line argument to prevent power domains enabled by the bootloader from being turned off even if they are not in use (for diagnostics purposes) from Tushar Behera. - Fixes for wakeup sysfs attributes documentation from Geert Uytterhoeven. - New ACPI video blacklist entry for ThinkPad Helix from Stephen Chandler Paul. - Assorted ACPI cleanups and a Kconfig help update from Jonghwan Choi, Zhihui Zhang, Hanjun Guo" * tag 'pm+acpi-3.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits) ACPI: Update the ACPI spec information in Kconfig arm, kvm: fix double lock on cpu_add_remove_lock cpuidle: sysfs: Export target residency information cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h cpufreq: create another field .flags in cpufreq_frequency_table cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table cpufreq: don't print value of .driver_data from core cpufreq: ia64: don't set .driver_data to index cpufreq: powernv: Select CPUFreq related Kconfig options for powernv cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids cpufreq: powernv: cpufreq driver for powernv platform cpufreq: at32ap: don't declare local variable as static cpufreq: loongson2_cpufreq: don't declare local variable as static cpufreq: unicore32: fix typo issue for 'clk' cpufreq: exynos: Disable on multiplatform build PM / wakeup: Correct presence vs. emptiness of wakeup_* attributes PM / domains: Add pd_ignore_unused to keep power domains enabled ACPI / dock: Drop dock_device_ids[] table ACPI / video: Favor native backlight interface for ThinkPad Helix ACPI / thermal: Fix wrong variable usage in debug statement ...
2014-04-11Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pullx86 core platform updates from Peter Anvin: "This is the x86/platform branch with the objectionable IOSF patches removed. What is left is proper memory handling for Intel GPUs, and a change to the Calgary IOMMU code which will be required to make kexec work sanely on those platforms after some upcoming kexec changes" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, calgary: Use 8M TCE table size by default x86/gpu: Print the Intel graphics stolen memory range x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms x86/gpu: Add vfunc for Intel graphics stolen memory base address
2014-04-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of minor fixes for x86, plus the IRET information leak fix (forbid the use of 16-bit segments in 64-bit mode)" NOTE! We may have to relax the "forbid the use of 16-bit segments in 64-bit mode" part, since there may be people who still run and depend on 16-bit Windows binaries under Wine. But I'm taking this in the current unconditional form for now to see who (if anybody) screams bloody murder. Maybe nobody cares. And maybe we'll have to update it with some kind of runtime enablement (like our vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already need to tweak). * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels efi: Pass correct file handle to efi_file_{read,close} x86/efi: Correct EFI boot stub use of code32_start x86/efi: Fix boot failure with EFI stub x86/platform/hyperv: Handle VMBUS driver being a module x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround x86, CMCI: Add proper detection of end of CMCI storms
2014-04-11x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernelsH. Peter Anvin
The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 32-bit mode. Since 16-bit support is somewhat crippled anyway on a 64-bit kernel (no V86 mode), and most (if not quite all) 64-bit processors support virtualization for the users who really need it, simply reject attempts at creating a 16-bit segment when running on top of a 64-bit kernel. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org Cc: <stable@vger.kernel.org>
2014-04-11Merge tag 'efi-urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: "* Fix EFI boot regression introduced during the merge window where the firmware was reading random values from the stack because we were passing a pointer to the wrong object type. * Kernel corruption has been reported when booting with the EFI boot stub which was tracked down to setting a bogus value for bp->hdr.code32_start, resulting in corruption during relocation. * Olivier Martin reported that the wrong file handles were being passed to efi_file_(read|close), which works for x86 by luck due to the way that the FAT driver is implemented, but doesn't work on ARM." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-10x86, calgary: Use 8M TCE table size by defaultWANG Chao
New kexec-tools wants to pass kdump kernel needed memmap via E820 directly, instead of memmap=exactmap. This makes saved_max_pfn not be passed down to 2nd kernel. To keep 1st kernel and 2nd kernel using the same TCE table size, Muli suggest to hard code the size to max (8M). We can't get rid of saved_max_pfn this time, for backward compatibility with old first kernel and new second kernel. However new first kernel and old second kernel can not work unfortunately. v2->v1: - retain saved_max_pfn so new 2nd kernel can work with old 1st kernel from Vivek Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Muli Ben-Yehuda <mulix@mulix.org> Acked-by: Jon Mason <jdmason@kudzu.us> Link: http://lkml.kernel.org/r/1394463120-26999-1-git-send-email-chaowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-10efi: Pass correct file handle to efi_file_{read,close}Matt Fleming
We're currently passing the file handle for the root file system to efi_file_read() and efi_file_close(), instead of the file handle for the file we wish to read/close. While this has worked up until now, it seems that it has only been by pure luck. Olivier explains, "The issue is the UEFI Fat driver might return the same function for 'fh->read()' and 'h->read()'. While in our case it does not work with a different implementation of EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. In our case, we return a different pointer when reading a directory and reading a file." Fixing this actually clears up the two functions because we can drop one of the arguments, and instead only pass a file 'handle' argument. Reported-by: Olivier Martin <olivier.martin@arm.com> Reviewed-by: Olivier Martin <olivier.martin@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-10x86/efi: Correct EFI boot stub use of code32_startMatt Fleming
code32_start should point at the start of the protected mode code, and *not* at the beginning of the bzImage. This is much easier to do in assembly so document that callers of make_boot_params() need to fill out code32_start. The fallout from this bug is that we would end up relocating the image but copying the image at some offset, resulting in what appeared to be memory corruption. Reported-by: Thomas Bächler <thomas@archlinux.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-10x86/efi: Fix boot failure with EFI stubMatt Fleming
commit 54b52d872680 ("x86/efi: Build our own EFI services pointer table") introduced a regression because the 64-bit file_size() implementation passed a pointer to a 32-bit data object, instead of a pointer to a 64-bit object. Because the firmware treats the object as 64-bits regardless it was reading random values from the stack for the upper 32-bits. This resulted in people being unable to boot their machines, after seeing the following error messages, Failed to get file info size Failed to alloc highmem for files Reported-by: Dzmitry Sledneu <dzmitry.sledneu@gmail.com> Reported-by: Koen Kooi <koen@dominion.thruhere.net> Tested-by: Koen Kooi <koen@dominion.thruhere.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-04-09x86 kbuild: LLVMLinux: More cc-options added for clangJan-Simon Möller
Protect more options for x86 with cc-option so that we don't get errors when using clang instead of gcc. Add more or different options when using clang as well. Also need to enforce that SSE is off for clang and the stack is 8-byte aligned. Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Mark Charlebois <charlebm@gmail.com>
2014-04-08arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()David Rientjes
Kmemcheck should use the preferred interface for parsing command line arguments, kstrto*(), rather than sscanf() itself. Use it appropriately. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08Merge branch 'pm-cpuidle'Rafael J. Wysocki
* pm-cpuidle: cpuidle: sysfs: Export target residency information intel_idle: fine-tune IVT residency targets tools/power turbostat: Run on Broadwell tools/power turbostat: simplify output, add Avg_MHz intel_idle: Add CPU model 54 (Atom N2000 series) intel_idle: support Bay Trail intel_idle: allow sparse sub-state numbering, for Bay Trail ACPI idle: permit sparse C-state sub-state numbers
2014-04-08Merge branch 'release' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-cpuidle Pull intel_idle and turbostat material for v3.15-rc1 from Len Brown. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: intel_idle: fine-tune IVT residency targets tools/power turbostat: Run on Broadwell tools/power turbostat: simplify output, add Avg_MHz intel_idle: Add CPU model 54 (Atom N2000 series) intel_idle: support Bay Trail intel_idle: allow sparse sub-state numbering, for Bay Trail ACPI idle: permit sparse C-state sub-state numbers