summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2014-05-04x86, espfix: Make espfix64 a Kconfig option, fix UMLH. Peter Anvin
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
2014-05-03Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "This udpate delivers: - A fix for dynamic interrupt allocation on x86 which is required to exclude the GSI interrupts from the dynamic allocatable range. This was detected with the newfangled tablet SoCs which have GPIOs and therefor allocate a range of interrupts. The MSI allocations already excluded the GSI range, so we never noticed before. - The last missing set_irq_affinity() repair, which was delayed due to testing issues - A few bug fixes for the armada SoC interrupt controller - A memory allocation fix for the TI crossbar interrupt controller - A trivial kernel-doc warning fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: irq-crossbar: Not allocating enough memory irqchip: armanda: Sanitize set_irq_affinity() genirq: x86: Ensure that dynamic irq allocation does not conflict linux/interrupt.h: fix new kernel-doc warnings irqchip: armada-370-xp: Fix releasing of MSIs irqchip: armada-370-xp: implement the ->check_device() msi_chip operation irqchip: armada-370-xp: fix invalid cast of signed value into unsigned variable
2014-05-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Two very small changes: one fix for the vSMP Foundation platform, and one to help LLVM not choke on options it doesn't understand (although it probably should)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vsmp: Fix irq routing x86: LLVMLinux: Wrap -mno-80387 with cc-option
2014-05-01x86, espfix: Move espfix definitions into a separate header fileH. Peter Anvin
Sparse warns that the percpu variables aren't declared before they are defined. Rather than hacking around it, move espfix definitions into a proper header file. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-04-30x86-32, espfix: Remove filter for espfix32 due to raceH. Peter Anvin
It is not safe to use LAR to filter when to go down the espfix path, because the LDT is per-process (rather than per-thread) and another thread might change the descriptors behind our back. Fortunately it is always *safe* (if a bit slow) to go down the espfix path, and a 32-bit LDT stack segment is extremely rare. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin
The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30uprobes/x86: Simplify riprel_{pre,post}_xol() and make them similarOleg Nesterov
Ignoring the "correction" logic riprel_pre_xol() and riprel_post_xol() are very similar but look quite differently. 1. Add the "UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX" check at the start of riprel_pre_xol(), like the same check in riprel_post_xol(). 2. Add the trivial scratch_reg() helper which returns the address of scratch register pre_xol/post_xol need to change. 3. Change these functions to use the new helper and avoid copy-and-paste under if/else branches. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Kill the "autask" arg of riprel_pre_xol()Oleg Nesterov
default_pre_xol_op() passes &current->utask->autask to riprel_pre_xol() and this is just ugly because it still needs to load current->utask to read ->vaddr. Remove this argument, change riprel_pre_xol() to use current->utask. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Rename *riprel* helpers to make the naming consistentOleg Nesterov
handle_riprel_insn(), pre_xol_rip_insn() and handle_riprel_post_xol() look confusing and inconsistent. Rename them into riprel_analyze(), riprel_pre_xol(), and riprel_post_xol() respectively. No changes in compiled code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Cleanup the usage of UPROBE_FIX_IP/UPROBE_FIX_CALLOleg Nesterov
Now that UPROBE_FIX_IP/UPROBE_FIX_CALL are mutually exclusive we can use a single "fix_ip_or_call" enum instead of 2 fix_* booleans. This way the logic looks more understandable and clean to me. While at it, join "case 0xea" with other "ip is correct" ret/lret cases. Also change default_post_xol_op() to use "else if" for the same reason. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logicOleg Nesterov
The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL was 0xe8 "call relative", and now it is handled by branch_xol_ops. So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push the address of next insn == utask->vaddr + insn.length, just we need to record insn.length into the new auprobe->def.ilen member. Note: if/when we teach branch_xol_ops to support jcxz/loopz we can remove the "correction" logic, UPROBE_FIX_IP can use the same address. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Introduce push_ret_address()Oleg Nesterov
Extract the "push return address" code from branch_emulate_op() into the new simple helper, push_ret_address(). It will have more users. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Cleanup the usage of arch_uprobe->def.fixups, make it u8Oleg Nesterov
handle_riprel_insn() assumes that nobody else could modify ->fixups before. This is correct but fragile, change it to use "|=". Also make ->fixups u8, we are going to add the new members into the union. It is not clear why UPROBE_FIX_RIP_.X lived in the upper byte, redefine them so that they can fit into u8. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30uprobes/x86: Move default_xol_ops's data into arch_uprobe->defOleg Nesterov
Finally we can move arch_uprobe->fixups/rip_rela_target_address into the new "def" struct and place this struct in the union, they are only used by default_xol_ops paths. The patch also renames rip_rela_target_address to riprel_target just to make this name shorter. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Move UPROBE_FIX_SETF logic from arch_uprobe_post_xol() to ↵Oleg Nesterov
default_post_xol_op() UPROBE_FIX_SETF is only needed to handle "popf" correctly but it is processed by the generic arch_uprobe_post_xol() code. This doesn't allows us to make ->fixups private for default_xol_ops. 1 Change default_post_xol_op(UPROBE_FIX_SETF) to set ->saved_tf = T. "popf" always reads the flags from stack, it doesn't matter if TF was set or not before single-step. Ignoring the naming, this is even more logical, "saved_tf" means "owned by application" and we do not own this flag after "popf". 2. Change arch_uprobe_post_xol() to save ->saved_tf into the local "bool send_sigtrap" before ->post_xol(). 3. Change arch_uprobe_post_xol() to ignore UPROBE_FIX_SETF and just check ->saved_tf after ->post_xol(). With this patch ->fixups and ->rip_rela_target_address are only used by default_xol_ops hooks, we are ready to remove them from the common part of arch_uprobe. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Don't use arch_uprobe_abort_xol() in arch_uprobe_post_xol()Oleg Nesterov
014940bad8e4 "uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails" changed arch_uprobe_post_xol() to use arch_uprobe_abort_xol() if ->post_xol fails. This was correct and helped to avoid the additional complications, we need to clear X86_EFLAGS_TF in this case. However, now that we have uprobe_xol_ops->abort() hook it would be better to avoid arch_uprobe_abort_xol() here. ->post_xol() should likely do what ->abort() does anyway, we should not do the same work twice. Currently only handle_riprel_post_xol() can be called twice, this is unnecessary but safe. Still this is not clean and can lead to the problems in future. Change arch_uprobe_post_xol() to clear X86_EFLAGS_TF and restore ->ip by hand and avoid arch_uprobe_abort_xol(). This temporary uglifies the usage of autask.saved_tf, we will cleanup this later. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Introduce uprobe_xol_ops->abort() and default_abort_op()Oleg Nesterov
arch_uprobe_abort_xol() calls handle_riprel_post_xol() even if auprobe->ops != default_xol_ops. This is fine correctness wise, only default_pre_xol_op() can set UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX and otherwise handle_riprel_post_xol() is nop. But this doesn't look clean and this doesn't allow us to move ->fixups into the union in arch_uprobe. Move this handle_riprel_post_xol() call into the new default_abort_op() hook and change arch_uprobe_abort_xol() accordingly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Don't change the task's state if ->pre_xol() failsOleg Nesterov
Currently this doesn't matter, the only ->pre_xol() hook can't fail, but we need to fix arch_uprobe_pre_xol() anyway. If ->pre_xol() fails we should not change regs->ip/flags, we should just return the error to make restart actually possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-30uprobes/x86: Fix is_64bit_mm() with CONFIG_X86_X32Oleg Nesterov
is_64bit_mm() assumes that mm->context.ia32_compat means the 32-bit instruction set, this is not true if the task is TIF_X32. Change set_personality_ia32() to initialize mm->context.ia32_compat by TIF_X32 or TIF_IA32 instead of 1. This allows to fix is_64bit_mm() without affecting other users, they all treat ia32_compat as "bool". TIF_ in ->ia32_compat looks a bit strange, but this is grep-friendly and avoids the new define's. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Make good_insns_* depend on CONFIG_X86_*Oleg Nesterov
Add the suitable ifdef's around good_insns_* arrays. We do not want to add the ugly ifdef's into their only user, uprobe_init_insn(), so the "#else" branch simply defines them as NULL. This doesn't generate the extra code, gcc is smart enough, although the code is fine even if it could not detect that (without CONFIG_IA32_EMULATION) is_64bit_mm() is __builtin_constant_p(). The patch looks more complicated because it also moves good_insns_64 up close to good_insns_32. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Shift "insn_complete" from branch_setup_xol_ops() to ↵Oleg Nesterov
uprobe_init_insn() Change uprobe_init_insn() to make insn_complete() == T, this makes other insn_get_*() calls unnecessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Add is_64bit_mm(), kill validate_insn_bits()Oleg Nesterov
1. Extract the ->ia32_compat check from 64bit validate_insn_bits() into the new helper, is_64bit_mm(), it will have more users. TODO: this checks is actually wrong if mm owner is X32 task, we need another fix which changes set_personality_ia32(). TODO: even worse, the whole 64-or-32-bit logic is very broken and the fix is not simple, we need the nontrivial changes in the core uprobes code. 2. Kill validate_insn_bits() and change its single caller to use uprobe_init_insn(is_64bit_mm(mm). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Add uprobe_init_insn(), kill validate_insn_{32,64}bits()Oleg Nesterov
validate_insn_32bits() and validate_insn_64bits() are very similar, turn them into the single uprobe_init_insn() which has the additional "bool x86_64" argument which can be passed to insn_init() and used to choose between good_insns_64/good_insns_32. Also kill UPROBE_FIX_NONE, it has no users. Note: the current code doesn't use ifdef's consistently, good_insns_64 depends on CONFIG_X86_64 but good_insns_32 is unconditional. This patch removes ifdef around good_insns_64, we will add it back later along with the similar one for good_insns_32. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30uprobes/x86: Refuse to attach uprobe to "word-sized" branch insnsDenys Vlasenko
All branch insns on x86 can be prefixed with the operand-size override prefix, 0x66. It was only ever useful for performing jumps to 32-bit offsets in 16-bit code segments. In 32-bit code, such instructions are useless since they cause IP truncation to 16 bits, and in case of call insns, they save only 16 bits of return address and misalign the stack pointer as a "bonus". In 64-bit code, such instructions are treated differently by Intel and AMD CPUs: Intel ignores the prefix altogether, AMD treats them the same as in 32-bit mode. Before this patch, the emulation code would execute the instructions as if they have no 0x66 prefix. With this patch, we refuse to attach uprobes to such insns. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30x86: use FDT accessors for FDT blob header dataRob Herring
Remove the direct accesses to FDT header data using accessor function instead. This makes the code more readable and makes the FDT blob structure more opaque to the arch code. This also prepares for removing struct boot_param_header completely. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Tested-by: Grant Likely <grant.likely@linaro.org>
2014-04-28genirq: x86: Ensure that dynamic irq allocation does not conflictThomas Gleixner
On x86 the allocation of irq descriptors may allocate interrupts which are in the range of the GSI interrupts. That's wrong as those interrupts are hardwired and we don't have the irq domain translation like PPC. So one of these interrupts can be hooked up later to one of the devices which are hard wired to it and the io_apic init code for that particular interrupt line happily reuses that descriptor with a completely different configuration so hell breaks lose. Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs, except for a few usage sites which have not yet blown up in our face for whatever reason. But for drivers which need an irq range, like the GPIO drivers, we have no limit in place and we don't want to expose such a detail to a driver. To cure this introduce a function which an architecture can implement to impose a lower bound on the dynamic interrupt allocations. Implement it for x86 and set the lower bound to nr_gsi_irqs, which is the end of the hardwired interrupt space, so all dynamic allocations happen above. That not only allows the GPIO driver to work sanely, it also protects the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and htirq code. They need to be cleaned up as well, but that's a separate issue. Reported-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Krogerus Heikki <heikki.krogerus@intel.com> Cc: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-28x86/vsmp: Fix irq routingOren Twaig
Correct IRQ routing in case a vSMP box is detected but the Interrupt Routing Comply (IRC) value is set to "comply", which leads to incorrect IRQ routing. Before the patch: When a vSMP box was detected and IRC was set to "comply", users (and the kernel) couldn't effectively set the destination of the IRQs. This is because the hook inside vsmp_64.c always setup all CPUs as the IRQ destination using cpumask_setall() as the return value for IRQ allocation mask. Later, this "overrided" mask caused the kernel to set the IRQ destination to the lowest online CPU in the mask (CPU0 usually). After the patch: When the IRC is set to "comply", users (and the kernel) can control the destination of the IRQs as we will not be changing the default "apic->vector_allocation_domain". Signed-off-by: Oren Twaig <oren@scalemp.com> Acked-by: Shai Fultheim <shai@scalemp.com> Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com [ Minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-25Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotationMasami Hiramatsu
Use NOKPROBE_SYMBOL macro for protecting functions from kprobes instead of __kprobes annotation under arch/x86. This applies nokprobe_inline annotation for some cases, because NOKPROBE_SYMBOL() will inhibit inlining by referring the symbol address. This just folds a bunch of previous NOKPROBE_SYMBOL() cleanup patches for x86 to one patch. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Allow kprobes on text_poke/hw_breakpointMasami Hiramatsu
Allow kprobes on text_poke/hw_breakpoint because those are not related to the critical int3-debug recursive path of kprobes at this moment. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/20140417081807.26341.73219.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Allow probe on some kprobe preparation functionsMasami Hiramatsu
There is no need to prohibit probing on the functions used in preparation phase. Those are safely probed because those are not invoked from breakpoint/fault/debug handlers, there is no chance to cause recursive exceptions. Following functions are now removed from the kprobes blacklist: can_boost can_probe can_optimize is_IF_modifier __copy_instruction copy_optimized_instructions arch_copy_kprobe arch_prepare_kprobe arch_arm_kprobe arch_disarm_kprobe arch_remove_kprobe arch_trampoline_kprobe arch_prepare_kprobe_ftrace arch_prepare_optimized_kprobe arch_check_optimized_kprobe arch_within_optimized_kprobe __arch_remove_optimized_kprobe arch_remove_optimized_kprobe arch_optimize_kprobes arch_unoptimize_kprobe I tested those functions by putting kprobes on all instructions in the functions with the bash script I sent to LKML. See: https://lkml.org/lkml/2014/3/27/33 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Link: http://lkml.kernel.org/r/20140417081747.26341.36065.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Call exception_enter after kprobes handledMasami Hiramatsu
Move exception_enter() call after kprobes handler is done. Since the exception_enter() involves many other functions (like printk), it can cause recursive int3/break loop when kprobes probe such functions. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081740.26341.10894.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Call exception handlers directly from do_int3/do_debugMasami Hiramatsu
To avoid a kernel crash by probing on lockdep code, call kprobe_int3_handler() and kprobe_debug_handler()(which was formerly called post_kprobe_handler()) directly from do_int3 and do_debug. Currently kprobes uses notify_die() to hook the int3/debug exceptoins. Since there is a locking code in notify_die, the lockdep code can be invoked. And because the lockdep involves printk() related things, theoretically, we need to prohibit probing on such code, which means much longer blacklist we'll have. Instead, hooking the int3/debug for kprobes before notify_die() can avoid this problem. Anyway, most of the int3 handlers in the kernel are already called from do_int3 directly, e.g. ftrace_int3_handler, poke_int3_handler, kgdb_ll_trap. Actually only kprobe_exceptions_notify is on the notifier_call_chain. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081733.26341.24423.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt()Masami Hiramatsu
Since the kprobes uses do_debug for single stepping, functions called from do_debug() before notify_die() must not be probed. And also native_load_idt() is called from paranoid_exit when returning int3, this also must not be probed. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20140417081719.26341.65542.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes, x86: Prohibit probing on debug_stack_*()Masami Hiramatsu
Prohibit probing on debug_stack_reset and debug_stack_set_zero. Since the both functions are called from TRACE_IRQS_ON/OFF_DEBUG macros which run in int3 ist entry, probing it may cause a soft lockup. This happens when the kernel built with CONFIG_DYNAMIC_FTRACE=y and CONFIG_TRACE_IRQFLAGS=y. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Beulich <JBeulich@suse.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081712.26341.32994.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklistMasami Hiramatsu
Introduce NOKPROBE_SYMBOL() macro which builds a kprobes blacklist at kernel build time. The usage of this macro is similar to EXPORT_SYMBOL(), placed after the function definition: NOKPROBE_SYMBOL(function); Since this macro will inhibit inlining of static/inline functions, this patch also introduces a nokprobe_inline macro for static/inline functions. In this case, we must use NOKPROBE_SYMBOL() for the inline function caller. When CONFIG_KPROBES=y, the macro stores the given function address in the "_kprobe_blacklist" section. Since the data structures are not fully initialized by the macro (because there is no "size" information), those are re-initialized at boot time by using kallsyms. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20140417081705.26341.96719.stgit@ltc230.yrl.intra.hitachi.co.jp Cc: Alok Kataria <akataria@vmware.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christopher Li <sparse@chrisli.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: David S. Miller <davem@davemloft.net> Cc: Jan-Simon Möller <dl9pf@gmx.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-sparse@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes: Prohibit probing on .entry.text codeMasami Hiramatsu
.entry.text is a code area which is used for interrupt/syscall entries, which includes many sensitive code. Thus, it is better to prohibit probing on all of such code instead of a part of that. Since some symbols are already registered on kprobe blacklist, this also removes them from the blacklist. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081658.26341.57354.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24kprobes/x86: Allow to handle reentered kprobe on single-steppingMasami Hiramatsu
Since the NMI handlers(e.g. perf) can interrupt in the single stepping (or preparing the single stepping, do_debug etc.), we should consider a kprobe is hit in the NMI handler. Even in that case, the kprobe is allowed to be reentered as same as the kprobes hit in kprobe handlers (KPROBE_HIT_ACTIVE or KPROBE_HIT_SSDONE). The real issue will happen when a kprobe hit while another reentered kprobe is processing (KPROBE_REENTER), because we already consumed a saved-area for the previous kprobe. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Lebon <jlebon@redhat.com> Link: http://lkml.kernel.org/r/20140417081651.26341.10593.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24perf/x86: Fix RAPL rdmsrl_safe() usageStephane Eranian
This patch fixes a bug introduced by: 24223657806a ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU") The rdmsrl_safe() function returns 0 on success. The current code was failing to detect the RAPL PMU on real hardware (missing /sys/devices/power) because the return value of rdmsrl_safe() was misinterpreted. Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Venkatesh Srinivas <venkateshs@google.com> Cc: peterz@infradead.org Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-21ftrace/x86: Fix order of warning messages when ftrace modifies codePetr Mladek
The colon at the end of the printk message suggests that it should get printed before the details printed by ftrace_bug(). When touching the line, let's use the preferred pr_warn() macro as suggested by checkpatch.pl. Link: http://lkml.kernel.org/r/1392650573-3390-5-git-send-email-pmladek@suse.cz Signed-off-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-19Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "This fixes the preemption-count imbalance crash reported by Owen Kibel" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix CMCI preemption bugs
2014-04-18arch,x86: Convert smp_mb__*()Peter Zijlstra
x86 is strongly ordered and all its atomic ops imply a full barrier. Implement the two new primitives as the old ones were. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-knswsr5mldkr0w1lrdxvc81w@git.kernel.org Cc: Dave Jones <davej@redhat.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18perf/x86: Export perf_assign_events()Yan, Zheng
export perf_assign_events to allow building perf Intel uncore driver as module Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1395133004-23205-3-git-send-email-zheng.z.yan@intel.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: eranian@google.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18Merge branch 'perf/urgent' into perf/core, to pick up PMU driver fixes.Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMUVenkatesh Srinivas
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com Cc: zheng.z.yan@intel.com Cc: eranian@google.com Cc: ak@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-17uprobes/x86: Emulate relative conditional "near" jmp'sOleg Nesterov
Change branch_setup_xol_ops() to simply use opc1 = OPCODE2(insn) - 0x10 if OPCODE1() == 0x0f; this matches the "short" jmp which checks the same condition. Thanks to lib/insn.c, it does the rest correctly. branch->ilen/offs are correct no matter if this jmp is "near" or "short". Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate relative conditional "short" jmp'sOleg Nesterov
Teach branch_emulate_op() to emulate the conditional "short" jmp's which check regs->flags. Note: this doesn't support jcxz/jcexz, loope/loopz, and loopne/loopnz. They all are rel8 and thus they can't trigger the problem, but perhaps we will add the support in future just for completeness. Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate relative call'sOleg Nesterov
See the previous "Emulate unconditional relative jmp's" which explains why we can not execute "jmp" out-of-line, the same applies to "call". Emulating of rip-relative call is trivial, we only need to additionally push the ret-address. If this fails, we execute this instruction out of line and this should trigger the trap, the probed application should die or the same insn will be restarted if a signal handler expands the stack. We do not even need ->post_xol() for this case. But there is a corner (and almost theoretical) case: another thread can expand the stack right before we execute this insn out of line. In this case it hit the same problem we are trying to solve. So we simply turn the probed insn into "call 1f; 1:" and add ->post_xol() which restores ->sp and restarts. Many thanks to Jonathan who finally found the standalone reproducer, otherwise I would never resolve the "random SIGSEGV's under systemtap" bug-report. Now that the problem is clear we can write the simplified test-case: void probe_func(void), callee(void); int failed = 1; asm ( ".text\n" ".align 4096\n" ".globl probe_func\n" "probe_func:\n" "call callee\n" "ret" ); /* * This assumes that: * * - &probe_func = 0x401000 + a_bit, aligned = 0x402000 * * - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000 * as xol_add_vma() asks; the 1st slot = 0x7fffffffe080 * * so we can target the non-canonical address from xol_vma using * the simple math below, 100 * 4096 is just the random offset */ asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n"); void callee(void) { failed = 0; } int main(void) { probe_func(); return failed; } It SIGSEGV's if you probe "probe_func" (although this is not very reliable, randomize_va_space/etc can change the placement of xol area). Note: as Denys Vlasenko pointed out, amd and intel treat "callw" (0x66 0xe8) differently. This patch relies on lib/insn.c and thus implements the intel's behaviour: 0x66 is simply ignored. Fortunately nothing sane should ever use this insn, so we postpone the fix until we decide what should we do; emulate or not, support or not, etc. Reported-by: Jonathan Lebon <jlebon@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate nop's using ops->emulate()Oleg Nesterov
Finally we can kill the ugly (and very limited) code in __skip_sstep(). Just change branch_setup_xol_ops() to treat "nop" as jmp to the next insn. Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes "(rep;)+ nop;" at least, and (afaics) much more. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Emulate unconditional relative jmp'sOleg Nesterov
Currently we always execute all insns out-of-line, including relative jmp's and call's. This assumes that even if regs->ip points to nowhere after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will update it correctly. However, this doesn't work if this regs->ip == xol_vaddr + insn_offset is not canonical. In this case CPU generates #GP and general_protection() kills the task which tries to execute this insn out-of-line. Now that we have uprobe_xol_ops we can teach uprobes to emulate these insns and solve the problem. This patch adds branch_xol_ops which has a single branch_emulate_op() hook, so far it can only handle rel8/32 relative jmp's. TODO: move ->fixup into the union along with rip_rela_target_address. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jonathan Lebon <jlebon@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>