summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2017-10-17x86/fpu: Parse clearcpuid= as early XSAVE argumentAndi Kleen
With a followon patch we want to make clearcpuid affect the XSAVE configuration. But xsave is currently initialized before arguments are parsed. Move the clearcpuid= parsing into the special early xsave argument parsing code. Since clearcpuid= contains a = we need to keep the old __setup around as a dummy, otherwise it would end up as a environment variable in init's environment. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171013215645.23166-4-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-17x86/cpuid: Add generic table for CPUID dependenciesAndi Kleen
Some CPUID features depend on other features. Currently it's possible to to clear dependent features, but not clear the base features, which can cause various interesting problems. This patch implements a generic table to describe dependencies between CPUID features, to be used by all code that clears CPUID. Some subsystems (like XSAVE) had an own implementation of this, but it's better to do it all in a single place for everyone. Then clear_cpu_cap and setup_clear_cpu_cap always look up this table and clear all dependencies too. This is intended to be a practical table: only for features that make sense to clear. If someone for example clears FPU, or other features that are essentially part of the required base feature set, not much is going to work. Handling that is right now out of scope. We're only handling features which can be usefully cleared. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan McDowell <noodles@earth.li> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171013215645.23166-3-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-17x86/vector: Use correct per cpu variable in free_moved_vector()Thomas Gleixner
free_moved_vector() accesses the per cpu vector array with this_cpu_write() to clear the vector. The function has two call sites: 1) The vector cleanup IPI 2) The force_complete_move() code path For #1 this_cpu_write() is correct as it runs on the CPU on which the vector needs to be freed. For #2 this_cpu_write() is wrong because the function is called from an outgoing CPU which is not necessarily the CPU on which the previous vector needs to be freed. As a result it sets the vector on the outgoing CPU to NULL, which is pointless as that CPU does not handle interrupts anymore. What's worse is that it leaves the vector on the previous target CPU in place which later on triggers the BUG_ON(vector) in the vector allocation code when the vector gets reused. That's possible because the bitmap allocator entry of that CPU is freed correctly. Always use the CPU to which the vector was associated and clear the vector entry on that CPU. Fixup the tracepoint as well so it tracks on which CPU the vector gets removed. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Petri Latvala <petri.latvala@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Yu Chen <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710161614430.1973@nanos
2017-10-16x86/platform/UV: Add check of TSC state set by UV BIOSmike.travis@hpe.com
Insert a check early in UV system startup that checks whether BIOS was able to obtain satisfactory TSC Sync stability. If not, it usually is caused by an error in the external TSC clock generation source. In this case the best fallback is to use the builtin hardware RTC as the kernel will not be able to set an accurate TSC sync either. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Andrew Banman <andrew.abanman@hpe.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.406294490@stormcage.americas.sgi.com
2017-10-16x86/tsc: Provide a means to disable TSC ARTmike.travis@hpe.com
On systems where multiple chassis are reset asynchronously, and thus the TSC counters are started asynchronously, the offset needed to convert to TSC to ART would be different. Disable ART in that case and rely on the TSC counters to supply the accurate time. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.289397994@stormcage.americas.sgi.com
2017-10-16x86/tsc: Drastically reduce the number of firmware bug warningsmike.travis@hpe.com
Prior to the TSC ADJUST MSR being available, the method to set TSC's in sync with each other naturally caused a small skew between cpu threads. This was NOT a firmware bug at the time so introducing a whole avalanche of alarming warning messages might cause unnecessary concern and customer complaints. (Example: >3000 msgs in a 32 socket Skylake system.) Simply report the warning condition, if possible do the necessary fixes, and move on. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.175062400@stormcage.americas.sgi.com
2017-10-16x86/tsc: Skip TSC test and error messages if already unstablemike.travis@hpe.com
If the TSC has already been determined to be unstable, then checking TSC ADJUST values is a waste of time and generates unnecessary error messages. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163202.060777495@stormcage.americas.sgi.com
2017-10-16x86/tsc: Add option that TSC on Socket 0 being non-zero is validmike.travis@hpe.com
Add a flag to indicate and process that TSC counters are on chassis that reset at different times during system startup. Therefore which TSC ADJUST values should be zero is not predictable. Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Reviewed-by: Russ Anderson <russ.anderson@hpe.com> Reviewed-by: Andrew Banman <andrew.abanman@hpe.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Bin Gao <bin.gao@linux.intel.com> Link: https://lkml.kernel.org/r/20171012163201.944370012@stormcage.americas.sgi.com
2017-10-16x86/idt: Initialize early IDT before cr4_init_shadow()Thomas Gleixner
Moving the early IDT setup out of assembly code breaks the boot on first generation 486 systems. The reason is that the call of idt_setup_early_handler, which sets up the early handlers was added after the call to cr4_init_shadow(). cr4_init_shadow() tries to read CR4 which is not available on those systems. The accessor function uses a extable fixup to handle the resulting fault. As the IDT is not set up yet, the cr4 read exception causes an instantaneous reboot for obvious reasons. Call idt_setup_early_handler() before cr4_init_shadow() so IDT is set up before the first exception hits. Fixes: 87e81786b13b ("x86/idt: Move early IDT setup out of 32-bit asm") Reported-and-tested-by: Matthew Whitehead <whiteheadm@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710161210290.1973@nanos
2017-10-16x86/cpu/intel_cacheinfo: Remove redundant assignment to 'this_leaf'Colin Ian King
The 'this_leaf' variable is assigned a value that is never read and it is updated a little later with a newer value, hence we can remove the redundant assignment. Cleans up the following Clang warning: Value stored to 'this_leaf' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20171015160203.12332-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A landry list of fixes: - fix reboot breakage on some PCID-enabled system - fix crashes/hangs on some PCID-enabled systems - fix microcode loading on certain older CPUs - various unwinder fixes - extend an APIC quirk to more hardware systems and disable APIC related warning on virtualized systems - various Hyper-V fixes - a macro definition robustness fix - remove jprobes IRQ disabling - various mem-encryption fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Do the family check first x86/mm: Flush more aggressively in lazy TLB mode x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs x86/unwind: Disable unwinder warnings on 32-bit x86/unwind: Align stack pointer in unwinder dump x86/unwind: Use MSB for frame pointer encoding on 32-bit x86/unwind: Fix dereference of untrusted pointer x86/alternatives: Fix alt_max_short macro to really be a max() x86/mm/64: Fix reboot interaction with CR4.PCIDE kprobes/x86: Remove IRQ disabling from jprobe handlers kprobes/x86: Set up frame pointer in kprobe trampoline
2017-10-14Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Ingo Molnar: "A boot parameter fix, plus a header export fix" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Hide mca_cfg RAS/CEC: Use the right length for "cec_disable"
2017-10-14x86/microcode: Do the family check firstBorislav Petkov
On CPUs like AMD's Geode, for example, we shouldn't even try to load microcode because they do not support the modern microcode loading interface. However, we do the family check *after* the other checks whether the loader has been disabled on the command line or whether we're running in a guest. So move the family checks first in order to exit early if we're being loaded on an unsupported family. Reported-and-tested-by: Sven Glodowski <glodi1@arcor.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.11.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396 Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*'Josh Poimboeuf
Rename the unwinder config options from: CONFIG_ORC_UNWINDER CONFIG_FRAME_POINTER_UNWINDER CONFIG_GUESS_UNWINDER to: CONFIG_UNWINDER_ORC CONFIG_UNWINDER_FRAME_POINTER CONFIG_UNWINDER_GUESS ... in order to give them a more logical config namespace. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-12x86/apic: Update TSC_DEADLINE quirk with additional SKX steppingLen Brown
SKX stepping-3 fixed the TSC_DEADLINE issue in a different ucode version number than stepping-4. Linux needs to know this stepping-3 specific version number to also enable the TSC_DEADLINE on stepping-3. The steppings and ucode versions are documented in the SKX BIOS update: https://downloadmirror.intel.com/26978/eng/ReleaseNotes_R00.01.0004.txt Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/60f2bbf7cf617e212b522e663f84225bfebc50e5.1507756305.git.len.brown@intel.com
2017-10-12x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisorsPaolo Bonzini
Commit 594a30fb1242 ("x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature", 2017-08-30) was also about silencing the warning on VirtualBox; however, KVM does expose the TSC deadline timer, and it's virtualized so that it is immune from CPU errata. Therefore, booting 4.13 with "-cpu Haswell" shows this in the logs: [ 0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb2 (or later) Even if you had a hypervisor that does _not_ virtualize the TSC deadline and rather exposes the hardware one, it should be the hypervisors task to update microcode and possibly hide the flag from CPUID. So just hide the message when running on _any_ hypervisor, not just those that do not support the TSC deadline timer. The older check still makes sense, so keep it. Fixes: bd9240a18e ("x86/apic: Add TSC_DEADLINE quirk due to errata") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1507630377-54471-1-git-send-email-pbonzini@redhat.com
2017-10-12x86/apic/vector: Ignore set_affinity call for inactive interruptsThomas Gleixner
The core interrupt code can call the affinity setter for inactive interrupts under certain circumstances. For inactive intererupts which use managed or reservation mode this is a pointless exercise as the activation will assign a vector which fits the destination mask. Check for this and return w/o going through the vector assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-12Merge branch 'irq/urgent' into x86/apicThomas Gleixner
Pick up core changes which affect the vector rework.
2017-10-10x86/unwind: Disable unwinder warnings on 32-bitJosh Poimboeuf
x86-32 doesn't have stack validation, so in most cases it doesn't make sense to warn about bad frame pointers. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a69658760800bf281e6353248c23e0fa0acf5230.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10x86/unwind: Align stack pointer in unwinder dumpJosh Poimboeuf
When printing the unwinder dump, the stack pointer could be unaligned, for one of two reasons: - stack corruption; or - GCC created an unaligned stack. There's no way for the unwinder to tell the difference between the two, so we have to assume one or the other. GCC unaligned stacks are very rare, and have only been spotted before GCC 5. Presumably, if we're doing an unwinder stack dump, stack corruption is more likely than a GCC unaligned stack. So always align the stack before starting the dump. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10x86/unwind: Use MSB for frame pointer encoding on 32-bitJosh Poimboeuf
On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings like: WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000 And also there were some stack dumps with a bunch of unreliable '?' symbols after an apic_timer_interrupt symbol, meaning the unwinder got confused when it tried to read the regs. The cause of those issues is that, with GCC 4.8 (and possibly older), there are cases where GCC misaligns the stack pointer in a leaf function for no apparent reason: c124a388 <acpi_rs_move_data>: c124a388: 55 push %ebp c124a389: 89 e5 mov %esp,%ebp c124a38b: 57 push %edi c124a38c: 56 push %esi c124a38d: 89 d6 mov %edx,%esi c124a38f: 53 push %ebx c124a390: 31 db xor %ebx,%ebx c124a392: 83 ec 03 sub $0x3,%esp ... c124a3e3: 83 c4 03 add $0x3,%esp c124a3e6: 5b pop %ebx c124a3e7: 5e pop %esi c124a3e8: 5f pop %edi c124a3e9: 5d pop %ebp c124a3ea: c3 ret If an interrupt occurs in such a function, the regs on the stack will be unaligned, which breaks the frame pointer encoding assumption. So on 32-bit, use the MSB instead of the LSB to encode the regs. This isn't an issue on 64-bit, because interrupts align the stack before writing to it. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10x86/unwind: Fix dereference of untrusted pointerJosh Poimboeuf
Tetsuo Handa and Fengguang Wu reported a panic in the unwinder: BUG: unable to handle kernel NULL pointer dereference at 000001f2 IP: update_stack_state+0xd4/0x340 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be67 #592 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 task: bb0b53c0 task.stack: bb3ac000 EIP: update_stack_state+0xd4/0x340 EFLAGS: 00010002 CPU: 0 EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570 ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: fffe0ff0 DR7: 00000400 Call Trace: ? unwind_next_frame+0xea/0x400 ? __unwind_start+0xf5/0x180 ? __save_stack_trace+0x81/0x160 ? save_stack_trace+0x20/0x30 ? __lock_acquire+0xfa5/0x12f0 ? lock_acquire+0x1c2/0x230 ? tick_periodic+0x3a/0xf0 ? _raw_spin_lock+0x42/0x50 ? tick_periodic+0x3a/0xf0 ? tick_periodic+0x3a/0xf0 ? debug_smp_processor_id+0x12/0x20 ? tick_handle_periodic+0x23/0xc0 ? local_apic_timer_interrupt+0x63/0x70 ? smp_trace_apic_timer_interrupt+0x235/0x6a0 ? trace_apic_timer_interrupt+0x37/0x3c ? strrchr+0x23/0x50 Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83 EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f CR2: 00000000000001f2 ---[ end trace 0d147fd4aba8ff50 ]--- Kernel panic - not syncing: Fatal exception in interrupt On x86-32, after decoding a frame pointer to get a regs address, regs_size() dereferences the regs pointer when it checks regs->cs to see if the regs are user mode. This is dangerous because it's possible that what looks like a decoded frame pointer is actually a corrupt value, and we don't want the unwinder to make things worse. Instead of calling regs_size() on an unsafe pointer, just assume they're kernel regs to start with. Later, once it's safe to access the regs, we can do the user mode check and corresponding safety check for the remaining two regs. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 5ed8d8bb38c5 ("x86/unwind: Move common code into update_stack_state()") Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10locking/paravirt: Use new static key for controlling call of virt_spin_lock()Juergen Gross
There are cases where a guest tries to switch spinlocks to bare metal behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this has the downside of falling back to unfair test and set scheme for qspinlocks due to virt_spin_lock() detecting the virtualized environment. Add a static key controlling whether virt_spin_lock() should be called or not. When running on bare metal set the new key to false. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: hpa@zytor.com Cc: jeremy@goop.org Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170906173625.18158-2-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-09x86/mm/64: Fix reboot interaction with CR4.PCIDEAndy Lutomirski
Trying to reboot via real mode fails with PCID on: long mode cannot be exited while CR4.PCIDE is set. (No, I have no idea why, but the SDM and actual CPUs are in agreement here.) The result is a GPF and a hang instead of a reboot. I didn't catch this in testing because neither my computer nor my VM reboots this way. I can trigger it with reboot=bios, though. Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems") Reported-and-tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/f1e7d965998018450a7a70c2823873686a8b21c0.1507524746.git.luto@kernel.org
2017-10-05x86/mce: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Adjust sanity-check WARN to make sure the triggering timer matches the current CPU timer. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20171005005425.GA23950@beast
2017-10-05x86/mce: Hide mca_cfgBorislav Petkov
Now that lguest is gone, put it in the internal header which should be used only by MCA/RAS code. Add missing header guards while at it. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
2017-10-05x86/intel_rdt: Remove redundant assignmentJithu Joseph
The assignment to the 'files' variable is immediately overwritten in the following line. Remove the older assignment, which was meant specifially for creating control groups files. Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Reported-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: tony.luck@intel.com Cc: vikas.shivappa@intel.com Link: https://lkml.kernel.org/r/1507157337-18118-1-git-send-email-jithu.joseph@intel.com
2017-10-05x86/intel_rdt/cqm: Make integer rmid_limbo_count staticColin Ian King
rmid_limbo_count is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'rmid_limbo_count' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20171002145931.27479-1-colin.king@canonical.com
2017-10-04kvm/x86: Avoid async PF preempting the kernel incorrectlyBoqun Feng
Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call schedule() to reschedule in some cases. This could result in accidentally ending the current RCU read-side critical section early, causing random memory corruption in the guest, or otherwise preempting the currently running task inside between preempt_disable and preempt_enable. The difficulty to handle this well is because we don't know whether an async PF delivered in a preemptible section or RCU read-side critical section for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock() are both no-ops in that case. To cure this, we treat any async PF interrupting a kernel context as one that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing the schedule() path in that case. To do so, a second parameter for kvm_async_pf_task_wait() is introduced, so that we know whether it's called from a context interrupting the kernel, and the parameter is set properly in all the callsites. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-03kprobes/x86: Remove IRQ disabling from jprobe handlersMasami Hiramatsu
Jprobes actually don't need to disable IRQs while calling handlers, because of how we specify the kernel interface in Documentation/kprobes.txt: ----- Probe handlers are run with preemption disabled. Depending on the architecture and optimization state, handlers may also run with interrupts disabled (e.g., kretprobe handlers and optimized kprobe handlers run without interrupt disabled on x86/x86-64). ----- So let's remove IRQ disabling from jprobes too. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150701508194.32266.14458959863314097305.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-03kprobes/x86: Set up frame pointer in kprobe trampolineJosh Poimboeuf
Richard Weinberger saw an unwinder warning when running bcc's opensnoop: WARNING: kernel stack frame pointer at ffff99ef4076bea0 in opensnoop:2008 has bad value 0000000000000008 unwind stack type:0 next_sp: (null) mask:0x2 graph_idx:0 ... ffff99ef4076be88: ffff99ef4076bea0 (0xffff99ef4076bea0) ffff99ef4076be90: ffffffffac442721 (optimized_callback +0x81/0x90) ... A lockdep stack trace was initiated from inside a kprobe handler, when the unwinder noticed a bad frame pointer on the stack. The bad frame pointer is related to the fact that the kprobe optprobe trampoline doesn't save the frame pointer before calling into optimized_callback(). Reported-and-tested-by: Richard Weinberger <richard@sigma-star.at> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/7aef2f8ecd75c2f505ef9b80490412262cf4a44c.1507038547.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-03x86/boot: Spell out "boot CPU" for BPJean Delvare
It's not obvious to everybody that BP stands for boot processor. At least it was not for me. And BP is also a CPU register on x86, so it is ambiguous. Spell out "boot CPU" everywhere instead. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This contains the following fixes and improvements: - Avoid dereferencing an unprotected VMA pointer in the fault signal generation code - Fix inline asm call constraints for GCC 4.4 - Use existing register variable to retrieve the stack pointer instead of forcing the compiler to create another indirect access which results in excessive extra 'mov %rsp, %<dst>' instructions - Disable branch profiling for the memory encryption code to prevent an early boot crash - Fix a sparse warning caused by casting the __user annotation in __get_user_asm_u64() away - Fix an off by one error in the loop termination of the error patch in the x86 sysfs init code - Add missing CPU IDs to various Intel specific drivers to enable the functionality on recent hardware - More (init) constification in the numachip code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Use register variable to get stack pointer value x86/mm: Disable branch profiling in mem_encrypt.c x86/asm: Fix inline asm call constraints for GCC 4.4 perf/x86/intel/uncore: Correct num_boxes for IIO and IRP perf/x86/intel/rapl: Add missing CPU IDs perf/x86/msr: Add missing CPU IDs perf/x86/intel/cstate: Add missing CPU IDs x86: Don't cast away the __user in __get_user_asm_u64() x86/sysfs: Fix off-by-one error in loop termination x86/mm: Fix fault error path using unsafe vma pointer x86/numachip: Add const and __initconst to numachip2_clockevent
2017-09-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Mixed bugfixes. Perhaps the most interesting one is a latent bug that was finally triggered by PCID support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm/x86: Handle async PF in RCU read-side critical sections KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume KVM: VMX: use cmpxchg64 KVM: VMX: simplify and fix vmx_vcpu_pi_load KVM: VMX: avoid double list add with VT-d posted interrupts KVM: VMX: extract __pi_post_block KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
2017-09-29x86/stacktrace: Avoid recording save_stack_trace() wrappersVlastimil Babka
The save_stack_trace() and save_stack_trace_tsk() wrappers of __save_stack_trace() add themselves to the call stack, and thus appear in the recorded stacktraces. This is redundant and wasteful when we have limited space to record the useful part of the backtrace with e.g. page_owner functionality. Fix this by making sure __save_stack_trace() is noinline (which matches the current gcc decision) and bumping the skip in the wrappers (save_stack_trace_tsk() only when called for the current task). This is similar to what was done for arm in 3683f44c42e9 ("ARM: stacktrace: avoid listing stacktrace functions in stacktrace") and is pending for arm64. Also make sure that __save_stack_trace_reliable() doesn't get this problem in the future by marking it __always_inline (which matches current gcc decision), per Josh Poimboeuf. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929092335.2744-1-vbabka@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29x86/asm: Use register variable to get stack pointer valueAndrey Ryabinin
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") ... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions: -mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d> +sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a> Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29kvm/x86: Handle async PF in RCU read-side critical sectionsBoqun Feng
Sasha Levin reported a WARNING: | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329 | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 ... | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS | 1.10.1-1ubuntu1 04/01/2014 | Call Trace: ... | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline] | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458 | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002 | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000 | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410 | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001 | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08 | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040 | __schedule+0x201/0x2240 kernel/sched/core.c:3292 | schedule+0x113/0x460 kernel/sched/core.c:3421 | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158 | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271 | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069 | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996 | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283 | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670 | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140 | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000 | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8 | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000 | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136 | sprintf+0xbe/0xf0 lib/vsprintf.c:2386 | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23 | get_link fs/namei.c:1047 [inline] | link_path_walk+0x1041/0x1490 fs/namei.c:2127 ... This happened when the host hit a page fault, and delivered it as in an async page fault, while the guest was in an RCU read-side critical section. The guest then tries to reschedule in kvm_async_pf_task_wait(), but rcu_preempt_note_context_switch() would treat the reschedule as a sleep in RCU read-side critical section, which is not allowed (even in preemptible RCU). Thus the WARN. To cure this, make kvm_async_pf_task_wait() go to the halt path if the PF happens in a RCU read-side critical section. Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-28x86/apic: Fix spelling mistake: "symmectic" -> "symmetric"Colin Ian King
Trivial fix to spelling mistakes in pr_info messages Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/20170927102223.31920-1-colin.king@canonical.com
2017-09-28x86/head: Add unwind hint annotationsJosh Poimboeuf
Jiri Slaby reported an ORC issue when unwinding from an idle task. The stack was: ffffffff811083c2 do_idle+0x142/0x1e0 ffffffff8110861d cpu_startup_entry+0x5d/0x60 ffffffff82715f58 start_kernel+0x3ff/0x407 ffffffff827153e8 x86_64_start_kernel+0x14e/0x15d ffffffff810001bf secondary_startup_64+0x9f/0xa0 The ORC unwinder errored out at secondary_startup_64 because the head code isn't annotated yet so there wasn't a corresponding ORC entry. Fix that and any other head-related unwinding issues by adding unwind hints to the head code. Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/78ef000a2f68f545d6eef44ee912edceaad82ccf.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28x86/boot: Annotate verify_cpu() as a callable functionJosh Poimboeuf
verify_cpu() is a callable function. Annotate it as such. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/293024b8a080832075312f38c07ccc970fc70292.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28x86/head: Fix head ELF function annotationsJosh Poimboeuf
These functions aren't callable C-type functions, so don't annotate them as such. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/36eb182738c28514f8bf95e403d89b6413a88883.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28x86/head: Remove unused 'bad_address' codeJosh Poimboeuf
It's no longer possible for this code to be executed, so remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/32a46fe92d2083700599b36872b26e7dfd7b7965.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28x86/head: Remove confusing commentJosh Poimboeuf
This comment is actively wrong and confusing. It refers to the registers' stack offsets after the pt_regs has been constructed on the stack, but this code is *before* that. At this point the stack just has the standard iret frame, for which no comment should be needed. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a3c267b770fc56c9b86df9c11c552848248aace2.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobesMasami Hiramatsu
Kkprobes don't need to disable IRQs if they are called from the ftrace/jump trampoline code, because Documentation/kprobes.txt says: ----- Probe handlers are run with preemption disabled. Depending on the architecture and optimization state, handlers may also run with interrupts disabled (e.g., kretprobe handlers and optimized kprobe handlers run without interrupt disabled on x86/x86-64). ----- So let's remove IRQ disabling from those handlers. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150581534039.32348.11331736206004264553.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28kprobes/x86: Disable preemption in ftrace-based jprobesMasami Hiramatsu
Disable preemption in ftrace-based jprobe handlers as described in Documentation/kprobes.txt: "Probe handlers are run with preemption disabled." This will fix jprobes behavior when CONFIG_PREEMPT=y. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150581530024.32348.9863783558598926771.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28kprobes/x86: Disable preemption in optprobeMasami Hiramatsu
Disable preemption in optprobe handler as described in Documentation/kprobes.txt, which says: "Probe handlers are run with preemption disabled." Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150581525942.32348.6359217983269060829.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28kprobes/x86: Move the get_kprobe_ctlblk() into irq-disabled blockMasami Hiramatsu
Since get_kprobe_ctlblk() accesses per-cpu variables which calls smp_processor_id(), it must be called under preempt-disabled or irq-disabled. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150581517952.32348.2655896843219158446.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28kprobes/x86: Remove addressof() operatorsMasami Hiramatsu
The following commit: 54a7d50b9205 ("x86: mark kprobe templates as character arrays, not single characters") changed optprobe_template_* to arrays, so we can remove the addressof() operators from those symbols. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150304469798.17009.15886717935027472863.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-28kprobes/x86: Make insn buffer always ROX and use text_poke()Masami Hiramatsu
Make insn buffer always ROX and use text_poke() to write the copied instructions instead of set_memory_*(). This makes instruction buffer stronger against other kernel subsystems because there is no window time to modify the buffer. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150304463032.17009.14195368040691676813.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-27x86/intel_rdt: Add diagnostics when making directoriesTony Luck
Mostly this is about running out of RMIDs or CLOSIDs. Other errors are various internal errors. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vikas Shivappa <vikas.shivappa@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/027cf1ffb3a3695f2d54525813a1d644887353cf.1506382469.git.tony.luck@intel.com