summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2017-08-31Merge branch 'x86/mm' into x86/platform, to pick up TLB flush dependencyIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)Vitaly Kuznetsov
There's a subtle bug in how some of the paravirt guest code handles page table freeing on x86: On x86 software page table walkers depend on the fact that remote TLB flush does an IPI: walk is performed lockless but with interrupts disabled and in case the page table is freed the freeing CPU will get blocked as remote TLB flush is required. On other architectures which don't require an IPI to do remote TLB flush we have an RCU-based mechanism (see include/asm-generic/tlb.h for more details). In virtualized environments we may want to override the ->flush_tlb_others callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has been doing this for years and the upcoming remote TLB flush for Hyper-V will do it too. This is not safe, as software page table walkers may step on an already freed page. Fix the bug by enabling the RCU-based page table freeing mechanism, CONFIG_HAVE_RCU_TABLE_FREE=y. Testing with kernbench and mmap/munmap microbenchmarks, and neither showed any noticeable performance impact. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com [ Rewrote/fixed/clarified the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31x86/idt: Remove the tracing IDT leftoversThomas Gleixner
Stephen reported a merge conflict with the XEN tree. That also shows that the IDT cleanup forgot to remove the now unused trace_{trap} defines. Remove them. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com>
2017-08-29Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Hide set_intr_gate()Thomas Gleixner
set_intr_gate() is an internal function of the IDT code. The only user left is the KVM code which replaces the pagefault handler eventually. Provide an explicit update_intr_gate() function and make set_intr_gate() static. While at it replace the magic number 14 in the KVM code with the proper trap define. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.663008004@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Deinline setup functionsThomas Gleixner
None of this is performance sensitive in any way - so debloat the kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.502052875@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Remove unused functions/inlinesThomas Gleixner
The IDT related inlines are not longer used. Remove them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.422083717@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Move APIC gate initialization to tablesThomas Gleixner
Replace the APIC/SMP vector gate initialization with the table based mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.260177013@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Move regular trap init to tablesThomas Gleixner
Initialize the regular traps with a table. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.182128165@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Move IST stack based traps to table initThomas Gleixner
Initialize the IST based traps via a table. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.091328949@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Move debug stack init to table basedThomas Gleixner
Add the debug_idt init table and make use of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.006502252@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Move early IDT setup out of 32-bit asmThomas Gleixner
The early IDT setup can be done in C code like it's done on 64-bit kernels. Reuse the 64-bit version. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.757980775@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Move early IDT handler setup to IDT codeThomas Gleixner
The early IDT handler setup is done in C entry code on 64-bit kernels and in ASM entry code on 32-bit kernels. Move the 64-bit variant to the IDT code so it can be shared with 32-bit in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.679561404@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Consolidate IDT invalidationThomas Gleixner
kexec and reboot have both code to invalidate IDT. Create a common function and use it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.600953282@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Remove unused set_trap_gate()Thomas Gleixner
This inline is not used at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.522053134@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/ldttss: Clean up 32-bit descriptorsThomas Gleixner
Like the IDT descriptors, the LDT/TSS descriptors are pointlessly different on 32 and 64 bit kernels. Unify them and get rid of the duplicated code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.289634692@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/gdt: Use bitfields for initializationThomas Gleixner
The GDT entry related code uses two ways to access entries via union fields: - bitfields - macros which initialize the two 16-bit parts of the entry by magic shift and mask operations. Clean it up and only use the bitfields to initialize and access entries. ( The old access patterns were partly done due to GCC optimizing bitfield accesses in a horrible way - that's mostly fixed these days and clarity of code in such low level accessors is very important. ) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.197673367@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/asm: Replace access to desc_struct:a/b fieldsThomas Gleixner
The union inside of desc_struct which allows access to the raw u32 parts of the descriptors. This raw access part is about to go away. Replace the few code parts which access those fields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.120214366@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Unify gate_struct handling for 32/64-bit kernelsThomas Gleixner
The first 32 bits of gate struct are the same for 32 and 64 bit kernels. The 32-bit version uses desc_struct and no designated data structure, so we need different accessors for 32 and 64 bit kernels. Aside of that the macros which are necessary to build the 32-bit gate descriptor are horrible to read. Unify the gate structs and switch all code fiddling with it over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/tracing: Build tracepoints only when they are usedThomas Gleixner
The tracepoint macro magic emits code for all tracepoints in a event header file. That code stays around even if the tracepoint is not used at all. The linker does not discard it. Build the various irq_vector tracepoints dependent on the appropriate CONFIG switches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.770651777@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/irq_work: Make it depend on APICThomas Gleixner
The irq work interrupt vector is only installed when CONFIG_X86_LOCAL_APIC is enabled, but the interrupt handler is compiled in unconditionally. Compile the cruft out when the APIC is disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.691909010@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/ipi: Make platform IPI depend on APICThomas Gleixner
The platform IPI vector is only installed when the local APIC is enabled. All users of it depend on the local APIC anyway. Make the related code conditional on CONFIG_X86_LOCAL_APIC=y. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.615286163@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/tracing: Disentangle pagefault and resched IPI tracing keyThomas Gleixner
The pagefault and the resched IPI handler are the only ones where it is worth to optimize the code further in case tracepoints are disabled. But it makes no sense to have a single static key for both. Seperate the static keys so the facilities are handled seperately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.536699116@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Clean up the i386 low level entry macrosThomas Gleixner
Some of the entry function defines for i386 were explictely using the BUILD_INTERRUPT3() macro to prevent that the extra trace entry got added via BUILD_INTERRUPT(). No that the trace cruft is gone, the file can be cleaned up and converted to use BUILD_INTERRUPT() which avoids the ugly line breaks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.456815006@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Remove the tracing IDT completelyThomas Gleixner
No more users of the tracing IDT. All exception tracepoints have been moved into the regular handlers. Get rid of the mess which shouldn't have been created in the first place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.378851687@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/smp: Use static key for reschedule interrupt tracingThomas Gleixner
It's worth to avoid the extra irq_enter()/irq_exit() pair in the case that the reschedule interrupt tracepoints are disabled. Use the static key which indicates that exception tracing is enabled. For now this key is global. It will be optimized in a later step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064957.299808677@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/smp: Remove pointless duplicated interrupt codeThomas Gleixner
Two NOP5s are really a good tradeoff vs. the unholy IDT switching mess, which duplicates code all over the place. The rescheduling interrupt gets optimized in a later step. Make the ordering of function call and statistics increment the same as in other places. Calculate stats first, then do the function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.222101344@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/mce: Remove duplicated tracing interrupt codeThomas Gleixner
Machine checks are not really high frequency events. The extra two NOP5s for the disabled tracepoints are noise vs. the heavy lifting which needs to be done in the MCE handler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.144301907@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/irqwork: Get rid of duplicated tracing interrupt codeThomas Gleixner
Two NOP5s are a reasonable tradeoff to avoid duplicated code and the requirement to switch the IDT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.064746737@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/apic: Remove the duplicated tracing versions of interruptsThomas Gleixner
The error and the spurious interrupt are really rare events and not at all performance sensitive: two NOP5s can be tolerated when tracing is disabled. Remove the complication. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064956.986009402@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/irq: Get rid of duplicated trace_x86_platform_ipi() codeThomas Gleixner
Two NOP5s are really a good tradeoff vs. the unholy IDT switching mess, which duplicates code all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064956.907209383@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/apic: Remove the duplicated tracing version of local_timer_interrupt()Thomas Gleixner
The two NOP5s are noise in the rest of the work which is done by the timer interrupt and modern CPUs are pretty good in optimizing NOPs anyway. Get rid of the interrupt handler duplication and move the tracepoints into the regular handler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170828064956.751247330@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/traps: Simplify pagefault tracing logicThomas Gleixner
Make use of the new irqvector tracing static key and remove the duplicated trace_do_pagefault() implementation. If irq vector tracing is disabled, then the overhead of this is a single NOP5, which is a reasonable tradeoff to avoid duplicated code and the unholy macro mess. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.672965407@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/tracing: Introduce a static key for exception tracingThomas Gleixner
Switching the IDT just for avoiding tracepoints creates a completely impenetrable macro/inline/ifdef mess. There is no point in avoiding tracepoints for most of the traps/exceptions. For the more expensive tracepoints, like pagefaults, this can be handled with an explicit static key. Preparatory patch to remove the tracing IDT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.593094539@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/irq: Remove duplicated used_vectors definitionThomas Gleixner
Also remove the unparseable comment in the other place while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.436711634@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/irq: Get rid of the 'first_system_vector' indirection bogosityThomas Gleixner
This variable is beyond pointless. Nothing allocates a vector via alloc_gate() below FIRST_SYSTEM_VECTOR. So nothing can change first_system_vector. If there is a need for a gate below FIRST_SYSTEM_VECTOR then it can be added to the vector defines and FIRST_SYSTEM_VECTOR can be adjusted accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.357109735@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/irq: Remove vector_used_by_percpu_irq()Thomas Gleixner
Last user (lguest) is gone. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064956.201432430@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29Merge branch 'linus' into x86/apic, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-28Merge 4.13-rc7 into char-misc-nextGreg Kroah-Hartman
We want the binder fix in here as well for testing and merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: one for an ldt_struct handling bug and a cherry-picked objtool fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix use-after-free of ldt_struct objtool: Fix '-mtune=atom' decoding support in objtool 2.0
2017-08-26Merge branch 'linus' into x86/mm to pick up fixes and to fix conflictsIngo Molnar
Conflicts: arch/x86/kernel/head64.c arch/x86/mm/mmap.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25futex: Remove duplicated code and fix undefined behaviourJiri Slaby
There is code duplicated over all architecture's headers for futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, and comparison of the result. Remove this duplication and leave up to the arches only the needed assembly which is now in arch_futex_atomic_op_inuser. This effectively distributes the Will Deacon's arm64 fix for undefined behaviour reported by UBSAN to all architectures. The fix was done in commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump. And as suggested by Thomas, check for negative oparg too, because it was also reported to cause undefined behaviour report. Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: remove pointless access_ok() checks") as access_ok there returns true. We introduce it back to the helper for the sake of simplicity (it gets optimized away anyway). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64] Cc: linux-mips@linux-mips.org Cc: Rich Felker <dalias@libc.org> Cc: linux-ia64@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: peterz@infradead.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sparclinux@vger.kernel.org Cc: Jonas Bonn <jonas@southpole.se> Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-hexagon@vger.kernel.org Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: linux-snps-arc@lists.infradead.org Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-xtensa@linux-xtensa.org Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: openrisc@lists.librecores.org Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Stafford Horne <shorne@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Richard Henderson <rth@twiddle.net> Cc: Chris Zankel <chris@zankel.net> Cc: Michal Simek <monstr@monstr.eu> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-parisc@vger.kernel.org Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: linux-alpha@vger.kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
2017-08-25Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25x86/mm: Fix use-after-free of ldt_structEric Biggers
The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v4.6+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.statePaolo Bonzini
The host pkru is restored right after vcpu exit (commit 1be0e61), so KVM_GET_XSAVE will return the host PKRU value instead. Fix this by using the guest PKRU explicitly in fill_xsave and load_xsave. This part is based on a patch by Junkang Fu. The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state, because it could have been changed by userspace since the last time it was saved, so skip loading it in kvm_load_guest_fpu. Reported-by: Junkang Fu <junkang.fjk@alibaba-inc.com> Cc: Yang Zhang <zy107165@alibaba-inc.com> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25KVM: x86: simplify handling of PKRUPaolo Bonzini
Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a simple comparison against the host value of the register. The write of PKRU in addition can be skipped if the guest has not enabled the feature. Once we do this, we need not test OSPKE in the host anymore, because guest_CR4.PKE=1 implies host_CR4.PKE=1. The static PKU test is kept to elide the code on older CPUs. Suggested-by: Yang Zhang <zy107165@alibaba-inc.com> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25Merge branch 'x86/asm' into x86/apicThomas Gleixner
Pick up dependent changes to avoid merge conflicts
2017-08-24KVM: X86: Fix loss of exception which has not yet been injectedWanpeng Li
vmx_complete_interrupts() assumes that the exception is always injected, so it can be dropped by kvm_clear_exception_queue(). However, an exception cannot be injected immediately if it is: 1) originally destined to a nested guest; 2) trapped to cause a vmexit; 3) happening right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true. This patch applies to exceptions the same algorithm that is used for NMIs, replacing exception.reinject with "exception.injected" (equivalent to nmi_injected). exception.pending now represents an exception that is queued and whose side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet. If exception.pending is true, the exception might result in a nested vmexit instead, too (in which case the side effects must not be applied). exception.injected instead represents an exception that is going to be injected into the guest at the next vmentry. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24KVM: MMU: Expose the LA57 feature to VM.Yu Zhang
This patch exposes 5 level page table feature to the VM. At the same time, the canonical virtual address checking is extended to support both 48-bits and 57-bits address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>