summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-02-21extable: Make init_kernel_text() globalJosh Poimboeuf
Convert init_kernel_text() to a global function and use it in a few places instead of manually comparing _sinittext and _einittext. Note that kallsyms.h has a very similar function called is_kernel_inittext(), but its end check is inclusive. I'm not sure whether that's intentional behavior, so I didn't touch it. Suggested-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4335d02be8d45ca7d265d2f174251d0b7ee6c5fd.1519051220.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Open-code switch_to_thread_stack()Dominik Brodowski
Open-code the two instances which called switch_to_thread_stack(). This allows us to remove the wrapper around DO_SWITCH_TO_THREAD_STACK. While at it, update the UNWIND hint to reflect where the IRET frame is, and update the commentary to reflect what we are actually doing here. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-7-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move ASM_CLAC to interrupt_entry()Dominik Brodowski
Moving ASM_CLAC to interrupt_entry means two instructions (addq / pushq and call interrupt_entry) are not covered by it. However, it offers a noticeable size reduction (-.2k): text data bss dec hex filename 16882 0 0 16882 41f2 entry_64.o-orig 16623 0 0 16623 40ef entry_64.o Suggested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-6-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Remove 'interrupt' macroDominik Brodowski
It is now trivial to call interrupt_entry() and then the actual worker. Therefore, remove the interrupt macro and open code it all. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()Dominik Brodowski
We can also move the CLD, SWAPGS, and the switch_to_thread_stack() call to the interrupt_entry() helper function. As we do not want call depths of two, convert switch_to_thread_stack() to a macro. However, switch_to_thread_stack() has another user in entry_64_compat.S, which currently expects it to be a function. To keep the code changes in this patch minimal, create a wrapper function. The switch to a macro means that there is some binary code duplication if CONFIG_IA32_EMULATION=y is enabled. Therefore, the size reduction differs whether CONFIG_IA32_EMULATION is enabled or not: CONFIG_IA32_EMULATION=y (-0.13k): text data bss dec hex filename 17158 0 0 17158 4306 entry_64.o-orig 17028 0 0 17028 4284 entry_64.o CONFIG_IA32_EMULATION=n (-0.27k): text data bss dec hex filename 17158 0 0 17158 4306 entry_64.o-orig 16882 0 0 16882 41f2 entry_64.o Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entryDominik Brodowski
Moving the switch to IRQ stack from the interrupt macro to the helper function requires some trickery: All ENTER_IRQ_STACK really cares about is where the "original" stack -- meaning the GP registers etc. -- is stored. Therefore, we need to offset the stored RSP value by 8 whenever ENTER_IRQ_STACK is called from within a function. In such cases, and after switching to the IRQ stack, we need to push the "original" return address (i.e. the return address from the call to the interrupt entry function) to the IRQ stack. This trickery allows us to carve another .85k from the text size (it would be more except for the additional unwind hints): text data bss dec hex filename 18006 0 0 18006 4656 entry_64.o-orig 17158 0 0 17158 4306 entry_64.o Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper functionDominik Brodowski
The PUSH_AND_CLEAR_REGS macro is able to insert the GP registers "above" the original return address. This allows us to move a sizeable part of the interrupt entry macro to an interrupt entry helper function: text data bss dec hex filename 21088 0 0 21088 5260 entry_64.o-orig 18006 0 0 18006 4656 entry_64.o Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPPIngo Molnar
firmware_restrict_branch_speculation_*() recently started using preempt_enable()/disable(), but those are relatively high level primitives and cause build failures on some 32-bit builds. Since we want to keep <asm/nospec-branch.h> low level, convert them to macros to avoid header hell... Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arjan.van.de.ven@intel.com Cc: bp@alien8.de Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/mm: Optimize boot-time paging mode switching costKirill A. Shutemov
By this point we have functioning boot-time switching between 4- and 5-level paging mode. But naive approach comes with cost. Numbers below are for kernel build, allmodconfig, 5 times. CONFIG_X86_5LEVEL=n: Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs): 17308719.892691 task-clock:u (msec) # 26.772 CPUs utilized ( +- 0.11% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 331,993,164 page-faults:u # 0.019 M/sec ( +- 0.01% ) 43,614,978,867,455 cycles:u # 2.520 GHz ( +- 0.01% ) 39,371,534,575,126 stalled-cycles-frontend:u # 90.27% frontend cycles idle ( +- 0.09% ) 28,363,350,152,428 instructions:u # 0.65 insn per cycle # 1.39 stalled cycles per insn ( +- 0.00% ) 6,316,784,066,413 branches:u # 364.948 M/sec ( +- 0.00% ) 250,808,144,781 branch-misses:u # 3.97% of all branches ( +- 0.01% ) 646.531974142 seconds time elapsed ( +- 1.15% ) CONFIG_X86_5LEVEL=y: Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs): 17411536.780625 task-clock:u (msec) # 26.426 CPUs utilized ( +- 0.10% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 331,868,663 page-faults:u # 0.019 M/sec ( +- 0.01% ) 43,865,909,056,301 cycles:u # 2.519 GHz ( +- 0.01% ) 39,740,130,365,581 stalled-cycles-frontend:u # 90.59% frontend cycles idle ( +- 0.05% ) 28,363,358,997,959 instructions:u # 0.65 insn per cycle # 1.40 stalled cycles per insn ( +- 0.00% ) 6,316,784,937,460 branches:u # 362.793 M/sec ( +- 0.00% ) 251,531,919,485 branch-misses:u # 3.98% of all branches ( +- 0.00% ) 658.886307752 seconds time elapsed ( +- 0.92% ) The patch tries to fix the performance regression by using cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in all hot code paths. These will statically patch the target code for additional performance. CONFIG_X86_5LEVEL=y + the patch: Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs): 17381990.268506 task-clock:u (msec) # 26.907 CPUs utilized ( +- 0.19% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 331,862,625 page-faults:u # 0.019 M/sec ( +- 0.01% ) 43,697,726,320,051 cycles:u # 2.514 GHz ( +- 0.03% ) 39,480,408,690,401 stalled-cycles-frontend:u # 90.35% frontend cycles idle ( +- 0.05% ) 28,363,394,221,388 instructions:u # 0.65 insn per cycle # 1.39 stalled cycles per insn ( +- 0.00% ) 6,316,794,985,573 branches:u # 363.410 M/sec ( +- 0.00% ) 251,013,232,547 branch-misses:u # 3.97% of all branches ( +- 0.01% ) 645.991174661 seconds time elapsed ( +- 1.19% ) Unfortunately, this approach doesn't help with text size: vmlinux.before .text size: 8190319 vmlinux.after .text size: 8200623 The .text section is increased by about 4k. Not sure if we can do anything about this. Signed-off-by: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/mm: Redefine some of page table helpers as macrosKirill A. Shutemov
This is preparation for the next patch, which would change pgtable_l5_enabled to be cpu_feature_enabled(X86_FEATURE_LA57). The change makes few helpers in paravirt.h dependent on cpu_feature_enabled() definition from cpufeature.h. And cpufeature.h is dependent on paravirt.h. Let's re-define some of helpers as macros to break this dependency loop. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180216114948.68868-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/xen: Allow XEN_PV and XEN_PVH to be enabled with X86_5LEVELKirill A. Shutemov
With boot-time switching between paging modes, XEN_PV and XEN_PVH can be boot into 4-level paging mode. Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180216114948.68868-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()Arnd Bergmann
GCC-8 shows a warning for the x86 oprofile code that copies per-CPU data from CPU 0 to all other CPUs, which when building a non-SMP kernel turns into a memcpy() with identical source and destination pointers: arch/x86/oprofile/nmi_int.c: In function 'mux_clone': arch/x86/oprofile/nmi_int.c:285:2: error: 'memcpy' source argument is the same as destination [-Werror=restrict] memcpy(per_cpu(cpu_msrs, cpu).multiplex, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ per_cpu(cpu_msrs, 0).multiplex, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sizeof(struct op_msr) * model->num_virt_counters); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/oprofile/nmi_int.c: In function 'nmi_setup': arch/x86/oprofile/nmi_int.c:466:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] arch/x86/oprofile/nmi_int.c:470:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] I have analyzed a number of such warnings now: some are valid and the GCC warning is welcome. Others turned out to be false-positives, and GCC was changed to not warn about those any more. This is a corner case that is a false-positive but the GCC developers feel it's better to keep warning about it. In this case, it seems best to work around it by telling GCC a little more clearly that this code path is never hit with an IS_ENABLED() configuration check. Cc:stable as we also want old kernels to build cleanly with GCC-8. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Sebor <msebor@gcc.gnu.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oprofile-list@lists.sf.net Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180220205826.2008875-1-arnd@arndb.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84095 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()Peter Zijlstra
This is boot code and thus Spectre-safe: we run this _way_ before userspace comes along to have a chance to poison our branch predictor. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/boot, objtool: Annotate indirect jump in secondary_startup_64()Peter Zijlstra
The objtool retpoline validation found this indirect jump. Seeing how it's on CPU bringup before we run userspace it should be safe, annotate it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/paravirt, objtool: Annotate indirect callsPeter Zijlstra
Paravirt emits indirect calls which get flagged by objtool retpoline checks, annotate it away because all these indirect calls will be patched out before we start userspace. This patching happens through alternative_instructions() -> apply_paravirt() -> pv_init_ops.patch() which will eventually end up in paravirt_patch_default(). This function _will_ write direct alternatives. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21x86/speculation, objtool: Annotate indirect calls/jumps for objtoolPeter Zijlstra
Annotate the indirect calls/jumps in the CALL_NOSPEC/JUMP_NOSPEC alternatives. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/retpoline: Support retpoline builds with ClangDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arjan.van.de.ven@intel.com Cc: bp@alien8.de Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Link: http://lkml.kernel.org/r/1519037457-7643-5-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/speculation: Use IBRS if available before calling into firmwareDavid Woodhouse
Retpoline means the kernel is safe because it has no indirect branches. But firmware isn't, so use IBRS for firmware calls if it's available. Block preemption while IBRS is set, although in practice the call sites already had to be doing that. Ignore hpwdt.c for now. It's taking spinlocks and calling into firmware code, from an NMI handler. I don't want to touch that with a bargepole. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arjan.van.de.ven@intel.com Cc: bp@alien8.de Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Link: http://lkml.kernel.org/r/1519037457-7643-2-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20Revert "x86/retpoline: Simplify vmexit_fill_RSB()"David Woodhouse
This reverts commit 1dde7415e99933bb7293d6b2843752cbdb43ec11. By putting the RSB filling out of line and calling it, we waste one RSB slot for returning from the function itself, which means one fewer actual function call we can make if we're doing the Skylake abomination of call-depth counting. It also changed the number of RSB stuffings we do on vmexit from 32, which was correct, to 16. Let's just stop with the bikeshedding; it didn't actually *fix* anything anyway. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arjan.van.de.ven@intel.com Cc: bp@alien8.de Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Link: http://lkml.kernel.org/r/1519037457-7643-4-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86-64/realmode: Add instruction suffixJan Beulich
Omitting suffixes from instructions in AT&T mode is bad practice when operand size cannot be determined by the assembler from register operands, and is likely going to be warned about by upstream GAS in the future (mine does already). Add the single missing suffix here. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5A8AF5F602000078001A9230@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/LDT: Avoid warning in 32-bit builds with older gccJan Beulich
BUG() doesn't always imply "no return", and hence should be followed by a return statement even if that's obviously (to a human) unreachable. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5A8AF2AA02000078001A91E9@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/IO-APIC: Avoid warning in 32-bit buildsJan Beulich
Constants wider than 32 bits should be tagged with ULL. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5A8AF23F02000078001A91E5@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/asm: Improve how GEN_*_SUFFIXED_RMWcc() specify clobbersJan Beulich
Commit: df3405245a ("x86/asm: Add suffix macro for GEN_*_RMWcc()") ... introduced "suffix" RMWcc operations, adding bogus clobber specifiers: For one, on x86 there's no point explicitly clobbering "cc". In fact, with GCC properly fixed, this results in an overlap being detected by the compiler between outputs and clobbers. Furthermore it seems bad practice to me to have clobber specification and use of the clobbered register(s) disconnected - it should rather be at the invocation place of that GEN_{UN,BIN}ARY_SUFFIXED_RMWcc() macros that the clobber is specified which this particular invocation needs. Drop the "cc" clobber altogether and move the "cx" one to refcount.h. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5A8AF1F802000078001A91E1@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/mm: Remove stale comment about KMEMCHECKJann Horn
This comment referred to a conditional call to kmemcheck_hide() that was here until commit 4950276672fc ("kmemcheck: remove annotations"). Now that kmemcheck has been removed, it doesn't make sense anymore. Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180219175039.253089-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/mm: Fix {pmd,pud}_{set,clear}_flags()Jan Beulich
Just like pte_{set,clear}_flags() their PMD and PUD counterparts should not do any address translation. This was outright wrong under Xen (causing a dead boot with no useful output on "suitable" systems), and produced needlessly more complicated code (even if just slightly) when paravirt was enabled. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/5A8AF1BB02000078001A91C3@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>KarimAllah Ahmed
... since u64 has a hidden header dependency that was not there before using it (i.e. it breaks our VMM build). Also, __u64 is the right way to expose data types through UAPI. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: devel@linuxdriverproject.org Fixes: 93286261 ("x86/hyperv: Reenlightenment notifications support") Link: http://lkml.kernel.org/r/1519112391-23773-1-git-send-email-karahmed@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-19um: Use POSIX ucontext_t instead of struct ucontextKrzysztof Mazur
glibc 2.26 removed the 'struct ucontext' to "improve" POSIX compliance and break programs, including User Mode Linux. Fix User Mode Linux by using POSIX ucontext_t. This fixes: arch/um/os-Linux/signal.c: In function 'hard_handler': arch/um/os-Linux/signal.c:163:22: error: dereferencing pointer to incomplete type 'struct ucontext' mcontext_t *mc = &uc->uc_mcontext; arch/x86/um/stub_segv.c: In function 'stub_segv_handler': arch/x86/um/stub_segv.c:16:13: error: dereferencing pointer to incomplete type 'struct ucontext' &uc->uc_mcontext); Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-02-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 Kconfig fixes from Thomas Gleixner: "Three patchlets to correct HIGHMEM64G and CMPXCHG64 dependencies in Kconfig when CPU selections are explicitely set to M586 or M686" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group x86/Kconfig: Add missing i586-class CPUs to the X86_CMPXCHG64 Kconfig group
2018-02-17x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specifiedBaoquan He
Currently the kdump kernel becomes very slow if 'noapic' is specified. Normal kernel doesn't have this bug. Kernel parameter 'noapic' is used to disable IO-APIC in system for testing or special purpose. Here the root cause is that in kdump kernel LAPIC is disabled since commit: 522e664644 ("x86/apic: Disable I/O APIC before shutdown of the local APIC") In this case we need set up through-local-APIC on boot CPU in setup_local_APIC(). In normal kernel the legacy irq mode is enabled by the BIOS. If it is virtual wire mode, the local-APIC has been enabled and set as through-local-APIC. Though we fixed the regression introduced by commit 522e664644, to further improve robustness set up the through-local-APIC mode explicitly, do not rely on the default boot IRQ mode. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: uobergfe@redhat.com Link: http://lkml.kernel.org/r/20180214054656.3780-7-bhe@redhat.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/apic: Rename variables and functions related to x86_io_apic_opsBaoquan He
The names of x86_io_apic_ops and its two member variables are misleading: The ->read() member is to read IO_APIC reg, while ->disable() which is called by native_disable_io_apic()/irq_remapping_disable_io_apic() is actually used to restore boot IRQ mode, not to disable the IO-APIC. So rename x86_io_apic_ops to 'x86_apic_ops' since it doesn't only handle the IO-APIC, but also the local APIC. Also rename its member variables and the related callbacks. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: uobergfe@redhat.com Link: http://lkml.kernel.org/r/20180214054656.3780-6-bhe@redhat.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/apic: Remove the (now) unused disable_IO_APIC() functionBaoquan He
No one uses it anymore. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: uobergfe@redhat.com Link: http://lkml.kernel.org/r/20180214054656.3780-5-bhe@redhat.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdumpBaoquan He
This is a regression fix. Before, to fix erratum AVR31, the following commit: 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC") ... moved the lapic_shutdown() call to after disable_IO_APIC() in the reboot and kexec/kdump code paths. This introduced the following regression: disable_IO_APIC() not only clears the IO-APIC, but it also restores boot IRQ mode by setting the LAPIC/APIC/IMCR, calling lapic_shutdown() after disable_IO_APIC() will disable LAPIC and ruin the possible virtual wire mode setting which the code has been trying to do all along. The consequence is that a KVM guest kernel always prints the warning below during kexec/kdump as the kernel boots up: [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330 [ ........] [ 0.001000] Call Trace: [ 0.001000] apic_bsp_setup+0x56/0x74 [ 0.001000] x86_late_time_init+0x11/0x16 [ 0.001000] start_kernel+0x3c9/0x486 [ 0.001000] secondary_startup_64+0xa5/0xb0 [ ........] [ 0.001000] masked ExtINT on CPU#0 To fix this, just call clear_IO_APIC() to stop the IO-APIC where disable_IO_APIC() was called, and call restore_boot_irq_mode() to restore boot IRQ mode before a reboot or a kexec/kdump jump. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: stable@vger.kernel.org Cc: uobergfe@redhat.com Fixes: commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC") Link: http://lkml.kernel.org/r/20180214054656.3780-4-bhe@redhat.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=yBaoquan He
Split following patches disable_IO_APIC() will be broken up into clear_IO_APIC() and restore_boot_irq_mode(). These two functions will be called separately where they are needed to fix a regression introduced by: 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC"). While the CONFIG_KEXEC_JUMP=y code doesn't call lapic_shutdown() before jump like kexec/kdump, so it's not impacted by commit 522e66464467. Hence here change clear_IO_APIC() as public, and replace disable_IO_APIC() with clear_IO_APIC() and restore_boot_irq_mode() to keep CONFIG_KEXEC_JUMP=y code unchanged in essence. No functional change. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: uobergfe@redhat.com Link: http://lkml.kernel.org/r/20180214054656.3780-3-bhe@redhat.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()Baoquan He
This is a preparation patch. Split out the code which restores boot irq mode from disable_IO_APIC() into the new restore_boot_irq_mode() function. No functional changes. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: douly.fnst@cn.fujitsu.com Cc: joro@8bytes.org Cc: prarit@redhat.com Cc: uobergfe@redhat.com Link: http://lkml.kernel.org/r/20180214054656.3780-2-bhe@redhat.com [ Build fix for !CONFIG_IO_APIC and rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/entry/64: Use 'xorl' for faster register clearingDominik Brodowski
On some x86 CPU microarchitectures using 'xorq' to clear general-purpose registers is slower than 'xorl'. As 'xorl' is sufficient to clear all 64 bits of these registers due to zero-extension [*], switch the x86 64-bit entry code to use 'xorl'. No change in functionality and no change in code size. [*] According to Intel 64 and IA-32 Architecture Software Developer's Manual, section 3.4.1.1, the result of 32-bit operands are "zero- extended to a 64-bit result in the destination general-purpose register." The AMD64 Architecture Programmer’s Manual Volume 3, Appendix B.1, describes the same behaviour. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180214175924.23065-3-linux@dominikbrodowski.net [ Improved on the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/entry: Reduce the code footprint of the 'idtentry' macroDominik Brodowski
Play a little trick in the generic PUSH_AND_CLEAR_REGS macro to insert the GP registers "above" the original return address. This allows us to (re-)insert the macro in error_entry() and paranoid_entry() and to remove it from the idtentry macro. This reduces the static footprint significantly: text data bss dec hex filename 24307 0 0 24307 5ef3 entry_64.o-orig 20987 0 0 20987 51fb entry_64.o Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180214175924.23065-2-linux@dominikbrodowski.net [ Small tweaks to comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/xen: Calculate __max_logical_packages on PV domainsPrarit Bhargava
The kernel panics on PV domains because native_smp_cpus_done() is only called for HVM domains. Calculate __max_logical_packages for PV domains. Fixes: b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages estimate") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Tested-and-reported-by: Simon Gaiser <simon@invisiblethingslab.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: xen-devel@lists.xenproject.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-02-17x86/CPU: Check CPU feature bits after microcode upgradeBorislav Petkov
With some microcode upgrades, new CPUID features can become visible on the CPU. Check what the kernel has mirrored now and issue a warning hinting at possible things the user/admin can do to make use of the newly visible features. Originally-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180216112640.11554-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/CPU: Add a microcode loader callbackBorislav Petkov
Add a callback function which the microcode loader calls when microcode has been updated to a newer revision. Do the callback only when no error was encountered during loading. Tested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180216112640.11554-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-17x86/microcode: Propagate return value from updating functionsBorislav Petkov
... so that callers can know when microcode was updated and act accordingly. Tested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180216112640.11554-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=yKirill A. Shutemov
All pieces of the puzzle are in place and we can now allow to boot with CONFIG_X86_5LEVEL=y on a machine without LA57 support. Kernel will detect that LA57 is missing and fold p4d at runtime. Update the documentation and the Kconfig option description to reflect the change. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Replace compile-time checks for 5-level paging with runtime-time checksKirill A. Shutemov
This patch converts the of CONFIG_X86_5LEVEL check to runtime checks for p4d folding. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Fold p4d page table layer at runtimeKirill A. Shutemov
Change page table helpers to fold p4d at runtime. The logic is the same as in <asm-generic/pgtable-nop4d.h>. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Support boot-time switching of paging modes in the early boot codeKirill A. Shutemov
Early boot code should be able to initialize page tables for both 4- and 5-level paging modes. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Initialize vmemmap_base at boot-timeKirill A. Shutemov
vmemmap area has different placement depending on paging mode. Let's adjust it during early boot accodring to machine capability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Adjust vmalloc base and size at boot-timeKirill A. Shutemov
vmalloc area has different placement and size depending on paging mode. Let's adjust it during early boot accodring to machine capability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Initialize 'page_offset_base' at boot-timeKirill A. Shutemov
For 4- and 5-level paging we have different 'page_offset_base'. Let's initialize it at boot-time accordingly to machine capability. We also have to split __PAGE_OFFSET_BASE into two constants -- for 4- and 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Initialize 'pgdir_shift' and 'ptrs_per_p4d' at boot-timeKirill A. Shutemov
Switching between paging modes requires the folding of the p4d page table level when we only have 4 paging levels, which means we need to adjust 'pgdir_shift' and 'ptrs_per_p4d' during early boot according to paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Initialize 'pgtable_l5_enabled' at boot-timeKirill A. Shutemov
'pgtable_l5_enabled' indicates which paging mode we are using. We need to initialize it at boot-time according to machine capability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/apic: Make setup_local_APIC() staticDou Liyang
This function isn't used outside of apic.c, so let's mark it static. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bhe@redhat.com Cc: ebiederm@xmission.com Link: http://lkml.kernel.org/r/20180214062554.21020-1-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>