summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2016-07-15x86/smp: Remove unnecessary initialization of thread_info::cpuAndy Lutomirski
It's statically initialized to zero -- no need to dynamically initialize it to zero as well. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6cf6314dce3051371a913ee19d1b88e29c68c560.1468527351.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15x86/smp: Remove stack_smp_processor_id()Andy Lutomirski
It serves no purpose -- raw_smp_processor_id() works fine. This change will be needed to move thread_info off the stack. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15x86/uaccess: Move thread_info::addr_limit to thread_structAndy Lutomirski
struct thread_info is a legacy mess. To prepare for its partial removal, move thread_info::addr_limit out. As an added benefit, this way is simpler. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15x86/dumpstack: When OOPSing, rewind the stack before do_exit()Andy Lutomirski
If we call do_exit() with a clean stack, we greatly reduce the risk of recursive oopses due to stack overflow in do_exit, and we allow do_exit to work even if we OOPS from an IST stack. The latter gives us a much better chance of surviving long enough after we detect a stack overflow to write out our logs. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/32f73ceb372ec61889598da5e5b145889b9f2e19.1468527351.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15x86/dumpstack/64: Handle faults when printing the "Stack: " part of an OOPSAndy Lutomirski
If we overflow the stack into a guard page, we'll recursively fault when trying to dump the contents of the guard page. Use probe_kernel_address() so we can recover if this happens. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e626d47a55d7b04dcb1b4d33faa95e8505b217c8.1468527351.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15x86/dumpstack: Try harder to get a call trace on stack overflowAndy Lutomirski
If we overflow the stack, print_context_stack() will abort. Detect this case and rewind back into the valid part of the stack so that we can trace it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ee1690eb2715ccc5dc187fde94effa4ca0ccbbcd.1468527351.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14x86/reboot: Add Dell Optiplex 7450 AIO reboot quirkAlex Hung
Dell Optiplex 7450 AIO works with BOOT_ACPI; however, the quirk for "OptiPlex 745" changes its boot method to BOOT_BIOS and causes 7450 AIO hangs when rebooting; as a result, 7450 AIO is appended to overwrite BOOT_BIOS by BOOT_ACPI in order not to break the original 745 series Signed-off-by: Alex Hung <alex.hung@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14x86: Audit and remove any remaining unnecessary uses of module.hPaul Gortmaker
Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. In the case of some of these which are modular, we can extend that to also include files that are building basic support functionality but not related to loading or registering the final module; such files also have no need whatsoever for module.h The advantage in removing such instances is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each instance for the presence of either and replace as needed. In the case of crypto/glue_helper.c we delete a redundant instance of MODULE_LICENSE in order to delete module.h -- the license info is already present at the top of the file. The uncore change warrants a mention too; it is uncore.c that uses module.h and not uncore.h; hence the relocation done there. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14x86/kernel: Audit and remove any unnecessary uses of module.hPaul Gortmaker
Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. The advantage in doing so is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance for the presence of either and replace as needed. Build testing revealed some implicit header usage that was fixed up accordingly. Note that some bool/obj-y instances remain since module.h is the header for some exception table entry stuff, and for things like __init_or_module (code that is tossed when MODULES=n). Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14x86: Don't use module.h just for AUTHOR / LICENSE tagsPaul Gortmaker
The Kconfig controlling compilation of these files are: arch/x86/Kconfig.debug:config DEBUG_RODATA_TEST arch/x86/Kconfig.debug: bool "Testcase for the marking rodata read-only" arch/x86/Kconfig.debug:config X86_PTDUMP_CORE arch/x86/Kconfig.debug: def_bool n ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. We delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160714001901.31603-2-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14Merge branch 'x86/platform' into x86/headers, to apply dependent patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14x86/hpet: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153335.279718463@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-13Merge branch 'timers/core' into smp/hotplug to pick up dependenciesThomas Gleixner
2016-07-11x86/tsc: Enumerate BXT tsc_khz via CPUIDLen Brown
Hard code the BXT crystal clock (aka ART - Always Running Timer) to 19.200 MHz, and use CPUID leaf 0x15 to determine the BXT TSC frequency. Use tsc_khz to sanity check BXT cpu_khz, which can be erroneous in some configurations. (I simplified the original patch from Bin Gao.) Original-From: Bin Gao <bin.gao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/bf4e7c175acd6d09719c47c319b10ff1f0627ff8.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUIDLen Brown
Skylake CPU base-frequency and TSC frequency may differ by up to 2%. Enumerate CPU and TSC frequencies separately, allowing cpu_khz and tsc_khz to differ. The existing CPU frequency calibration mechanism is unchanged. However, CPUID extensions are preferred, when available. CPUID.0x16 is preferred over MSR and timer calibration for CPU frequency discovery. CPUID.0x15 takes precedence over CPU-frequency for TSC frequency discovery. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/b27ec289fd005833b27d694d9c2dbb716c5cdff7.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11x86/tsc_msr: Remove irqoff around MSR-based TSC enumerationLen Brown
Remove the irqoff/irqon around MSR-based TSC enumeration, as it is not necessary. Also rename: try_msr_calibrate_tsc() to cpu_khz_from_msr(), as that better describes what the routine does. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/a6b5c3ecd3b068175d2309599ab28163fc34215e.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11x86/fpu/xstate: Re-enable XSAVESYu-cheng Yu
We did not handle XSAVES instructions correctly. There were issues in converting between standard and compacted format when interfacing with user-space. These issues have been corrected. Add a WARN_ONCE() to make it clear that XSAVES supervisor states are not yet implemented. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <h.peter.anvin@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468253937-40008-5-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11x86/fpu/xstate: Fix fpstate_init() for XRSTORSYu-cheng Yu
In XSAVES mode if fpstate_init() is used to initialize a task's extended state area, xsave.header.xcomp_bv[63] must be set. Otherwise, when the task is scheduled, a warning is triggered from copy_kernel_to_xregs(). One such test case is: setting an invalid extended state through PTRACE. When xstateregs_set() rejects the syscall and re-initializes the task's extended state area. This triggers the warning mentioned above. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <h.peter.anvin@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468253937-40008-4-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11x86/fpu/xstate: Return NULL for disabled xstate component addressYu-cheng Yu
It is an error to request a disabled XSAVE/XSAVES component address. For that case, make __raw_xsave_addr() return a NULL and issue a warning. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <h.peter.anvin@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468253937-40008-3-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-11x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVESYu-cheng Yu
When the kernel is using XSAVES compacted format, we cannot do __copy_from_user() from a signal frame, which has standard-format data. Fix it by using copyin_to_xsaves(), which converts between formats and filters out all supervisor states that we do not allow userspace to write. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <h.peter.anvin@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1468253937-40008-2-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86"Ingo Molnar
This reverts commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a. Stephane reported the following regression: > Since Andi added: > > commit 2c95afc1e83d93fac3be6923465e1753c2c53b0a > Author: Andi Kleen <ak@linux.intel.com> > Date: Thu Jun 9 06:14:38 2016 -0700 > > perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 > > $ perf stat -e ref-cycles ls > <not counted> .... > > fails systematically because the ref-cycles is now used by the > watchdog and given this is a system-wide pinned event, it monopolizes > the fixed counter 2 which is the only counter able to measure this event. Since the next merge window is near, fix the regression for now by reverting the commit. Reported-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/quirks: Add early quirk to reset Apple AirPort cardLukas Wunner
The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ #17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/quirks: Reintroduce scanning of secondary busesLukas Wunner
We used to scan secondary buses until the following commit that was applied in 2009: 8659c406ade3 ("x86: only scan the root bus in early PCI quirks") which commit constrained early quirks to the root bus only. Its motivation was to prevent application of the nvidia_bugs quirk on secondary buses. We're about to add a quirk to reset the Broadcom 4331 wireless card on 2011/2012 Macs, which is located on a secondary bus behind a PCIe root port. To facilitate that, reintroduce scanning of secondary buses. The commit message of 8659c406ade3 notes that scanning only the root bus "saves quite some unnecessary scanning work". The algorithm used prior to 8659c406ade3 was particularly time consuming because it scanned buses 0 to 31 brute force. To avoid lengthening boot time, employ a recursive strategy which only scans buses that are actually reachable from the root bus. Yinghai Lu pointed out that the secondary bus number read from a bridge's config space may be invalid, in particular a value of 0 would cause an infinite loop. The PCI core goes beyond that and recurses to a child bus only if its bus number is greater than the parent bus number (see pci_scan_bridge()). Since the root bus is numbered 0, this implies that secondary buses may not be 0. Do the same on early scanning. If this algorithm is found to significantly impact boot time or cause infinite loops on broken hardware, it would be possible to limit its recursion depth: The Broadcom 4331 quirk applies at depth 1, all others at depth 0, so the bus need not be scanned deeper than that for now. An alternative approach would be to revert to scanning only the root bus, and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12 and 8086:1e16. Apple always positioned the card behind either of these three ports. The quirk would then check presence of the card in slot 0 below the root port and do its deed. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/quirks: Apply nvidia_bugs quirk only on root busLukas Wunner
Since the following commit: 8659c406ade3 ("x86: only scan the root bus in early PCI quirks") ... early quirks are only applied to devices on the root bus. The motivation was to prevent application of the nvidia_bugs quirk on secondary buses. We're about to reintroduce scanning of secondary buses for a quirk to reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent regressions, open code the requirement to apply nvidia_bugs only on the root bus. Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/fpu/xstate: Fix xstate_offsets, xstate_sizes for non-extended xstatesYu-cheng Yu
The arrays xstate_offsets[] and xstate_sizes[] record XSAVE standard- format offsets and sizes. Values for non-extended state components fpu and xmm's were not initialized or used. Ptrace format conversion needs them. Fix it. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cf3ea36cf30e2a99e37da6483e65446d018ff0a7.1466179491.git.yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/fpu/xstate: Fix XSTATE component offset print outYu-cheng Yu
Component offset print out was incorrect for XSAVES. Correct it and move to a separate function. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/86602a8ac400626c6eca7125c3e15934866fc38e.1466179491.git.yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/fpu/xstate: Fix PTRACE frames for XSAVESYu-cheng Yu
XSAVES uses compacted format and is a kernel instruction. The kernel should use standard-format, non-supervisor state data for PTRACE. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> [ Edited away artificial linebreaks. ] Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/de3d80949001305fe389799973b675cab055c457.1466179491.git.yu-cheng.yu@intel.com [ Made various readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/fpu/xstate: Fix supervisor xstate component offsetYu-cheng Yu
CPUID function 0x0d, sub function (i, i > 1) returns in ebx the offset of xstate component i. Zero is returned for a supervisor state. A supervisor state can only be saved by XSAVES and XSAVES uses a compacted format. There is no fixed offset for a supervisor state. This patch checks and makes sure a supervisor state offset is not recorded or mis-used. This has no effect in practice as we currently use no supervisor states, but it would be good to fix. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/81b29e40d35d4cec9f2511a856fe769f34935a3f.1466179491.git.yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/fpu/xstate: Align xstate components according to CPUIDYu-cheng Yu
CPUID function 0x0d, sub function (i, i > 1) returns in ecx[1] the alignment requirement of component 'i' when the compacted format is used. If ecx[1] is 0, component 'i' is located immediately following the preceding component. If ecx[1] is 1, component 'i' is located on the next 64-byte boundary following the preceding component. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/331e2bef1a0a7a584f06adde095b6bbfbe166472.1466179491.git.yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10Merge branch 'linus' into x86/fpu, to pick up fixes before applying new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/tsc_msr: Add Airmont reference clock valuesLen Brown
per the Intel 64 and IA-32 Architecture Software Developer's Manual... Add the reference clock for Intel Atom Processors Based on the Airmont Microarchitecture. Reported-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/abc6a0f4b18281410da1a3f26e2819d8e03e144f.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/tsc_msr: Correct Silvermont reference clock valuesLen Brown
Atom processors use a 19.2 MHz crystal oscillator. Early processors generate 100 MHz via 19.2 MHz * 26 / 5 = 99.84 MHz. Later preocessor generate 100 MHz via 19.2 MHz * 125 / 24 = 100 MHz. Update the Silvermont-based tables accordingly, matching the Software Developers Manual. Also, correct a 166 MHz entry that should have been 116 MHz, and add a missing 80 MHz entry. Reported-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5d7561655dfb066ff10801b423405bae4d1cfbe2.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/tsc_msr: Update comments, expand definitionsLen Brown
Syntax only, no functional change. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8653a2dba21fef122fc7b29eafb750e2004d3976.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/tsc_msr: Remove debugging messagesLen Brown
Debugging messages are not necessary after all of the possible hardware failures that never occur. Instead, this code can simply return 0. This code also doesn't need to print in the success case. tsc_init() already prints the TSC frequency, and apic=debug is available if anybody really is interested in printing the LAPIC frequency. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/cf03279a125b95dfa9b8d3d5b4a66de09cd04050.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10x86/tsc_msr: Identify Intel-specific codeLen Brown
try_msr_calibrate_tsc() is currently Intel-specific, and should not execute on any other vendor's parts. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1fe23c052826bdcfeb3d45045aa02246078cb5a7.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-10Revert "x86/tsc: Add missing Cherrytrail frequency to the table"Len Brown
This reverts commit: e2724e9d9692 ("x86/tsc: Add missing Cherrytrail frequency to the table") ... as it is incomplete, and is replaced by a more complete patch later in this series. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/2199d0e959f7f71a18827268b5d060f8d3831639.1466138954.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-09Merge branch 'linus' into x86/asm, to pick up fixes before merging new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes: - A boot crash fix with certain configs - a MAINTAINERS entry update - Documentation typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Documentation: Fix various typos in Documentation/x86/ files x86/amd_nb: Fix boot crash on non-AMD systems MAINTAINERS: Update the Calgary IOMMU entry
2016-07-08x86/mm: Enable KASLR for physical mapping memory regionsThomas Garnier
Add the physical mapping in the list of randomized memory regions. The physical memory mapping holds most allocations from boot and heap allocators. Knowing the base address and physical memory size, an attacker can deduce the PDE virtual address for the vDSO memory page. This attack was demonstrated at CanSecWest 2016, in the following presentation: "Getting Physical: Extreme Abuse of Intel Based Paged Systems": https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf (See second part of the presentation). The exploits used against Linux worked successfully against 4.6+ but fail with KASLR memory enabled: https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits Similar research was done at Google leading to this patch proposal. Variants exists to overwrite /proc or /sys objects ACLs leading to elevation of privileges. These variants were tested against 4.6+. The page offset used by the compressed kernel retains the static value since it is not yet randomized during this boot stage. Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/mm: Implement ASLR for kernel memory regionsThomas Garnier
Randomizes the virtual address space of kernel memory regions for x86_64. This first patch adds the infrastructure and does not randomize any region. The following patches will randomize the physical memory mapping, vmalloc and vmemmap regions. This security feature mitigates exploits relying on predictable kernel addresses. These addresses can be used to disclose the kernel modules base addresses or corrupt specific structures to elevate privileges bypassing the current implementation of KASLR. This feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option. The order of each memory region is not changed. The feature looks at the available space for the regions based on different configuration options and randomizes the base and space between each. The size of the physical memory mapping is the available physical memory. No performance impact was detected while testing the feature. Entropy is generated using the KASLR early boot functions now shared in the lib directory (originally written by Kees Cook). Randomization is done on PGD & PUD page table levels to increase possible addresses. The physical memory mapping code was adapted to support PUD level virtual addresses. This implementation on the best configuration provides 30,000 possible virtual addresses in average for each memory region. An additional low memory page is used to ensure each CPU can start with a PGD aligned virtual address (for realmode). x86/dump_pagetable was updated to correctly display each region. Updated documentation on x86_64 memory layout accordingly. Performance data, after all patches in the series: Kernbench shows almost no difference (-+ less than 1%): Before: Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695) User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9 (13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636) After: Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636) User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095 (12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11) Hackbench shows 0% difference on average (hackbench 90 repeated 10 times): attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068 5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065 10,0.068,0.071 average,0.0677,0.0677 Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08Merge branch 'x86/mm' into x86/boot, to pick up dependenciesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/dumpstack: Add show_stack_regs() and use itBorislav Petkov
Add a helper to dump supplied pt_regs and use it in the MSR exception handling code to have precise stack traces pointing to the actual function causing the MSR access exception and not the stack frame of the exception handler itself. The new output looks like this: unchecked MSR access error: RDMSR from 0xdeadbeef at rIP: 0xffffffff8102ddb6 (early_init_intel+0x16/0x3a0) 00000000756e6547 ffffffff81c03f68 ffffffff81dd0940 ffffffff81c03f10 ffffffff81d42e65 0000000001000000 ffffffff81c03f58 ffffffff81d3e5a3 0000800000000000 ffffffff81800080 ffffffffffffffff 0000000000000000 Call Trace: [<ffffffff81d42e65>] early_cpu_init+0xe7/0x136 [<ffffffff81d3e5a3>] setup_arch+0xa5/0x9df [<ffffffff81d38bb9>] start_kernel+0x9f/0x43a [<ffffffff81d38294>] x86_64_start_reservations+0x2f/0x31 [<ffffffff81d383fe>] x86_64_start_kernel+0x168/0x176 Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1467671487-10344-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/dumpstack: Honor supplied @regs argAndy Lutomirski
The comment suggests that show_stack(NULL, NULL) should backtrace the current context, but the code doesn't match the comment. If regs are given, start the "Stack:" hexdump at regs->sp. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1467671487-10344-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/efcd79bf4106d61f1cd258c2caa87f3a0618eeac.1466036668.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/microcode: Remove unused symbol exportsBorislav Petkov
It is not a module anymore and those can be retracted. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160704170551.GC7261@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08Merge tag 'v4.7-rc6' into x86/microcode, to pick up fixes before merging new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/mce: Fix mce_rdmsrl() warning messageBorislav Petkov
The MSR address we're dumping in there should be in hex, otherwise we get funsies like: [ 0.016000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:428 mce_rdmsrl+0xd9/0xe0 [ 0.016000] mce: Unable to read msr -1073733631! ^^^^^^^^^^^ Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1467968983-4874-5-git-send-email-bp@alien8.de [ Fixed capitalization of 'MSR'. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/mce/AMD: Increase size of the bank_map typeAravind Gopalakrishnan
Change bank_map type from 'char' to 'int' since we now have more than eight banks in a system. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1467968983-4874-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08Merge tag 'v4.7-rc6' into ras/core, to pick up fixes before merging new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07timers, x86/mce: Initialize MCE restart timer as pinnedThomas Gleixner
Pinned timers must carry the pinned attribute in the timer structure itself, so convert the code to the new API. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.215783439@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07timers, x86/apic/uv: Initialize the UV heartbeat timer as pinnedThomas Gleixner
Pinned timers must carry the pinned attribute in the timer structure itself, so convert the code to the new API. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160704094341.133837204@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>