summaryrefslogtreecommitdiff
path: root/arch/x86/boot/compressed
AgeCommit message (Collapse)Author
2017-09-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "PCID support, 5-level paging support, Secure Memory Encryption support The main changes in this cycle are support for three new, complex hardware features of x86 CPUs: - Add 5-level paging support, which is a new hardware feature on upcoming Intel CPUs allowing up to 128 PB of virtual address space and 4 PB of physical RAM space - a 512-fold increase over the old limits. (Supercomputers of the future forecasting hurricanes on an ever warming planet can certainly make good use of more RAM.) Many of the necessary changes went upstream in previous cycles, v4.14 is the first kernel that can enable 5-level paging. This feature is activated via CONFIG_X86_5LEVEL=y - disabled by default. (By Kirill A. Shutemov) - Add 'encrypted memory' support, which is a new hardware feature on upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system RAM to be encrypted and decrypted (mostly) transparently by the CPU, with a little help from the kernel to transition to/from encrypted RAM. Such RAM should be more secure against various attacks like RAM access via the memory bus and should make the radio signature of memory bus traffic harder to intercept (and decrypt) as well. This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled by default. (By Tom Lendacky) - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a hardware feature that attaches an address space tag to TLB entries and thus allows to skip TLB flushing in many cases, even if we switch mm's. (By Andy Lutomirski) All three of these features were in the works for a long time, and it's coincidence of the three independent development paths that they are all enabled in v4.14 at once" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits) x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) x86/mm: Use pr_cont() in dump_pagetable() x86/mm: Fix SME encryption stack ptr handling kvm/x86: Avoid clearing the C-bit in rsvd_bits() x86/CPU: Align CR3 defines x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages acpi, x86/mm: Remove encryption mask from ACPI page protection type x86/mm, kexec: Fix memory corruption with SME on successive kexecs x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y x86/mm: Allow userspace have mappings above 47-bit x86/mm: Prepare to expose larger address space to userspace x86/mpx: Do not allow MPX if we have mappings above 47-bit x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit() x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD x86/mm/dump_pagetables: Fix printout of p4d level x86/mm/dump_pagetables: Generalize address normalization x86/boot: Fix memremap() related build failure ...
2017-09-04Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The main changes are KASL related fixes and cleanups: in particular we now exclude certain physical memory ranges as KASLR randomization targets that have proven to be unreliable (early-)RAM on some firmware versions" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor x86/boot/KASLR: Rename process_e820_entry() into process_mem_region() x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry() x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()
2017-09-04Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: - Introduce the ORC unwinder, which can be enabled via CONFIG_ORC_UNWINDER=y. The ORC unwinder is a lightweight, Linux kernel specific debuginfo implementation, which aims to be DWARF done right for unwinding. Objtool is used to generate the ORC unwinder tables during build, so the data format is flexible and kernel internal: there's no dependency on debuginfo created by an external toolchain. The ORC unwinder is almost two orders of magnitude faster than the (out of tree) DWARF unwinder - which is important for perf call graph profiling. It is also significantly simpler and is coded defensively: there has not been a single ORC related kernel crash so far, even with early versions. (knock on wood!) But the main advantage is that enabling the ORC unwinder allows CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel measurably: With frame pointers disabled, GCC does not have to add frame pointer instrumentation code to every function in the kernel. The kernel's .text size decreases by about 3.2%, resulting in better cache utilization and fewer instructions executed, resulting in a broad kernel-wide speedup. Average speedup of system calls should be roughly in the 1-3% range - measurements by Mel Gorman [1] have shown a speedup of 5-10% for some function execution intense workloads. The main cost of the unwinder is that the unwinder data has to be stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel config - which is a modest cost on modern x86 systems. Given how young the ORC unwinder code is it's not enabled by default - but given the performance advantages the plan is to eventually make it the default unwinder on x86. See Documentation/x86/orc-unwinder.txt for more details. - Remove lguest support: its intended role was that of a temporary proof of concept for virtualization, plus its removal will enable the reduction (removal) of the paravirt API as well, so Rusty agreed to its removal. (Juergen Gross) - Clean up and fix FSGS related functionality (Andy Lutomirski) - Clean up IO access APIs (Andy Shevchenko) - Enhance the symbol namespace (Jiri Slaby) * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) objtool: Handle GCC stack pointer adjustment bug x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone() x86/fpu/math-emu: Add ENDPROC to functions x86/boot/64: Extract efi_pe_entry() from startup_64() x86/boot/32: Extract efi_pe_entry() from startup_32() x86/lguest: Remove lguest support x86/paravirt/xen: Remove xen_patch() objtool: Fix objtool fallthrough detection with function padding x86/xen/64: Fix the reported SS and CS in SYSCALL objtool: Track DRAP separately from callee-saved registers objtool: Fix validate_branch() return codes x86: Clarify/fix no-op barriers for text_poke_bp() x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs selftests/x86/fsgsbase: Test selectors 1, 2, and 3 x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common x86/asm: Fix UNWIND_HINT_REGS macro for older binutils x86/asm/32: Fix regs_get_register() on segment registers x86/xen/64: Rearrange the SYSCALL entries x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads ...
2017-08-31x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* ↵Naoya Horiguchi
and EFI_LOADER_* from KASLR's choice There's a potential bug in how we select the KASLR kernel address n the early boot code. The KASLR boot code currently chooses the kernel image's physical memory location from E820_TYPE_RAM regions by walking over all e820 entries. E820_TYPE_RAM includes EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA as well, so those regions can end up hosting the kernel image. According to the UEFI spec, all memory regions marked as EfiBootServicesCode and EfiBootServicesData are available as free memory after the first call to ExitBootServices(). I.e. so such regions should be usable for the kernel, per spec. In real life however, we have workarounds for broken x86 firmware, where we keep such regions reserved until SetVirtualAddressMap() is done. See the following code in should_map_region(): static bool should_map_region(efi_memory_desc_t *md) { ... /* * Map boot services regions as a workaround for buggy * firmware that accesses them even when they shouldn't. * * See efi_{reserve,free}_boot_services(). */ if (md->type =3D=3D EFI_BOOT_SERVICES_CODE || md->type =3D=3D EFI_BOOT_SERVICES_DATA) return false; This workaround suppressed a boot crash, but potential issues still remain because no one prevents the regions from overlapping with kernel image by KASLR. So let's make sure that EFI_BOOT_SERVICES_{CODE|DATA} regions are never chosen as kernel memory for the workaround to work fine. Furthermore, EFI_LOADER_{CODE|DATA} regions are also excluded because they can be used after ExitBootServices() as defined in EFI spec. As a result, we choose kernel address only from EFI_CONVENTIONAL_MEMORY which is the only memory type we know to be safely free. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Junichi Nomura <j-nomura@ce.jp.nec.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Link: http://lkml.kernel.org/r/20170828074444.GC23181@hori1.linux.bs1.fc.nec.co.jp [ Rewrote/fixed/clarified the changelog and the in code comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot: Prevent faulty bootparams.screeninfo from causing harmJan H. Schönherr
If a zero for the number of lines manages to slip through, scroll() may underflow some offset calculations, causing accesses outside the video memory. Make the check in __putstr() more pessimistic to prevent that. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot/64: Extract efi_pe_entry() from startup_64()Jiri Slaby
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into startup_64(). In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry() to start at 0x210. But this requirement was removed long time ago, in: 99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_64() into a separate function, we do so. We also annotate the function appropriatelly by ENTRY+ENDPROC. ABI offsets are preserved: 0000000000000000 T startup_32 0000000000000200 T startup_64 0000000000000390 T efi64_stub_entry On the top-level, it looked like: .org 0x200 ENTRY(startup_64) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr GLOBAL(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq preferred_addr(%rax), %rax jmp *%rax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_64) ENDPROC(startup_64) And it is now converted into: .org 0x200 ENTRY(startup_64) ... ; a lot of assembly (startup_64) ENDPROC(startup_64) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq startup_64(%rax), %rax jmp *%rax ENDPROC(efi_pe_entry) #endif Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot/32: Extract efi_pe_entry() from startup_32()Jiri Slaby
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days, we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start at 0x10. But this requirement was removed long time ago, in: 99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_32() into a separate function, we do so and we separate it to two functions as they are marked already: efi_pe_entry() + efi32_stub_entry(). We also annotate the functions appropriatelly by ENTRY+ENDPROC. ABI offset is preserved: 0000 128 FUNC GLOBAL DEFAULT 6 startup_32 0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry 00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry On the top-level, it looked like this: ENTRY(startup_32) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal preferred_addr(%eax), %eax jmp *%eax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_32) ENDPROC(startup_32) And it is now converted into: ENTRY(startup_32) ... ; a lot of assembly (startup_32) ENDPROC(startup_32) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENDPROC(efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal startup_32(%eax), %eax jmp *%eax ENDPROC(efi32_stub_entry) #endif Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-26Merge branch 'linus' into x86/mm to pick up fixes and to fix conflictsIngo Molnar
Conflicts: arch/x86/kernel/head64.c arch/x86/mm/mmap.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical addressBaoquan He
Currently KASLR will parse all e820 entries of RAM type and add all candidate positions into the slots array. After that we choose one slot randomly as the new position which the kernel will be decompressed into and run at. On systems with EFI enabled, e820 memory regions are coming from EFI memory regions by combining adjacent regions. These EFI memory regions have various attributes, and the "mirrored" attribute is one of them. The physical memory region whose descriptors in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored. The address range mirroring feature of the kernel arranges such mirrored regions into normal zones and other regions into movable zones. With the mirroring feature enabled, the code and data of the kernel can only be located in the more reliable mirrored regions. However, the current KASLR code doesn't check EFI memory entries, and could choose a new kernel position in non-mirrored regions. This will break the intended functionality of the address range mirroring feature. To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored region to process for adding candidate of randomization slot. If EFI is disabled or no mirrored region found, still process the e820 memory map. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: keescook@chromium.org Cc: linux-efi@vger.kernel.org Cc: matt@codeblueprint.co.uk Cc: n-horiguchi@ah.jp.nec.com Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com [ Rewrote most of the text. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptorBaoquan He
The existing map iteration helper for_each_efi_memory_desc_in_map can only be used after the kernel initializes the EFI subsystem to set up struct efi_memory_map. Before that we also need iterate map descriptors which are stored in several intermediate structures, like struct efi_boot_memmap for arch independent usage and struct efi_info for x86 arch only. Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and replace several places where that primitive is open coded. Signed-off-by: Baoquan He <bhe@redhat.com> [ Various improvements to the text. ] Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: keescook@chromium.org Cc: linux-efi@vger.kernel.org Cc: n-horiguchi@ah.jp.nec.com Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17Merge branch 'linus' into x86/boot, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-28x86/boot: Disable the address-of-packed-member compiler warningMatthias Kaehlcke
The clang warning 'address-of-packed-member' is disabled for the general kernel code, also disable it for the x86 boot code. This suppresses a bunch of warnings like this when building with clang: ./arch/x86/include/asm/processor.h:535:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value [-Waddress-of-packed-member] return this_cpu_read_stable(cpu_tss.x86_tss.sp0); ^~~~~~~~~~~~~~~~~~~ ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro 'this_cpu_read_stable' #define this_cpu_read_stable(var) percpu_stable_op("mov", var) ^~~ ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro 'percpu_stable_op' : "p" (&(var))); ^~~ Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: Doug Anderson <dianders@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18x86/mm: Provide general kernel support for memory encryptionTom Lendacky
Changes to the existing page table macros will allow the SME support to be enabled in a simple fashion with minimal changes to files that use these macros. Since the memory encryption mask will now be part of the regular pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization without the encryption mask before SME becomes active. Two new pgprot() macros are defined to allow setting or clearing the page encryption mask. The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does not support encryption for MMIO areas so this define removes the encryption mask from the page attribute. Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow creating a physical address with the encryption mask. These are used when working with the cr3 register so that the PGD can be encrypted. The current __va() macro is updated so that the virtual address is generated based off of the physical address without the encryption mask thus allowing the same virtual address to be generated regardless of whether encryption is enabled for that physical location or not. Also, an early initialization function is added for SME. If SME is active, this function: - Updates the early_pmd_flags so that early page faults create mappings with the encryption mask. - Updates the __supported_pte_mask to include the encryption mask. - Updates the protection_map entries to include the encryption mask so that user-space allocations will automatically have the encryption mask applied. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()Baoquan He
Now process_e820_entry() is not limited to e820 entry processing, rename it to process_mem_region(). And adjust the code comment accordingly. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: matt@codeblueprint.co.uk Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/1499603862-11516-4-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()Baoquan He
This makes process_e820_entry() be able to process any kind of memory region. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: matt@codeblueprint.co.uk Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/1499603862-11516-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18x86/boot/KASLR: Wrap e820 entries walking code into new function ↵Baoquan He
process_e820_entries() The original function process_e820_entry() only takes care of each e820 entry passed. And move the E820_TYPE_RAM checking logic into process_e820_entries(). And remove the redundent local variable 'addr' definition in find_random_phys_addr(). Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Cc: matt@codeblueprint.co.uk Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/1499603862-11516-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-12include/linux/string.h: add the option of fortified string.h functionsDaniel Micay
This adds support for compiling with a rough equivalent to the glibc _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer overflow checks for string.h functions when the compiler determines the size of the source or destination buffer at compile-time. Unlike glibc, it covers buffer reads in addition to writes. GNU C __builtin_*_chk intrinsics are avoided because they would force a much more complex implementation. They aren't designed to detect read overflows and offer no real benefit when using an implementation based on inline checks. Inline checks don't add up to much code size and allow full use of the regular string intrinsics while avoiding the need for a bunch of _chk functions and per-arch assembly to avoid wrapper overhead. This detects various overflows at compile-time in various drivers and some non-x86 core kernel code. There will likely be issues caught in regular use at runtime too. Future improvements left out of initial implementation for simplicity, as it's all quite optional and can be done incrementally: * Some of the fortified string functions (strncpy, strcat), don't yet place a limit on reads from the source based on __builtin_object_size of the source buffer. * Extending coverage to more string functions like strlcat. * It should be possible to optionally use __builtin_object_size(x, 1) for some functions (C strings) to detect intra-object overflows (like glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative approach to avoid likely compatibility issues. * The compile-time checks should be made available via a separate config option which can be enabled by default (or always enabled) once enough time has passed to get the issues it catches fixed. Kees said: "This is great to have. While it was out-of-tree code, it would have blocked at least CVE-2016-3858 from being exploitable (improper size argument to strlcpy()). I've sent a number of fixes for out-of-bounds-reads that this detected upstream already" [arnd@arndb.de: x86: fix fortified memcpy] Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de [keescook@chromium.org: avoid panic() in favor of BUG()] Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast [keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help] Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org Signed-off-by: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-03Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - Continued work to add support for 5-level paging provided by future Intel CPUs. In particular we switch the x86 GUP code to the generic implementation. (Kirill A. Shutemov) - Continued work to add PCID CPU support to native kernels as well. In this round most of the focus is on reworking/refreshing the TLB flush infrastructure for the upcoming PCID changes. (Andy Lutomirski)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Don't reenter flush_tlb_func_common() x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging x86/ftrace: Exclude functions in head64.c from function-tracing x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap x86/mm: Remove reset_lazy_tlbstate() x86/ldt: Simplify the LDT switching logic x86/boot/64: Put __startup_64() into .head.text x86/mm: Add support for 5-level paging for KASLR x86/mm: Make kernel_physical_mapping_init() support 5-level paging x86/mm: Add sync_global_pgds() for configuration with 5-level paging x86/boot/64: Add support of additional page table level during early boot x86/boot/64: Rename init_level4_pgt and early_level4_pgt x86/boot/64: Rewrite startup_64() in C x86/boot/compressed: Enable 5-level paging during decompression stage x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations x86/boot/efi: Cleanup initialization of GDT entries x86/asm: Fix comment in return_from_SYSCALL_64() x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ...
2017-07-03Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The main changes in this cycle were KASLR improvements for rare environments with special boot options, by Baoquan He. Also misc smaller changes/cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Extend the lower bound of crash kernel low reservations x86/boot: Remove unused copy_*_gs() functions x86/KASLR: Use the right memcpy() implementation Documentation/kernel-parameters.txt: Update 'memmap=' boot option description x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options x86/KASLR: Parse all 'memmap=' boot option entries
2017-06-30x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level pagingKirill A. Shutemov
KASLR uses hack to detect whether we booted via startup_32() or startup_64(): it checks what is loaded into cr3 and compares it to _pgtables. _pgtables is the array of page tables where early code allocates page table from. KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but that's not true if we booted with 5-level paging enabled. In this case top level page table is allocated separately and only the first p4d page table is allocated from the array. Let's modify the check to cover both 4- and 5-level paging cases. The patch also renames 'level4p' to 'top_level_pgt' as it now can hold page table for 4th or 5th level, depending on configuration. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170628121730.43079-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bugBaoquan He
Kernel text KASLR is separated into physical address and virtual address randomization. And for virtual address randomization, we only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE. So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR, but not the original kernel loading address 'output'. The bug will cause kernel boot failure if kernel is loaded at a different position than the address, 16M, which is decided at compiled time. Kexec/kdump is such practical case. To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial value. Tested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately") Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30x86/boot/KASLR: Add checking for the offset of kernel virtual address ↵Baoquan He
randomization For kernel text KASLR, the virtual address is confined to area of 1G, [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of virtual address randomization, we only randomize to get an offset between 16M and 1G, then add this offset to the starting address, 0xffffffff80000000. Here 16M is the offset which is decided at linking stage. So the amount of the local variable 'virt_addr' which respresents the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE. Add a debug check for the offset. If out of bounds, print error message and hang there. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/compressed: Enable 5-level paging during decompression stageKirill A. Shutemov
We need to cover two basic cases: when bootloader left us in 32-bit mode and when bootloader enabled long mode. The patch implements unified codepath to enabled 5-level paging for both cases. It means case when we start in 32-bit mode, we first enable long mode with 4-level and then switch over to 5-level paging. Switching from 4-level to 5-level paging is not trivial. We cannot do it directly. Setting LA57 in long mode would trigger #GP. So we need to switch off long mode first and the then re-enable with 5-level paging. NOTE: The need of switching off long mode means we are in trouble if bootloader put us above 4G boundary. If bootloader wants to boot 5-level paging kernel, it has to put kernel below 4G or enable 5-level paging on it's own, so we could avoid the step. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurationsKirill A. Shutemov
We would need to switch temporarily to compatibility mode during booting with 5-level paging enabled. It would require 32-bit code segment descriptor. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurationsKirill A. Shutemov
Define __KERNEL_CS GDT entry as long mode (.L=1, .D=0) on 64-bit configurations. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/efi: Cleanup initialization of GDT entriesKirill A. Shutemov
This is preparation for following patches without changing semantics of the code. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()Andy Lutomirski
The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-31x86/KASLR: Use the right memcpy() implementationArnd Bergmann
The decompressor has its own implementation of the string functions, but has to include the right header to get those, while implicitly including linux/string.h may result in a link error: arch/x86/boot/compressed/kaslr.o: In function `choose_random_location': kaslr.c:(.text+0xf51): undefined reference to `_mmx_memcpy' This has appeared now as KASLR started using memcpy(), via: d52e7d5a952c ("x86/KASLR: Parse all 'memmap=' boot option entries") Other files in the decompressor already do the same thing. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170530091446.1000183-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' ↵Baoquan He
boot options The 'mem=' boot option limits the max address a system can use - any memory region above the limit will be removed. Furthermore, the 'memmap=nn[KMG]' variant (with no offset specified) has the same behaviour as 'mem='. KASLR needs to consider this when choosing the random position for decompressing the kernel. Do it. Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: douly.fnst@cn.fujitsu.com Cc: dyoung@redhat.com Link: http://lkml.kernel.org/r/1494654390-23861-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24x86/KASLR: Parse all 'memmap=' boot option entriesBaoquan He
In commit: f28442497b5c ("x86/boot: Fix KASLR and memmap= collision") ... the memmap= option is parsed so that KASLR can avoid those reserved regions. It uses cmdline_find_option() to get the value if memmap= is specified, however the problem is that cmdline_find_option() can only find the last entry if multiple memmap entries are provided. This is not correct. Address this by checking each command line token for a "memmap=" match and parse each instance instead of using cmdline_find_option(). Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: douly.fnst@cn.fujitsu.com Cc: dyoung@redhat.com Cc: m.mizuma@jp.fujitsu.com Link: http://lkml.kernel.org/r/1494654390-23861-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-21x86/boot: Use CROSS_COMPILE prefix for readelfRob Landley
The boot code Makefile contains a straight 'readelf' invocation. This causes build warnings in cross compile environments, when there is no unprefixed readelf accessible via $PATH. Add the missing $(CROSS_COMPILE) prefix. [ tglx: Rewrote changelog ] Fixes: 98f78525371b ("x86/boot: Refuse to build with data relocations") Signed-off-by: Rob Landley <rob@landley.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: "H.J. Lu" <hjl.tools@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-08x86/mm: Add support for gbpages to kernel_ident_mapping_init()Xunlei Pang
Kernel identity mappings on x86-64 kernels are created in two ways: by the early x86 boot code, or by kernel_ident_mapping_init(). Native kernels (which is the dominant usecase) use the former, but the kexec and the hibernation code uses kernel_ident_mapping_init(). There's a subtle difference between these two ways of how identity mappings are created, the current kernel_ident_mapping_init() code creates identity mappings always using 2MB page(PMD level) - while the native kernel boot path also utilizes gbpages where available. This difference is suboptimal both for performance and for memory usage: kernel_ident_mapping_init() needs to allocate pages for the page tables when creating the new identity mappings. This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init() to address these concerns. The primary advantage would be better TLB coverage/performance, because we'd utilize 1GB TLBs instead of 2MB ones. It is also useful for machines with large number of memory to save paging structure allocations(around 4MB/TB using 2MB page) when setting identity mappings for all the memory, after using 1GB page it will consume only 8KB/TB. ( Note that this change alone does not activate gbpages in kexec, we are doing that in a separate patch. ) Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-07x86/boot: Declare error() as noreturnKees Cook
The compressed boot function error() is used to halt execution, but it wasn't marked with "noreturn". This fixes that in preparation for supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation on panic, and calls error(). GCC would warn about a noreturn function calling a non-noreturn function: arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’: arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return } ^ Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/20170506045116.GA2879@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-28x86/KASLR: Fix kexec kernel boot crash when KASLR randomization failsBaoquan He
Dave found that a kdump kernel with KASLR enabled will reset to the BIOS immediately if physical randomization failed to find a new position for the kernel. A kernel with the 'nokaslr' option works in this case. The reason is that KASLR will install a new page table for the identity mapping, while it missed building it for the original kernel location if KASLR physical randomization fails. This only happens in the kexec/kdump kernel, because the identity mapping has been built for kexec/kdump in the 1st kernel for the whole memory by calling init_pgtable(). Here if physical randomizaiton fails, it won't build the identity mapping for the original area of the kernel but change to a new page table '_pgtable'. Then the kernel will triple fault immediately caused by no identity mappings. The normal kernel won't see this bug, because it comes here via startup_32() and CR3 will be set to _pgtable already. In startup_32() the identity mapping is built for the 0~4G area. In KASLR we just append to the existing area instead of entirely overwriting it for on-demand identity mapping building. So the identity mapping for the original area of kernel is still there. To fix it we just switch to the new identity mapping page table when physical KASLR succeeds. Otherwise we keep the old page table unchanged just like "nokaslr" does. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11Merge branch 'WIP.x86/boot' into x86/boot, to pick up ready branchIngo Molnar
The E820 rework in WIP.x86/boot has gone through a couple of weeks of exposure in -tip, merge it in a wider fashion. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-31x86/boot: Fix Sparse warning by including required header fileZhengyi Shen
Include declarations for various symbols defined in the error.h header file to fix the following Sparse warnings: arch/x86/boot/compressed/error.c:8:6: warning: symbol 'warn' was not declared. Should it be static? arch/x86/boot/compressed/error.c:15:6: warning: symbol 'error' was not declared. Should it be static? Signed-off-by: Zhengyi Shen <shenzhengyi@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1490770820-24472-1-git-send-email-shenzhengyi@gmail.com [ Fixed/enhanced the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up ↵Ingo Molnar
updates Conflicts: arch/x86/xen/setup.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-20Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc updates: - fix e820 error handling - convert page table setup code from assembly to C - fix kexec environment bug - ... plus small cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kconfig: Remove misleading note regarding hibernation and KASLR x86/boot: Fix KASLR and memmap= collision x86/e820/32: Fix e820_search_gap() error handling on x86-32 x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C x86/e820: Make e820_search_gap() static and remove unused variables
2017-02-07efi: Get and store the secure boot statusDavid Howells
Get the firmware's secure-boot status in the kernel boot wrapper and stash it somewhere that the main kernel image can find. The efi_get_secureboot() function is extracted from the ARM stub and (a) generalised so that it can be called from x86 and (b) made to use efi_call_runtime() so that it can be run in mixed-mode. For x86, it is stored in boot_params and can be overridden by the boot loader or kexec. This allows secure-boot mode to be passed on to a new kernel. Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1486380166-31868-5-git-send-email-ard.biesheuvel@linaro.org [ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07x86/efi: Allow invocation of arbitrary runtime servicesDavid Howells
Provide the ability to perform mixed-mode runtime service calls for x86 in the same way the following commit provided the ability to invoke for boot services: 0a637ee61247bd ("x86/efi: Allow invocation of arbitrary boot services") Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1486380166-31868-2-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01x86/efi: Deduplicate efi_char16_printk()Lukas Wunner
Eliminate the separate 32-bit and 64x- bit code paths by way of the shiny new efi_call_proto() macro. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1485868902-20401-3-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01efi: Deduplicate efi_file_size() / _read() / _close()Lukas Wunner
There's one ARM, one x86_32 and one x86_64 version which can be folded into a single shared version by masking their differences with the shiny new efi_call_proto() macro. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1485868902-20401-2-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-29x86/boot/e820: Separate the E820 ABI structures from the in-kernel structuresIngo Molnar
Linus pointed out that relying on the compiler to pack structures with enums is fragile not just for the kernel, but for external tooling as well which might rely on our UAPI headers. So separate the two from each other: introduce 'struct boot_e820_entry', which is the boot protocol entry format. This actually simplifies the code, as e820__update_table() is now never called directly with boot protocol table entries - we can rely on append_e820_table() and do a e820__update_table() call afterwards. ( This will allow further simplifications of __e820__update_table(), but that will be done in a separate patch. ) This change also has the side effect of not modifying the bootparams structure anymore - which might be useful for debugging. In theory we could even constify the boot_params structure - at least from the E820 code's point of view. Remove the uapi/asm/e820/types.h file, as it's not used anymore - all kernel side E820 types are defined in asm/e820/types.h. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_"Ingo Molnar
So there's a number of constants that start with "E820" but which are not types - these create a confusing mixture when seen together with 'enum e820_type' values: E820MAP E820NR E820_X_MAX E820MAX To better differentiate the 'enum e820_type' values prefix them with E820_TYPE_. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename everything to e820_tableIngo Molnar
No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename 'e820_map' variables to 'e820_array'Ingo Molnar
In line with the rename to 'struct e820_array', harmonize the naming of common e820 table variable names as well: e820 => e820_array e820_saved => e820_array_saved e820_map => e820_array initial_e820 => e820_array_init This makes the variable names more consistent and easier to grep for. No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and ↵Ingo Molnar
'struct e820_array' The 'e820entry' and 'e820map' names have various annoyances: - the missing underscore departs from the usual kernel style and makes the code look weird, - in the past I kept confusing the 'map' with the 'entry', because a 'map' is ambiguous in that regard, - it's not really clear from the 'e820map' that this is a regular C array. Rename them to 'struct e820_entry' and 'struct e820_array' accordingly. ( Leave the legacy UAPI header alone but do the rename in the bootparam.h and e820/types.h file - outside tools relying on these defines should either adjust their code, or should use the legacy header, or should create their private copies for the definitions. ) No change in functionality. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28x86/boot/e820: Remove spurious asm/e820/api.h inclusionsIngo Molnar
A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h spuriously, without making direct use of it. Removing it is not simple: over the years various .c code learned to rely on this indirect inclusion. Remove the unnecessary include - this should speed up the kernel build a bit, as a large header is not included anymore in totally unrelated code. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25x86/boot: Fix KASLR and memmap= collisionDave Jiang
CONFIG_RANDOMIZE_BASE=y relocates the kernel to a random base address. However it does not take into account the memmap= parameter passed in from the kernel command line. This results in the kernel sometimes being put in the middle of memmap. Teach KASLR to not insert the kernel in memmap defined regions. We support up to 4 memmap regions: any additional regions will cause KASLR to disable. The mem_avoid set has been augmented to add up to 4 unusable regions of memmaps provided by the user to exclude those regions from the set of valid address range to insert the uncompressed kernel image. The nn@ss ranges will be skipped by the mem_avoid set since it indicates that memory is useable. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Cc: david@fromorbit.com Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/148417664156.131935.2248592164852799738.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: NTB: correct ntb_spad_count comment typo misc: ibmasm: fix typo in error message Remove references to dead make variable LINUX_INCLUDE Remove last traces of ikconfig.h treewide: Fix printk() message errors Documentation/device-mapper: s/getsize/getsz/