summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-04-29x86/asm/64: Rename thread_struct's fs and gs to fsbase and gsbaseAndy Lutomirski
Unlike ds and es, these are base addresses, not selectors. Rename them so their meaning is more obvious. On x86_32, the field is still called fs. Fixing that could make sense as a future cleanup. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/69a18a51c4cba0ce29a241e570fc618ad721d908.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/arch_prctl/64: Remove FSBASE/GSBASE < 4G optimizationAndy Lutomirski
As far as I know, the optimization doesn't work on any modern distro because modern distros use high addresses for ASLR. Remove it. The ptrace code was either wrong or very strange, but the behavior with this patch should be essentially identical to the behavior without this patch unless user code goes out of its way to mislead ptrace. On newer CPUs, once the FSGSBASE instructions are enabled, we won't want to use the optimized variant anyway. This isn't actually much of a performance regression, it has no effect on normal dynamically linked programs, and it's a considerably simplification. It also removes some nasty special cases from code that is already way too full of special cases for comfort. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/dd1599b08866961dba9d2458faa6bbd7fba471d7.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/segments/64: When load_gs_index fails, clear the baseAndy Lutomirski
On AMD CPUs, a failed load_gs_base currently may not clear the FS base. Fix it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1a6c4d3a8a4e7be79ba448b42685e0321d50c14c.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/segments/64: When loadsegment(fs, ...) fails, clear the baseAndy Lutomirski
On AMD CPUs, a failed loadsegment currently may not clear the FS base. Fix it. While we're at it, prevent loadsegment(gs, xyz) from even compiling on 64-bit kernels. It shouldn't be used. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a084c1b93b7b1408b58d3fd0b5d6e47da8e7d7cf.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/asm: Make asm/alternative.h safe from assemblyAndy Lutomirski
asm/alternative.h isn't directly useful from assembly, but it shouldn't break the build. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e5b693fcef99fe6e80341c9e97a002fb23871e91.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/asm: Stop depending on ptrace.h in alternative.hAndy Lutomirski
alternative.h pulls in ptrace.h, which means that alternatives can't be used in anything referenced from ptrace.h, which is a mess. Break the dependency by pulling text patching helpers into their own header. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/99b93b13f2c9eb671f5c98bba4c2cbdc061293a2.1461698311.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29Merge branch 'x86/urgent' into x86/asm, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/boot: Correctly bounds-check relocationsYinghai Lu
Relocation handling performs bounds checking on the resulting calculated addresses. The existing code uses output_len (VO size plus relocs size) as the max address. This is not right since the max_addr check should stop at the end of VO and exclude bss, brk, etc, which follows. The valid range should be VO [_text, __bss_start] in the loaded physical address space. This patch adds an export for __bss_start in voffset.h and uses it to set the correct limit for max_addr. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Rewrote the changelog. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/KASLR: Clean up unused code from old 'run_size' and rename it to ↵Yinghai Lu
'kernel_total_size' Since 'run_size' is now calculated in misc.c, the old script and associated argument passing is no longer needed. This patch removes them, and renames 'run_size' to the more descriptive 'kernel_total_size'. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/boot: Fix "run_size" calculationYinghai Lu
Currently, the "run_size" variable holds the total kernel size (size of code plus brk and bss) and is calculated via the shell script arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size of the .bss and .brk sections from the vmlinux, and adds them as follows: run_size = $(( $offsetA + $sizeA + $sizeB )) However, this is not correct (it is too large). To illustrate, here's a walk-through of the script's calculation, compared to the correct way to find it. First, offsetA is found as the starting address of the first .bss or .brk section seen in the ELF file. The sizeA and sizeB values are the respective section sizes. [bhe@x1 linux]$ objdump -h vmlinux vmlinux: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12 ALLOC 28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0 ALLOC Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at 0x00027000. The resulting run_size is 0x145f000: 0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000 However, if we instead examine the ELF LOAD program headers, we see a different picture. [bhe@x1 linux]$ readelf -l vmlinux Elf file type is EXEC (Executable file) Entry point 0x1000000 There are 5 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000 0x0000000000b5e000 0x0000000000b5e000 R E 200000 LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000 0x0000000000145000 0x0000000000145000 RW 200000 LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000 0x0000000000018158 0x0000000000018158 RW 200000 LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000 0x000000000016a000 0x0000000000301000 RWE 200000 NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac 0x00000000000001bc 0x00000000000001bc 4 Section to Segment mapping: Segment Sections... 00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param __modver 01 .data .vvar 02 .data..percpu 03 .init.text .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .bss .brk 04 .notes As mentioned, run_size needs to be the size of the running kernel including .bss and .brk. We can see from the Section/Segment mapping above that .bss and .brk are included in segment 03 (which corresponds to the final LOAD program header). To find the run_size, we calculate the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000) and its MemSiz (0x0000000000301000), minus the physical load address of the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The resulting run_size is 0x105f000: 0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000 So, from this we can see that the existing run_size calculation is 0x400000 too high. And, as it turns out, the correct run_size is actually equal to VO_end - VO_text, which is certainly easier to calculate. _end: 0xffffffff8205f000 _text:0xffffffff81000000 0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000 As a result, run_size is a simple constant, so we don't need to pass it around; we already have voffset.h for such things. We can share voffset.h between misc.c and header.S instead of getting run_size in other ways. This patch moves voffset.h creation code to boot/compressed/Makefile, and switches misc.c to use the VO_end - VO_text calculation for run_size. Dependence before: boot/header.S ==> boot/voffset.h ==> vmlinux boot/header.S ==> compressed/vmlinux ==> compressed/misc.c Dependence after: boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Fixes: e6023367d779 ("x86, kaslr: Prevent .bss from overlaping initrd") Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/boot: Calculate decompression size during boot not buildYinghai Lu
Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c. This doesn't work well because mkpiggy.c doesn't know the details of the decompressor in use. As a result, it can only make an estimation, which has risks: - output + output_len (VO) could be much bigger than input + input_len (ZO). In this case, the decompressed kernel plus relocs could overwrite the decompression code while it is running. - The head code of ZO could be bigger than z_extract_offset. In this case an overwrite could happen when the head code is running to move ZO to the end of buffer. Though currently the size of the head code is very small it's still a potential risk. Since there is no rule to limit the size of the head code of ZO, it runs the risk of suddenly becoming a (hard to find) bug. Instead, this moves the z_extract_offset calculation into header.S, and makes adjustments to be sure that the above two cases can never happen, and further corrects the comments describing the calculations. Since we have (in the previous patch) made ZO always be located against the end of decompression buffer, z_extract_offset is only used here to calculate an appropriate buffer size (INIT_SIZE), and is not longer used elsewhere. As such, it can be removed from voffset.h. Additionally clean up #if/#else #define to improve readability. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog and comments. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/boot: Move compressed kernel to the end of the decompression bufferYinghai Lu
This change makes later calculations about where the kernel is located easier to reason about. To better understand this change, we must first clarify what 'VO' and 'ZO' are. These values were introduced in commits by hpa: 77d1a4999502 ("x86, boot: make symbols from the main vmlinux available") 37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields") Specifically: All names prefixed with 'VO_': - relate to the uncompressed kernel image - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define) All names prefixed with 'ZO_': - relate to the bootable compressed kernel image (boot/compressed/vmlinux), which is composed of the following memory areas: - head text - compressed kernel (VO image and relocs table) - decompressor code - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below) The 'INIT_SIZE' value is used to find the larger of the two image sizes: #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) #define VO_INIT_SIZE (VO__end - VO__text) #if ZO_INIT_SIZE > VO_INIT_SIZE # define INIT_SIZE ZO_INIT_SIZE #else # define INIT_SIZE VO_INIT_SIZE #endif The current code uses extract_offset to decide where to position the copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE currently includes the extract_offset.) Why does z_extract_offset exist? It's needed because we are trying to minimize the amount of RAM used for the whole act of creating an uncompressed, executable, properly relocation-linked kernel image in system memory. We do this so that kernels can be booted on even very small systems. To achieve the goal of minimal memory consumption we have implemented an in-place decompression strategy: instead of cleanly separating the VO and ZO images and also allocating some memory for the decompression code's runtime needs, we instead create this elaborate layout of memory buffers where the output (decompressed) stream, as it progresses, overlaps with and destroys the input (compressed) stream. This can only be done safely if the ZO image is placed to the end of the VO range, plus a certain amount of safety distance to make sure that when the last bytes of the VO range are decompressed, the compressed stream pointer is safely beyond the end of the VO range. z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during the build process, at a point when we know the exact compressed and uncompressed size of the kernel images and can calculate this safe minimum offset value. (Note that the mkpiggy.c calculation is not perfect, because we don't know the decompressor used at that stage, so the z_extract_offset calculation is necessarily imprecise and is mostly based on gzip internals - we'll improve that in the next patch.) When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible), the copied ZO occupies the memory from extract_offset to the end of decompression buffer. It overlaps with the soon-to-be-uncompressed kernel like this: |-----compressed kernel image------| V V 0 extract_offset +INIT_SIZE |-----------|---------------|-------------------------|--------| | | | | VO__text startup_32 of ZO VO__end ZO__end ^ ^ |-------uncompressed kernel image---------| When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space left from end of ZO to the end of decompressing buffer, like below. |-compressed kernel image-| V V 0 extract_offset +INIT_SIZE |-----------|---------------|-------------------------|--------| | | | | VO__text startup_32 of ZO ZO__end VO__end ^ ^ |------------uncompressed kernel image-------------| To simplify calculations and avoid special cases, it is cleaner to always place the compressed kernel image in memory so that ZO__end is at the end of the decompression buffer, instead of placing t at the start of extract_offset as is currently done. This patch adds BP_init_size (which is the INIT_SIZE as passed in from the boot_params) into asm-offsets.c to make it visible to the assembly code. Then when moving the ZO, it calculates the starting position of the copied ZO (via BP_init_size and the ZO run size) so that the VO__end will be at the end of the decompression buffer. To make the position calculation safe, the end of ZO is page aligned (and a comment is added to the existing VO alignment for good measure). Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Rewrote changelog and comments. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org [ Rewrote the changelog some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29x86/KASLR: Handle kernel relocations above 2G correctlyBaoquan He
When processing the relocation table, the offset used to calculate the relocation is an 'int'. This is sufficient for calculating the physical address of the relocs entry on 32-bit systems and on 64-bit systems when the relocation is under 2G. To handle relocations above 2G (seen in situations like kexec, netboot, etc), this offset needs to be calculated using a 'long' to avoid wrapping and miscalculating the relocation. Signed-off-by: Baoquan He <bhe@redhat.com> [ Rewrote the changelog. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two boot crash fixes and an IRQ handling crash fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Handle zero vector gracefully in clear_vector_irq() Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging" xen/qspinlock: Don't kick CPU if IRQ is not initialized
2016-04-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "x86 PMU driver fixes plus a core code race fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix incorrect lbr_sel_mask value perf/x86/intel/pt: Don't die on VMXON perf/core: Fix perf_event_open() vs. execve() race perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation perf/x86/intel/rapl: Add missing Haswell model perf/x86/intel: Add model number for Skylake Server to perf
2016-04-28x86/mm, sched/core: Turn off IRQs in switch_mm()Andy Lutomirski
Potential races between switch_mm() and TLB-flush or LDT-flush IPIs could be very messy. AFAICT the code is currently okay, whether by accident or by careful design, but enabling PCID will make it considerably more complicated and will no longer be obviously safe. Fix it with a big hammer: run switch_mm() with IRQs off. To avoid a performance hit in the scheduler, we take advantage of our knowledge that the scheduler already has IRQs disabled when it calls switch_mm(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/mm, sched/core: Uninline switch_mm()Andy Lutomirski
It's fairly large and it has quite a few callers. This may also help untangle some headers down the road. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/mm: Build arch/x86/mm/tlb.c even on !SMPAndy Lutomirski
Currently all of the functions that live in tlb.c are inlined on !SMP builds. One can debate whether this is a good idea (in many respects the code in tlb.c is better than the inlined UP code). Regardless, I want to add code that needs to be built on UP and SMP kernels and relates to tlb flushing, so arrange for tlb.c to be compiled unconditionally. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/efi: Enable runtime call flag checkingMark Rutland
Define ARCH_EFI_IRQ_FLAGS_MASK for x86, which will enable the generic runtime wrapper code to detect when firmware erroneously modifies flags over a runtime services function call. For x86 (both 32-bit and 64-bit), we only need check the interrupt flag. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Darren Hart <dvhart@infradead.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Harald Hoyer harald@redhat.com Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Kweh Hock Leong <hock.leong.kweh@intel.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raphael Hertzog <hertzog@debian.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-40-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/efi: Move to generic {__,}efi_call_virt()Mark Rutland
Now there's a common template for {__,}efi_call_virt(), remove the duplicate logic from the x86 EFI code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-35-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/efi: Force EFI reboot to process pending capsulesMatt Fleming
If an EFI capsule has been sent to the firmware we must match the type of EFI reset against that required by the capsule to ensure it is processed correctly. Force an EFI reboot if a capsule is pending for the next reset. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Kweh Hock Leong <hock.leong.kweh@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: joeyli <jlee@suse.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-29-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/efi/efifb: Move DMI based quirks handling out of generic codeArd Biesheuvel
The efifb quirks handling based on DMI identification of the platform is specific to x86, so move it to x86 arch code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Peter Jones <pjones@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-19-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28efi/libstub: Move Graphics Output Protocol handling to generic codeArd Biesheuvel
The Graphics Output Protocol code executes in the stub, so create a generic version based on the x86 version in libstub so that we can move other archs to it in subsequent patches. The new source file gop.c is added to the libstub build for all architectures, but only wired up for x86. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-18-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/efi: Prepare GOP handling code for reuse as generic codeArd Biesheuvel
In preparation of moving this code to drivers/firmware/efi and reusing it on ARM and arm64, apply any changes that will be required to make this code build for other architectures. This should make it easier to track down problems that this move may cause to its operation on x86. Note that the generic version uses slightly different ways of casting the protocol methods and some other variables to the correct types, since such method calls are not loosely typed on ARM and arm64 as they are on x86. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-17-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/efi: Remove the always true EFI_DEBUG symbolMatt Fleming
This symbol is always set which makes it useless. Additionally we have a kernel command-line switch, efi=debug, which actually controls the printing of the memory map. Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-16-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28efi: Check EFI_MEMORY_DESCRIPTOR version explicitlyArd Biesheuvel
Our efi_memory_desc_t type is based on EFI_MEMORY_DESCRIPTOR version 1 in the UEFI spec. No version updates are expected, but since we are about to introduce support for new firmware tables that use the same descriptor type, it makes sense to at least warn if we encounter other versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-9-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28efi: Remove global 'memmap' EFI memory mapMatt Fleming
Abolish the poorly named EFI memory map, 'memmap'. It is shadowed by a bunch of local definitions in various files and having two ways to access the EFI memory map ('efi.memmap' vs. 'memmap') is rather confusing. Furthermore, IA64 doesn't even provide this global object, which has caused issues when trying to write generic EFI memmap code. Replace all occurrences with efi.memmap, and convert the remaining iterator code to use for_each_efi_mem_desc(). Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Luck, Tony <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-8-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28efi: Iterate over efi.memmap in for_each_efi_memory_desc()Matt Fleming
Most of the users of for_each_efi_memory_desc() are equally happy iterating over the EFI memory map in efi.memmap instead of 'memmap', since the former is usually a pointer to the latter. For those users that want to specify an EFI memory map other than efi.memmap, that can be done using for_each_efi_memory_desc_in_map(). One such example is in the libstub code where the firmware is queried directly for the memory map, it gets iterated over, and then freed. This change goes part of the way toward deleting the global 'memmap' variable, which is not universally available on all architectures (notably IA64) and is rather poorly named. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Mark Salter <msalter@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-7-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/mm/pat: Document the (currently) EFI-only code pathMatt Fleming
It's not at all obvious that populate_pgd() and friends are only executed when mapping EFI virtual memory regions or that no other pageattr callers pass a ->pgd value. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-4-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28efi: Get rid of the EFI_SYSTEM_TABLES status bitArd Biesheuvel
The EFI_SYSTEM_TABLES status bit is set by all EFI supporting architectures upon discovery of the EFI system table, but the bit is never tested in any code we have in the tree. So remove it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Luck, Tony <tony.luck@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1461614832-17633-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/boot: Rename overlapping memcpy() to memmove()Kees Cook
Instead of having non-standard memcpy() behavior, explicitly call the new function memmove(), make it available to the decompressors, and switch the two overlap cases (screen scrolling and ELF parsing) to use memmove(). Additionally documents the purpose of compressed/string.c. Suggested-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20160426214606.GA5758@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28Merge branch 'perf/urgent' into perf/core, to resolve conflictIngo Molnar
Conflicts: arch/x86/events/intel/pt.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf/x86/intel: Fix incorrect lbr_sel_mask valueKan Liang
This patch fixes a bug which was introduced by: b16a5b52eb90 ("perf/x86: Add option to disable reading branch flags/cycles") In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be set in lbr_select. This patch corrects the LBR_SEL_MASK by including all valid bits in LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in LBR_SELECT. It does not operate in suppress mode, so it needs to be specially handled in intel_pmu_setup_hw_lbr_filter. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf/x86/intel/pt: Don't die on VMXONAlexander Shishkin
Some versions of Intel PT do not support tracing across VMXON, more specifically, VMXON will clear TraceEn control bit and any attempt to set it before VMXOFF will throw a #GP, which in the current state of things will crash the kernel. Namely: $ perf record -e intel_pt// kvm -nographic on such a machine will kill it. To avoid this, notify the intel_pt driver before VMXON and after VMXOFF so that it knows when not to enable itself. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAXAdam Borowski
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is referenced by filter_events() which expects undefined events to have a value of 0. Found via KASAN: UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30 index 9 is out of range for type 'u64 [9]' UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9 load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64' Signed-off-by: Adam Borowski <kilobyte@angband.pl> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28x86/apic: Handle zero vector gracefully in clear_vector_irq()Keith Busch
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup the already allocated vectors. This subsequently calls clear_vector_irq(). The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in clear_vector_irq(). We cannot suppress the call to x86_vector_free_irqs() for the failed interrupt, because the other data related to this irq must be cleaned up as well. So calling clear_vector_irq() with vector == 0 is legitimate. Remove the BUG_ON and return if vector is zero, [ tglx: Massaged changelog ] Fixes: b5dc8e6c21e7 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors" Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-27perf core: Allow setting up max frame stack depth via sysctlArnaldo Carvalho de Melo
The default remains 127, which is good for most cases, and not even hit most of the time, but then for some cases, as reported by Brendan, 1024+ deep frames are appearing on the radar for things like groovy, ruby. And in some workloads putting a _lower_ cap on this may make sense. One that is per event still needs to be put in place tho. The new file is: # cat /proc/sys/kernel/perf_event_max_stack 127 Chaging it: # echo 256 > /proc/sys/kernel/perf_event_max_stack # cat /proc/sys/kernel/perf_event_max_stack 256 But as soon as there is some event using callchains we get: # echo 512 > /proc/sys/kernel/perf_event_max_stack -bash: echo: write error: Device or resource busy # Because we only allocate the callchain percpu data structures when there is a user, which allows for changing the max easily, its just a matter of having no callchain users at that point. Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-26Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"Andy Lutomirski
This reverts commit 320d25b6a05f8b73c23fc21025d2906ecdd2d4fc. This change was problematic for a couple of reasons: 1. It missed a some entry points (Xen things and 64-bit native). 2. The entry it changed can be executed more than once. This isn't really a problem, but it conflated per-cpu state setup and global state setup. 3. It broke 64-bit non-NX. 64-bit non-NX worked the other way around from 32-bit -- __supported_pte_mask had NX set initially and was *cleared* in x86_configure_nx. With the patch applied, it never got cleared. Reported-and-tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: two EDAC driver fixes, a Xen crash fix, a HyperV log spam fix and a documentation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86 EDAC, sb_edac.c: Take account of channel hashing when needed x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address x86/mm/xen: Suppress hugetlbfs in PV guests x86/doc: Correct limits in Documentation/x86/x86_64/mm.txt x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances
2016-04-23Merge tag 'v4.6-rc4' into sched/core, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23x86/perf/rapl: Add missing Broadwell modelPeter Zijlstra
With the array aligned as per events/intel/core.c it was fairly obvious we missed one, add it in. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23x86/perf/rapl: Reorder model numbersPeter Zijlstra
Re-order the model array to match the order in events/intel/core.c, to easier spot gaps and such. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/x86/intel/rapl: Support Skylake RAPL domainsSrinivas Pandruvada
Add Skylake client support for RAPL domains. In addition to RAPL domains in Broadwell clients, it has support for platform domain (aka PSys). The PSys domain controls the entire SoC instead of just a CPU package. Unlike package domain, PSys support requires more than just processor level implementation. The other parts in the system need additional HW level signaling, which OEMs need to support. When not supported, the energy counter register in PSys domain returns 0. Also corrected error in comment for GPU counter, which previously was DRAM counter. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com [ Cnverted to model_match stuff. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jacob.jun.pan@linux.intel.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/x86/intel: Add LBR filter support for Silvermont and Airmont CPUsKan Liang
LBR filtering is also supported on the Silvermont and Airmont microarchitectures. The layout of MSR_LBR_SELECT is the same as Nehalem. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1460706825-46163-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/x86/intel: Add Goldmont CPU supportKan Liang
Add perf core PMU support for Intel Goldmont CPU cores: - The init code is based on Silvermont. - There is a new cache event list, based on the Silvermont cache event list. - Goldmont has 32 LBR entries. It also uses new LBRv6 format, which report the cycle information using upper 16-bit of the LBR_TO. - It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS for precise cycles. For details, please refer to the latest SDM058: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1460706167-45320-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23Merge branch 'perf/urgent' into perf/core, to resolve conflictIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/x86/intel/rapl: Add missing Haswell modelSrinivas Pandruvada
Added one missing Haswell model. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1460907809-11897-1-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23perf/x86/intel: Add model number for Skylake Server to perfAndi Kleen
Everything the same as base Skylake, just a new model number. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1460751933-2264-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23xen/qspinlock: Don't kick CPU if IRQ is not initializedRoss Lagerwall
The following commit: 1fb3a8b2cfb2 ("xen/spinlock: Fix locking path engaging too soon under PVHVM.") ... moved the initalization of the kicker interrupt until after native_cpu_up() is called. However, when using qspinlocks, a CPU may try to kick another CPU that is spinning (because it has not yet initialized its kicker interrupt), resulting in the following crash during boot: kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210! invalid opcode: 0000 [#1] SMP ... RIP: 0010:[<ffffffff814c97c9>] [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60 ... Call Trace: [<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10 [<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0 [<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [<ffffffff81052936>] ? check_tsc_warp+0x76/0x150 [<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160 [<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0 [<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80 [<ffffffff8108198c>] _cpu_up+0x13c/0x180 [<ffffffff81081a4a>] cpu_up+0x7a/0xa0 [<ffffffff81f80dfc>] smp_init+0x7f/0x81 [<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212 [<ffffffff81817f30>] ? rest_init+0x80/0x80 [<ffffffff81817f3e>] kernel_init+0xe/0xe0 [<ffffffff8182488f>] ret_from_fork+0x3f/0x70 [<ffffffff81817f30>] ? rest_init+0x80/0x80 To fix this, only send the kick if the target CPU's interrupt has been initialized. This check isn't racy, because the target is waiting for the spinlock, so it won't have initialized the interrupt in the meantime. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22x86/init: Disable pnpbios and rtc for X86_SUBARCH_CE4100Luis R. Rodriguez
As per hpa CE4100 platforms can also disable pnpbios: http://lkml.kernel.org/r/5702B5C2.7070101@zytor.com Then Sebastian also recently noted that CE4100 also disables RTC probe, to do that Sebastian had long ago added the RTC of_have_populated_dt() check, he noted that it was meant to skip the RTC probe on all OF platforms but as of now, CE4100 was the only x86 DT using this. We can just fold this requirement into the platform quirk then. This now means that all of these match platform quirks for pnpbios and RTC preferences: * X86_SUBARCH_XEN * X86_SUBARCH_LGUEST * X86_SUBARCH_INTEL_MID * X86_SUBARCH_CE4100 Also see: http://lkml.kernel.org/r/570B52EA.60300@linutronix.de Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jgross@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-17-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>