summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-06-28crypto: sha512-mb - SHA512 multibuffer job manager and glue codeMegha Dey
This patch introduces the multi-buffer job manager which is responsible for submitting scatter-gather buffers from several SHA512 jobs to the multi-buffer algorithm. It also contains the flush routine that's called by the crypto daemon to complete the job when no new jobs arrive before the deadline of maximum latency of a SHA512 crypto job. The SHA512 multi-buffer crypto algorithm is defined and initialized in this patch. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.Quentin Casasnovas
I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was getting a GP(0) on a rather unspecial vmread from Xen: (XEN) ----[ Xen-4.7.0-rc x86_64 debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450 (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d1v0) (XEN) rax: ffff82d0801e6288 rbx: ffff83003ffbfb7c rcx: fffffffffffab928 (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff83000bdd0000 (XEN) rbp: ffff83000bdd0000 rsp: ffff83003ffbfab0 r8: ffff830038813910 (XEN) r9: ffff83003faf3958 r10: 0000000a3b9f7640 r11: ffff83003f82d418 (XEN) r12: 0000000000000000 r13: ffff83003ffbffff r14: 0000000000004802 (XEN) r15: 0000000000000008 cr0: 0000000080050033 cr4: 00000000001526e0 (XEN) cr3: 000000003fc79000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450): (XEN) 00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00 (XEN) Xen stack trace from rsp=ffff83003ffbfab0: ... (XEN) Xen call trace: (XEN) [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450 (XEN) [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300 (XEN) [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60 (XEN) [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70 (XEN) [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0 (XEN) [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0 (XEN) [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0 (XEN) [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270 (XEN) [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0 (XEN) [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90 (XEN) [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140 (XEN) [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140 (XEN) [<ffff82d080164aeb>] context_switch+0x13b/0xe40 (XEN) [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570 (XEN) [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90 (XEN) [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=0000] (XEN) **************************************** Tracing my host KVM showed it was the one injecting the GP(0) when emulating the VMREAD and checking the destination segment permissions in get_vmx_mem_address(): 3) | vmx_handle_exit() { 3) | handle_vmread() { 3) | nested_vmx_check_permission() { 3) | vmx_get_segment() { 3) 0.074 us | vmx_read_guest_seg_base(); 3) 0.065 us | vmx_read_guest_seg_selector(); 3) 0.066 us | vmx_read_guest_seg_ar(); 3) 1.636 us | } 3) 0.058 us | vmx_get_rflags(); 3) 0.062 us | vmx_read_guest_seg_ar(); 3) 3.469 us | } 3) | vmx_get_cs_db_l_bits() { 3) 0.058 us | vmx_read_guest_seg_ar(); 3) 0.662 us | } 3) | get_vmx_mem_address() { 3) 0.068 us | vmx_cache_reg(); 3) | vmx_get_segment() { 3) 0.074 us | vmx_read_guest_seg_base(); 3) 0.068 us | vmx_read_guest_seg_selector(); 3) 0.071 us | vmx_read_guest_seg_ar(); 3) 1.756 us | } 3) | kvm_queue_exception_e() { 3) 0.066 us | kvm_multiple_exception(); 3) 0.684 us | } 3) 4.085 us | } 3) 9.833 us | } 3) + 10.366 us | } Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine Control Structure", I found that we're enforcing that the destination operand is NOT located in a read-only data segment or any code segment when the L1 is in long mode - BUT that check should only happen when it is in protected mode. Shuffling the code a bit to make our emulation follow the specification allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests without problems. Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Eugene Korenevsky <ekorenevsky@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27KVM: LAPIC: cap __delay at lapic_timer_advance_nsMarcelo Tosatti
The host timer which emulates the guest LAPIC TSC deadline timer has its expiration diminished by lapic_timer_advance_ns nanoseconds. Therefore if, at wait_lapic_expire, a difference larger than lapic_timer_advance_ns is encountered, delay at most lapic_timer_advance_ns. This fixes a problem where the guest can cause the host to delay for large amounts of time. Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27KVM: x86: move nsec_to_cycles from x86.c to x86.hMarcelo Tosatti
Move the inline function nsec_to_cycles from x86.c to x86.h, as the next patch uses it from lapic.c. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27pvclock: Get rid of __pvclock_read_cycles in function pvclock_read_flagsMinfei Huang
There is a generic function __pvclock_read_cycles to be used to get both flags and cycles. For function pvclock_read_flags, it's useless to get cycles value. To make this function be more effective, get this variable flags directly in function. Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27pvclock: Cleanup to remove function pvclock_get_nsec_offsetMinfei Huang
Function __pvclock_read_cycles is short enough, so there is no need to have another function pvclock_get_nsec_offset to calculate tsc delta. It's better to combine it into function __pvclock_read_cycles. Remove useless variables in function __pvclock_read_cycles. Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27pvclock: Add CPU barriers to get correct version valueMinfei Huang
Protocol for the "version" fields is: hypervisor raises it (making it uneven) before it starts updating the fields and raises it again (making it even) when it is done. Thus the guest can make sure the time values it got are consistent by checking the version before and after reading them. Add CPU barries after getting version value just like what function vread_pvclock does, because all of callees in this function is inline. Fixes: 502dfeff239e8313bfbe906ca0a1a6827ac8481b Cc: stable@vger.kernel.org Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27x86/efi: Remove the unused efi_get_time() functionArnd Bergmann
Nothing calls the efi_get_time() function on x86, but it does suffer from the 32-bit time_t overflow in 2038. This removes the function, we can always put it back in case we need it later. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-8-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macrosAlex Thorlton
Currently, the efi_thunk macro has some semi-duplicated code in it that can be replaced with the arch_efi_call_virt_setup/teardown macros. This commit simply replaces the duplicated code with those macros. Suggested-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roy Franz <roy.franz@linaro.org> Cc: Russ Anderson <rja@sgi.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-7-git-send-email-matt@codeblueprint.co.uk [ Renamed variables to the standard __ prefix. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()Alex Thorlton
Now that the efi_call_virt() macro has been generalized to be able to use EFI system tables besides efi.systab, we are able to convert our uv_bios_call() wrapper to use this standard EFI callback mechanism. This simple change is part of a much larger effort to recover from some issues with the way we were mapping in some of our MMRs, and the way that we were doing our BIOS callbacks, which were uncovered by commit 67a9108ed431 ("x86/efi: Build our own page table structures"). The first issue that this uncovered was that we were relying on the EFI memory mapping mechanism to map in our MMR space for us, which, while reliable, was technically a bug, as it relied on "undefined" behavior in the mapping code. The reason we were able to piggyback on the EFI memory mapping code to map in our MMRs was because, previously, EFI code used the trampoline_pgd, which shares a few entries with the main kernel pgd. It just so happened, that the memory range containing our MMRs was inside one of those shared regions, which kept our code working without issue for quite a while. Anyways, once we discovered this problem, we brought back our original code to map in the MMRs with commit: 08914f436bdd ("x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init") This got our systems a little further along, but we were still running into trouble with our EFI callbacks, which prevented us from booting all the way up. Our first step towards fixing the BIOS callbacks was to get our uv_bios_call() wrapper updated to use efi_call_virt() instead of the plain efi_call(). The previous patch took care of the effort needed to make that possible. Along the way, we hit a major issue with some confusion about how to properly pull arguments higher than number 6 off the stack in the efi_call() code, which resulted in the following commit from Linus: 683ad8092cd2 ("x86/efi: Fix 7-parameter efi_call()s") Now that all of those issues are out of the way, we're able to make this simple change to use the new efi_call_virt_pointer() in uv_bios_call() which gets our machines booting, running properly, and able to execute our callbacks with 6+ arguments. Note that, since we are now using the EFI page table when we make our function call, we are no longer able to make the call using the __va() of our function pointer, since the memory range containing that address isn't mapped into the EFI page table. For now, we will use the physical address of the function directly, since that is mapped into the EFI page table. In the near future, we're going to get some code added in to properly update our function pointer to its virtual address during SetVirtualAddressMap. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roy Franz <roy.franz@linaro.org> Cc: Russ Anderson <rja@sgi.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-6-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27efi: Convert efi_call_virt() to efi_call_virt_pointer()Alex Thorlton
This commit makes a few slight modifications to the efi_call_virt() macro to get it to work with function pointers that are stored in locations other than efi.systab->runtime, and renames the macro to efi_call_virt_pointer(). The majority of the changes here are to pull these macros up into header files so that they can be accessed from outside of drivers/firmware/efi/runtime-wrappers.c. The most significant change not directly related to the code move is to add an extra "p" argument into the appropriate efi_call macros, and use that new argument in place of the, formerly hard-coded, efi.systab->runtime pointer. The last piece of the puzzle was to add an efi_call_virt() macro back into drivers/firmware/efi/runtime-wrappers.c to wrap around the new efi_call_virt_pointer() macro - this was mainly to keep the code from looking too cluttered by adding a bunch of extra references to efi.systab->runtime everywhere. Note that I also broke up the code in the efi_call_virt_pointer() macro a bit in the process of moving it. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roy Franz <roy.franz@linaro.org> Cc: Russ Anderson <rja@sgi.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-5-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27x86/efi: Remove unused variable 'efi'Colin Ian King
Remove unused variable 'efi', it is never used. This fixes the following clang build warning: arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27x86/boot/64: Add forgotten end of function markerBorislav Petkov
Add secondary_startup_64()'s ENDPROC() marker. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160625112457.16930-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27perf/x86/intel: Add {rd,wr}lbr_{to,from} wrappersPeter Zijlstra
The whole rdmsr()/wrmsr() for lbr_from got a little unweildy with the sign extension quirk, provide a few simple wrappers to clean things up. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27perf/x86/intel: Add MSR_LAST_BRANCH_FROM_x quirk for ctx switchDavid Carrillo-Cisneros
Add quirk for context switch to save/restore the value of MSR_LAST_BRANCH_FROM_x when LBR is enabled and there is potential for kernel addresses to be in the lbr_from register. To test this patch, use a perf tool and kernel with the patch next in this series. That patch removes the work around that masked the hw bug: $ ./lbr_perf record --call-graph lbr -e cycles:k sleep 1 where lbr_perf is the patched perf tool, that allows to specify :k on lbr mode. The above command will trigger a #GPF : WARNING: CPU: 28 PID: 14096 at arch/x86/mm/extable.c:65 ex_handler_wrmsr_unsafe+0x70/0x80 unchecked MSR access error: WRMSR to 0x681 (tried to write 0x1fffffff81010794) ... Call Trace: [<ffffffff8167af49>] dump_stack+0x4d/0x63 [<ffffffff810b9b15>] __warn+0xe5/0x100 [<ffffffff810b9be9>] warn_slowpath_fmt+0x49/0x50 [<ffffffff810abb40>] ex_handler_wrmsr_unsafe+0x70/0x80 [<ffffffff810abc42>] fixup_exception+0x42/0x50 [<ffffffff81079d1a>] do_general_protection+0x8a/0x160 [<ffffffff81684ec2>] general_protection+0x22/0x30 [<ffffffff810101b9>] ? intel_pmu_lbr_sched_task+0xc9/0x380 [<ffffffff81009d7c>] intel_pmu_sched_task+0x3c/0x60 [<ffffffff81003a2b>] x86_pmu_sched_task+0x1b/0x20 [<ffffffff81192a5b>] perf_pmu_sched_task+0x6b/0xb0 [<ffffffff8119746d>] __perf_event_task_sched_in+0x7d/0x150 [<ffffffff810dd9dc>] finish_task_switch+0x15c/0x200 [<ffffffff8167f894>] __schedule+0x274/0x6cc [<ffffffff8167fdd9>] schedule+0x39/0x90 [<ffffffff81675398>] exit_to_usermode_loop+0x39/0x89 [<ffffffff810028ce>] prepare_exit_to_usermode+0x2e/0x30 [<ffffffff81683c1b>] retint_user+0x8/0x10 Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1466533874-52003-5-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27perf/x86/intel: Fix trivial formatting and style bugDavid Carrillo-Cisneros
Replace spaces by tabs in LBR_FROM_* constants to align with newly defined constant. Use BIT_ULL. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1466533874-52003-4-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug when no TSXDavid Carrillo-Cisneros
Intel's SDM states that bits 61:62 in MSR_LAST_BRANCH_FROM_x are the TSX flags for formats with LBR_TSX flags (i.e. LBR_FORMAT_EIP_EFLAGS2). However, when the CPU has TSX support deactivated, bits 61:62 actually behave as follows: - For wrmsr(), bits 61:62 are considered part of the sign extension. - When capturing branches, the LBR hw will always clear bits 61:62. regardless of the sign extension. Therefore, if: 1) LBR has TSX format. 2) CPU has no TSX support enabled. ... then any value passed to wrmsr() must be sign extended to 63 bits and any value from rdmsr() must be converted to have a sign extension of 61 bits, ignoring the values at TSX flags. This bug was masked by the work-around to the Intel's CPU bug: BJ94. "LBR May Contain Incorrect Information When Using FREEZE_LBRS_ON_PMI" in Document Number: 324643-037US. The aforementioned work-around uses hw flags to filter out all kernel branches, limiting LBR callstack to user level execution only. Since user addresses are not sign extended, they do not trigger the wrmsr() bug in MSR_LAST_BRANCH_FROM_x when saved/restored at context switch. To verify the hw bug: $ perf record -b -e cycles sleep 1 $ rdmsr -p 0 0x680 0x1fffffffb0b9b0cc $ wrmsr -p 0 0x680 0x1fffffffb0b9b0cc write(): Input/output error The quirk for LBR_FROM_ MSRs is required before calls to wrmsrl() and after rdmsrl(). This patch introduces it for wrmsrl()'s done for testing LBR support. Future patch in series adds the quirk for context switch, that would be required if LBR callstack is to be enabled for ring 0. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1466533874-52003-3-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27perf/x86/intel: Print LBR support statement after validationDavid Carrillo-Cisneros
The following commit: 338b522ca43c ("perf/x86/intel: Protect LBR and extra_regs against KVM lying") added an additional test to LBR support detection that is performed after printing the LBR support statement to dmesg. Move the LBR support output after the very last test, to make sure we print the true status of LBR support. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1466533874-52003-2-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27Merge tag 'v4.7-rc5' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27crypto: sha1-mb - rename sha-mb to sha1-mbMegha Dey
Until now, there was only support for the SHA1 multibuffer algorithm. Hence, there was just one sha-mb folder. Now, with the introduction of the SHA256 multi-buffer algorithm , it is logical to name the existing folder as sha1-mb. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27crypto: sha256-mb - Crypto computation (x8 AVX2)Megha Dey
This patch introduces the assembly routines to do SHA256 computation on buffers belonging to several jobs at once. The assembly routines are optimized with AVX2 instructions that have 8 data lanes and using AVX2 registers. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27crypto: sha256-mb - Algorithm data structuresMegha Dey
This patch introduces the data structures and prototypes of functions needed for computing SHA256 hash using multi-buffer. Included are the structures of the multi-buffer SHA256 job, job scheduler in C and x86 assembly. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27crypto: sha256-mb - submit/flush routines for AVX2Megha Dey
This patch introduces the routines used to submit and flush buffers belonging to SHA256 crypto jobs to the SHA256 multibuffer algorithm. It is implemented mostly in assembly optimized with AVX2 instructions. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-27crypto: sha256-mb - SHA256 multibuffer job manager and glue codeMegha Dey
This patch introduces the multi-buffer job manager which is responsible for submitting scatter-gather buffers from several SHA256 jobs to the multi-buffer algorithm. It also contains the flush routine to that's called by the crypto daemon to complete the job when no new jobs arrive before the deadline of maximum latency of a SHA256 crypto job. The SHA256 multi-buffer crypto algorithm is defined and initialized in this patch. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-26x86/KASLR: Allow randomization below the load addressYinghai Lu
Currently the kernel image physical address randomization's lower boundary is the original kernel load address. For bootloaders that load kernels into very high memory (e.g. kexec), this means randomization takes place in a very small window at the top of memory, ignoring the large region of physical memory below the load address. Since mem_avoid[] is already correctly tracking the regions that must be avoided, this patch changes the minimum address to whatever is less: 512M (to conservatively avoid unknown things in lower memory) or the load address. Now, for example, if the kernel is loaded at 8G, [512M, 8G) will be added to the list of possible physical memory positions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Rewrote the changelog, refactored the code to use min(). ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org [ Edited the changelog some more, plus the code comment as well. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26x86/KASLR: Extend kernel image physical address randomization to addresses ↵Kees Cook
larger than 4G We want the physical address to be randomized anywhere between 16MB and the top of physical memory (up to 64TB). This patch exchanges the prior slots[] array for the new slot_areas[] array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical address offset for 64-bit. As before, process_e820_entry() walks memory and populates slot_areas[], splitting on any detected mem_avoid collisions. Finally, since the slots[] array and its associated functions are not needed any more, so they are removed. Based on earlier patches by Baoquan He. Originally-from: Baoquan He <bhe@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26x86/KASLR: Randomize virtual address separatelyBaoquan He
The current KASLR implementation randomizes the physical and virtual addresses of the kernel together (both are offset by the same amount). It calculates the delta of the physical address where vmlinux was linked to load and where it is finally loaded. If the delta is not equal to 0 (i.e. the kernel was relocated), relocation handling needs be done. On 64-bit, this patch randomizes both the physical address where kernel is decompressed and the virtual address where kernel text is mapped and will execute from. We now have two values being chosen, so the function arguments are reorganized to pass by pointer so they can be directly updated. Since relocation handling only depends on the virtual address, we must check the virtual delta, not the physical delta for processing kernel relocations. This also populates the page table for the new virtual address range. 32-bit does not support a separate virtual address, so it continues to use the physical offset for its virtual offset. Additionally updates the sanity checks done on the resulting kernel addresses since they are potentially separate now. [kees: rewrote changelog, limited virtual split to 64-bit only, update checks] [kees: fix CONFIG_RANDOMIZE_BASE=n boot failure] Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26x86/KASLR: Clarify identity map interfaceKees Cook
This extracts the call to prepare_level4() into a top-level function that the user of the pagetable.c interface must call to initialize the new page tables. For clarity and to match the "finalize" function, it has been renamed to initialize_identity_maps(). This function also gains the initialization of mapping_info so we don't have to do it each time in add_identity_map(). Additionally add copyright notice to the top, to make it clear that the bulk of the pagetable.c code was written by Yinghai, and that I just added bugs later. :) Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26x86/boot: Refuse to build with data relocationsKees Cook
The compressed kernel is built with -fPIC/-fPIE so that it can run in any location a bootloader happens to put it. However, since ELF relocation processing is not happening (and all the relocation information has already been stripped at link time), none of the code can use data relocations (e.g. static assignments of pointers). This is already noted in a warning comment at the top of misc.c, but this adds an explicit check for the condition during the linking stage to block any such bugs from appearing. If this was in place with the earlier bug in pagetable.c, the build would fail like this: ... CC arch/x86/boot/compressed/pagetable.o DATAREL arch/x86/boot/compressed/vmlinux error: arch/x86/boot/compressed/pagetable.o has data relocations! make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1 ... A clean build shows: ... CC arch/x86/boot/compressed/pagetable.o DATAREL arch/x86/boot/compressed/vmlinux LD arch/x86/boot/compressed/vmlinux ... Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26x86/KASLR, x86/power: Remove x86 hibernation restrictionsKees Cook
With the following fix: 70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel") ... there is no longer a problem with hibernation resuming a KASLR-booted kernel image, so remove the restriction. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux PM list <linux-pm@vger.kernel.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-26rtc: move mc146818 helper functions out-of-lineArnd Bergmann
The mc146818_get_time/mc146818_set_time functions are rather large inline functions in a global header file and are used in several drivers and in x86 specific code. Here we move them into a separate .c file that is compiled whenever any of the users require it. This also lets us remove the linux/acpi.h header inclusion from mc146818rtc.h, which in turn avoids some warnings about duplicate definition of the TRUE/FALSE macros. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-06-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kprobe fix from Thomas Gleixner: "A single fix clearing the TF bit when a fault is single stepped" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Clear TF bit in fault on single-stepping
2016-06-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Two weeks worth of fixes here" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits) init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 autofs: don't get stuck in a loop if vfs_write() returns an error mm/page_owner: avoid null pointer dereference tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences" fs/nilfs2: fix potential underflow in call to crc32_le oom, suspend: fix oom_reaper vs. oom_killer_disable race ocfs2: disable BUG assertions in reading blocks mm, compaction: abort free scanner if split fails mm: prevent KASAN false positives in kmemleak mm/hugetlb: clear compound_mapcount when freeing gigantic pages mm/swap.c: flush lru pvecs on compound page arrival memcg: css_alloc should return an ERR_PTR value on error memcg: mem_cgroup_migrate() may be called with irq disabled hugetlb: fix nr_pmds accounting with shared page tables Revert "mm: disable fault around on emulated access bit architecture" Revert "mm: make faultaround produce old ptes" mailmap: add Boris Brezillon's email mailmap: add Antoine Tenart's email mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask mm: mempool: kasan: don't poot mempool objects in quarantine ...
2016-06-24Merge tag 'for-linus-4.7b-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - fix x86 PV dom0 crash during early boot on some hardware - fix two pciback bugs affects certain devices - fix potential overflow when clearing page tables in x86 PV * tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: return proper values during BAR sizing x86/xen: avoid m2p lookup when setting early page table entries xen/pciback: Fix conf_space read/write overlap check. x86/xen: fix upper bound of pmd loop in xen_cleanhighmap() xen/balloon: Fix declared-but-not-defined warning
2016-06-24x86/efi: get rid of superfluous __GFP_REPEATMichal Hocko
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0 page. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/1464599699-30131-4-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24x86: get rid of superfluous __GFP_REPEATMichal Hocko
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this flag is for more than order-0. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/1464599699-30131-3-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24tree wide: get rid of __GFP_REPEAT for order-0 allocations part IMichal Hocko
This is the third version of the patchset previously sent [1]. I have basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get rid of superfluous gfp flags" which went through dm tree. I am sending it now because it is tree wide and chances for conflicts are reduced considerably when we want to target rc2. I plan to send the next step and rename the flag and move to a better semantic later during this release cycle so we will have a new semantic ready for 4.8 merge window hopefully. Motivation: While working on something unrelated I've checked the current usage of __GFP_REPEAT in the tree. It seems that a majority of the usage is and always has been bogus because __GFP_REPEAT has always been about costly high order allocations while we are using it for order-0 or very small orders very often. It seems that a big pile of them is just a copy&paste when a code has been adopted from one arch to another. I think it makes some sense to get rid of them because they are just making the semantic more unclear. Please note that GFP_REPEAT is documented as * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. while !costly requests have basically nofail semantic. So one could reasonably expect that order-0 request with __GFP_REPEAT will not loop for ever. This is not implemented right now though. I would like to move on with __GFP_REPEAT and define a better semantic for it. $ git grep __GFP_REPEAT origin/master | wc -l 111 $ git grep __GFP_REPEAT | wc -l 36 So we are down to the third after this patch series. The remaining places really seem to be relying on __GFP_REPEAT due to large allocation requests. This still needs some double checking which I will do later after all the simple ones are sorted out. I am touching a lot of arch specific code here and I hope I got it right but as a matter of fact I even didn't compile test for some archs as I do not have cross compiler for them. Patches should be quite trivial to review for stupid compile mistakes though. The tricky parts are usually hidden by macro definitions and thats where I would appreciate help from arch maintainers. [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org This patch (of 19): __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have the full kernel tree with its usage for apparently order-0 allocations. This is really confusing because __GFP_REPEAT is explicitly documented to allow allocation failures which is a weaker semantic than the current order-0 has (basically nofail). Let's simply drop __GFP_REPEAT from those places. This would allow to identify place which really need allocator to retry harder and formulate a more specific semantic for what the flag is supposed to do actually. Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: John Crispin <blogic@openwrt.org> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24x86: fix up a few misc stack pointer vs thread_info confusionsLinus Torvalds
As the actual pointer value is the same for the thread stack allocation and the thread_info, code that confused the two worked fine, but will break when the thread info is moved away from the stack allocation. It also looks very confusing. For example, the kprobe code wanted to know the current top of stack. To do that, it used this: (unsigned long)current_thread_info() + THREAD_SIZE which did indeed give the correct value. But it's not only a fairly nonsensical expression, it's also rather complex, especially since we actually have this: static inline unsigned long current_top_of_stack(void) which not only gives us the value we are interested in, but happens to be how "current_thread_info()" is currently defined as: (struct thread_info *)(current_top_of_stack() - THREAD_SIZE); so using current_thread_info() to figure out the top of the stack really is a very round-about thing to do. The other cases are just simpler confusion about task_thread_info() vs task_stack_page(), which currently return the same pointer - but if you want the stack page, you really should be using the latter one. And there was one entirely unused assignment of the current stack to a thread_info pointer. All cleaned up to make more sense today, and make it easier to move the thread_info away from the stack in the future. No semantic changes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23x86: avoid avoid passing around 'thread_info' in stack dumping codeLinus Torvalds
None of the code actually wants a thread_info, it all wants a task_struct, and it's just converting to a thread_info pointer much too early. No semantic change. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23kvm: x86: use getboottime64Arnd Bergmann
KVM reads the current boottime value as a struct timespec in order to calculate the guest wallclock time, resulting in an overflow in 2038 on 32-bit systems. The data then gets passed as an unsigned 32-bit number to the guest, and that in turn overflows in 2106. We cannot do much about the second overflow, which affects both 32-bit and 64-bit hosts, but we can ensure that they both behave the same way and don't overflow until 2106, by using getboottime64() to read a timespec64 value. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23KVM: VMX: enable guest access to LMCE related MSRsAshok Raj
On Intel platforms, this patch adds LMCE to KVM MCE supported capabilities and handles guest access to LMCE related MSRs. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported Only enable LMCE on Intel platform Check MSR_IA32_FEATURE_CONTROL when handling guest access to MSR_IA32_MCG_EXT_CTL] Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROLHaozhong Zhang
KVM currently does not check the value written to guest MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features may be set. This patch makes KVM to validate individual bits written to guest MSR_IA32_FEATURE_CONTROL according to enabled features. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23KVM: VMX: move msr_ia32_feature_control to vcpu_vmxHaozhong Zhang
msr_ia32_feature_control will be used for LMCE and not depend only on nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-23x86/xen: avoid m2p lookup when setting early page table entriesDavid Vrabel
When page tables entries are set using xen_set_pte_init() during early boot there is no page fault handler that could handle a fault when performing an M2P lookup. In 64 bit guests (usually dom0) early_ioremap() would fault in xen_set_pte_init() because an M2P lookup faults because the MFN is in MMIO space and not mapped in the M2P. This lookup is done to see if the PFN in in the range used for the initial page table pages, so that the PTE may be set as read-only. The M2P lookup can be avoided by moving the check (and clear of RW) earlier when the PFN is still available. Reported-by: Kevin Moraga <kmoragas@riseup.net> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2016-06-23x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()Juergen Gross
xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper bound of the loop setting non-kernel-image entries to zero should not exceed the size of level2_kernel_pgt. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23crypto: sha1-mb - async implementation for sha1-mbMegha Dey
Herbert wants the sha1-mb algorithm to have an async implementation: https://lkml.org/lkml/2016/4/5/286. Currently, sha1-mb uses an async interface for the outer algorithm and a sync interface for the inner algorithm. This patch introduces a async interface for even the inner algorithm. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-23crypto: ghash-clmulni - Fix cryptd reorderingHerbert Xu
This patch fixes an old bug where requests can be reordered because some are processed by cryptd while others are processed directly in softirq context. The fix is to always postpone to cryptd if there are currently requests outstanding from the same tfm. This patch also removes the redundant use of cryptd in the async init function as init never touches the FPU. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-23crypto: aesni - Fix cryptd reordering problem on gcmHerbert Xu
This patch fixes an old bug where gcm requests can be reordered because some are processed by cryptd while others are processed directly in softirq context. The fix is to always postpone to cryptd if there are currently requests outstanding from the same tfm. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-23crypto: chacha20-simd - Use generic code for small requestsHerbert Xu
On 16-byte requests the optimised version is actually slower than the generic code, so we should simply use that instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers,