summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-09-13Merge tag 'efi-next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/core Pull EFI updates from Matt Fleming: "* Refactor the EFI memory map code into architecture neutral files and allow drivers to permanently reserve EFI boot services regions on x86, as well as ARM/arm64 - Matt Fleming * Add ARM support for the EFI esrt driver - Ard Biesheuvel * Make the EFI runtime services and efivar API interruptible by swapping spinlocks for semaphores - Sylvain Chouleur * Provide the EFI identity mapping for kexec which allows kexec to work on SGI/UV platforms with requiring the "noefi" kernel command line parameter - Alex Thorlton * Add debugfs node to dump EFI page tables on arm64 - Ard Biesheuvel * Merge the EFI test driver being carried out of tree until now in the FWTS project - Ivan Hu * Expand the list of flags for classifying EFI regions as "RAM" on arm64 so we align with the UEFI spec - Ard Biesheuvel * Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32) or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot services function table for direct calls, alleviating us from having to maintain the custom function table - Lukas Wunner * Miscellaneous cleanups and fixes" Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-09x86/efi: Allow invocation of arbitrary boot servicesLukas Wunner
We currently allow invocation of 8 boot services with efi_call_early(). Not included are LocateHandleBuffer and LocateProtocol in particular. For graphics output or to retrieve PCI ROMs and Apple device properties, we're thus forced to use the LocateHandle + AllocatePool + LocateHandle combo, which is cumbersome and needs more code. The ARM folks allow invocation of the full set of boot services but are restricted to our 8 boot services in functions shared across arches. Thus, rather than adding just LocateHandleBuffer and LocateProtocol to struct efi_config, let's rework efi_call_early() to allow invocation of arbitrary boot services by selecting the 64 bit vs 32 bit code path in the macro itself. When compiling for 32 bit or for 64 bit without mixed mode, the unused code path is optimized away and the binary code is the same as before. But on 64 bit with mixed mode enabled, this commit adds one compare instruction to each invocation of a boot service and, depending on the code path selected, two jump instructions. (Most of the time gcc arranges the jumps in the 32 bit code path.) The result is a minuscule performance penalty and the binary code becomes slightly larger and more difficult to read when disassembled. This isn't a hot path, so these drawbacks are arguably outweighed by the attainable simplification of the C code. We have some overhead anyway for thunking or conversion between calling conventions. The 8 boot services can consequently be removed from struct efi_config. No functional change intended (for now). Example -- invocation of free_pool before (64 bit code path): 0x2d4 movq %ds:efi_early, %rdx ; efi_early 0x2db movq %ss:arg_0-0x20(%rsp), %rsi 0x2e0 xorl %eax, %eax 0x2e2 movq %ds:0x28(%rdx), %rdi ; efi_early->free_pool 0x2e6 callq *%ds:0x58(%rdx) ; efi_early->call() Example -- invocation of free_pool after (64 / 32 bit mixed code path): 0x0dc movq %ds:efi_early, %rax ; efi_early 0x0e3 cmpb $0, %ds:0x28(%rax) ; !efi_early->is64 ? 0x0e7 movq %ds:0x20(%rax), %rdx ; efi_early->call() 0x0eb movq %ds:0x10(%rax), %rax ; efi_early->boot_services 0x0ef je $0x150 0x0f1 movq %ds:0x48(%rax), %rdi ; free_pool (64 bit) 0x0f5 xorl %eax, %eax 0x0f7 callq *%rdx ... 0x150 movl %ds:0x30(%rax), %edi ; free_pool (32 bit) 0x153 jmp $0x0f5 Size of eboot.o text section: CONFIG_X86_32: 6464 before, 6318 after CONFIG_X86_64 && !CONFIG_EFI_MIXED: 7670 before, 7573 after CONFIG_X86_64 && CONFIG_EFI_MIXED: 7670 before, 8319 after Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Optimize away setup_gop32/64 if unusedLukas Wunner
Commit 2c23b73c2d02 ("x86/efi: Prepare GOP handling code for reuse as generic code") introduced an efi_is_64bit() macro to x86 which previously only existed for arm arches. The macro is used to choose between the 64 bit or 32 bit code path in gop.c at runtime. However the code path that's going to be taken is known at compile time when compiling for x86_32 or for x86_64 with mixed mode disabled. Amend the macro to eliminate the unused code path in those cases. Size of gop.o text section: CONFIG_X86_32: 1758 before, 1299 after CONFIG_X86_64 && !CONFIG_EFI_MIXED: 2201 before, 1406 after CONFIG_X86_64 && CONFIG_EFI_MIXED: 2201 before and after Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Use kmalloc_array() in efi_call_phys_prolog()Markus Elfring
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus reuse the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Defer efi_esrt_init until after memblock_x86_fillRicardo Neri
Commit 7b02d53e7852 ("efi: Allow drivers to reserve boot services forever") introduced a new efi_mem_reserve to reserve the boot services memory regions forever. This reservation involves allocating a new EFI memory range descriptor. However, allocation can only succeed if there is memory available for the allocation. Otherwise, error such as the following may occur: esrt: Reserving ESRT space from 0x000000003dd6a000 to 0x000000003dd6a010. Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0 bytes below \ 0x0. CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc5+ #503 0000000000000000 ffffffff81e03ce0 ffffffff8131dae8 ffffffff81bb6c50 ffffffff81e03d70 ffffffff81e03d60 ffffffff8111f4df 0000000000000018 ffffffff81e03d70 ffffffff81e03d08 00000000000009f0 00000000000009f0 Call Trace: [<ffffffff8131dae8>] dump_stack+0x4d/0x65 [<ffffffff8111f4df>] panic+0xc5/0x206 [<ffffffff81f7c6d3>] memblock_alloc_base+0x29/0x2e [<ffffffff81f7c6e3>] memblock_alloc+0xb/0xd [<ffffffff81f6c86d>] efi_arch_mem_reserve+0xbc/0x134 [<ffffffff81fa3280>] efi_mem_reserve+0x2c/0x31 [<ffffffff81fa3280>] ? efi_mem_reserve+0x2c/0x31 [<ffffffff81fa40d3>] efi_esrt_init+0x19e/0x1b4 [<ffffffff81f6d2dd>] efi_init+0x398/0x44a [<ffffffff81f5c782>] setup_arch+0x415/0xc30 [<ffffffff81f55af1>] start_kernel+0x5b/0x3ef [<ffffffff81f55434>] x86_64_start_reservations+0x2f/0x31 [<ffffffff81f55520>] x86_64_start_kernel+0xea/0xed ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0 bytes below 0x0. An inspection of the memblock configuration reveals that there is no memory available for the allocation: MEMBLOCK configuration: memory size = 0x0 reserved size = 0x4f339c0 memory.cnt = 0x1 memory[0x0] [0x00000000000000-0xffffffffffffffff], 0x0 bytes on node 0\ flags: 0x0 reserved.cnt = 0x4 reserved[0x0] [0x0000000008c000-0x0000000008c9bf], 0x9c0 bytes flags: 0x0 reserved[0x1] [0x0000000009f000-0x000000000fffff], 0x61000 bytes\ flags: 0x0 reserved[0x2] [0x00000002800000-0x0000000394bfff], 0x114c000 bytes\ flags: 0x0 reserved[0x3] [0x000000304e4000-0x00000034269fff], 0x3d86000 bytes\ flags: 0x0 This situation can be avoided if we call efi_esrt_init after memblock has memory regions for the allocation. Also, the EFI ESRT driver makes use of early_memremap'pings. Therfore, we do not want to defer efi_esrt_init for too long. We must call such function while calls to early_memremap are still valid. A good place to meet the two aforementioned conditions is right after memblock_x86_fill, grouped with other EFI-related functions. Reported-by: Scott Lawson <scott.lawson@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Jones <pjones@redhat.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Remove unused find_bits() functionLukas Wunner
Left behind by commit fc37206427ce ("efi/libstub: Move Graphics Output Protocol handling to generic code"). Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Map in physical addresses in efi_map_region_fixedAlex Thorlton
This is a simple change to add in the physical mappings as well as the virtual mappings in efi_map_region_fixed. The motivation here is to get access to EFI runtime code that is only available via the 1:1 mappings on a kexec'd kernel. The added call is essentially the kexec analog of the first __map_region that Boris put in efi_map_region in commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). Signed-off-by: Alex Thorlton <athorlton@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Mike Travis <travis@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Young <dyoung@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Initialize status to ensure garbage is not returned on small sizeColin Ian King
Although very unlikey, if size is too small or zero, then we end up with status not being set and returning garbage. Instead, initializing status to EFI_INVALID_PARAMETER to indicate that size is invalid in the calls to setup_uga32 and setup_uga64. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image dataMatt Fleming
efi_mem_reserve() allows us to permanently mark EFI boot services regions as reserved, which means we no longer need to copy the image data out and into a separate buffer. Leaving the data in the original boot services region has the added benefit that BGRT images can now be passed across kexec reboot. Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Jones <pjones@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Môshe van der Sterre <me@moshe.nl> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09efi/runtime-map: Use efi.memmap directly instead of a copyMatt Fleming
Now that efi.memmap is available all of the time there's no need to allocate and build a separate copy of the EFI memory map. Furthermore, efi.memmap contains boot services regions but only those regions that have been reserved via efi_mem_reserve(). Using efi.memmap allows us to pass boot services across kexec reboot so that the ESRT and BGRT drivers will now work. Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Jones <pjones@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09efi: Allow drivers to reserve boot services foreverMatt Fleming
Today, it is not possible for drivers to reserve EFI boot services for access after efi_free_boot_services() has been called on x86. For ARM/arm64 it can be done simply by calling memblock_reserve(). Having this ability for all three architectures is desirable for a couple of reasons, 1) It saves drivers copying data out of those regions 2) kexec reboot can now make use of things like ESRT Instead of using the standard memblock_reserve() which is insufficient to reserve the region on x86 (see efi_reserve_boot_services()), a new API is introduced in this patch; efi_mem_reserve(). efi.memmap now always represents which EFI memory regions are available. On x86 the EFI boot services regions that have not been reserved via efi_mem_reserve() will be removed from efi.memmap during efi_free_boot_services(). This has implications for kexec, since it is not possible for a newly kexec'd kernel to access the same boot services regions that the initial boot kernel had access to unless they are reserved by every kexec kernel in the chain. Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Jones <pjones@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09efi: Add efi_memmap_init_late() for permanent EFI memmapMatt Fleming
Drivers need a way to access the EFI memory map at runtime. ARM and arm64 currently provide this by remapping the EFI memory map into the vmalloc space before setting up the EFI virtual mappings. x86 does not provide this functionality which has resulted in the code in efi_mem_desc_lookup() where it will manually map individual EFI memmap entries if the memmap has already been torn down on x86, /* * If a driver calls this after efi_free_boot_services, * ->map will be NULL, and the target may also not be mapped. * So just always get our own virtual map on the CPU. * */ md = early_memremap(p, sizeof (*md)); There isn't a good reason for not providing a permanent EFI memory map for runtime queries, especially since the EFI regions are not mapped into the standard kernel page tables. Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Jones <pjones@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09efi: Refactor efi_memmap_init_early() into arch-neutral codeMatt Fleming
Every EFI architecture apart from ia64 needs to setup the EFI memory map at efi.memmap, and the code for doing that is essentially the same across all implementations. Therefore, it makes sense to factor this out into the common code under drivers/firmware/efi/. The only slight variation is the data structure out of which we pull the initial memory map information, such as physical address, memory descriptor size and version, etc. We can address this by passing a generic data structure (struct efi_memory_map_data) as the argument to efi_memmap_init_early() which contains the minimum info required for initialising the memory map. In the process, this patch also fixes a few undesirable implementation differences: - ARM and arm64 were failing to clear the EFI_MEMMAP bit when unmapping the early EFI memory map. EFI_MEMMAP indicates whether the EFI memory map is mapped (not the regions contained within) and can be traversed. It's more correct to set the bit as soon as we memremap() the passed in EFI memmap. - Rename efi_unmmap_memmap() to efi_memmap_unmap() to adhere to the regular naming scheme. This patch also uses a read-write mapping for the memory map instead of the read-only mapping currently used on ARM and arm64. x86 needs the ability to update the memory map in-place when assigning virtual addresses to regions (efi_map_region()) and tagging regions when reserving boot services (efi_reserve_boot_services()). There's no way for the generic fake_mem code to know which mapping to use without introducing some arch-specific constant/hook, so just use read-write since read-only is of dubious value for the EFI memory map. Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Peter Jones <pjones@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Consolidate region mapping logicMatt Fleming
EFI regions are currently mapped in two separate places. The bulk of the work is done in efi_map_regions() but when CONFIG_EFI_MIXED is enabled the additional regions that are required when operating in mixed mode are mapping in efi_setup_page_tables(). Pull everything into efi_map_regions() and refactor the test for which regions should be mapped into a should_map_region() function. Generously sprinkle comments to clarify the different cases. Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-09x86/efi: Test for EFI_MEMMAP functionality when iterating EFI memmapMatt Fleming
Both efi_find_mirror() and efi_fake_memmap() really want to know whether the EFI memory map is available, not just whether the machine was booted using EFI. efi_fake_memmap() even has a check for EFI_MEMMAP at the start of the function. Since we've already got other code that has this dependency, merge everything under one if() conditional, and remove the now superfluous check from efi_fake_memmap(). Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump] Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm] Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05x86/efi: Use efi_exit_boot_services()Jeffrey Hugo
The eboot code directly calls ExitBootServices. This is inadvisable as the UEFI spec details a complex set of errors, race conditions, and API interactions that the caller of ExitBootServices must get correct. The eboot code attempts allocations after calling ExitBootSerives which is not permitted per the spec. Call the efi_exit_boot_services() helper intead, which handles the allocation scenario properly. Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-05efi/libstub: Allocate headspace in efi_get_memory_map()Jeffrey Hugo
efi_get_memory_map() allocates a buffer to store the memory map that it retrieves. This buffer may need to be reused by the client after ExitBootServices() is called, at which point allocations are not longer permitted. To support this usecase, provide the allocated buffer size back to the client, and allocate some additional headroom to account for any reasonable growth in the map that is likely to happen between the call to efi_get_memory_map() and the client reusing the buffer. Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-09-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for an AMD erratum so machines without a BIOS fix work" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/AMD: Apply erratum 665 on machines without a BIOS fix
2016-09-02x86/AMD: Apply erratum 665 on machines without a BIOS fixEmanuel Czirai
AMD F12h machines have an erratum which can cause DIV/IDIV to behave unpredictably. The workaround is to set MSRC001_1029[31] but sometimes there is no BIOS update containing that workaround so let's do it ourselves unconditionally. It is simple enough. [ Borislav: Wrote commit message. ] Signed-off-by: Emanuel Czirai <icanrealizeum@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yaowu Xu <yaowu@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-02x86/paravirt: Do not trace _paravirt_ident_*() functionsSteven Rostedt
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up after enabling function tracer. I asked him to bisect the functions within available_filter_functions, which he did and it came down to three: _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64() It was found that this is only an issue when noreplace-paravirt is added to the kernel command line. This means that those functions are most likely called within critical sections of the funtion tracer, and must not be traced. In newer kenels _paravirt_nop() is defined within gcc asm(), and is no longer an issue. But both _paravirt_ident_{32,64}() causes the following splat when they are traced: mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070) mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054) mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054) NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469] Modules linked in: e1000e CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000 RIP: 0010:[<ffffffff81134148>] [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0 RSP: 0018:ffff8800d4aefb90 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0 FS: 00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0 Call Trace: _raw_spin_lock+0x27/0x30 handle_pte_fault+0x13db/0x16b0 handle_mm_fault+0x312/0x670 __do_page_fault+0x1b1/0x4e0 do_page_fault+0x22/0x30 page_fault+0x28/0x30 __vfs_read+0x28/0xe0 vfs_read+0x86/0x130 SyS_read+0x46/0xa0 entry_SYSCALL_64_fastpath+0x1e/0xa8 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-01kconfig: tinyconfig: provide whole choice blocks to avoid warningsArnd Bergmann
Using "make tinyconfig" produces a couple of annoying warnings that show up for build test machines all the time: .config:966:warning: override: NOHIGHMEM changes choice state .config:965:warning: override: SLOB changes choice state .config:963:warning: override: KERNEL_XZ changes choice state .config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state .config:933:warning: override: SLOB changes choice state .config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state .config:870:warning: override: SLOB changes choice state .config:868:warning: override: KERNEL_XZ changes choice state .config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state I've made a previous attempt at fixing them and we discussed a number of alternatives. I tried changing the Makefile to use "merge_config.sh -n $(fragment-list)" but couldn't get that to work properly. This is yet another approach, based on the observation that we do want to see a warning for conflicting 'choice' options, and that we can simply make them non-conflicting by listing all other options as disabled. This is a trivial patch that we can apply independent of plans for other changes. Link: http://lkml.kernel.org/r/20160829214952.1334674-2-arnd@arndb.de Link: https://storage.kernelci.org/mainline/v4.7-rc6/x86-tinyconfig/build.log https://patchwork.kernel.org/patch/9212749/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-30mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKSJosh Poimboeuf
There are three usercopy warnings which are currently being silenced for gcc 4.6 and newer: 1) "copy_from_user() buffer size is too small" compile warning/error This is a static warning which happens when object size and copy size are both const, and copy size > object size. I didn't see any false positives for this one. So the function warning attribute seems to be working fine here. Note this scenario is always a bug and so I think it should be changed to *always* be an error, regardless of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS. 2) "copy_from_user() buffer size is not provably correct" compile warning This is another static warning which happens when I enable __compiletime_object_size() for new compilers (and CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size is const, but copy size is *not*. In this case there's no way to compare the two at build time, so it gives the warning. (Note the warning is a byproduct of the fact that gcc has no way of knowing whether the overflow function will be called, so the call isn't dead code and the warning attribute is activated.) So this warning seems to only indicate "this is an unusual pattern, maybe you should check it out" rather than "this is a bug". I get 102(!) of these warnings with allyesconfig and the __compiletime_object_size() gcc check removed. I don't know if there are any real bugs hiding in there, but from looking at a small sample, I didn't see any. According to Kees, it does sometimes find real bugs. But the false positive rate seems high. 3) "Buffer overflow detected" runtime warning This is a runtime warning where object size is const, and copy size > object size. All three warnings (both static and runtime) were completely disabled for gcc 4.6 with the following commit: 2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+") That commit mistakenly assumed that the false positives were caused by a gcc bug in __compiletime_object_size(). But in fact, __compiletime_object_size() seems to be working fine. The false positives were instead triggered by #2 above. (Though I don't have an explanation for why the warnings supposedly only started showing up in gcc 4.6.) So remove warning #2 to get rid of all the false positives, and re-enable warnings #1 and #3 by reverting the above commit. Furthermore, since #1 is a real bug which is detected at compile time, upgrade it to always be an error. Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer needed. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Nilay Vaish <nilayvaish@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single bugfix to prevent irq remapping when the ioapic is disabled" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Do not init irq remapping if ioapic is disabled
2016-08-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "ARM: - fixes for ITS init issues, error handling, IRQ leakage, race conditions - an erratum workaround for timers - some removal of misleading use of errors and comments - a fix for GICv3 on 32-bit guests MIPS: - fix for where the guest could wrongly map the first page of physical memory x86: - nested virtualization fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: MIPS: KVM: Check for pfn noslot case kvm: nVMX: fix nested tsc scaling KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC arm64: KVM: report configured SRE value to 32-bit world arm64: KVM: remove misleading comment on pmu status KVM: arm/arm64: timer: Workaround misconfigured timer interrupt arm64: Document workaround for Cortex-A72 erratum #853709 KVM: arm/arm64: Change misleading use of is_error_pfn KVM: arm64: ITS: avoid re-mapping LPIs KVM: arm64: check for ITS device on MSI injection KVM: arm64: ITS: move ITS registration into first VCPU run KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic KVM: arm64: vgic-its: Plug race in vgic_put_irq KVM: arm64: vgic-its: Handle errors from vgic_add_lpi KVM: arm64: ITS: return 1 on successful MSI injection
2016-08-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: silently skip readahead for DAX inodes dax: fix device-dax region base fs/seq_file: fix out-of-bounds read mm: memcontrol: avoid unused function warning mm: clarify COMPACTION Kconfig text treewide: replace config_enabled() with IS_ENABLED() (2nd round) printk: fix parsing of "brl=" option soft_dirty: fix soft_dirty during THP split sysctl: handle error writing UINT_MAX to u32 fields get_maintainer: quiet noisy implicit -f vcs_file_exists checking byteswap: don't use __builtin_bswap*() with sparse
2016-08-26Merge tag 'pci-v4.8-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Resource management: - Update "pci=resource_alignment" documentation (Mathias Koehrer) MSI: - Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig) - Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig) Intel VMD host bridge driver: - Fix infinite loop executing irq's (Keith Busch)" * tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: VMD: Fix infinite loop executing irq's PCI: Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() PCI: Use positive flags in pci_alloc_irq_vectors() PCI: Update "pci=resource_alignment" documentation
2016-08-26treewide: replace config_enabled() with IS_ENABLED() (2nd round)Masahiro Yamada
Commit 97f2645f358b ("tree-wide: replace config_enabled() with IS_ENABLED()") mostly killed config_enabled(), but some new users have appeared for v4.8-rc1. They are all used for a boolean option, so can be replaced with IS_ENABLED() safely. Link: http://lkml.kernel.org/r/1471970749-24867-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-24Merge tag 'for-linus-4.8b-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen regression fix from David Vrabel: "Fix a regression in the xenbus device preventing userspace tools from working" * tag 'for-linus-4.8b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: change the type of xen_vcpu_id to uint32_t xenbus: don't look up transaction IDs for ordinary writes
2016-08-24xen: change the type of xen_vcpu_id to uint32_tVitaly Kuznetsov
We pass xen_vcpu_id mapping information to hypercalls which require uint32_t type so it would be cleaner to have it as uint32_t. The initializer to -1 can be dropped as we always do the mapping before using it and we never check the 'not set' value anyway. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-08-24x86/apic: Do not init irq remapping if ioapic is disabledWanpeng Li
native_smp_prepare_cpus -> default_setup_apic_routing -> enable_IR_x2apic -> irq_remapping_prepare -> intel_prepare_irq_remapping -> intel_setup_irq_remapping So IR table is setup even if "noapic" boot parameter is added. As a result we crash later when the interrupt affinity is set due to a half initialized remapping infrastructure. Prevent remap initialization when IOAPIC is disabled. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-23x86/PCI: VMD: Fix infinite loop executing irq'sKeith Busch
We can't initialize the list head on deletion as this causes the node to point to itself, which causes an infinite loop if vmd_irq() happens to be servicing that node. The list initialization was trying to fix a bug from multiple calls to disable the same IRQ. Fix this instead by having the VMD driver track if the interrupt is enabled. [bhelgaas: changelog, add "Fixes"] Fixes: 97e923063575 ("x86/PCI: VMD: Initialize list item in IRQ disable") Reported-by: Grzegorz Koczot <grzegorz.koczot@intel.com> Tested-by: Miroslaw Drost <miroslaw.drost@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by Jon Derrick: <jonathan.derrick@intel.com>
2016-08-23Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a number of memory corruption bugs in the newly added sha256-mb/sha256-mb code" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: sha512-mb - fix ctx pointer crypto: sha256-mb - fix ctx pointer and digest copy
2016-08-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "An initrd microcode loading fix, and an SMP bootup topology setup fix to resolve crashes on SGI/UV systems if the BIOS is configured in a certain way" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/smp: Fix __max_logical_packages value setup x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
2016-08-18kvm: nVMX: fix nested tsc scalingPeter Feiner
When the host supported TSC scaling, L2 would use a TSC multiplier of 0, which causes a VM entry failure. Now L2's TSC uses the same multiplier as L1. Signed-off-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-18KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE writeRadim Krčmář
If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the write with vmcs02 as the current VMCS. This will incorrectly apply modifications intended for vmcs01 to vmcs02 and L2 can use it to gain access to L0's x2APIC registers by disabling virtualized x2APIC while using msr bitmap that assumes enabled. Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the current VMCS. An alternative solution would temporarily make vmcs01 the current VMCS, but it requires more care. Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support") Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-18KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APICRadim Krčmář
msr bitmap can be used to avoid a VM exit (interception) on guest MSR accesses. In some configurations of VMX controls, the guest can even directly access host's x2APIC MSRs. See SDM 29.5 VIRTUALIZING MSR-BASED APIC ACCESSES. L2 could read all L0's x2APIC MSRs and write TPR, EOI, and SELF_IPI. To do so, L1 would first trick KVM to disable all possible interceptions by enabling APICv features and then would turn those features off; nested_vmx_merge_msr_bitmap() only disabled interceptions, so VMX would not intercept previously enabled MSRs even though they were not safe with the new configuration. Correctly re-enabling interceptions is not enough as a second bug would still allow L1+L2 to access host's MSRs: msr bitmap was shared for all VMCSs, so L1 could trigger a race to get the desired combination of msr bitmap and VMX controls. This fix allocates a msr bitmap for every L1 VCPU, allows only safe x2APIC MSRs from L1's msr bitmap, and disables msr bitmaps if they would have to intercept everything anyway. Fixes: 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap") Reported-by: Jim Mattson <jmattson@google.com> Suggested-by: Wincy Van <fanwenyi0529@gmail.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-18x86/smp: Fix __max_logical_packages value setupJiri Olsa
Frank reported kernel panic when he disabled several cores in BIOS via following option: Core Disable Bitmap(Hex) [0] with number 0xFFE, which leaves 16 CPUs in system (out of 48). The kernel panic below goes along with following messages: smpboot: Max logical packages: 2^M smpboot: APIC(0) Converting physical 0 to logical package 0^M smpboot: APIC(20) Converting physical 1 to logical package 1^M smpboot: APIC(40) Package 2 exceeds logical package map^M smpboot: CPU 8 APICId 40 disabled^M smpboot: APIC(60) Package 3 exceeds logical package map^M smpboot: CPU 12 APICId 60 disabled^M ... general protection fault: 0000 [#1] SMP^M Modules linked in:^M CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M task: ffff8801673e0000 ti: ffff8801673ac000 task.ti: ffff8801673ac000^M RIP: 0010:[<ffffffff81014d54>] [<ffffffff81014d54>] uncore_change_context+0xd4/0x180^M ... [<ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M [<ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M [<ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M [<ffffffff81002190>] do_one_initcall+0x50/0x190^M [<ffffffff810ab193>] ? parse_args+0x293/0x480^M [<ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M [<ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M [<ffffffff816dc19e>] kernel_init+0xe/0x110^M [<ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M [<ffffffff816dc190>] ? rest_init+0x80/0x80^M The reason for the panic is wrong value of __max_logical_packages, which lets logical_package_map uninitialized and the uncore code relying on this map being properly initialized (maybe we should add some safety checks there as well). The __max_logical_packages is computed as: DIV_ROUND_UP(total_cpus, ncpus); - ncpus being number of cores With above BIOS setup we get total_cpus == 16 which set __max_logical_packages to 2 (ncpus is 12). Once topology_update_package_map processes CPU with logical pkg over 2 we display above messages and fail to initialize the physical_to_logical_pkg map, which makes the uncore code crash. The fix is to remove logical_package_map bitmap completely and keep and update the logical_packages number instead. After we enumerate all the present CPUs, we check if the enumerated logical packages count is within its computed maximum from BIOS data. If it's not the case, we set this maximum to the new enumerated value and freeze any new addition of logical packages. The freeze is because lot of init code like uncore/rapl/cqm depends on having maximum logical package value set to allocate their data, so we can't change it later on. Prarit Bhargava tested the patch and confirms that it solves the problem: From dmidecode: Core Count: 24 Core Enabled: 24 Thread Count: 48 Orig kernel boot log: [ 0.464981] smpboot: Max logical packages: 19 [ 0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1 [ 0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2 [ 0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3 1. nr_cpus=8, should stop enumerating in package 0: [ 0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.539596] smpboot: Max logical packages: 19 2. max_cpus=8, should still enumerate all packages: [ 0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1 [ 0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2 [ 0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3 [ 0.550524] smpboot: Max logical packages: 19 3. nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in package 2: [ 0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1 [ 0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2 [ 0.539368] smpboot: Max logical packages: 19 4. maxcpus=49, should still enumerate all packages: [ 0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0 [ 0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1 [ 0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2 [ 0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3 [ 0.549624] smpboot: Max logical packages: 19 5. kdump (nr_cpus=1) works as well. Reported-by: Frank Ramsay <framsay@redhat.com> Tested-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=yBorislav Petkov
Similar to: efaad554b4ff ("x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y") ... fix microcode loading from the initrd on AMD by adding the randomization offset to the microcode patch container within the initrd. Reported-and-tested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-tip-commits@vger.kernel.org Link: http://lkml.kernel.org/r/20160817113314.GA19221@nazgul.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18Merge branch 'pm-sleep'Rafael J. Wysocki
* pm-sleep: PM / hibernate: Fix rtree_next_node() to avoid walking off list ends x86/power/64: Use __pa() for physical address computation PM / sleep: Update some system sleep documentation
2016-08-16crypto: sha512-mb - fix ctx pointerXiaodong Liu
1. fix ctx pointer Use req_ctx which is the ctx for the next job that have been completed in the lanes instead of the first completed job rctx, whose completion could have been called and released. Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-08-16crypto: sha256-mb - fix ctx pointer and digest copyXiaodong Liu
1. fix ctx pointer Use req_ctx which is the ctx for the next job that have been completed in the lanes instead of the first completed job rctx, whose completion could have been called and released. 2. fix digest copy Use XMM register to copy another 16 bytes sha256 digest instead of a regular register. Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-08-16x86/power/64: Use __pa() for physical address computationRafael J. Wysocki
The value of temp_level4_pgt is the physical address of the top-level page directory, so use __pa() to compute it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2016-08-12Merge tag 'pm-4.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Two hibernation fixes allowing it to work with the recently added randomization of the kernel identity mapping base on x86-64 and one cpufreq driver regression fix. Specifics: - Fix the x86 identity mapping creation helpers to avoid the assumption that the base address of the mapping will always be aligned at the PGD level, as it may be aligned at the PUD level if address space randomization is enabled (Rafael Wysocki). - Fix the hibernation core to avoid executing tracing functions before restoring the processor state completely during resume (Thomas Garnier). - Fix a recently introduced regression in the powernv cpufreq driver that causes it to crash due to an out-of-bounds array access (Akshay Adiga)" * tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / hibernate: Restore processor state before using per-CPU variables x86/power/64: Always create temporary identity mapping correctly cpufreq: powernv: Fix crash in gpstate_timer_handler()
2016-08-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This is bigger than usual - the reason is partly a pent-up stream of fixes after the merge window and partly accidental. The fixes are: - five patches to fix a boot failure on Andy Lutomirsky's laptop - four SGI UV platform fixes - KASAN fix - warning fix - documentation update - swap entry definition fix - pkeys fix - irq stats fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe() x86/efi: Allocate a trampoline if needed in efi_free_boot_services() x86/boot: Rework reserve_real_mode() to allow multiple tries x86/boot: Defer setup_real_mode() to early_initcall time x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly x86/boot: Run reserve_bios_regions() after we initialize the memory map x86/irq: Do not substract irq_tlb_count from irq_call_count x86/mm: Fix swap entry comment and macro x86/mm/kaslr: Fix -Wformat-security warning x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET x86/mm: Disable preemption during CR3 read+write x86/mm/KASLR: Increase BRK pages for KASLR memory randomization x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
2016-08-12Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Misc fixes: a /dev/rtc regression fix, two APIC timer period calibration fixes, an ARM clocksource driver fix and a NOHZ power use regression fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup x86/timers/apic: Inform TSC deadline clockevent device about recalibration x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error timers: Fix get_next_timer_interrupt() computation clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
2016-08-12Merge branches 'pm-sleep' and 'pm-cpufreq'Rafael J. Wysocki
* pm-sleep: PM / hibernate: Restore processor state before using per-CPU variables x86/power/64: Always create temporary identity mapping correctly * pm-cpufreq: cpufreq: powernv: Fix crash in gpstate_timer_handler()
2016-08-12Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a perf-cgroups fix and an AUX events fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add enable_box for client MSR uncore perf/x86/intel/uncore: Fix uncore num_counters uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions perf/core: Set cgroup in CPU contexts for new cgroup events perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash perf probe ppc64le: Fix probe location when using DWARF perf probe: Add function to post process kernel trace events tools: Sync cpufeatures headers with the kernel toops: Sync tools/include/uapi/linux/bpf.h with the kernel tools: Sync cpufeatures.h and vmx.h with the kernel perf probe: Support signedness casting perf stat: Avoid skew when reading events perf probe: Fix module name matching perf probe: Adjust map->reloc offset when finding kernel symbol from map perf hists: Trim libtraceevent trace_seq buffers perf script: Add 'bpf-output' field to usage message
2016-08-12perf/x86/intel/uncore: Add enable_box for client MSR uncoreKan Liang
There are bug reports about miscounting uncore counters on some client machines like Sandybridge, Broadwell and Skylake. It is very likely to be observed on idle systems. This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be cleared after Package C7, and nothing will be count. The related errata (HSD 158) could be found in: www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL in ->enable_box(). The workaround does not cover all cases. It helps for new events after returning from C7. But it cannot prevent C7, it will still miscount if a counter is already active. There is no drawback in leaving it enabled, so it does not need disable_box() here. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12perf/x86/intel/uncore: Fix uncore num_countersKan Liang
Some uncore boxes' num_counters value for Haswell server and Broadwell server are not correct (too large, off by one). This issue was found by comparing the code with the document. Although there is no bug report from users yet, accessing non-existent counters is dangerous and the behavior is undefined: it may cause miscounting or even crashes. This patch makes them consistent with the uncore document. Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructionsDenys Vlasenko
Since instruction decoder now supports EVEX-encoded instructions, two fixes are needed to correctly handle them in uprobes. Extended bits for MODRM.rm field need to be sanitized just like we do it for VEX3, to avoid encoding wrong register for register-relative access. EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be ignored by the CPU (since GPRs go only up to 15, not 31), but let's be paranoid here: proper encoding for register-relative access should have EVEX.x = 1. Secondly, we should fetch vex.vvvv for EVEX too. This is now super easy because instruction decoder populates vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> # v4.1+ Fixes: 8a764a875fe3 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX") Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>