summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-02-08crypto: x86/morus - fix handling chunked inputs and MAY_SLEEPEric Biggers
The x86 MORUS implementations all fail the improved AEAD tests because they produce the wrong result with some data layouts. The issue is that they assume that if the skcipher_walk API gives 'nbytes' not aligned to the walksize (a.k.a. walk.stride), then it is the end of the data. In fact, this can happen before the end. Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can incorrectly sleep in the skcipher_walk_*() functions while preemption has been disabled by kernel_fpu_begin(). Fix these bugs. Fixes: 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-08crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEPEric Biggers
The x86 AEGIS implementations all fail the improved AEAD tests because they produce the wrong result with some data layouts. The issue is that they assume that if the skcipher_walk API gives 'nbytes' not aligned to the walksize (a.k.a. walk.stride), then it is the end of the data. In fact, this can happen before the end. Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can incorrectly sleep in the skcipher_walk_*() functions while preemption has been disabled by kernel_fpu_begin(). Fix these bugs. Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-08crypto: x86/crct10dif-pcl - cleanup and optimizationsEric Biggers
The x86, arm, and arm64 asm implementations of crct10dif are very difficult to understand partly because many of the comments, labels, and macros are named incorrectly: the lengths mentioned are usually off by a factor of two from the actual code. Many other things are unnecessarily convoluted as well, e.g. there are many more fold constants than actually needed and some aren't fully reduced. This series therefore cleans up all these implementations to be much more maintainable. I also made some small optimizations where I saw opportunities, resulting in slightly better performance. This patch cleans up the x86 version. As part of this, I removed support for len < 16 from the x86 assembly; now the glue code falls back to the generic table-based implementation in this case. Due to the overhead of kernel_fpu_begin(), this actually significantly improves performance on these lengths. (And even if kernel_fpu_begin() were free, the generic code is still faster for about len < 11.) This removal also eliminates error-prone special cases and makes the x86, arm32, and arm64 ports of the code match more closely. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-07KVM: nVMX: unconditionally cancel preemption timer in free_nested ↵Peter Shier
(CVE-2019-7221) Bugzilla: 1671904 There are multiple code paths where an hrtimer may have been started to emulate an L1 VMX preemption timer that can result in a call to free_nested without an intervening L2 exit where the hrtimer is normally cancelled. Unconditionally cancel in free_nested to cover all cases. Embargoed until Feb 7th 2019. Signed-off-by: Peter Shier <pshier@google.com> Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org Message-Id: <20181011184646.154065-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-07KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)Paolo Bonzini
Bugzilla: 1671930 Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with memory operand, INVEPT, INVVPID) can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Embargoed until Feb 7th 2019. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-07x86/PCI: Fixup RTIT_BAR of Intel Denverton Trace HubAlexander Shishkin
On Denverton's integration of the Intel(R) Trace Hub (for a reference and overview see Documentation/trace/intel_th.rst) the reported size of one of its resources (RTIT_BAR) doesn't match its actual size, which leads to overlaps with other devices' resources. In practice, it overlaps with XHCI MMIO space, which results in the xhci driver bailing out after seeing its registers as 0xffffffff, and perceived disappearance of all USB devices: intel_th_pci 0000:00:1f.7: enabling device (0004 -> 0006) xhci_hcd 0000:00:15.0: xHCI host controller not responding, assume dead xhci_hcd 0000:00:15.0: xHC not responding in xhci_irq, assume controller is dead xhci_hcd 0000:00:15.0: HC died; cleaning up usb 1-1: USB disconnect, device number 2 For this reason, we need to resize the RTIT_BAR on Denverton to its actual size, which in this case is 4MB. The corresponding erratum is DNV36 at the link below: DNV36. Processor Host Root Complex May Incorrectly Route Memory Accesses to Intel® Trace Hub Problem: The Intel® Trace Hub RTIT_BAR (B0:D31:F7 offset 20h) is reported as a 2KB memory range. Due to this erratum, the processor Host Root Complex will forward addresses from RTIT_BAR to RTIT_BAR + 4MB -1 to Intel® Trace Hub. Implication: Devices assigned within the RTIT_BAR to RTIT_BAR + 4MB -1 space may not function correctly. Workaround: A BIOS code change has been identified and may be implemented as a workaround for this erratum. Status: No Fix. Note that 5118ccd34780 ("intel_th: pci: Add Denverton SOC support") updates the Trace Hub driver so it claims the Denverton device, but the resource overlap exists regardless of whether that driver is loaded or that commit is included. Link: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c3000-family-spec-update.pdf Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> [bhelgaas: include erratum text, clarify relationship with 5118ccd34780] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
2019-02-07y2038: add 64-bit time_t syscalls to all 32-bit architecturesArnd Bergmann
This adds 21 new system calls on each ABI that has 32-bit time_t today. All of these have the exact same semantics as their existing counterparts, and the new ones all have macro names that end in 'time64' for clarification. This gets us to the point of being able to safely use a C library that has 64-bit time_t in user space. There are still a couple of loose ends to tie up in various areas of the code, but this is the big one, and should be entirely uncontroversial at this point. In particular, there are four system calls (getitimer, setitimer, waitid, and getrusage) that don't have a 64-bit counterpart yet, but these can all be safely implemented in the C library by wrapping around the existing system calls because the 32-bit time_t they pass only counts elapsed time, not time since the epoch. They will be dealt with later. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-07y2038: rename old time and utime syscallsArnd Bergmann
The time, stime, utime, utimes, and futimesat system calls are only used on older architectures, and we do not provide y2038 safe variants of them, as they are replaced by clock_gettime64, clock_settime64, and utimensat_time64. However, for consistency it seems better to have the 32-bit architectures that still use them call the "time32" entry points (leaving the traditional handlers for the 64-bit architectures), like we do for system calls that now require two versions. Note: We used to always define __ARCH_WANT_SYS_TIME and __ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and __ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat mode. The resulting asm/unistd.h changes look a bit counterintuitive. This is only a cleanup patch and it should not change any behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-02-07y2038: use time32 syscall names on 32-bitArnd Bergmann
This is the big flip, where all 32-bit architectures set COMPAT_32BIT_TIME and use the _time32 system calls from the former compat layer instead of the system calls that take __kernel_timespec and similar arguments. The temporary redirects for __kernel_timespec, __kernel_itimerspec and __kernel_timex can get removed with this. It would be easy to split this commit by architecture, but with the new generated system call tables, it's easy enough to do it all at once, which makes it a little easier to check that the changes are the same in each table. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07y2038: syscalls: rename y2038 compat syscallsArnd Bergmann
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07x86/x32: use time64 versions of sigtimedwait and recvmmsgArnd Bergmann
x32 has always followed the time64 calling conventions of these syscalls, which required a special hack in compat_get_timespec aka get_old_timespec32 to continue working. Since we now have the time64 syscalls, use those explicitly. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-06x86/boot/compressed/64: Explain paging_prepare()'s return valueKirill A. Shutemov
paging_prepare() returns a two-quadword structure which lands into RDX:RAX: - Address of the trampoline is returned in RAX. - Non zero RDX means trampoline needs to enable 5-level paging. Document that explicitly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: dave.hansen@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kyle D Pelton <kyle.d.pelton@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190206154756.matwldebbxkmlnae@black.fi.intel.com
2019-02-06x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 settingKirill A. Shutemov
RDMSR in the trampoline code overwrites EDX but that register is used to indicate whether 5-level paging has to be enabled and if clobbered, leads to failure to boot on a 5-level paging machine. Preserve EDX on the stack while we are dealing with EFER. Fixes: b677dfae5aa1 ("x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode") Reported-by: Kyle D Pelton <kyle.d.pelton@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: dave.hansen@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190206115253.1907-1-kirill.shutemov@linux.intel.com
2019-02-06regulator: fixed/gpio: Pull inversion/OD into gpiolibLinus Walleij
This pushes the handling of inversion semantics and open drain settings to the GPIO descriptor and gpiolib. All affected board files are also augmented. This is especially nice since we don't have to have any confusing flags passed around to the left and right littering the fixed and GPIO regulator drivers and the regulator core. It is all just very straight-forward: the core asks the GPIO line to be asserted or deasserted and gpiolib deals with the rest depending on how the platform is configured: if the line is active low, it deals with that, if the line is open drain, it deals with that too. Cc: Alexander Shiyan <shc_work@mail.ru> # i.MX boards user Cc: Haojian Zhuang <haojian.zhuang@gmail.com> # MMP2 maintainer Cc: Aaro Koskinen <aaro.koskinen@iki.fi> # OMAP1 maintainer Cc: Tony Lindgren <tony@atomide.com> # OMAP1,2,3 maintainer Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> # EM-X270 maintainer Cc: Robert Jarzmik <robert.jarzmik@free.fr> # EZX maintainer Cc: Philipp Zabel <philipp.zabel@gmail.com> # Magician maintainer Cc: Petr Cvek <petr.cvek@tul.cz> # Magician Cc: Robert Jarzmik <robert.jarzmik@free.fr> # PXA Cc: Paul Parsons <lost.distance@yahoo.com> # hx4700 Cc: Daniel Mack <zonque@gmail.com> # Raumfeld maintainer Cc: Marc Zyngier <marc.zyngier@arm.com> # Zeus maintainer Cc: Geert Uytterhoeven <geert+renesas@glider.be> # SuperH pinctrl/GPIO maintainer Cc: Russell King <rmk+kernel@armlinux.org.uk> # SA1100 Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> #OMAP1 Amstrad Delta Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-02-06x86/kexec: Fill in acpi_rsdp_addr from the first kernelKairui Song
When efi=noruntime or efi=oldmap is used on the kernel command line, EFI services won't be available in the second kernel, therefore the second kernel will not be able to get the ACPI RSDP address from firmware by calling EFI services and so it won't boot. Commit e6e094e053af ("x86/acpi, x86/boot: Take RSDP address from boot params if available") added an acpi_rsdp_addr field to boot_params which stores the RSDP address for other kernel users. Recently, after 3a63f70bf4c3 ("x86/boot: Early parse RSDP and save it in boot_params") the acpi_rsdp_addr will always be filled with a valid RSDP address. So fill in that value into the second kernel's boot_params thus ensuring that the second kernel receives the RSDP value from the first kernel. [ bp: massage commit message. ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kexec@lists.infradead.org Cc: Philipp Rudo <prudo@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yannik Sembritzki <yannik@sembritzki.me> Link: https://lkml.kernel.org/r/20190204173852.4863-1-kasong@redhat.com
2019-02-06perf/aux: Make perf_event accessible to setup_aux()Mathieu Poirier
When pmu::setup_aux() is called the coresight PMU needs to know which sink to use for the session by looking up the information in the event's attr::config2 field. As such simply replace the cpu information by the complete perf_event structure and change all affected customers. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06x86/boot: Fix randconfig build error due to MEMORY_HOTREMOVEBorislav Petkov
When building randconfigs, one of the failures is: ld: arch/x86/boot/compressed/kaslr.o: in function `choose_random_location': kaslr.c:(.text+0xbf7): undefined reference to `count_immovable_mem_regions' ld: kaslr.c:(.text+0xcbe): undefined reference to `immovable_mem' make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1 because CONFIG_ACPI is not enabled in this particular .config but CONFIG_MEMORY_HOTREMOVE is and count_immovable_mem_regions() is unresolvable because it is defined in compressed/acpi.c which is the compilation unit that depends on CONFIG_ACPI. Add CONFIG_ACPI to the explicit dependencies for MEMORY_HOTREMOVE. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190205131033.9564-1-bp@alien8.de
2019-02-06x86/boot: Fix cmdline_find_option() prototype visibilityBorislav Petkov
ac09c5f43cf6 ("x86/boot: Build the command line parsing code unconditionally") enabled building the command line parsing code unconditionally but it forgot to remove the respective ifdeffery around the prototypes in the misc.h header, leading to arch/x86/boot/compressed/acpi.c: In function ‘get_acpi_rsdp’: arch/x86/boot/compressed/acpi.c:37:8: warning: implicit declaration of function \ ‘cmdline_find_option’ [-Wimplicit-function-declaration] ret = cmdline_find_option("acpi_rsdp", val, MAX_ADDR_LEN); ^~~~~~~~~~~~~~~~~~~ for configs where neither CONFIG_EARLY_PRINTK nor CONFIG_RANDOMIZE_BASE was defined. Drop the ifdeffery in the header too. Fixes: ac09c5f43cf6 ("x86/boot: Build the command line parsing code unconditionally") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/5c51daf0.83pQEkvDZILqoSYW%lkp@intel.com Link: https://lkml.kernel.org/r/20190205131352.GA27396@zn.tnic
2019-02-05x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definitionReinette Chatre
The definition of MSR_MISC_FEATURE_CONTROL was first introduced in 98af74599ea0 ("x86 msr_index.h: Define MSR_MISC_FEATURE_CONTROL") and present in Linux since v4.11. The Cache Pseudo-Locking code added this duplicate definition in more recent f2a177292bd0 ("x86/intel_rdt: Discover supported platforms via prefetch disable bits"), available since v4.19. Remove the duplicate definition from the resctrl subsystem and let that code obtain the needed definition from the core architecture msr-index.h instead. Fixes: f2a177292bd0 ("x86/intel_rdt: Discover supported platforms via prefetch disable bits") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: gavin.hindman@intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: jithu.joseph@intel.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/ff6b95d9b6ef6f4ac96267f130719ba1af09614b.1549312475.git.reinette.chatre@intel.com
2019-02-04kexec, KEYS: Make use of platform keyring for signature verifyKairui Song
This patch allows the kexec_file_load syscall to verify the PE signed kernel image signature based on the preboot keys stored in the .platform keyring, as fall back, if the signature verification failed due to not finding the public key in the secondary or builtin keyrings. This commit adds a VERIFY_USE_PLATFORM_KEYRING similar to previous VERIFY_USE_SECONDARY_KEYRING indicating that verify_pkcs7_signature should verify the signature using platform keyring. Also, decrease the error message log level when verification failed with -ENOKEY, so that if called tried multiple time with different keyring it won't generate extra noises. Signed-off-by: Kairui Song <kasong@redhat.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> (for kexec_file_load part) [zohar@linux.ibm.com: tweaked the first paragraph of the patch description, and fixed checkpatch warning.] Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2019-02-04refcount_t: Add ACQUIRE ordering on success for dec(sub)_and_test() variantsElena Reshetova
This adds an smp_acquire__after_ctrl_dep() barrier on successful decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test variants and therefore gives stronger memory ordering guarantees than prior versions of these functions. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: dvyukov@google.com Cc: keescook@chromium.org Cc: stern@rowland.harvard.edu Link: https://lkml.kernel.org/r/1548847131-27854-2-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()Peter Zijlstra
intel_pmu_cpu_prepare() allocated memory for ->shared_regs among other members of struct cpu_hw_events. This memory is released in intel_pmu_cpu_dying() which is wrong. The counterpart of the intel_pmu_cpu_prepare() callback is x86_pmu_dead_cpu(). Otherwise if the CPU fails on the UP path between CPUHP_PERF_X86_PREPARE and CPUHP_AP_PERF_X86_STARTING then it won't release the memory but allocate new memory on the next attempt to online the CPU (leaking the old memory). Also, if the CPU down path fails between CPUHP_AP_PERF_X86_STARTING and CPUHP_PERF_X86_PREPARE then the CPU will go back online but never allocate the memory that was released in x86_pmu_dying_cpu(). Make the memory allocation/free symmetrical in regard to the CPU hotplug notifier by moving the deallocation to intel_pmu_cpu_dead(). This started in commit: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). In principle the bug was introduced in v2.6.39 (!), but it will almost certainly not backport cleanly across the big CPU hotplug rewrite between v4.7-v4.15... [ bigeasy: Added patch description. ] [ mingo: Added backporting guidance. ] Reported-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With developer hat on Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With maintainer hat on Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: bp@alien8.de Cc: hpa@zytor.com Cc: jolsa@kernel.org Cc: kan.liang@linux.intel.com Cc: namhyung@kernel.org Cc: <stable@vger.kernel.org> Fixes: a7e3ed1e47011 ("perf: Add support for supplementary event registers"). Link: https://lkml.kernel.org/r/20181219165350.6s3jvyxbibpvlhtq@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04perf/x86/intel/uncore: Add Node ID maskKan Liang
Some PCI uncore PMUs cannot be registered on an 8-socket system (HPE Superdome Flex). To understand which Socket the PCI uncore PMUs belongs to, perf retrieves the local Node ID of the uncore device from CPUNODEID(0xC0) of the PCI configuration space, and the mapping between Socket ID and Node ID from GIDNIDMAP(0xD4). The Socket ID can be calculated accordingly. The local Node ID is only available at bit 2:0, but current code doesn't mask it. If a BIOS doesn't clear the rest of the bits, an incorrect Node ID will be fetched. Filter the Node ID by adding a mask. Reported-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v3.7+ Fixes: 7c94ee2e0917 ("perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support") Link: https://lkml.kernel.org/r/1548600794-33162-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementationArd Biesheuvel
Move the x86 EFI earlyprintk implementation to a shared location under drivers/firmware and tweak it slightly so we can expose it as an earlycon implementation (which is generic) rather than earlyprintk (which is only implemented for a few architectures) This also involves switching to write-combine mappings by default (which is required on ARM since device mappings lack memory semantics, and so memcpy/memset may not be used on them), and adding support for shared memory framebuffers on cache coherent non-x86 systems (which do not tolerate mismatched attributes). Note that 32-bit ARM does not populate its struct screen_info early enough for earlycon=efifb to work, so it is disabled there. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Alexander Graf <agraf@suse.de> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190202094119.13230-10-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbolArd Biesheuvel
Turn ARCH_USE_MEMREMAP_PROT into a generic Kconfig symbol, and fix the dependency expression to reflect that AMD_MEM_ENCRYPT depends on it, instead of the other way around. This will permit ARCH_USE_MEMREMAP_PROT to be selected by other architectures. Note that the encryption related early memremap routines in arch/x86/mm/ioremap.c cannot be built for 32-bit x86 without triggering the following warning: arch/x86//mm/ioremap.c: In function 'early_memremap_encrypted': >> arch/x86/include/asm/pgtable_types.h:193:27: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '9223372036854776163' to '355' [-Woverflow] #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC) ^~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86//mm/ioremap.c:713:46: note: in expansion of macro '__PAGE_KERNEL_ENC' return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC); which essentially means they are 64-bit only anyway. However, we cannot make them dependent on CONFIG_ARCH_HAS_MEM_ENCRYPT, since that is always defined, even for i386 (and changing that results in a slew of build errors) So instead, build those routines only if CONFIG_AMD_MEM_ENCRYPT is defined. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Alexander Graf <agraf@suse.de> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190202094119.13230-9-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-04x86/efi: Mark can_free_region() as an __init functionSai Praneeth Prakhya
can_free_region() is called only once during boot, by efi_reserve_boot_services(). Hence, mark it as an __init function. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Alexander Graf <agraf@suse.de> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190202094119.13230-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-03arch: Use asm-generic/socket.h when possibleDeepa Dinamani
Many architectures maintain an arch specific copy of the file even though there are no differences with the asm-generic one. Allow these architectures to use the generic one instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: chris@zankel.net Cc: fenghua.yu@intel.com Cc: tglx@linutronix.de Cc: schwidefsky@de.ibm.com Cc: linux-ia64@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-s390@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few updates for x86: - Fix an unintended sign extension issue in the fault handling code - Rename the new resource control config switch so it's less confusing - Avoid setting up EFI info in kexec when the EFI runtime is disabled. - Fix the microcode version check in the AMD microcode loader so it only loads higher version numbers and never downgrades - Set EFER.LME in the 32bit trampoline before returning to long mode to handle older AMD/KVM behaviour properly. - Add Darren and Andy as x86/platform reviewers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Avoid confusion over the new X86_RESCTRL config x86/kexec: Don't setup EFI info if EFI runtime is not enabled x86/microcode/amd: Don't falsely trick the late loading mechanism MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers x86/fault: Fix sign-extend unintended sign extension x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode x86/cpu: Add Atom Tremont (Jacobsville)
2019-02-03Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug fixes from Thomas Gleixner: "Two fixes for the cpu hotplug machinery: - Replace the overly clever 'SMT disabled by BIOS' detection logic as it breaks KVM scenarios and prevents speculation control updates when the Hyperthreads are brought online late after boot. - Remove a redundant invocation of the speculation control update function" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM x86/speculation: Remove redundant arch_smt_update() invocation
2019-02-03x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()Tony Luck
Internal injection testing crashed with a console log that said: mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134 This caused a lot of head scratching because the MCACOD (bits 15:0) of that status is a signature from an L1 data cache error. But Linux says that it found it in "Bank 0", which on this model CPU only reports L1 instruction cache errors. The answer was that Linux doesn't initialize "m->bank" in the case that it finds a fatal error in the mce_no_way_out() pre-scan of banks. If this was a local machine check, then this partially initialized struct mce is being passed to mce_panic(). Fix is simple: just initialize m->bank in the case of a fatal error. Fixes: 40c36e2741d7 ("x86/mce: Fix incorrect "Machine check from unknown source" message") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com
2019-02-03x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank typesYazen Ghannam
Some SMCA bank types on future systems will report new error types even though the bank type is not treated as a new version. These new error types will reported by bits that are reserved in past systems. Add the new error descriptions to the lists in edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
2019-02-03x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU unitsYazen Ghannam
The existing CS, PSP, and SMU SMCA bank types will see new versions (as indicated by their McaTypes) in future SMCA systems. Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the same names as the older versions, since they are logically the same to the user. SMCA systems won't mix and match IP blocks with different McaType versions in the same system, so there isn't a need to distinguish them. The MCA_IPID register is saved when logging an MCA error, and that can be used to triage the error. Also, add the new error descriptions to edac_mce_amd. Some error types (positions in the list) are overloaded compared to the previous McaTypes. Therefore, just create new lists of the error descriptions to keep things simple even if some of the error descriptions are the same between versions. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
2019-02-03x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank typesYazen Ghannam
Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and PCIE SMCA bank types. Also, add their respective error descriptions to the MCE decoding module edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.com
2019-02-02x86/resctrl: Avoid confusion over the new X86_RESCTRL configJohannes Weiner
"Resource Control" is a very broad term for this CPU feature, and a term that is also associated with containers, cgroups etc. This can easily cause confusion. Make the user prompt more specific. Match the config symbol name. [ bp: In the future, the corresponding ARM arch-specific code will be under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out under the CPU_RESCTRL umbrella symbol. ] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Babu Moger <Babu.Moger@amd.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: linux-doc@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
2019-02-01x86_64: increase stack size for KASAN_EXTRAQian Cai
If the kernel is configured with KASAN_EXTRA, the stack size is increasted significantly because this option sets "-fstack-reuse" to "none" in GCC [1]. As a result, it triggers stack overrun quite often with 32k stack size compiled using GCC 8. For example, this reproducer https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c triggers a "corrupted stack end detected inside scheduler" very reliably with CONFIG_SCHED_STACK_END_CHECK enabled. There are just too many functions that could have a large stack with KASAN_EXTRA due to large local variables that have been called over and over again without being able to reuse the stacks. Some noticiable ones are size 7648 shrink_page_list 3584 xfs_rmap_convert 3312 migrate_page_move_mapping 3312 dev_ethtool 3200 migrate_misplaced_transhuge_page 3168 copy_process There are other 49 functions are over 2k in size while compiling kernel with "-Wframe-larger-than=" even with a related minimal config on this machine. Hence, it is too much work to change Makefiles for each object to compile without "-fsanitize-address-use-after-scope" individually. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23 Although there is a patch in GCC 9 to help the situation, GCC 9 probably won't be released in a few months and then it probably take another 6-month to 1-year for all major distros to include it as a default. Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020 when GCC 9 is everywhere. Until then, this patch will help users avoid stack overrun. This has already been fixed for arm64 for the same reason via 6e8830674ea ("arm64: kasan: Increase stack size for KASAN_EXTRA"). Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01x86/kexec: Don't setup EFI info if EFI runtime is not enabledKairui Song
Kexec-ing a kernel with "efi=noruntime" on the first kernel's command line causes the following null pointer dereference: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 #PF error: [normal kernel read fault] Call Trace: efi_runtime_map_copy+0x28/0x30 bzImage64_load+0x688/0x872 arch_kexec_kernel_image_load+0x6d/0x70 kimage_file_alloc_init+0x13e/0x220 __x64_sys_kexec_file_load+0x144/0x290 do_syscall_64+0x55/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Just skip the EFI info setup if EFI runtime services are not enabled. [ bp: Massage commit message. ] Suggested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: bhe@redhat.com Cc: David Howells <dhowells@redhat.com> Cc: erik.schmauss@intel.com Cc: fanc.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kexec@lists.infradead.org Cc: lenb@kernel.org Cc: linux-acpi@vger.kernel.org Cc: Philipp Rudo <prudo@linux.vnet.ibm.com> Cc: rafael.j.wysocki@intel.com Cc: robert.moore@intel.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yannik Sembritzki <yannik@sembritzki.me> Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.com
2019-02-01x86: explicitly align IO accesses in memcpy_{to,from}ioLinus Torvalds
In commit 170d13ca3a2f ("x86: re-introduce non-generic memcpy_{to,from}io") I made our copy from IO space use a separate copy routine rather than rely on the generic memcpy. I did that because our generic memory copy isn't actually well-defined when it comes to internal access ordering or alignment, and will in fact depend on various CPUID flags. In particular, the default memcpy() for a modern Intel CPU will generally be just a "rep movsb", which works reasonably well for medium-sized memory copies of regular RAM, since the CPU will turn it into fairly optimized microcode. However, for non-cached memory and IO, "rep movs" ends up being horrendously slow and will just do the architectural "one byte at a time" accesses implied by the movsb. At the other end of the spectrum, if you _don't_ end up using the "rep movsb" code, you'd likely fall back to the software copy, which does overlapping accesses for the tail, and may copy things backwards. Again, for regular memory that's fine, for IO memory not so much. The thinking was that clearly nobody really cared (because things worked), but some people had seen horrible performance due to the byte accesses, so let's just revert back to our long ago version that dod "rep movsl" for the bulk of the copy, and then fixed up the potentially last few bytes of the tail with "movsw/b". Interestingly (and perhaps not entirely surprisingly), while that was our original memory copy implementation, and had been used before for IO, in the meantime many new users of memcpy_*io() had come about. And while the access patterns for the memory copy weren't well-defined (so arguably _any_ access pattern should work), in practice the "rep movsb" case had been very common for the last several years. In particular Jarkko Sakkinen reported that the memcpy_*io() change resuled in weird errors from his Geminilake NUC TPM module. And it turns out that the TPM TCG accesses according to spec require that the accesses be (a) done strictly sequentially (b) be naturally aligned otherwise the TPM chip will abort the PCI transaction. And, in fact, the tpm_crb.c driver did this: memcpy_fromio(buf, priv->rsp, 6); ... memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6); which really should never have worked in the first place, but back before commit 170d13ca3a2f it *happened* to work, because the memcpy_fromio() would be expanded to a regular memcpy, and (a) gcc would expand the first memcpy in-line, and turn it into a 4-byte and a 2-byte read, and they happened to be in the right order, and the alignment was right. (b) gcc would call "memcpy()" for the second one, and the machines that had this TPM chip also apparently ended up always having ERMS ("Enhanced REP MOVSB/STOSB instructions"), so we'd use the "rep movbs" for that copy. In other words, basically by pure luck, the code happened to use the right access sizes in the (two different!) memcpy() implementations to make it all work. But after commit 170d13ca3a2f, both of the memcpy_fromio() calls resulted in a call to the routine with the consistent memory accesses, and in both cases it started out transferring with 4-byte accesses. Which worked for the first copy, but resulted in the second copy doing a 32-bit read at an address that was only 2-byte aligned. Jarkko is actually fixing the fragile code in the TPM driver, but since this is an excellent example of why we absolutely must not use a generic memcpy for IO accesses, _and_ an IO-specific one really should strive to align the IO accesses, let's do exactly that. Side note: Jarkko also noted that the driver had been used on ARM platforms, and had worked. That was because on 32-bit ARM, memcpy_*io() ends up always doing byte accesses, and on 64-bit ARM it first does byte accesses to align to 8-byte boundaries, and then does 8-byte accesses for the bulk. So ARM actually worked by design, and the x86 case worked by pure luck. We *might* want to make x86-64 do the 8-byte case too. That should be a pretty straightforward extension, but let's do one thing at a time. And generally MMIO accesses aren't really all that performance-critical, as shown by the fact that for a long time we just did them a byte at a time, and very few people ever noticed. Reported-and-tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> Cc: David Laight <David.Laight@aculab.com> Fixes: 170d13ca3a2f ("x86: re-introduce non-generic memcpy_{to,from}io") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory onlyChao Fan
KASLR may randomly choose a range which is located in movable memory regions. As a result, this will break memory hotplug and make the movable memory chosen by KASLR immovable. Therefore, limit KASLR to choose memory regions in the immovable range after consulting the SRAT table. [ bp: - Rewrite commit message. - Trim comments. ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: caoj.fnst@cn.fujitsu.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-8-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Parse SRAT table and count immovable memory regionsChao Fan
Parse SRAT for the immovable memory regions and use that information to control which offset KASLR selects so that it doesn't overlap with any movable region. [ bp: - Move struct mem_vector where it is visible so that it builds. - Correct comments. - Rewrite commit message. ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: <caoj.fnst@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <indou.takao@jp.fujitsu.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: <kasong@redhat.com> Cc: <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <msys.mizuma@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-7-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Early parse RSDP and save it in boot_paramsChao Fan
The RSDP is needed by KASLR so parse it early and save it in boot_params.acpi_rsdp_addr, before KASLR setup runs. RSDP is needed by other kernel facilities so have the parsing code built-in instead of a long "depends on" line in Kconfig. [ bp: - Trim commit message and comments - Add CONFIG_ACPI dependency in the Makefile - Move ->acpi_rsdp_addr assignment with the rest of boot_params massaging in extract_kernel(). ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhe@redhat.com Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-6-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Search for RSDP in memoryChao Fan
Scan memory (EBDA) for the RSDP and verify RSDP by signature and checksum. [ bp: - Trim commit message. - Simplify bios_get_rsdp_addr() and cleanup mad casting. ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhe@redhat.com Cc: caoj.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-5-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Search for RSDP in the EFI tablesChao Fan
The immovable memory ranges information in the SRAT table is necessary to fix the issue of KASLR not paying attention to movable memory regions when selecting the offset. Therefore, SRAT needs to be parsed. Depending on the boot: KEXEC/EFI/BIOS, the methods to compute RSDP are different. When booting from EFI, the EFI table points to the RSDP. So iterate over the EFI system tables in order to find the RSDP. [ bp: - Heavily massage commit message - Trim comments - Move the CONFIG_ACPI ifdeffery into the Makefile. ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhe@redhat.com Cc: caoj.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-4-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Add "acpi_rsdp=" early parsingChao Fan
KASLR may randomly choose offsets which are located in movable memory regions resulting in the movable memory becoming immovable. The ACPI SRAT (System/Static Resource Affinity Table) describes memory ranges including ranges of memory provided by hot-added memory devices. In order to access SRAT, one needs the Root System Description Pointer (RSDP) with which to find the Root/Extended System Description Table (R/XSDT) which then contains the system description tables of which SRAT is one of. In case the RSDP address has been passed on the command line (kexec-ing a second kernel) parse it from there. [ bp: Rewrite the commit message and cleanup the code. ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhe@redhat.com Cc: caoj.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-3-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Copy kstrtoull() to boot/string.cChao Fan
Copy kstrtoull() and the other necessary functions from lib/kstrtox.c to boot/string.c so that code in boot/ can use kstrtoull() and the old simple_strtoull() can gradually be phased out. Using div_u64() from math64.h directly will cause the dividend to be handled as a 64-bit value and cause the infamous __divdi3 linker error due to gcc trying to use its library function for the 64-bit division. Therefore, separate the dividend into an upper and lower part. [ bp: Rewrite commit message. ] Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhe@redhat.com Cc: caoj.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190123110850.12433-2-fanc.fnst@cn.fujitsu.com
2019-02-01x86/boot: Build the command line parsing code unconditionallyBorislav Petkov
Just drop the three-item ifdeffery and build it in unconditionally. Early cmdline parsing is needed more often than not. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhe@redhat.com Cc: hpa@zytor.com Cc: indou.takao@jp.fujitsu.com Cc: kasong@redhat.com Cc: keescook@chromium.org Cc: mingo@redhat.com Cc: msys.mizuma@gmail.com Cc: tglx@linutronix.de Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190130112238.GB18383@zn.tnic
2019-01-31x86/microcode/amd: Don't falsely trick the late loading mechanismThomas Lendacky
The load_microcode_amd() function searches for microcode patches and attempts to apply a microcode patch if it is of different level than the currently installed level. While the processor won't actually load a level that is less than what is already installed, the logic wrongly returns UCODE_NEW thus signaling to its caller reload_store() that a late loading should be attempted. If the file-system contains an older microcode revision than what is currently running, such a late microcode reload can result in these misleading messages: x86/CPU: CPU features have changed after loading microcode, but might not take effect. x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update. These messages were issued on a system where SME/SEV are not enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot, early_detect_mem_encrypt() is called and cleared the SME and SEV features in this case. However, after the wrong late load attempt, get_cpu_cap() is called and reloads the SME and SEV feature bits, resulting in the messages. Update the microcode level check to not attempt microcode loading if the current level is greater than(!) and not only equal to the current patch level. [ bp: massage commit message. ] Fixes: 2613f36ed965 ("x86/microcode: Attempt late loading only when new microcode is present") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.net
2019-01-30cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVMJosh Poimboeuf
With the following commit: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") ... the hotplug code attempted to detect when SMT was disabled by BIOS, in which case it reported SMT as permanently disabled. However, that code broke a virt hotplug scenario, where the guest is booted with only primary CPU threads, and a sibling is brought online later. The problem is that there doesn't seem to be a way to reliably distinguish between the HW "SMT disabled by BIOS" case and the virt "sibling not yet brought online" case. So the above-mentioned commit was a bit misguided, as it permanently disabled SMT for both cases, preventing future virt sibling hotplugs. Going back and reviewing the original problems which were attempted to be solved by that commit, when SMT was disabled in BIOS: 1) /sys/devices/system/cpu/smt/control showed "on" instead of "notsupported"; and 2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning. I'd propose that we instead consider #1 above to not actually be a problem. Because, at least in the virt case, it's possible that SMT wasn't disabled by BIOS and a sibling thread could be brought online later. So it makes sense to just always default the smt control to "on" to allow for that possibility (assuming cpuid indicates that the CPU supports SMT). The real problem is #2, which has a simple fix: change vmx_vm_init() to query the actual current SMT state -- i.e., whether any siblings are currently online -- instead of looking at the SMT "control" sysfs value. So fix it by: a) reverting the original "fix" and its followup fix: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation") and b) changing vmx_vm_init() to query the actual current SMT state -- instead of the sysfs control value -- to determine whether the L1TF warning is needed. This also requires the 'sched_smt_present' variable to exported, instead of 'cpu_smt_control'. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joe Mario <jmario@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
2019-01-30x86/asm/suspend: Drop ENTRY from local dataJiri Slaby
ENTRY is intended for functions and shall be paired with ENDPROC. ENTRY also aligns symbols which creates unnecessary holes between data. So drop ENTRY from saved_eip in wakeup_32 and many saved_* in wakeup_64, as these symbols are local only. One could've used SYM_DATA_LOCAL for these symbols, but it was discouraged earlier: https://lkml.kernel.org/r/20170427124310.GC23352@amd Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: linux-arch@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190130124711.12463-3-jslaby@suse.cz
2019-01-30x86/hw_breakpoints, kprobes: Remove kprobes ifdefferyBorislav Petkov
Remove the ifdeffery in the breakpoint parsing arch_build_bp_info() by adding a within_kprobe_blacklist() stub for the !CONFIG_KPROBES case. It is returning true when kprobes are not enabled to mean that any address is within the kprobes blacklist on such kernels and thus not allow kernel breakpoints on non-kprobes kernels. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190127131237.4557-1-bp@alien8.de