summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-03-17Merge tag 'kbuild-v5.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - add more Build-Depends to Debian source package - prefix header search paths with $(srctree)/ - make modpost show verbose section mismatch warnings - avoid hard-coded CROSS_COMPILE for h8300 - fix regression for Debian make-kpkg command - add semantic patch to detect missing put_device() - fix some warnings of 'make deb-pkg' - optimize NOSTDINC_FLAGS evaluation - add warnings about redundant generic-y - clean up Makefiles and scripts * tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: remove stale lxdialog/.gitignore kbuild: force all architectures except um to include mandatory-y kbuild: warn redundant generic-y Revert "modsign: Abort modules_install when signing fails" kbuild: Make NOSTDINC_FLAGS a simply expanded variable kbuild: deb-pkg: avoid implicit effects coccinelle: semantic code search for missing put_device() kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb kbuild: deb-pkg: add CONFIG_ prefix to kernel config options kbuild: add workaround for Debian make-kpkg kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG} unicore32: simplify linker script generation for decompressor h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- kbuild: move archive command to scripts/Makefile.lib modpost: always show verbose warning for section mismatch ia64: prefix header search path with $(srctree)/ libfdt: prefix header search paths with $(srctree)/ deb-pkg: generate correct build dependencies
2019-03-17Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Thomas Gleixner: "Two cleanup patches removing dead conditionals and unused code" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove unused __constant_c_x_memset() macro and inlines x86/asm: Remove dead __GNUC__ conditionals
2019-03-17Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Three fixes for the fallout from the TSX errata workaround: - Prevent memory corruption caused by a unchecked out of bound array index. - Two trivial fixes to address compiler warnings" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Make dev_attr_allow_tsx_force_abort static perf/x86: Fixup typo in stub functions perf/x86/intel: Fix memory corruption
2019-03-17perf/x86/intel: Make dev_attr_allow_tsx_force_abort statickbuild test robot
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: kbuild-all@01.org Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06
2019-03-17kbuild: force all architectures except um to include mandatory-yMasahiro Yamada
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes the common Kbuild.asm file. Factor out the duplicated include directives to scripts/Makefile.asm-generic so that no architecture would opt out of the mandatory-y mechanism. um is not forced to include mandatory-y since it is a very exceptional case which does not support UAPI. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-17kbuild: warn redundant generic-yMasahiro Yamada
The generic-y is redundant under the following condition: - arch has its own implementation - the same header is added to generated-y - the same header is added to mandatory-y If a redundant generic-y is found, the warning like follows is displayed: scripts/Makefile.asm-generic:20: redundant generic-y found in arch/arm/include/asm/Kbuild: timex.h I fixed up arch Kbuild files found by this. Suggested-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-03-16Merge tag 'pidfd-v5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd system call from Christian Brauner: "This introduces the ability to use file descriptors from /proc/<pid>/ as stable handles on struct pid. Even if a pid is recycled the handle will not change. For a start these fds can be used to send signals to the processes they refer to. With the ability to use /proc/<pid> fds as stable handles on struct pid we can fix a long-standing issue where after a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. With this patchset we enable a variety of use cases. One obvious example is that we can now safely delegate an important part of process management - sending signals - to processes other than the parent of a given process by sending file descriptors around via scm rights and not fearing that the given process will have been recycled in the meantime. It also allows for easy testing whether a given process is still alive or not by sending signal 0 to a pidfd which is quite handy. There has been some interest in this feature e.g. from systems management (systemd, glibc) and container managers. I have requested and gotten comments from glibc to make sure that this syscall is suitable for their needs as well. In the future I expect it to take on most other pid-based signal syscalls. But such features are left for the future once they are needed. This has been sitting in linux-next for quite a while and has not caused any issues. It comes with selftests which verify basic functionality and also test that a recycled pid cannot be signaled via a pidfd. Jon has written about a prior version of this patchset. It should cover the basic functionality since not a lot has changed since then: https://lwn.net/Articles/773459/ The commit message for the syscall itself is extensively documenting the syscall, including it's functionality and extensibility" * tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests: add tests for pidfd_send_signal() signal: add pidfd_send_signal() syscall
2019-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
2019-03-15kvm: vmx: fix formatting of a commentPaolo Bonzini
Eliminate a gratuitous conflict with 5.0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-15Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"Ben Gardon
This reverts commit 71883a62fcd6c70639fa12cda733378b4d997409. The above commit contains an optimization to kvm_zap_gfn_range which uses gfn-limited TLB flushes, if enabled. If using these limited flushes, kvm_zap_gfn_range passes lock_flush_tlb=false to slot_handle_level_range which creates a race when the function unlocks to call cond_resched. See an example of this race below: CPU 0 CPU 1 CPU 3 // zap_direct_gfn_range mmu_lock() // *ptep == pte_1 *ptep = 0 if (lock_flush_tlb) flush_tlbs() mmu_unlock() // In invalidate range // MMU notifier mmu_lock() if (pte != 0) *ptep = 0 flush = true if (flush) flush_remote_tlbs() mmu_unlock() return // Host MM reallocates // page previously // backing guest memory. // Guest accesses // invalid page // through pte_1 // in its TLB!! Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and without this patch. The patch introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-15perf/x86: Fixup typo in stub functionsPeter Zijlstra
Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n: > With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in: > > In file included from arch/x86/events/amd/core.c:8:0: > arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration > static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu) While harmless (an unsed pointer is an unused pointer, no matter the type) it needs fixing. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: d01b1f96a82e ("perf/x86/intel: Make cpuc allocations consistent") Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-15perf/x86/intel: Fix memory corruptionPeter Zijlstra
Through: validate_event() x86_pmu.get_event_constraints(.idx=-1) tfa_get_event_constraints() dyn_constraint() cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access. In this case, simply skip the TFA constraint code, there is no event constraint with just PMC3, therefore the code will never result in the empty set. Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Reported-by: Tony Jones <tonyj@suse.com> Reported-by: "DSouza, Nelson" <nelson.dsouza@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Jones <tonyj@suse.com> Tested-by: "DSouza, Nelson" <nelson.dsouza@intel.com> Cc: eranian@google.com Cc: jolsa@redhat.com Cc: stable@kernel.org Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.org
2019-03-12Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount infrastructure updates from Al Viro: "The rest of core infrastructure; no new syscalls in that pile, but the old parts are switched to new infrastructure. At that point conversions of individual filesystems can happen independently; some are done here (afs, cgroup, procfs, etc.), there's also a large series outside of that pile dealing with NFS (quite a bit of option-parsing stuff is getting used there - it's one of the most convoluted filesystems in terms of mount-related logics), but NFS bits are the next cycle fodder. It got seriously simplified since the last cycle; documentation is probably the weakest bit at the moment - I considered dropping the commit introducing Documentation/filesystems/mount_api.txt (cutting the size increase by quarter ;-), but decided that it would be better to fix it up after -rc1 instead. That pile allows to do followup work in independent branches, which should make life much easier for the next cycle. fs/super.c size increase is unpleasant; there's a followup series that allows to shrink it considerably, but I decided to leave that until the next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) afs: Use fs_context to pass parameters over automount afs: Add fs_context support vfs: Add some logging to the core users of the fs_context log vfs: Implement logging through fs_context vfs: Provide documentation for new mount API vfs: Remove kern_mount_data() hugetlbfs: Convert to fs_context cpuset: Use fs_context kernfs, sysfs, cgroup, intel_rdt: Support fs_context cgroup: store a reference to cgroup_ns into cgroup_fs_context cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper cgroup_do_mount(): massage calling conventions cgroup: stash cgroup_root reference into cgroup_fs_context cgroup2: switch to option-by-option parsing cgroup1: switch to option-by-option parsing cgroup: take options parsing into ->parse_monolithic() cgroup: fold cgroup1_mount() into cgroup1_get_tree() cgroup: start switching to fs_context ipc: Convert mqueue fs to fs_context proc: Add fs_context support to procfs ...
2019-03-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - the rest of MM - remove flex_arrays, replace with new simple radix-tree implementation * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) Drop flex_arrays sctp: convert to genradix proc: commit to genradix generic radix trees selinux: convert to kvmalloc md: convert to kvmalloc openvswitch: convert to kvmalloc of: fix kmemleak crash caused by imbalance in early memory reservation mm: memblock: update comments and kernel-doc memblock: split checks whether a region should be skipped to a helper function memblock: remove memblock_{set,clear}_region_flags memblock: drop memblock_alloc_*_nopanic() variants memblock: memblock_alloc_try_nid: don't panic treewide: add checks for the return value of memblock_alloc*() swiotlb: add checks for the return value of memblock_alloc*() init/main: add checks for the return value of memblock_alloc*() mm/percpu: add checks for the return value of memblock_alloc*() sparc: add checks for the return value of memblock_alloc*() ia64: add checks for the return value of memblock_alloc*() arch: don't memset(0) memory returned by memblock_alloc() ...
2019-03-12memblock: drop memblock_alloc_*_nopanic() variantsMike Rapoport
As all the memblock allocation functions return NULL in case of error rather than panic(), the duplicates with _nopanic suffix can be removed. Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [printk] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [akpm@linux-foundation.org: fix xtensa printk warning] Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: Juergen Gross <jgross@suse.com> [Xen] Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: drop __memblock_alloc_base()Mike Rapoport
The __memblock_alloc_base() function tries to allocate a memory up to the limit specified by its max_addr parameter. Depending on the value of this parameter, the __memblock_alloc_base() can is replaced with the appropriate memblock_phys_alloc*() variant. Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12Merge branch 'x86-tsx-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tsx fixes from Thomas Gleixner: "This update provides kernel side handling for the TSX erratum of Intel Skylake (and later) CPUs. On these CPUs Intel Transactional Synchronization Extensions (TSX) functions can result in unpredictable system behavior under certain circumstances. The issue is mitigated with an microcode update which utilizes Performance Monitoring Counter (PMC) 3 when TSX functions are in use. This mitigation is enabled unconditionally by the updated microcode. As a consequence the usage of TSX functions can cause corrupted performance monitoring results for events which utilize PMC3. The corruption is silent on kernels which have no update for this issue. This update makes the kernel aware of the PMC3 utilization by the microcode: The microcode offers a possibility to enforce TSX abort which prevents the malfunction and frees up PMC3. The enforced TSX abort requires the TSX using application to have a software fallback path implemented; abort handlers which solely retry the transaction will fail over and over. The enforced TSX abort request is issued by the kernel when: - enforced TSX abort is enabled (PMU attribute) - A performance monitoring request needs PMC3 When PMC3 is not longer used by the kernel the TSX force abort request is cleared. The enforced TSX abort mechanism is enabled by default and can be controlled by the administrator via the new PMU attribute 'allow_tsx_force_abort'. This attribute is only visible when updated microcode is detected on affected systems. Writing '0' disables the enforced TSX abort mechanism, '1' enables it. As a result of disabling the enforced TSX abort mechanism, PMC3 is permanentely unavailable for performance monitoring which can cause performance monitoring requests to fail or switch to multiplexing mode" * branch 'x86-tsx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Implement support for TSX Force Abort x86: Add TSX Force Abort CPUID/MSR perf/x86/intel: Generalize dynamic constraint creation perf/x86/intel: Make cpuc allocations consistent
2019-03-11Merge tag 'for-linus-5.1a-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "xen fixes and features: - remove fallback code for very old Xen hypervisors - three patches for fixing Xen dom0 boot regressions - an old patch for Xen PCI passthrough which was never applied for unknown reasons - some more minor fixes and cleanup patches" * tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix dom0 boot on huge systems xen, cpu_hotplug: Prevent an out of bounds access xen: remove pre-xen3 fallback handlers xen/ACPI: Switch to bitmap_zalloc() x86/xen: dont add memory above max allowed allocation x86: respect memory size limiting via mem= parameter xen/gntdev: Check and release imported dma-bufs on close xen/gntdev: Do not destroy context while dma-bufs are in use xen/pciback: Don't disable PCI_COMMAND on PCI device reset. xen-scsiback: mark expected switch fall-through xen: mark expected switch fall-through
2019-03-11Merge tag 'trace-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The biggest change for this release is in the histogram code: - Add "onchange(var)" histogram handler that executes a action when $var changes. - Add new "snapshot()" action for histogram handlers, that causes a snapshot of the ring buffer when triggered. ie. onchange(var).snapshot() will trigger a snapshot if var changes. - Add alternative for "trace()" action. Currently, to trigger a synthetic event, the name of that event is used as the handler name, which is inconsistent with the other actions. onchange(var).synthetic(param) where it can now be onchange(var).trace(synthetic, param). The older method will still be allowed, as long as the synthetic events do not overlap with other handler names. - The histogram documentation at testcases were updated for the new changes. Outside of the histogram code, we have: - Added a quicker way to enable set_ftrace_filter files, that will make it much quicker to bisect tracing a function that shouldn't be traced and crashes the kernel. (You can echo in numbers to set_ftrace_filter, and it will select the corresponding function that is in available_filter_functions). - Some better displaying of the tracing data (and more information was added). The rest are small fixes and more clean ups to the code" * tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (37 commits) tracing: Use strncpy instead of memcpy when copying comm in trace.c tracing: Use strncpy instead of memcpy when copying comm for hist triggers tracing: Use strncpy instead of memcpy for string keys in hist triggers tracing: Use str_has_prefix() in synth_event_create() x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace() tracing/perf: Use strndup_user() instead of buggy open-coded version doc: trace: Fix documentation for uprobe_profile tracing: Fix spelling mistake: "analagous" -> "analogous" tracing: Comment why cond_snapshot is checked outside of max_lock protection tracing: Add hist trigger action 'expected fail' test case tracing: Add alternative synthetic event trace action test case tracing: Add hist trigger onchange() handler test case tracing: Add hist trigger snapshot() action test case tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases tracing: Add alternative synthetic event trace action syntax tracing: Add hist trigger onchange() handler Documentation tracing: Add hist trigger onchange() handler tracing: Add hist trigger snapshot() action Documentation tracing: Add hist trigger snapshot() action tracing: Add conditional snapshot ...
2019-03-10Merge tag 'kbuild-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - do not generate unneeded top-level built-in.a - let git ignore O= directory entirely - optimize scripts/kallsyms slightly - exclude DWARF info from *.s regardless of config options - fix GCC toolchain search path for Clang to prepare ld.lld support - do not generate modules.order when CONFIG_MODULES is disabled - simplify single target rules and remove VPATH for external module build - allow to add optional flags to dpkg-buildpackage when building deb-pkg - move some compiler option tests from Makefile to Kconfig - various Makefile cleanups * tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits) kbuild: remove scripts/basic/% build target kbuild: use -Werror=implicit-... instead of -Werror-implicit-... kbuild: clean up scripts/gcc-version.sh kbuild: remove cc-version macro kbuild: update comment block of scripts/clang-version.sh kbuild: remove commented-out INITRD_COMPRESS kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig kbuild: [bin]deb-pkg: add DPKG_FLAGS variable kbuild: move ".config not found!" message from Kconfig to Makefile kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing kbuild: simplify single target rules kbuild: remove empty rules for makefiles kbuild: make -r/-R effective in top Makefile for old Make versions kbuild: move tools_silent to a more relevant place kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig kbuild: refactor cc-cross-prefix implementation kbuild: hardcode genksyms path and remove GENKSYMS variable scripts/gdb: refactor rules for symlink creation kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target scripts/gdb: do not descend into scripts/gdb from scripts ...
2019-03-10Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "Mimi Zohar says: 'Linux 5.0 introduced the platform keyring to allow verifying the IMA kexec kernel image signature using the pre-boot keys. This pull request similarly makes keys on the platform keyring accessible for verifying the PE kernel image signature. Also included in this pull request is a new IMA hook that tags tmp files, in policy, indicating the file hash needs to be calculated. The remaining patches are cleanup'" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: evm: Use defined constant for UUID representation ima: define ima_post_create_tmpfile() hook and add missing call evm: remove set but not used variable 'xattr' encrypted-keys: fix Opt_err/Opt_error = -1 kexec, KEYS: Make use of platform keyring for signature verify integrity, KEYS: add a reference to platform keyring
2019-03-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Perf updates and fixes: Kernel: - Handle events which have the bpf_event attribute set as side band events as they carry information about BPF programs. - Add missing switch-case fall-through comments Libraries: - Fix leaks and double frees in error code paths. - Prevent buffer overflows in libtraceevent Tools: - Improvements in handling Intel BT/PTS - Add BTF ELF markers to perf trace BPF programs to improve output - Support --time, --cpu, --pid and --tid filters for perf diff - Calculate the column width in perf annotate as the hardcoded 6 characters for the instruction are not sufficient - Small fixes all over the place" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) perf/core: Mark expected switch fall-through perf/x86/intel/uncore: Fix client IMC events return huge result perf/ring_buffer: Use high order allocations for AUX buffers optimistically perf data: Force perf_data__open|close zero data->file.path perf session: Fix double free in perf_data__close perf evsel: Probe for precise_ip with simple attr perf tools: Read and store caps/max_precise in perf_pmu perf hist: Fix memory leak of srcline perf hist: Add error path into hist_entry__init perf c2c: Fix c2c report for empty numa node perf script python: Add Python3 support to intel-pt-events.py perf script python: Add Python3 support to event_analyzing_sample.py perf script python: add Python3 support to check-perf-trace.py perf script python: Add Python3 support to futex-contention.py perf script python: Remove mixed indentation perf diff: Support --pid/--tid filter options perf diff: Support --cpu filter option perf diff: Support --time filter option perf thread: Generalize function to copy from thread addr space from intel-bts code perf annotate: Calculate the max instruction name, align column to that ...
2019-03-10Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Make the unwinder more robust when it encounters a NULL pointer call, so the backtrace becomes more useful - Fix the bogus ORC unwind table alignment - Prevent kernel panic during kexec on HyperV caused by a cleared but not disabled hypercall page. - Remove the now pointless stacksize increase for KASAN_EXTRA, as KASAN_EXTRA is gone. - Remove unused variables from the x86 memory management code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Fix kernel panic when kexec on HyperV x86/mm: Remove unused variable 'old_pte' x86/mm: Remove unused variable 'cpu' Revert "x86_64: Increase stack size for KASAN_EXTRA" x86/unwind: Add hardcoded ORC entry for NULL x86/unwind: Handle NULL pointer calls better in frame unwinder x86/unwind/orc: Fix ORC unwind table alignment
2019-03-10Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot fix from Thomas Gleixner: "A trivial fix for the previous x86/boot pull request which did not make it in time" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/KASLR: Always return a value from process_mem_region
2019-03-10Merge tag 'iommu-updates-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - A big cleanup and optimization patch-set for the Tegra GART driver - Documentation updates and fixes for the IOMMU-API - Support for page request in Intel VT-d scalable mode - Intel VT-d dma_[un]map_resource() support - Updates to the ATS enabling code for PCI (acked by Bjorn) and Intel VT-d to align with the latest version of the ATS spec - Relaxed IRQ source checking in the Intel VT-d driver for some aliased devices, needed for future devices which send IRQ messages from more than on request-ID - IRQ remapping driver for Hyper-V - Patches to make generic IOVA and IO-Page-Table code usable outside of the IOMMU code - Various other small fixes and cleanups * tag 'iommu-updates-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits) iommu/vt-d: Get domain ID before clear pasid entry iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm() iommu/vt-d: Set context field after value initialized iommu/vt-d: Disable ATS support on untrusted devices iommu/mediatek: Fix semicolon code style issue MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scope iommu/hyper-v: Add Hyper-V stub IOMMU driver x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available PCI/ATS: Add inline to pci_prg_resp_pasid_required() iommu/vt-d: Check identity map for hot-added devices iommu: Fix IOMMU debugfs fallout iommu: Document iommu_ops.is_attach_deferred() iommu: Document iommu_ops.iotlb_sync_map() iommu/vt-d: Enable ATS only if the device uses page aligned address. PCI/ATS: Add pci_ats_page_aligned() interface iommu/vt-d: Fix PRI/PASID dependency issue. PCI/ATS: Add pci_prg_resp_pasid_required() interface. iommu/vt-d: Allow interrupts from the entire bus for aliased devices iommu/vt-d: Add helper to set an IRTE to verify only the bus number iommu: Fix flush_tlb_all typo ...
2019-03-10Merge tag 'dma-mapping-5.1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull DMA mapping updates from Christoph Hellwig: - add debugfs support for dumping dma-debug information (Corentin Labbe) - Kconfig cleanups (Andy Shevchenko and me) - debugfs cleanups (Greg Kroah-Hartman) - improve dma_map_resource and use it in the media code - arch_setup_dma_ops / arch_teardown_dma_ops cleanups - various small cleanups and improvements for the per-device coherent allocator - make the DMA mask an upper bound and don't fail "too large" dma mask in the remaning two architectures - this will allow big driver cleanups in the following merge windows * tag 'dma-mapping-5.1' of git://git.infradead.org/users/hch/dma-mapping: (21 commits) Documentation/DMA-API-HOWTO: update dma_mask sections sparc64/pci_sun4v: allow large DMA masks sparc64/iommu: allow large DMA masks sparc64: refactor the ali DMA quirk ccio: allow large DMA masks dma-mapping: remove the DMA_MEMORY_EXCLUSIVE flag dma-mapping: remove dma_mark_declared_memory_occupied dma-mapping: move CONFIG_DMA_CMA to kernel/dma/Kconfig dma-mapping: improve selection of dma_declare_coherent availability dma-mapping: remove an incorrect __iommem annotation of: select OF_RESERVED_MEM automatically device.h: dma_mem is only needed for HAVE_GENERIC_DMA_COHERENT mfd/sm501: depend on HAS_DMA dma-mapping: add a kconfig symbol for arch_teardown_dma_ops availability dma-mapping: add a kconfig symbol for arch_setup_dma_ops availability dma-mapping: move debug configuration options to kernel/dma dma-debug: add dumping facility via debugfs dma: debug: no need to check return value of debugfs_create functions videobuf2: replace a layering violation with dma_map_resource dma-mapping: don't BUG when calling dma_map_resource on RAM ...
2019-03-09Merge tag 'pci-v5.1-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - Use match_string() instead of reimplementing it (Andy Shevchenko) - Enable SERR# forwarding for all bridges (Bharat Kumar Gogada) - Use Latency Tolerance Reporting if already enabled by platform (Bjorn Helgaas) - Save/restore LTR info for suspend/resume (Bjorn Helgaas) - Fix DPC use of uninitialized data (Dongdong Liu) - Probe bridge window attributes only once at enumeration-time to fix device accesses during rescan (Bjorn Helgaas) - Return BAR size (not "size -1 ") from pci_size() to simplify code (Du Changbin) - Use config header type (not class code) identify bridges more reliably (Honghui Zhang) - Work around Intel Denverton incorrect Trace Hub BAR size reporting (Alexander Shishkin) - Reorder pciehp cached state/hardware state updates to avoid missed interrupts (Mika Westerberg) - Turn ibmphp semaphores into completions or mutexes (Arnd Bergmann) - Mark expected switch fall-through (Mathieu Malaterre) - Use of_node_name_eq() for node name comparisons (Rob Herring) - Add ACS and pciehp quirks for HXT SD4800 (Shunyong Yang) - Consolidate Rohm Vendor ID definitions (Andy Shevchenko) - Use u32 (not __u32) for things not exposed to userspace (Logan Gunthorpe) - Fix locking semantics of bus and slot reset interfaces (Alex Williamson) - Update PCIEPORTBUS Kconfig help text (Hou Zhiqiang) - Allow portdrv to claim subtractive decode Ports so PCIe services will work for them (Honghui Zhang) - Report PCIe links that become degraded at run-time (Alexandru Gagniuc) - Blacklist Gigabyte X299 Root Port power management to fix Thunderbolt hotplug (Mika Westerberg) - Revert runtime PM suspend/resume callbacks that broke PME on network cable plug (Mika Westerberg) - Disable Data Link State Changed interrupts to prevent wakeup immediately after suspend (Mika Westerberg) - Extend altera to support Stratix 10 (Ley Foon Tan) - Allow building altera driver on ARM64 (Ley Foon Tan) - Replace Douglas with Tom Joseph as Cadence PCI host/endpoint maintainer (Lorenzo Pieralisi) - Add DT support for R-Car RZ/G2E (R8A774C0) (Fabrizio Castro) - Add dra72x/dra74x/dra76x SoC compatible strings (Kishon Vijay Abraham I) - Enable x2 mode support for dra72x/dra74x/dra76x SoC (Kishon Vijay Abraham I) - Configure dra7xx PHY to PCIe mode (Kishon Vijay Abraham I) - Simplify dwc (remove unnecessary header includes, name variables consistently, reduce inverted logic, etc) (Gustavo Pimentel) - Add i.MX8MQ support (Andrey Smirnov) - Add message to help debug dwc MSI-X mask bit errors (Gustavo Pimentel) - Work around imx7d PCIe PLL erratum (Trent Piepho) - Don't assert qcom reset GPIO during probe (Bjorn Andersson) - Skip dwc MSI init if MSIs have been disabled (Lucas Stach) - Use memcpy_fromio()/memcpy_toio() instead of plain memcpy() in PCI endpoint framework (Wen Yang) - Add interface to discover supported endpoint features to replace a bitfield that wasn't flexible enough (Kishon Vijay Abraham I) - Implement the new supported-feature interface for designware-plat, dra7xx, rockchip, cadence (Kishon Vijay Abraham I) - Fix issues with 64-bit BAR in endpoints (Kishon Vijay Abraham I) - Add layerscape endpoint mode support (Xiaowei Bao) - Remove duplicate struct hv_vp_set in favor of struct hv_vpset (Maya Nakamura) - Rework hv_irq_unmask() to use cpumask_to_vpset() instead of open-coded reimplementation (Maya Nakamura) - Align Hyper-V struct retarget_msi_interrupt arguments (Maya Nakamura) - Fix mediatek MMIO size computation to enable full size of available MMIO space (Honghui Zhang) - Fix mediatek DMA window size computation to allow endpoint DMA access to full DRAM address range (Honghui Zhang) - Fix mvebu prefetchable BAR regression caused by common bridge emulation that assumed all bridges had prefetchable windows (Thomas Petazzoni) - Make advk_pci_bridge_emul_ops static (Wei Yongjun) - Configure MPS settings for VMD root ports (Jon Derrick) * tag 'pci-v5.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits) PCI: Update PCIEPORTBUS Kconfig help text PCI: Fix "try" semantics of bus and slot reset PCI/LINK: Report degraded links via link bandwidth notification dt-bindings: PCI: altera: Add altr,pcie-root-port-2.0 PCI: altera: Enable driver on ARM64 PCI: altera: Add Stratix 10 PCIe support PCI/PME: Fix possible use-after-free on remove PCI: aardvark: Make symbol 'advk_pci_bridge_emul_ops' static PCI: dwc: skip MSI init if MSIs have been explicitly disabled PCI: hv: Refactor hv_irq_unmask() to use cpumask_to_vpset() PCI: hv: Replace hv_vp_set with hv_vpset PCI: hv: Add __aligned(8) to struct retarget_msi_interrupt PCI: mediatek: Enlarge PCIe2AHB window size to support 4GB DRAM PCI: mediatek: Fix memory mapped IO range size computation PCI: dwc: Remove superfluous shifting in definitions PCI: dwc: Make use of GENMASK/FIELD_PREP PCI: dwc: Make use of BIT() in constant definitions PCI: dwc: Share code for dw_pcie_rd/wr_other_conf() PCI: dwc: Make use of IS_ALIGNED() PCI: imx6: Add code to request/control "pcie_aux" clock for i.MX8MQ ...
2019-03-09perf/x86/intel/uncore: Fix client IMC events return huge resultKan Liang
The client IMC bandwidth events currently return very large values: $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a 10.000117222 34,788.76 MiB uncore_imc/data_reads/ 10.000117222 8.26 MiB uncore_imc/data_writes/ 20.000374584 34,842.89 MiB uncore_imc/data_reads/ 20.000374584 10.45 MiB uncore_imc/data_writes/ 30.000633299 37,965.29 MiB uncore_imc/data_reads/ 30.000633299 323.62 MiB uncore_imc/data_writes/ 40.000891548 41,012.88 MiB uncore_imc/data_reads/ 40.000891548 6.98 MiB uncore_imc/data_writes/ 50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/ 50.001142480 6.97 MiB uncore_imc/data_writes/ The client IMC events are freerunning counters. They still use the old event encoding format (0x1 for data_read and 0x2 for data write). The counter bit width is calculated by common code, which assume that the standard encoding format is used for the freerunning counters. Error bit width information is calculated. The patch intends to convert the old client IMC event encoding to the standard encoding format. Current common code uses event->attr.config which directly copy from user space. We should not implicitly modify it for a converted event. The event->hw.config is used to replace the event->attr.config in common code. For client IMC events, the event->attr.config is used to calculate a converted event with standard encoding format in the custom event_init(). The converted event is stored in event->hw.config. For other events of freerunning counters, they already use the standard encoding format. The same value as event->attr.config is assigned to event->hw.config in common event_init(). Reported-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@kernel.org # v4.18+ Fixes: 9aae1780e7e8 ("perf/x86/intel/uncore: Clean up client IMC uncore") Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-08Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring IO interface from Jens Axboe: "Second attempt at adding the io_uring interface. Since the first one, we've added basic unit testing of the three system calls, that resides in liburing like the other unit tests that we have so far. It'll take a while to get full coverage of it, but we're working towards it. I've also added two basic test programs to tools/io_uring. One uses the raw interface and has support for all the various features that io_uring supports outside of standard IO, like fixed files, fixed IO buffers, and polled IO. The other uses the liburing API, and is a simplified version of cp(1). This adds support for a new IO interface, io_uring. io_uring allows an application to communicate with the kernel through two rings, the submission queue (SQ) and completion queue (CQ) ring. This allows for very efficient handling of IOs, see the v5 posting for some basic numbers: https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/ Outside of just efficiency, the interface is also flexible and extendable, and allows for future use cases like the upcoming NVMe key-value store API, networked IO, and so on. It also supports async buffered IO, something that we've always failed to support in the kernel. Outside of basic IO features, it supports async polled IO as well. This particular feature has already been tested at Facebook months ago for flash storage boxes, with 25-33% improvements. It makes polled IO actually useful for real world use cases, where even basic flash sees a nice win in terms of efficiency, latency, and performance. These boxes were IOPS bound before, now they are not. This series adds three new system calls. One for setting up an io_uring instance (io_uring_setup(2)), one for submitting/completing IO (io_uring_enter(2)), and one for aux functions like registrating file sets, buffers, etc (io_uring_register(2)). Through the help of Arnd, I've coordinated the syscall numbers so merge on that front should be painless. Jon did a writeup of the interface a while back, which (except for minor details that have been tweaked) is still accurate. Find that here: https://lwn.net/Articles/776703/ Huge thanks to Al Viro for helping getting the reference cycle code correct, and to Jann Horn for his extensive reviews focused on both security and bugs in general. There's a userspace library that provides basic functionality for applications that don't need or want to care about how to fiddle with the rings directly. It has helpers to allow applications to easily set up an io_uring instance, and submit/complete IO through it without knowing about the intricacies of the rings. It also includes man pages (thanks to Jeff Moyer), and will continue to grow support helper functions and features as time progresses. Find it here: git://git.kernel.dk/liburing Fio has full support for the raw interface, both in the form of an IO engine (io_uring), but also with a small test application (t/io_uring) that can exercise and benchmark the interface" * tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block: io_uring: add a few test tools io_uring: allow workqueue item to handle multiple buffered requests io_uring: add support for IORING_OP_POLL io_uring: add io_kiocb ref count io_uring: add submission polling io_uring: add file set registration net: split out functions related to registering inflight socket files io_uring: add support for pre-mapped user IO buffers block: implement bio helper to add iter bvec pages to bio io_uring: batch io_kiocb allocation io_uring: use fget/fput_many() for file references fs: add fget_many() and fput_many() io_uring: support for IO polling io_uring: add fsync support Add io_uring IO interface
2019-03-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: "This time around we have in store: - Disable MC4_MISC thresholding banks on all AMD family 0x15 models (Shirish S) - AMD MCE error descriptions update and error decode improvements (Yazen Ghannam) - The usual smaller conversions and fixes" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Improve error message when kernel cannot recover, p2 EDAC/mce_amd: Decode MCA_STATUS in bit definition order EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit EDAC, mce_amd: Print ExtErrorCode and description on a single line EDAC, mce_amd: Match error descriptions to latest documentation x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types RAS: Add a MAINTAINERS entry RAS: Use consistent types for UUIDs x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models x86/MCE: Switch to use the new generic UUID API
2019-03-08xen: fix dom0 boot on huge systemsJuergen Gross
Commit f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit PV guests") introduced a regression for booting dom0 on huge systems with lots of RAM (in the TB range). Reason is that on those hosts the p2m list needs to be moved early in the boot process and this requires temporary page tables to be created. Said commit modified xen_set_pte_init() to use a hypercall for writing a PTE, but this requires the page table being in the direct mapped area, which is not the case for the temporary page tables used in xen_relocate_p2m(). As the page tables are completely written before being linked to the actual address space instead of set_pte() a plain write to memory can be used in xen_relocate_p2m(). Fixes: f7c90c2aa40048 ("x86/xen: don't write ptes directly in 32-bit PV guests") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-03-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - some of the rest of MM - various misc things - dynamic-debug updates - checkpatch - some epoll speedups - autofs - rapidio - lib/, lib/lzo/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) samples/mic/mpssd/mpssd.h: remove duplicate header kernel/fork.c: remove duplicated include include/linux/relay.h: fix percpu annotation in struct rchan arch/nios2/mm/fault.c: remove duplicate include unicore32: stop printing the virtual memory layout MAINTAINERS: fix GTA02 entry and mark as orphan mm: create the new vm_fault_t type arm, s390, unicore32: remove oneliner wrappers for memblock_alloc() arch: simplify several early memory allocations openrisc: simplify pte_alloc_one_kernel() sh: prefer memblock APIs returning virtual address microblaze: prefer memblock API returning virtual address powerpc: prefer memblock APIs returning virtual address lib/lzo: separate lzo-rle from lzo lib/lzo: implement run-length encoding lib/lzo: fast 8-byte copy on arm64 lib/lzo: 64-bit CTZ on arm64 lib/lzo: tidy-up ifdefs ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size ipc: annotate implicit fall through ...
2019-03-07mm: create the new vm_fault_t typeSouptick Joarder
Page fault handlers are supposed to return VM_FAULT codes, but some drivers/file systems mistakenly return error numbers. Now that all drivers/file systems have been converted to use the vm_fault_t return type, change the type definition to no longer be compatible with 'int'. By making it an unsigned int, the function prototype becomes incompatible with a function which returns int. Sparse will detect any attempts to return a value which is not a VM_FAULT code. VM_FAULT_SET_HINDEX and VM_FAULT_GET_HINDEX values are changed to avoid conflict with other VM_FAULT codes. [jrdr.linux@gmail.com: fix warnings] Link: http://lkml.kernel.org/r/20190109183742.GA24326@jordon-HP-15-Notebook-PC Link: http://lkml.kernel.org/r/20190108183041.GA12137@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07configs: get rid of obsolete CONFIG_ENABLE_WARN_DEPRECATEDAlexey Brodkin
This Kconfig option was removed during v4.19 development in commit 771c035372a0 ("deprecate the '__deprecated' attribute warnings entirely and for good") so there's no point to keep it in defconfigs any longer. FWIW defconfigs were patched with: --------------------------->8---------------------- find . -name *_defconfig -exec sed -i '/CONFIG_ENABLE_WARN_DEPRECATED/d' {} \; --------------------------->8---------------------- Link: http://lkml.kernel.org/r/20190128152434.41969-1-abrodkin@synopsys.com Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07Merge branch 'x86-uv-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 UV updates from Ingo Molnar: "Three UV related cleanups" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/UV: Use efi_enabled() instead of test_bit() x86/platform/UV: Remove uv_bios_call_reentrant() x86/platform/UV: Remove unnecessary #ifdef CONFIG_EFI
2019-03-07Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform update from Ingo Molnar: "A defconfig update" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/defconfig: Enable EFI stub, mixed mode and BGRT
2019-03-07Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm cleanup from Ingo Molnar: "A single GUP cleanup" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm/gup: Remove the 'write' parameter from gup_fast_permitted()
2019-03-07Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kdump update from Ingo Molnar: "Add the AMD SME mask to the vmcoreinfo, and also document our vmcoreinfo fields" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kdump: Document kernel data exported in the vmcoreinfo note x86/kdump: Export the SME mask to vmcoreinfo
2019-03-07Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "Three changes: - preparatory patch for AVX state tracking that computing-cluster folks would like to use for user-space batching - but we are not happy about the related ABI yet so this is only the kernel tracking side - a cleanup for CR0 handling in do_device_not_available() - plus we removed a workaround for an ancient binutils version" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Track AVX-512 usage of tasks x86/fpu: Get rid of CONFIG_AS_FXSAVEQ x86/traps: Have read_cr0() only once in the #NM handler
2019-03-07Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups and simplifications, none of them really stands out, they are all over the place" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Remove unused __addr_ok() macro x86/smpboot: Remove unused phys_id variable x86/mm/dump_pagetables: Remove the unused prev_pud variable x86/fpu: Move init_xstate_size() to __init section x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section x86/mtrr: Remove unused variable x86/boot/compressed/64: Explain paging_prepare()'s return value x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition x86/asm/suspend: Drop ENTRY from local data x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery x86/boot: Save several bytes in decompressor x86/trap: Remove useless declaration x86/mm/tlb: Remove unused cpu variable x86/events: Mark expected switch-case fall-throughs x86/asm-prototypes: Remove duplicate include <asm/page.h> x86/kernel: Mark expected switch-case fall-throughs x86/insn-eval: Mark expected switch-case fall-through x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls x86/e820: Replace kmalloc() + memcpy() with kmemdup()
2019-03-07Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Misc cleanups and a retpoline code generation optimization" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, retpolines: Raise limit for generating indirect calls from switch-case x86/build: Use the single-argument OUTPUT_FORMAT() linker script command x86/build: Specify elf_i386 linker emulation explicitly for i386 objects x86/build: Mark per-CPU symbols as absolute explicitly for LLD
2019-03-07Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Most of the changes center around the difficult problem of KASLR pinning down hot-removable memory regions. At the very early stage KASRL is making irreversible kernel address layout decisions we don't have full knowledge about the memory maps yet. So the changes from Chao Fan add this (parsing the RSDP table early), together with fixes from Borislav Petkov" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/compressed/64: Do not read legacy ROM on EFI system x86/boot: Correct RSDP parsing with 32-bit EFI x86/kexec: Fill in acpi_rsdp_addr from the first kernel x86/boot: Fix randconfig build error due to MEMORY_HOTREMOVE x86/boot: Fix cmdline_find_option() prototype visibility x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only x86/boot: Parse SRAT table and count immovable memory regions x86/boot: Early parse RSDP and save it in boot_params x86/boot: Search for RSDP in memory x86/boot: Search for RSDP in the EFI tables x86/boot: Add "acpi_rsdp=" early parsing x86/boot: Copy kstrtoull() to boot/string.c x86/boot: Build the command line parsing code unconditionally
2019-03-06x86/hyperv: Fix kernel panic when kexec on HyperVKairui Song
After commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments"), kexec fails with a kernel panic: kexec_core: Starting new kernel BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018 RIP: 0010:0xffffc9000001d000 Call Trace: ? __send_ipi_mask+0x1c6/0x2d0 ? hv_send_ipi_mask_allbutself+0x6d/0xb0 ? mp_save_irq+0x70/0x70 ? __ioapic_read_entry+0x32/0x50 ? ioapic_read_entry+0x39/0x50 ? clear_IO_APIC_pin+0xb8/0x110 ? native_stop_other_cpus+0x6e/0x170 ? native_machine_shutdown+0x22/0x40 ? kernel_kexec+0x136/0x156 That happens if hypercall based IPIs are used because the hypercall page is reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs, which invokes the hypercall and dereferences the unusable page. To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid any misuse, IPI sending will fall back to the non hypercall based method. This only happens on kexec / kdump so just setting the pointer to NULL is good enough. Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments") Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: devel@linuxdriverproject.org Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.com
2019-03-06x86/mm: Remove unused variable 'old_pte'Qian Cai
The commit 3a19109efbfa ("x86/mm: Fix try_preserve_large_page() to handle large PAT bit") fixed try_preserve_large_page() by using the corresponding pud/pmd prot/pfn interfaces, but left a variable unused because it no longer used pte_pfn(). Later, the commit 8679de0959e6 ("x86/mm/cpa: Split, rename and clean up try_preserve_large_page()") renamed try_preserve_large_page() to __should_split_large_page(), but the unused variable remains. arch/x86/mm/pageattr.c: In function '__should_split_large_page': arch/x86/mm/pageattr.c:741:17: warning: variable 'old_pte' set but not used [-Wunused-but-set-variable] Fixes: 3a19109efbfa ("x86/mm: Fix try_preserve_large_page() to handle large PAT bit") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: toshi.kani@hpe.com Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20190301152924.94762-1-cai@lca.pw
2019-03-06x86/mm: Remove unused variable 'cpu'Qian Cai
The commit a2055abe9c67 ("x86/mm: Pass flush_tlb_info to flush_tlb_others() etc") removed the unnecessary cpu parameter from uv_flush_tlb_others() but left an unused variable. arch/x86/mm/tlb.c: In function 'native_flush_tlb_others': arch/x86/mm/tlb.c:688:16: warning: variable 'cpu' set but not used [-Wunused-but-set-variable] unsigned int cpu; ^~~ Fixes: a2055abe9c67 ("x86/mm: Pass flush_tlb_info to flush_tlb_others() etc") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andyt Lutomirski <luto@kernel.org> Cc: dave.hansen@linux.intel.com Cc: peterz@infradead.org Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20190228220155.88124-1-cai@lca.pw
2019-03-06Revert "x86_64: Increase stack size for KASAN_EXTRA"Qian Cai
This reverts commit a8e911d13540487942d53137c156bd7707f66e5d. KASAN_EXTRA was removed via the commit 7771bdbbfd3d ("kasan: remove use after scope bugs detection."), so this is no longer needed. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: bp@alien8.de Cc: akpm@linux-foundation.org Cc: aryabinin@virtuozzo.com Cc: glider@google.com Cc: dvyukov@google.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20190306213806.46139-1-cai@lca.pw
2019-03-06x86/unwind: Add hardcoded ORC entry for NULLJann Horn
When the ORC unwinder is invoked for an oops caused by IP==0, it currently has no idea what to do because there is no debug information for the stack frame of NULL. But if RIP is NULL, it is very likely that the last successfully executed instruction was an indirect CALL/JMP, and it is possible to unwind out in the same way as for the first instruction of a normal function. Hardcode a corresponding ORC entry. With an artificially-added NULL call in prctl_set_seccomp(), before this patch, the trace is: Call Trace: ? __x64_sys_prctl+0x402/0x680 ? __ia32_sys_prctl+0x6e0/0x6e0 ? __do_page_fault+0x457/0x620 ? do_syscall_64+0x6d/0x160 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 After this patch, the trace looks like this: Call Trace: __x64_sys_prctl+0x402/0x680 ? __ia32_sys_prctl+0x6e0/0x6e0 ? __do_page_fault+0x457/0x620 do_syscall_64+0x6d/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 prctl_set_seccomp() still doesn't show up in the trace because for some reason, tail call optimization is only disabled in builds that use the frame pointer unwinder. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.com
2019-03-06x86/unwind: Handle NULL pointer calls better in frame unwinderJann Horn
When the frame unwinder is invoked for an oops caused by a call to NULL, it currently skips the parent function because BP still points to the parent's stack frame; the (nonexistent) current function only has the first half of a stack frame, and BP doesn't point to it yet. Add a special case for IP==0 that calculates a fake BP from SP, then uses the real BP for the next frame. Note that this handles first_frame specially: Return information about the parent function as long as the saved IP is >=first_frame, even if the fake BP points below it. With an artificially-added NULL call in prctl_set_seccomp(), before this patch, the trace is: Call Trace: ? prctl_set_seccomp+0x3a/0x50 __x64_sys_prctl+0x457/0x6f0 ? __ia32_sys_prctl+0x750/0x750 do_syscall_64+0x72/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 After this patch, the trace is: Call Trace: prctl_set_seccomp+0x3a/0x50 __x64_sys_prctl+0x457/0x6f0 ? __ia32_sys_prctl+0x750/0x750 do_syscall_64+0x72/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.com
2019-03-06x86/boot/KASLR: Always return a value from process_mem_regionLouis Taylor
When compiling with -Wreturn-type, clang warns: arch/x86/boot/compressed/kaslr.c:704:1: warning: control may reach end of non-void function [-Wreturn-type] This function's return statement should have been placed outside the ifdeffed region. Move it there. Fixes: 690eaa532057 ("x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only") Signed-off-by: Louis Taylor <louis@kragniz.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: fanc.fnst@cn.fujitsu.com Cc: bhe@redhat.com Cc: kirill.shutemov@linux.intel.com Cc: jflat@chromium.org Link: https://lkml.kernel.org/r/20190302184929.28971-1-louis@kragniz.eu