summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2019-03-24x86/resctrl: Remove unused variablePeng Hao
Variable "struct rdt_resource *r" is set but not used. So remove it. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1552152584-26087-1-git-send-email-peng.hao2@zte.com.cn
2019-03-23x86/gart: Exclude GART aperture from kcoreKairui Song
On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
2019-03-22x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return ↵Nathan Chancellor
an error When building with -Wsometimes-uninitialized, Clang warns: arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] The default cannot be reached because arch_build_bp_info() initializes hw->len to one of the specified cases. Nevertheless the warning is valid and returning -EINVAL makes sure that this cannot be broken by future modifications. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/392 Link: https://lkml.kernel.org/r/20190307212756.4648-1-natechancellor@gmail.com
2019-03-21x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processorsMatthew Whitehead
There are comments in processor-cyrix.h advising you to _not_ make calls using the deprecated macros in this style: setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80); This is because it expands the macro into a non-functioning calling sequence. The calling order must be: outb(CX86_CCR2, 0x22); inb(0x23); From the comments: * When using the old macros a line like * setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88); * gets expanded to: * do { * outb((CX86_CCR2), 0x22); * outb((({ * outb((CX86_CCR2), 0x22); * inb(0x23); * }) | 0x88), 0x23); * } while (0); The new macros fix this problem, so use them instead. Tested on an actual Geode processor. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1552596361-8967-2-git-send-email-tedheadster@gmail.com
2019-03-21x86/microcode: Announce reload operation's completionBorislav Petkov
By popular demand, issue a single line to dmesg after the reload operation completes to let the user know that a reload has at least been attempted. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190313110022.8229-1-bp@alien8.de
2019-03-21x86/hpet: Prevent potential NULL pointer dereferenceAditya Pakki
hpet_virt_address may be NULL when ioremap_nocache fail, but the code lacks a check. Add a check to prevent NULL pointer dereference. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kjlu@umn.edu Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Roland Dreier <roland@purestorage.com> Link: https://lkml.kernel.org/r/20190319021958.17275-1-pakki001@umn.edu
2019-03-19x86/mm: Don't leak kernel addressesMatteo Croce
Since commit: ad67b74d2469d9b8 ("printk: hash addresses printed with %p") at boot "____ptrval____" is printed instead of actual addresses: found SMP MP-table at [mem 0x000f5cc0-0x000f5ccf] mapped at [(____ptrval____)] Instead of changing the print to "%px", and leaking a kernel addresses, just remove the print completely, like in: 071929dbdd865f77 ("arm64: Stop printing the virtual memory layout"). Signed-off-by: Matteo Croce <mcroce@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
2019-03-12Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount infrastructure updates from Al Viro: "The rest of core infrastructure; no new syscalls in that pile, but the old parts are switched to new infrastructure. At that point conversions of individual filesystems can happen independently; some are done here (afs, cgroup, procfs, etc.), there's also a large series outside of that pile dealing with NFS (quite a bit of option-parsing stuff is getting used there - it's one of the most convoluted filesystems in terms of mount-related logics), but NFS bits are the next cycle fodder. It got seriously simplified since the last cycle; documentation is probably the weakest bit at the moment - I considered dropping the commit introducing Documentation/filesystems/mount_api.txt (cutting the size increase by quarter ;-), but decided that it would be better to fix it up after -rc1 instead. That pile allows to do followup work in independent branches, which should make life much easier for the next cycle. fs/super.c size increase is unpleasant; there's a followup series that allows to shrink it considerably, but I decided to leave that until the next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) afs: Use fs_context to pass parameters over automount afs: Add fs_context support vfs: Add some logging to the core users of the fs_context log vfs: Implement logging through fs_context vfs: Provide documentation for new mount API vfs: Remove kern_mount_data() hugetlbfs: Convert to fs_context cpuset: Use fs_context kernfs, sysfs, cgroup, intel_rdt: Support fs_context cgroup: store a reference to cgroup_ns into cgroup_fs_context cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper cgroup_do_mount(): massage calling conventions cgroup: stash cgroup_root reference into cgroup_fs_context cgroup2: switch to option-by-option parsing cgroup1: switch to option-by-option parsing cgroup: take options parsing into ->parse_monolithic() cgroup: fold cgroup1_mount() into cgroup1_get_tree() cgroup: start switching to fs_context ipc: Convert mqueue fs to fs_context proc: Add fs_context support to procfs ...
2019-03-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - the rest of MM - remove flex_arrays, replace with new simple radix-tree implementation * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) Drop flex_arrays sctp: convert to genradix proc: commit to genradix generic radix trees selinux: convert to kvmalloc md: convert to kvmalloc openvswitch: convert to kvmalloc of: fix kmemleak crash caused by imbalance in early memory reservation mm: memblock: update comments and kernel-doc memblock: split checks whether a region should be skipped to a helper function memblock: remove memblock_{set,clear}_region_flags memblock: drop memblock_alloc_*_nopanic() variants memblock: memblock_alloc_try_nid: don't panic treewide: add checks for the return value of memblock_alloc*() swiotlb: add checks for the return value of memblock_alloc*() init/main: add checks for the return value of memblock_alloc*() mm/percpu: add checks for the return value of memblock_alloc*() sparc: add checks for the return value of memblock_alloc*() ia64: add checks for the return value of memblock_alloc*() arch: don't memset(0) memory returned by memblock_alloc() ...
2019-03-12memblock: drop memblock_alloc_*_nopanic() variantsMike Rapoport
As all the memblock allocation functions return NULL in case of error rather than panic(), the duplicates with _nopanic suffix can be removed. Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [printk] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [akpm@linux-foundation.org: fix xtensa printk warning] Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: Juergen Gross <jgross@suse.com> [Xen] Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12memblock: drop __memblock_alloc_base()Mike Rapoport
The __memblock_alloc_base() function tries to allocate a memory up to the limit specified by its max_addr parameter. Depending on the value of this parameter, the __memblock_alloc_base() can is replaced with the appropriate memblock_phys_alloc*() variant. Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-11Merge tag 'for-linus-5.1a-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "xen fixes and features: - remove fallback code for very old Xen hypervisors - three patches for fixing Xen dom0 boot regressions - an old patch for Xen PCI passthrough which was never applied for unknown reasons - some more minor fixes and cleanup patches" * tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix dom0 boot on huge systems xen, cpu_hotplug: Prevent an out of bounds access xen: remove pre-xen3 fallback handlers xen/ACPI: Switch to bitmap_zalloc() x86/xen: dont add memory above max allowed allocation x86: respect memory size limiting via mem= parameter xen/gntdev: Check and release imported dma-bufs on close xen/gntdev: Do not destroy context while dma-bufs are in use xen/pciback: Don't disable PCI_COMMAND on PCI device reset. xen-scsiback: mark expected switch fall-through xen: mark expected switch fall-through
2019-03-11Merge tag 'trace-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The biggest change for this release is in the histogram code: - Add "onchange(var)" histogram handler that executes a action when $var changes. - Add new "snapshot()" action for histogram handlers, that causes a snapshot of the ring buffer when triggered. ie. onchange(var).snapshot() will trigger a snapshot if var changes. - Add alternative for "trace()" action. Currently, to trigger a synthetic event, the name of that event is used as the handler name, which is inconsistent with the other actions. onchange(var).synthetic(param) where it can now be onchange(var).trace(synthetic, param). The older method will still be allowed, as long as the synthetic events do not overlap with other handler names. - The histogram documentation at testcases were updated for the new changes. Outside of the histogram code, we have: - Added a quicker way to enable set_ftrace_filter files, that will make it much quicker to bisect tracing a function that shouldn't be traced and crashes the kernel. (You can echo in numbers to set_ftrace_filter, and it will select the corresponding function that is in available_filter_functions). - Some better displaying of the tracing data (and more information was added). The rest are small fixes and more clean ups to the code" * tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (37 commits) tracing: Use strncpy instead of memcpy when copying comm in trace.c tracing: Use strncpy instead of memcpy when copying comm for hist triggers tracing: Use strncpy instead of memcpy for string keys in hist triggers tracing: Use str_has_prefix() in synth_event_create() x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace() tracing/perf: Use strndup_user() instead of buggy open-coded version doc: trace: Fix documentation for uprobe_profile tracing: Fix spelling mistake: "analagous" -> "analogous" tracing: Comment why cond_snapshot is checked outside of max_lock protection tracing: Add hist trigger action 'expected fail' test case tracing: Add alternative synthetic event trace action test case tracing: Add hist trigger onchange() handler test case tracing: Add hist trigger snapshot() action test case tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases tracing: Add alternative synthetic event trace action syntax tracing: Add hist trigger onchange() handler Documentation tracing: Add hist trigger onchange() handler tracing: Add hist trigger snapshot() action Documentation tracing: Add hist trigger snapshot() action tracing: Add conditional snapshot ...
2019-03-10Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "Mimi Zohar says: 'Linux 5.0 introduced the platform keyring to allow verifying the IMA kexec kernel image signature using the pre-boot keys. This pull request similarly makes keys on the platform keyring accessible for verifying the PE kernel image signature. Also included in this pull request is a new IMA hook that tags tmp files, in policy, indicating the file hash needs to be calculated. The remaining patches are cleanup'" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: evm: Use defined constant for UUID representation ima: define ima_post_create_tmpfile() hook and add missing call evm: remove set but not used variable 'xattr' encrypted-keys: fix Opt_err/Opt_error = -1 kexec, KEYS: Make use of platform keyring for signature verify integrity, KEYS: add a reference to platform keyring
2019-03-10Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Make the unwinder more robust when it encounters a NULL pointer call, so the backtrace becomes more useful - Fix the bogus ORC unwind table alignment - Prevent kernel panic during kexec on HyperV caused by a cleared but not disabled hypercall page. - Remove the now pointless stacksize increase for KASAN_EXTRA, as KASAN_EXTRA is gone. - Remove unused variables from the x86 memory management code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Fix kernel panic when kexec on HyperV x86/mm: Remove unused variable 'old_pte' x86/mm: Remove unused variable 'cpu' Revert "x86_64: Increase stack size for KASAN_EXTRA" x86/unwind: Add hardcoded ORC entry for NULL x86/unwind: Handle NULL pointer calls better in frame unwinder x86/unwind/orc: Fix ORC unwind table alignment
2019-03-10Merge tag 'iommu-updates-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - A big cleanup and optimization patch-set for the Tegra GART driver - Documentation updates and fixes for the IOMMU-API - Support for page request in Intel VT-d scalable mode - Intel VT-d dma_[un]map_resource() support - Updates to the ATS enabling code for PCI (acked by Bjorn) and Intel VT-d to align with the latest version of the ATS spec - Relaxed IRQ source checking in the Intel VT-d driver for some aliased devices, needed for future devices which send IRQ messages from more than on request-ID - IRQ remapping driver for Hyper-V - Patches to make generic IOVA and IO-Page-Table code usable outside of the IOMMU code - Various other small fixes and cleanups * tag 'iommu-updates-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits) iommu/vt-d: Get domain ID before clear pasid entry iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm() iommu/vt-d: Set context field after value initialized iommu/vt-d: Disable ATS support on untrusted devices iommu/mediatek: Fix semicolon code style issue MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scope iommu/hyper-v: Add Hyper-V stub IOMMU driver x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available PCI/ATS: Add inline to pci_prg_resp_pasid_required() iommu/vt-d: Check identity map for hot-added devices iommu: Fix IOMMU debugfs fallout iommu: Document iommu_ops.is_attach_deferred() iommu: Document iommu_ops.iotlb_sync_map() iommu/vt-d: Enable ATS only if the device uses page aligned address. PCI/ATS: Add pci_ats_page_aligned() interface iommu/vt-d: Fix PRI/PASID dependency issue. PCI/ATS: Add pci_prg_resp_pasid_required() interface. iommu/vt-d: Allow interrupts from the entire bus for aliased devices iommu/vt-d: Add helper to set an IRTE to verify only the bus number iommu: Fix flush_tlb_all typo ...
2019-03-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: "This time around we have in store: - Disable MC4_MISC thresholding banks on all AMD family 0x15 models (Shirish S) - AMD MCE error descriptions update and error decode improvements (Yazen Ghannam) - The usual smaller conversions and fixes" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Improve error message when kernel cannot recover, p2 EDAC/mce_amd: Decode MCA_STATUS in bit definition order EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit EDAC, mce_amd: Print ExtErrorCode and description on a single line EDAC, mce_amd: Match error descriptions to latest documentation x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types RAS: Add a MAINTAINERS entry RAS: Use consistent types for UUIDs x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models x86/MCE: Switch to use the new generic UUID API
2019-03-07Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kdump update from Ingo Molnar: "Add the AMD SME mask to the vmcoreinfo, and also document our vmcoreinfo fields" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kdump: Document kernel data exported in the vmcoreinfo note x86/kdump: Export the SME mask to vmcoreinfo
2019-03-07Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "Three changes: - preparatory patch for AVX state tracking that computing-cluster folks would like to use for user-space batching - but we are not happy about the related ABI yet so this is only the kernel tracking side - a cleanup for CR0 handling in do_device_not_available() - plus we removed a workaround for an ancient binutils version" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Track AVX-512 usage of tasks x86/fpu: Get rid of CONFIG_AS_FXSAVEQ x86/traps: Have read_cr0() only once in the #NM handler
2019-03-07Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Various cleanups and simplifications, none of them really stands out, they are all over the place" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Remove unused __addr_ok() macro x86/smpboot: Remove unused phys_id variable x86/mm/dump_pagetables: Remove the unused prev_pud variable x86/fpu: Move init_xstate_size() to __init section x86/cpu_entry_area: Move percpu_setup_debug_store() to __init section x86/mtrr: Remove unused variable x86/boot/compressed/64: Explain paging_prepare()'s return value x86/resctrl: Remove duplicate MSR_MISC_FEATURE_CONTROL definition x86/asm/suspend: Drop ENTRY from local data x86/hw_breakpoints, kprobes: Remove kprobes ifdeffery x86/boot: Save several bytes in decompressor x86/trap: Remove useless declaration x86/mm/tlb: Remove unused cpu variable x86/events: Mark expected switch-case fall-throughs x86/asm-prototypes: Remove duplicate include <asm/page.h> x86/kernel: Mark expected switch-case fall-throughs x86/insn-eval: Mark expected switch-case fall-through x86/platform/UV: Replace kmalloc() and memset() with k[cz]alloc() calls x86/e820: Replace kmalloc() + memcpy() with kmemdup()
2019-03-07Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Misc cleanups and a retpoline code generation optimization" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, retpolines: Raise limit for generating indirect calls from switch-case x86/build: Use the single-argument OUTPUT_FORMAT() linker script command x86/build: Specify elf_i386 linker emulation explicitly for i386 objects x86/build: Mark per-CPU symbols as absolute explicitly for LLD
2019-03-07Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Most of the changes center around the difficult problem of KASLR pinning down hot-removable memory regions. At the very early stage KASRL is making irreversible kernel address layout decisions we don't have full knowledge about the memory maps yet. So the changes from Chao Fan add this (parsing the RSDP table early), together with fixes from Borislav Petkov" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/compressed/64: Do not read legacy ROM on EFI system x86/boot: Correct RSDP parsing with 32-bit EFI x86/kexec: Fill in acpi_rsdp_addr from the first kernel x86/boot: Fix randconfig build error due to MEMORY_HOTREMOVE x86/boot: Fix cmdline_find_option() prototype visibility x86/boot/KASLR: Limit KASLR to extract the kernel in immovable memory only x86/boot: Parse SRAT table and count immovable memory regions x86/boot: Early parse RSDP and save it in boot_params x86/boot: Search for RSDP in memory x86/boot: Search for RSDP in the EFI tables x86/boot: Add "acpi_rsdp=" early parsing x86/boot: Copy kstrtoull() to boot/string.c x86/boot: Build the command line parsing code unconditionally
2019-03-06x86/unwind: Add hardcoded ORC entry for NULLJann Horn
When the ORC unwinder is invoked for an oops caused by IP==0, it currently has no idea what to do because there is no debug information for the stack frame of NULL. But if RIP is NULL, it is very likely that the last successfully executed instruction was an indirect CALL/JMP, and it is possible to unwind out in the same way as for the first instruction of a normal function. Hardcode a corresponding ORC entry. With an artificially-added NULL call in prctl_set_seccomp(), before this patch, the trace is: Call Trace: ? __x64_sys_prctl+0x402/0x680 ? __ia32_sys_prctl+0x6e0/0x6e0 ? __do_page_fault+0x457/0x620 ? do_syscall_64+0x6d/0x160 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 After this patch, the trace looks like this: Call Trace: __x64_sys_prctl+0x402/0x680 ? __ia32_sys_prctl+0x6e0/0x6e0 ? __do_page_fault+0x457/0x620 do_syscall_64+0x6d/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 prctl_set_seccomp() still doesn't show up in the trace because for some reason, tail call optimization is only disabled in builds that use the frame pointer unwinder. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.com
2019-03-06x86/unwind: Handle NULL pointer calls better in frame unwinderJann Horn
When the frame unwinder is invoked for an oops caused by a call to NULL, it currently skips the parent function because BP still points to the parent's stack frame; the (nonexistent) current function only has the first half of a stack frame, and BP doesn't point to it yet. Add a special case for IP==0 that calculates a fake BP from SP, then uses the real BP for the next frame. Note that this handles first_frame specially: Return information about the parent function as long as the saved IP is >=first_frame, even if the fake BP points below it. With an artificially-added NULL call in prctl_set_seccomp(), before this patch, the trace is: Call Trace: ? prctl_set_seccomp+0x3a/0x50 __x64_sys_prctl+0x457/0x6f0 ? __ia32_sys_prctl+0x750/0x750 do_syscall_64+0x72/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 After this patch, the trace is: Call Trace: prctl_set_seccomp+0x3a/0x50 __x64_sys_prctl+0x457/0x6f0 ? __ia32_sys_prctl+0x750/0x750 do_syscall_64+0x72/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20190301031201.7416-1-jannh@google.com
2019-03-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits) tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include proc: more robust bulk read test proc: test /proc/*/maps, smaps, smaps_rollup, statm proc: use seq_puts() everywhere proc: read kernel cpu stat pointer once proc: remove unused argument in proc_pid_lookup() fs/proc/thread_self.c: code cleanup for proc_setup_thread_self() fs/proc/self.c: code cleanup for proc_setup_self() proc: return exit code 4 for skipped tests mm,mremap: bail out earlier in mremap_to under map pressure mm/sparse: fix a bad comparison mm/memory.c: do_fault: avoid usage of stale vm_area_struct writeback: fix inode cgroup switching comment mm/huge_memory.c: fix "orig_pud" set but not used mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC mm/memcontrol.c: fix bad line in comment mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm/compaction: pass pgdat to too_many_isolated() instead of zone mm: remove zone_lru_lock() function, access ->lru_lock directly ...
2019-03-06Merge branch 'x86-alternatives-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 alternative instruction updates from Ingo Molnar: "Small RDTSCP opimization, enabled by the newly added ALTERNATIVE_3(), and other small improvements" * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/TSC: Use RDTSCP x86/alternatives: Add an ALTERNATIVE_3() macro x86/alternatives: Print containing function x86/alternatives: Add macro comments
2019-03-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of tooling updates - too many to list, here's a few highlights: - Various subcommand updates to 'perf trace', 'perf report', 'perf record', 'perf annotate', 'perf script', 'perf test', etc. - CPU and NUMA topology and affinity handling improvements, - HW tracing and HW support updates: - Intel PT updates - ARM CoreSight updates - vendor HW event updates - BPF updates - Tons of infrastructure updates, both on the build system and the library support side - Documentation updates. - ... and lots of other changes, see the changelog for details. Kernel side updates: - Tighten up kprobes blacklist handling, reduce the number of places where developers can install a kprobe and hang/crash the system. - Fix/enhance vma address filter handling. - Various PMU driver updates, small fixes and additions. - refcount_t conversions - BPF updates - error code propagation enhancements - misc other changes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits) perf script python: Add Python3 support to syscall-counts-by-pid.py perf script python: Add Python3 support to syscall-counts.py perf script python: Add Python3 support to stat-cpi.py perf script python: Add Python3 support to stackcollapse.py perf script python: Add Python3 support to sctop.py perf script python: Add Python3 support to powerpc-hcalls.py perf script python: Add Python3 support to net_dropmonitor.py perf script python: Add Python3 support to mem-phys-addr.py perf script python: Add Python3 support to failed-syscalls-by-pid.py perf script python: Add Python3 support to netdev-times.py perf tools: Add perf_exe() helper to find perf binary perf script: Handle missing fields with -F +.. perf data: Add perf_data__open_dir_data function perf data: Add perf_data__(create_dir|close_dir) functions perf data: Fail check_backup in case of error perf data: Make check_backup work over directories perf tools: Add rm_rf_perf_data function perf tools: Add pattern name checking to rm_rf perf tools: Add depth checking to rm_rf perf data: Add global path holder ...
2019-03-06Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main EFI changes in this cycle were: - Use 32-bit alignment for efi_guid_t - Allow the SetVirtualAddressMap() call to be omitted - Implement earlycon=efifb based on existing earlyprintk code - Various minor fixes and code cleanups from Sai, Ard and me" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Fix build error due to enum collision between efi.h and ima.h efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted efi: Replace GPL license boilerplate with SPDX headers efi/fdt: Apply more cleanups efi: Use 32-bit alignment for efi_guid_t efi/memattr: Don't bail on zero VA if it equals the region's PA x86/efi: Mark can_free_region() as an __init function
2019-03-05numa: make "nr_node_ids" unsigned intAlexey Dobriyan
Number of NUMA nodes can't be negative. This saves a few bytes on x86_64: add/remove: 0/0 grow/shrink: 4/21 up/down: 27/-265 (-238) Function old new delta hv_synic_alloc.cold 88 110 +22 prealloc_shrinker 260 262 +2 bootstrap 249 251 +2 sched_init_numa 1566 1567 +1 show_slab_objects 778 777 -1 s_show 1201 1200 -1 kmem_cache_init 346 345 -1 __alloc_workqueue_key 1146 1145 -1 mem_cgroup_css_alloc 1614 1612 -2 __do_sys_swapon 4702 4699 -3 __list_lru_init 655 651 -4 nic_probe 2379 2374 -5 store_user_store 118 111 -7 red_zone_store 106 99 -7 poison_store 106 99 -7 wq_numa_init 348 338 -10 __kmem_cache_empty 75 65 -10 task_numa_free 186 173 -13 merge_across_nodes_store 351 336 -15 irq_create_affinity_masks 1261 1246 -15 do_numa_crng_init 343 321 -22 task_numa_fault 4760 4737 -23 swapfile_init 179 156 -23 hv_synic_alloc 536 492 -44 apply_wqattrs_prepare 746 695 -51 Link: http://lkml.kernel.org/r/20190201223029.GA15820@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti update from Thomas Gleixner: "Just a single change from the anti-performance departement: - Add a new PR_SPEC_DISABLE_NOEXEC option which allows to apply the speculation protections on a process without inheriting the state on exec. This remedies a situation where a Java-launcher has speculation protections enabled because that's the default for JVMs which causes the launched regular harmless processes to inherit the protection state which results in unintended performance degradation" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
2019-03-05x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ↵Steven Rostedt (VMware)
ftrace_call_replace() Arnd reported the following compiler warning: arch/x86/kernel/ftrace.c:669:23: error: 'ftrace_jmp_replace' defined but not used [-Werror=unused-function] The ftrace_jmp_replace() function now only has a single user and should be simply moved by that user. But looking at the code, it shows that ftrace_jmp_replace() is similar to ftrace_call_replace() except that instead of using the opcode of 0xe8 it uses 0xe9. It makes more sense to consolidate that function into one implementation that both ftrace_jmp_replace() and ftrace_call_replace() use by passing in the op code separate. The structure in ftrace_code_union is also modified to replace the "e8" field with the more appropriate name "op". Cc: stable@vger.kernel.org Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/20190304200748.1418790-1-arnd@arndb.de Fixes: d2a68c4effd8 ("x86/ftrace: Do not call function graph from dynamic trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-28x86/Hyper-V: Set x2apic destination mode to physical when x2apic is availableLan Tianyu
Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic, set x2apic destination mode to physcial mode when x2apic is available and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have 8-bit APIC id. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-02-28kernfs, sysfs, cgroup, intel_rdt: Support fs_contextDavid Howells
Make kernfs support superblock creation/mount/remount with fs_context. This requires that sysfs, cgroup and intel_rdt, which are built on kernfs, be made to support fs_context also. Notes: (1) A kernfs_fs_context struct is created to wrap fs_context and the kernfs mount parameters are moved in here (or are in fs_context). (2) kernfs_mount{,_ns}() are made into kernfs_get_tree(). The extra namespace tag parameter is passed in the context if desired (3) kernfs_free_fs_context() is provided as a destructor for the kernfs_fs_context struct, but for the moment it does nothing except get called in the right places. (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to pass, but possibly this should be done anyway in case someone wants to add a parameter in future. (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and the cgroup v1 and v2 mount parameters are all moved there. (6) cgroup1 parameter parsing error messages are now handled by invalf(), which allows userspace to collect them directly. (7) cgroup1 parameter cleanup is now done in the context destructor rather than in the mount/get_tree and remount functions. Weirdies: (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held, but then uses the resulting pointer after dropping the locks. I'm told this is okay and needs commenting. (*) The cgroup refcount web. This really needs documenting. (*) cgroup2 only has one root? Add a suggestion from Thomas Gleixner in which the RDT enablement code is placed into its own function. [folded a leak fix from Andrey Vagin] Signed-off-by: David Howells <dhowells@redhat.com> cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Tejun Heo <tj@kernel.org> cc: Li Zefan <lizefan@huawei.com> cc: Johannes Weiner <hannes@cmpxchg.org> cc: cgroups@vger.kernel.org cc: fenghua.yu@intel.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-25x86/mce: Improve error message when kernel cannot recover, p2Tony Luck
In c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover") a case was added for a machine check caused by a DATA access to poison memory from the kernel. A case should have been added also for an uncorrectable error during an instruction fetch in the kernel. Add that extra case so the error message now reads: mce: [Hardware Error]: Machine check: Instruction fetch error in kernel Fixes: c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190225205940.15226-1-tony.luck@intel.com
2019-02-20x86: kvmguest: use TSC clocksource if invariant TSC is exposedMarcelo Tosatti
The invariant TSC bit has the following meaning: "The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." IOW, TSC does not change frequency. In such case, and with TSC scaling hardware available to handle migration, it is possible to use the TSC clocksource directly, whose system calls are faster. Reduce the rating of kvmclock clocksource to allow TSC clocksource to be the default if invariant TSC is exposed. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> v2: Use feature bits and tsc_unstable() check (Sean Christopherson) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20x86/kvmclock: set offset for kvm unstable clockPavel Tatashin
VMs may show incorrect uptime and dmesg printk offsets on hypervisors with unstable clock. The problem is produced when VM is rebooted without exiting from qemu. The fix is to calculate clock offset not only for stable clock but for unstable clock as well, and use kvm_sched_clock_read() which substracts the offset for both clocks. This is safe, because pvclock_clocksource_read() does the right thing and makes sure that clock always goes forward, so once offset is calculated with unstable clock, we won't get new reads that are smaller than offset, and thus won't get negative results. Thank you Jon DeVree for helping to reproduce this issue. Fixes: 857baa87b642 ("sched/clock: Enable sched clock early") Cc: stable@vger.kernel.org Reported-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-18x86/smpboot: Remove unused phys_id variableShaokun Zhang
The 'phys_id' local variable became unused after commit ce4b1b16502b ("x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"). Remove it. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alison Schofield <alison.schofield@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/1550495101-41755-1-git-send-email-zhangshaokun@hisilicon.com
2019-02-18x86: respect memory size limiting via mem= parameterJuergen Gross
When limiting memory size via kernel parameter "mem=" this should be respected even in case of memory made accessible via a PCI card. Today this kind of memory won't be made usable in initial memory setup as the memory won't be visible in E820 map, but it might be added when adding PCI devices due to corresponding ACPI table entries. Not respecting "mem=" can be corrected by adding a global max_mem_size variable set by parse_memopt() which will result in rejecting adding memory areas resulting in a memory size above the allowed limit. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-02-13x86/kprobes: Prohibit probing on IRQ handlers directlyMasami Hiramatsu
Prohibit probing on IRQ handlers in irqentry_text because if it interrupts user mode, at that point we haven't changed to kernel space yet and which eventually leads a double fault. E.g. # echo p apic_timer_interrupt > kprobe_events # echo 1 > events/kprobes/enable PANIC: double fault, error_code: 0x0 CPU: 1 PID: 814 Comm: less Not tainted 4.20.0-rc3+ #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:error_entry+0x12/0xf0 [snip] Call Trace: <ENTRY_TRAMPOLINE> ? native_iret+0x7/0x7 ? async_page_fault+0x8/0x30 ? trace_hardirqs_on_thunk+0x1c/0x1c ? error_entry+0x7c/0xf0 ? async_page_fault+0x8/0x30 ? native_iret+0x7/0x7 ? int3+0xa/0x20 ? trace_hardirqs_on_thunk+0x1c/0x1c ? error_entry+0x7c/0xf0 ? int3+0xa/0x20 ? apic_timer_interrupt+0x1/0x20 </ENTRY_TRAMPOLINE> Kernel panic - not syncing: Machine halted. Kernel Offset: disabled ---[ end Kernel panic - not syncing: Machine halted. ]--- Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998796400.31052.8406236614820687840.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13x86/kprobes: Prohibit probing on functions before kprobe_int3_handler()Masami Hiramatsu
Prohibit probing on the functions called before kprobe_int3_handler() in do_int3(). More specifically, ftrace_int3_handler(), poke_int3_handler(), and ist_enter(). And since rcu_nmi_enter() is called by ist_enter(), it also should be marked as NOKPROBE_SYMBOL. Since those are handled before kprobe_int3_handler(), probing those functions can cause a breakpoint recursion and crash the kernel. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998793571.31052.11301258949601150994.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13x86/kprobes: Move trampoline code into RODATAMasami Hiramatsu
Move optprobe trampoline code into RODATA since it is not executed, but copied and modified to be used on a trampoline buffer. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998790744.31052.3016106262944915510.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13x86/kprobes: Prohibit probing on optprobe template codeMasami Hiramatsu
Prohibit probing on optprobe template code, since it is not a code but a template instruction sequence. If we modify this template, copied template must be broken. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 9326638cbee2 ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation") Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11Merge tag 'v5.0-rc6' into x86/fpu, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11Merge branch 'x86/cpu' into perf/core, to pick up dependent commitIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11x86/cpufeature: Add facility to check for min microcode revisionsKan Liang
For bug workarounds or checks, it is useful to check for specific microcode revisions. Add a new generic function to match the CPU with stepping. Add the other function to check the min microcode revisions for the matched CPU. A new table format is introduced to facilitate the quirk to fill the related information. This does not change the existing x86_cpu_id because it's an ABI shared with modules, and also has quite different requirements, as in no wildcards, but everything has to be matched exactly. Originally-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Link: https://lkml.kernel.org/r/1549319013-4522-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-10Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A handful of fixes: - Fix an MCE corner case bug/crash found via MCE injection testing - Fix 5-level paging boot crash - Fix MCE recovery cache invalidation bug - Fix regression on Xen guests caused by a recent PMD level mremap speedup optimization" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Make set_pmd_at() paravirt aware x86/mm/cpa: Fix set_mce_nospec() x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()