summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2018-04-26x86/vector: Remove the macro VECTOR_OFFSET_STARTDou Liyang
Now, Linux uses matrix allocator for vector assignment, the original assignment code which used VECTOR_OFFSET_START has been removed. So remove the stale macro as well. Fixes: commit 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180425020553.17210-1-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-26x86/cpufeatures: Enumerate cldemote instructionFenghua Yu
cldemote is a new instruction in future x86 processors. It hints to hardware that a specified cache line should be moved ("demoted") from the cache(s) closest to the processor core to a level more distant from the processor core. This instruction is faster than snooping to make the cache line available for other cores. cldemote instruction is indicated by the presence of the CPUID feature flag CLDEMOTE (CPUID.(EAX=0x7, ECX=0):ECX[bit25]). More details on cldemote instruction can be found in the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: "Ashok Raj" <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1524508162-192587-1-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-25perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1Kan Liang
The SMM freeze feature was introduced since PerfMon V2. But the current code unconditionally enables the feature for all platforms. It can generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for the machine with PerfMon V1. To disable the feature for PerfMon V1, perf needs to - Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to intel_pmu, which is only applied to PerfMon V2 and later. - Check the PerfMon version before flipping the SMM bit when starting CPU Fixes: 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ak@linux.intel.com Cc: eranian@google.com Cc: acme@redhat.com Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
2018-04-25tracing/x86: Update syscall trace events to handle new prefixed syscall func ↵Steven Rostedt (VMware)
names Arnaldo noticed that the latest kernel is missing the syscall event system directory in x86. I bisected it down to d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()"). The system call trace events are special, as there is only one trace event for all system calls (the raw_syscalls). But a macro that wraps the system calls creates meta data for them that copies the name to find the system call that maps to the system call table (the number). At boot up, it does a kallsyms lookup of the system call table to find the function that maps to the meta data of the system call. If it does not find a function, then that system call is ignored. Because the x86 system calls had "__x64_", or "__ia32_" prefixed to the "sys" for the names, they do not match the default compare algorithm. As this was a problem for power pc, the algorithm can be overwritten by the architecture. The solution is to have x86 have its own algorithm to do the compare and this brings back the system call trace events. Link: http://lkml.kernel.org/r/20180417174128.0f3457f0@gandalf.local.home Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Fixes: d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-25Merge tag 'kvmarm-fixes-for-4.17-1' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/arm fixes for 4.17, take #1 - PSCI selection API, a leftover from 4.16 - Kick vcpu on active interrupt affinity change - Plug a VMID allocation race on oversubscribed systems - Silence debug messages - Update Christoffer's email address
2018-04-25powerpc: Fix smp_send_stop NMI IPI handlingNicholas Piggin
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count over the handler function call, which causes later smp_send_nmi_ipi() callers to spin until the call is finished. The stop_this_cpu() function never returns, so the busy count is never decremeted, which can cause the system to hang in some cases. For example panic() will call smp_send_stop() early on which calls stop_this_cpu() on other CPUs, then later in the reboot path, pnv_restart() will call smp_send_stop() again, which hangs. Fix this by adding a special case to the stop_this_cpu() handler to decrement the busy count, because it will never return. Now that the NMI/non-NMI versions of stop_this_cpu() are different, split them out into separate functions rather than doing #ifdef tricks to share the body between the two functions. Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out the functions, tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25x86/pti: Filter at vma->vm_page_prot populationDave Hansen
commit ce9962bf7e22bb3891655c349faff618922d4a73 0day reported warnings at boot on 32-bit systems without NX support: attempted to set unsupported pgprot: 8000000000000025 bits: 8000000000000000 supported: 7fffffffffffffff WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/pgtable.h:540 handle_mm_fault+0xfc1/0xfe0: check_pgprot at arch/x86/include/asm/pgtable.h:535 (inlined by) pfn_pte at arch/x86/include/asm/pgtable.h:549 (inlined by) do_anonymous_page at mm/memory.c:3169 (inlined by) handle_pte_fault at mm/memory.c:3961 (inlined by) __handle_mm_fault at mm/memory.c:4087 (inlined by) handle_mm_fault at mm/memory.c:4124 The problem is that due to the recent commit which removed auto-massaging of page protections, filtering page permissions at PTE creation time is not longer done, so vma->vm_page_prot is passed unfiltered to PTE creation. Filter the page protections before they are installed in vma->vm_page_prot. Fixes: fb43d6cb91 ("x86/mm: Do not auto-massage page protections") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222028.99D72858@viggo.jf.intel.com
2018-04-25x86/pti: Disallow global kernel text with RANDSTRUCTDave Hansen
commit 26d35ca6c3776784f8156e1d6f80cc60d9a2a915 RANDSTRUCT derives its hardening benefits from the attacker's lack of knowledge about the layout of kernel data structures. Keep the kernel image non-global in cases where RANDSTRUCT is in use to help keep the layout a secret. Fixes: 8c06c7740 (x86/pti: Leave kernel text global for !PCID) Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lkml.kernel.org/r/20180420222026.D0B4AAC9@viggo.jf.intel.com
2018-04-25x86/pti: Reduce amount of kernel text allowed to be GlobalDave Hansen
commit abb67605203687c8b7943d760638d0301787f8d9 Kees reported to me that I made too much of the kernel image global. It was far more than just text: I think this is too much set global: _end is after data, bss, and brk, and all kinds of other stuff that could hold secrets. I think this should match what mark_rodata_ro() is doing. This does exactly that. We use __end_rodata_hpage_align as our marker both because it is huge-page-aligned and it does not contain any sections we expect to hold secrets. Kees's logic was that r/o data is in the kernel image anyway and, in the case of traditional distributions, can be freely downloaded from the web, so there's no reason to hide it. Fixes: 8c06c7740 (x86/pti: Leave kernel text global for !PCID) Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222023.1C8B2B20@viggo.jf.intel.com
2018-04-25x86/pti: Fix boot warning from Global-bit settingDave Hansen
commit 231df823c4f04176f607afc4576c989895cff40e The pageattr.c code attempts to process "faults" when it goes looking for PTEs to change and finds non-present entries. It allows these faults in the linear map which is "expected to have holes", but WARN()s about them elsewhere, like when called on the kernel image. However, change_page_attr_clear() is now called on the kernel image in the process of trying to clear the Global bit. This trips the warning in __cpa_process_fault() if a non-present PTE is encountered in the kernel image. The "holes" in the kernel image result from free_init_pages()'s use of set_memory_np(). These holes are totally fine, and result from normal operation, just as they would be in the kernel linear map. Just silence the warning when holes in the kernel image are encountered. Fixes: 39114b7a7 (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image) Reported-by: Mariusz Ceier <mceier@gmail.com> Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222021.1C7D2B3F@viggo.jf.intel.com
2018-04-25x86/pti: Fix boot problems from Global-bit settingDave Hansen
commit 16dce603adc9de4237b7bf2ff5c5290f34373e7b Part of the global bit _setting_ patches also includes clearing the Global bit when it should not be enabled. That is done with set_memory_nonglobal(), which uses change_page_attr_clear() in pageattr.c under the covers. The TLB flushing code inside pageattr.c has has checks like BUG_ON(irqs_disabled()), looking for interrupt disabling that might cause deadlocks. But, these also trip in early boot on certain preempt configurations. Just copy the existing BUG_ON() sequence from cpa_flush_range() to the other two sites and check for early boot. Fixes: 39114b7a7 (x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image) Reported-by: Mariusz Ceier <mceier@gmail.com> Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Kees Cook <keescook@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: https://lkml.kernel.org/r/20180420222019.20C4A410@viggo.jf.intel.com
2018-04-25rtc: opal: Fix OPAL RTC driver OPAL_BUSY loopsNicholas Piggin
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24arm64: support __int128 with clangJason A. Donenfeld
Commit fb8722735f50 ("arm64: support __int128 on gcc 5+") added support for arm64 __int128 with gcc with a version-conditional, but neglected to enable this for clang, which in fact appears to support aarch64 __int128. This commit therefore enables it if the compiler is clang, using the same type of makefile conditional used elsewhere in the tree. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-24arm64: only advance singlestep for user instruction trapsMark Rutland
Our arm64_skip_faulting_instruction() helper advances the userspace singlestep state machine, but this is also called by the kernel BRK handler, as used for WARN*(). Thus, if we happen to hit a WARN*() while the user singlestep state machine is in the active-no-pending state, we'll advance to the active-pending state without having executed a user instruction, and will take a step exception earlier than expected when we return to userspace. Let's fix this by only advancing the state machine when skipping a user instruction. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-24arm64/kernel: rename module_emit_adrp_veneer->module_emit_veneer_for_adrpKim Phillips
Commit a257e02579e ("arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419") introduced a function whose name ends with "_veneer". This clashes with commit bd8b22d2888e ("Kbuild: kallsyms: ignore veneers emitted by the ARM linker"), which removes symbols ending in "_veneer" from kallsyms. The problem was manifested as 'perf test -vvvvv vmlinux' failed, correctly claiming the symbol 'module_emit_adrp_veneer' was present in vmlinux, but not in kallsyms. ... ERR : 0xffff00000809aa58: module_emit_adrp_veneer not on kallsyms ... test child finished with -1 ---- end ---- vmlinux symtab matches kallsyms: FAILED! Fix the problem by renaming module_emit_adrp_veneer to module_emit_veneer_for_adrp. Now the test passes. Fixes: a257e02579e ("arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Kim Phillips <kim.phillips@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-24arm64: ptrace: remove addr_limit manipulationMark Rutland
We transiently switch to KERNEL_DS in compat_ptrace_gethbpregs() and compat_ptrace_sethbpregs(), but in either case this is pointless as we don't perform any uaccess during this window. let's rip out the redundant addr_limit manipulation. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-24RISC-V: build vdso-dummy.o with -no-pieAurelien Jarno
Debian toolcahin defaults to PIE, and I guess that will also be the case of most distributions. This causes the following build failure: AS arch/riscv/kernel/vdso/getcpu.o AS arch/riscv/kernel/vdso/flush_icache.o VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg OBJCOPY arch/riscv/kernel/vdso/vdso.so AS arch/riscv/kernel/vdso/vdso.o VDSOLD arch/riscv/kernel/vdso/vdso-dummy.o LD arch/riscv/kernel/vdso/vdso-syms.o riscv64-linux-gnu-ld: attempted static link of dynamic object `arch/riscv/kernel/vdso/vdso-dummy.o' make[2]: *** [arch/riscv/kernel/vdso/Makefile:43: arch/riscv/kernel/vdso/vdso-syms.o] Error 1 make[1]: *** [scripts/Makefile.build:575: arch/riscv/kernel/vdso] Error 2 make: *** [Makefile:1018: arch/riscv/kernel] Error 2 While the root Makefile correctly passes "-fno-PIE" to build individual object files, the RISC-V kernel also builds vdso-dummy.o as an executable, which is therefore linked as PIE. Fix that by updating this specific link rule to also include "-no-pie". Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24riscv: there is no <asm/handle_irq.h>Christoph Hellwig
So don't list it as generic-y. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24riscv: select DMA_DIRECT_OPS instead of redefining itChristoph Hellwig
DMA_DIRECT_OPS is defined in lib/Kconfig, so don't duplicate it in arch/riscv/Kconfig. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24arm64: mm: drop addr parameter from sync icache and dcacheShaokun Zhang
The addr parameter isn't used for anything. Let's simplify and get rid of it, like arm. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-24x86/microcode: Do not exit early from __reload_late()Borislav Petkov
Vitezslav reported a case where the "Timeout during microcode update!" panic would hit. After a deeper look, it turned out that his .config had CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a no-op. When that happened, the discovered microcode patch wasn't saved into the cache and the late loading path wouldn't find any. This, then, lead to early exit from __reload_late() and thus CPUs waiting until the timeout is reached, leading to the panic. In hindsight, that function should have been written so it does not return before the post-synchronization. Oh well, I know better now... Fixes: bb8c13d61a62 ("x86/microcode: Fix CPU synchronization routine") Reported-by: Vitezslav Samel <vitezslav@samel.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vitezslav Samel <vitezslav@samel.cz> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz Link: https://lkml.kernel.org/r/20180421081930.15741-2-bp@alien8.de
2018-04-24x86/microcode/intel: Save microcode patch unconditionallyBorislav Petkov
save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the generic_load_microcode() path saves the microcode patches it has found into the cache of patches which is used for late loading too. Regardless of whether CPU hotplug is used or not. Make the saving unconditional so that late loading can find the proper patch. Reported-by: Vitezslav Samel <vitezslav@samel.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vitezslav Samel <vitezslav@samel.cz> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz Link: https://lkml.kernel.org/r/20180421081930.15741-1-bp@alien8.de
2018-04-24powerpc/mce: Fix a bug where mce loops on memory UE.Mahesh Salgaonkar
The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large ↵Alistair Popple
address range The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback ↵Alistair Popple
parameters There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroyAlistair Popple
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/memtrace: Let the arch hotunplug code flush cacheBalbir Singh
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/mm: Flush cache on memory hot(un)plugBalbir Singh
This patch adds support for flushing potentially dirty cache lines when memory is hot-plugged/hot-un-plugged. The support is currently limited to 64 bit systems. The bug was exposed when mappings for a device were actually hot-unplugged and plugged in back later. A similar issue was observed during the development of memtrace, but memtrace does it's own flushing of region via a custom routine. These patches do a flush both on hotplug/unplug to clear any stale data in the cache w.r.t mappings, there is a small race window where a clean cache line may be created again just prior to tearing down the mapping. The patches were tested by disabling the flush routines in memtrace and doing I/O on the trace file. The system immediately checkstops (quite reliablly if prior to the hot-unplug of the memtrace region, we memset the regions we are about to hot unplug). After these patches no custom flushing is needed in the memtrace code. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-23arm64: add sentinel to kpti_safe_listMark Rutland
We're missing a sentinel entry in kpti_safe_list. Thus is_midr_in_range_list() can walk past the end of kpti_safe_list. Depending on the contents of memory, this could erroneously match a CPU's MIDR, cause a data abort, or other bad outcomes. Add the sentinel entry to avoid this. Fixes: be5b299830c63ed7 ("arm64: capabilities: Add support for checks based on a list of MIDRs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-23x86/jailhouse: Fix incorrect SPDX identifierThomas Gleixner
GPL2.0 is not a valid SPDX identiier. Replace it with GPL-2.0. Fixes: 4a362601baa6 ("x86/jailhouse: Add infrastructure for running in non-root cell") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Link: https://lkml.kernel.org/r/20180422220832.815346488@linutronix.de
2018-04-23s390: correct module section names for expoline code revertMartin Schwidefsky
The main linker script vmlinux.lds.S for the kernel image merges the expoline code patch tables into two section ".nospec_call_table" and ".nospec_return_table". This is *not* done for the modules, there the sections retain their original names as generated by gcc: ".s390_indirect_call", ".s390_return_mem" and ".s390_return_reg". The module_finalize code has to check for the compiler generated section names, otherwise no code patching is done. This slows down the module code in case of "spectre_v2=off". Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23s390: update sampling tag after task pid changeMartin Schwidefsky
In a multi-threaded program any thread can call execve(). If this is not done by the thread group leader, the de_thread() function replaces the pid of the task that calls execve() with the pid of thread group leader. If the task reaches user space again without going over __switch_to() the sampling tag is still set to the old pid. Define the arch_setup_new_exec function to verify the task pid and udpate the tag with LPP if it has changed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23s390/cpum_cf: rename IBM z13/z14 counter namesAndré Wild
Change the IBM z13/z14 counter names to be in sync with all other models. Cc: stable@vger.kernel.org # v4.12+ Fixes: 3593eb944c ("s390/cpum_cf: add hardware counter support for IBM z14") Fixes: 3fc7acebae ("s390/cpum_cf: add IBM z13 counter event names") Signed-off-by: André Wild <wild@linux.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-23s390/uprobes: implement arch_uretprobe_is_alive()Heiko Carstens
Implement s390 specific arch_uretprobe_is_alive() to avoid SIGSEGVs observed with uretprobes in combination with setjmp/longjmp. See commit 2dea1d9c38e4 ("powerpc/uprobes: Implement arch_uretprobe_is_alive()") for more details. With this implemented all test cases referenced in the above commit pass. Reported-by: Ziqian SUN <zsun@redhat.com> Cc: <stable@vger.kernel.org> # v4.3+ Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-22Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes for x86: - Prevent X2APIC ID 0xFFFFFFFF from being treated as valid, which causes the possible CPU count to be wrong. - Prevent 32bit truncation in calc_hpet_ref() which causes the TSC calibration to fail - Fix the page table setup for temporary text mappings in the resume code which causes resume failures - Make the page table dump code handle HIGHPTE correctly instead of oopsing - Support for topologies where NUMA nodes share an LLC to prevent a invalid topology warning and further malfunction on such systems. - Remove the now unused pci-nommu code - Remove stale function declarations" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power/64: Fix page-table setup for temporary text mapping x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y x86,sched: Allow topologies where NUMA nodes share an LLC x86/processor: Remove two unused function declarations x86/acpi: Prevent X2APIC id 0xffffffff from being accounted x86/tsc: Prevent 32bit truncation in calc_hpet_ref() x86: Remove pci-nommu.c
2018-04-22Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A larger set of updates for perf. Kernel: - Handle the SBOX uncore monitoring correctly on Broadwell CPUs which do not have SBOX. - Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The percentage of preempting and non-preempting context switches help understanding the nature of workloads (CPU or IO bound) that are running on a machine. This adds the kernel facility and userspace changes needed to show this information in 'perf script' and 'perf report -D' (Alexey Budankov) - Remove a WARN_ON() in the trace/kprobes code which is pointless because the return error code is already telling the caller what's wrong. - Revert a fugly workaround for clang BPF targets. - Fix sample_max_stack maximum check and do not proceed when an error has been detect, return them to avoid misidentifying errors (Jiri Olsa) - Add SPDX idenitifiers and get rid of GPL boilderplate. Tools: - Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar) - Support MAP_FIXED_NOREPLACE, noticed when updating the tools/include/ copies (Arnaldo Carvalho de Melo) - Add '\n' at the end of parse-options error messages (Ravi Bangoria) - Add s390 support for detailed/verbose PMU event description (Thomas Richter) - perf annotate fixes and improvements: * Allow showing offsets in more than just jump targets, use the new 'O' hotkey in the TUI, config ~/.perfconfig annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho de Melo) * Use the resolved variable names from objdump disassembled lines to make them more compact, just like was already done for some instructions, like "mov", this eventually will be done more generally, but lets now add some more to the existing mechanism (Arnaldo Carvalho de Melo) - perf record fixes: * Change warning for missing topology sysfs entry to debug, as not all architectures have those files, s390 being one of those (Thomas Richter) * Remove old error messages about things that unlikely to be the root cause in modern systems (Andi Kleen) - perf sched fixes: * Fix -g/--call-graph documentation (Takuya Yamamoto) - perf stat: * Enable 1ms interval for printing event counters values in (Alexey Budankov) - perf test fixes: * Run dwarf unwind on arm32 (Kim Phillips) * Remove unused ptrace.h include from LLVM test, sidesteping older clang's lack of support for some asm constructs (Arnaldo Carvalho de Melo) * Fixup BPF test using epoll_pwait syscall function probe, to cope with the syscall routines renames performed in this development cycle (Arnaldo Carvalho de Melo) - perf version fixes: * Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version --build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as libaudit won't be used in that case, print info about syscall_table support instead (Jin Yao) - Build system fixes: * Use HAVE_..._SUPPORT used consistently (Jin Yao) * Restore READ_ONCE() C++ compatibility in tools/include (Mark Rutland) * Give hints about package names needed to build jvmti (Arnaldo Carvalho de Melo)" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server" coresight: Move to SPDX identifier perf test BPF: Fixup BPF test using epoll_pwait syscall function probe perf tests mmap: Show which tracepoint is failing perf tools: Add '\n' at the end of parse-options error messages perf record: Remove suggestion to enable APIC perf record: Remove misleading error suggestion perf hists browser: Clarify top/report browser help perf mem: Allow all record/report options perf trace: Support MAP_FIXED_NOREPLACE perf: Remove superfluous allocation error check perf: Fix sample_max_stack maximum check perf: Return proper values for user stack errors perf list: Add s390 support for detailed/verbose PMU event description perf script: Extend misc field decoding with switch out event type perf report: Extend raw dump (-D) out with switch out event type perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE] tools/headers: Synchronize kernel ABI headers, v4.17-rc1 trace_kprobe: Remove warning message "Could not insert probe at..." ...
2018-04-21Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - kasan: avoid pfn_to_nid() before the page array is initialised - Fix typo causing the "upgrade" of known signals to SIGKILL * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: don't force known signals to SIGKILL arm64: kasan: avoid pfn_to_nid() before page array is initialized
2018-04-20kexec_file: do not add extra alignment to efi memmapDave Young
Chun-Yi reported a kernel warning message below: WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 early_iounmap+0x4f/0x12c() early_iounmap(ffffffffff200180, 00000118) [0] size not consistent 00000120 The problem is x86 kexec_file_load adds extra alignment to the efi memmap: in bzImage64_load(): efi_map_sz = efi_get_runtime_map_size(); efi_map_sz = ALIGN(efi_map_sz, 16); And __efi_memmap_init maps with the size including the alignment bytes but efi_memmap_unmap use nr_maps * desc_size which does not include the extra bytes. The alignment in kexec code is only needed for the kexec buffer internal use Actually kexec should pass exact size of the efi memmap to 2nd kernel. Link: http://lkml.kernel.org/r/20180417083600.GA1972@dhcp-128-65.nay.redhat.com Signed-off-by: Dave Young <dyoung@redhat.com> Reported-by: joeyli <jlee@suse.com> Tested-by: Randy Wright <rwright@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20proc: fix /proc/loadavg regressionAlexey Dobriyan
Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes. Some of that is only a matter with fault injection (broken handling of small allocation failure in various mount-related places), but the last one is a root-triggerable stack overflow, and combined with userns it gets really nasty ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Don't leak MNT_INTERNAL away from internal mounts mm,vmscan: Allow preallocating memory for register_shrinker(). rpc_pipefs: fix double-dput() orangefs_kill_sb(): deal with allocation failures jffs2_kill_sb(): deal with failed allocations hypfs_kill_super(): deal with failed allocations
2018-04-20arm/arm64: KVM: Add PSCI version selection APIMarc Zyngier
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: stable@vger.kernel.org #4.16 Reviewed-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-20Merge tag 'mips_fixes_4.17_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fixes from James Hogan: - io: Add barriers to read*() & write*() - dts: Fix boston PCI bus DTC warnings (4.17) - memset: Several corner case fixes (one 3.10, others longer) * tag 'mips_fixes_4.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: uaccess: Add micromips clobbers to bzero invocation MIPS: memset.S: Fix clobber of v1 in last_fixup MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup MIPS: memset.S: EVA & fault support for small_memset MIPS: dts: Boston: Fix PCI bus dtc warnings: MIPS: io: Add barrier after register read in readX() MIPS: io: Prevent compiler reordering writeX()
2018-04-20Merge tag 'powerpc-4.17-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix an off-by-one bug in our alternative asm patching which leads to incorrectly patched code. This bug lay dormant for nearly 10 years but we finally hit it due to a recent change. - Fix lockups when running KVM guests on Power8 due to a missing check when a thread that's running KVM comes out of idle. - Fix an out-of-spec behaviour in the XIVE code (P9 interrupt controller). - Fix EEH handling of bridge MMIO windows. - Prevent crashes in our RFI fallback flush handler if firmware didn't tell us the size of the L1 cache (only seen on simulators). Thanks to: Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling. * tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kvm: Fix lockups when running KVM guests on Power8 powerpc/eeh: Fix enabling bridge MMIO windows powerpc/xive: Fix trying to "push" an already active pool VP powerpc/64s: Default l1d_size to 64K in RFI fallback flush powerpc/lib: Fix off-by-one in alternate feature patching
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes and kexec-file-load from Martin Schwidefsky: "After the common code kexec patches went in via Andrew we can now push the architecture parts to implement the kexec-file-load system call. Plus a few more bug fixes and cleanups, this includes an update to the default configurations" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/signal: cleanup uapi struct sigaction s390: rename default_defconfig to debug_defconfig s390: remove gcov defconfig s390: update defconfig s390: add support for IBM z14 Model ZR1 s390: remove couple of duplicate includes s390/boot: remove unused COMPILE_VERSION and ccflags-y s390/nospec: include cpu.h s390/decompressor: Ignore file vmlinux.bin.full s390/kexec_file: add generated files to .gitignore s390/Kconfig: Move kexec config options to "Processor type and features" s390/kexec_file: Add ELF loader s390/kexec_file: Add crash support to image loader s390/kexec_file: Add image loader s390/kexec_file: Add kexec_file_load system call s390/kexec_file: Add purgatory s390/kexec_file: Prepare setup.h for kexec_file_load s390/smsgiucv: disable SMSG on module unload s390/sclp: avoid potential usage of uninitialized value
2018-04-20perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUsOskar Senft
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally despite the fact that there are no SBOXes available. Check the Power Control Unit CAPID4 register to determine the number of available SBOXes on the particular CPU before trying to enable them. If there are none, nullify the SBOX descriptor so it isn't tried to be initialized. Signed-off-by: Oskar Senft <osk@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mark van Dijk <mark@voidzero.net> Reviewed-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
2018-04-20perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"Stephane Eranian
This reverts commit 3b94a891667c ("perf/x86/intel/uncore: Remove SBOX support for Broadwell server") Revert because there exists a proper workaround for Broadwell-EP servers without SBOX now. Note that BDX-DE does not have a SBOX. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: osk@google.com Cc: mark@voidzero.net Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com
2018-04-20x86/power/64: Fix page-table setup for temporary text mappingJoerg Roedel
On a system with 4-level page-tables there is no p4d, so the pud in the pgd should be mapped. The old code before commit fb43d6cb91ef already did that. The change from above commit causes an invalid page-table which causes undefined behavior. In one report it caused triple faults. Fix it by changing the p4d back to pud. Fixes: fb43d6cb91ef ('x86/mm: Do not auto-massage page protections') Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Borislav Petkov <bp@suse.de> Cc: linux-pm@vger.kernel.org Cc: rjw@rjwysocki.net Cc: pavel@ucw.cz Cc: hpa@zytor.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/1524162360-26179-1-git-send-email-joro@8bytes.org
2018-04-19Merge branch 'omap-for-v4.17/fixes-ti-sysc' into omap-for-v4.17/fixesTony Lindgren
2018-04-19powerpc/kvm: Fix lockups when running KVM guests on Power8Michael Ellerman
When running KVM guests on Power8 we can see a lockup where one CPU stops responding. This often leads to a message such as: watchdog: CPU 136 detected hard LOCKUP on other CPUs 72 Task dump for CPU 72: qemu-system-ppc R running task 10560 20917 20908 0x00040004 And then backtraces on other CPUs, such as: Task dump for CPU 48: ksmd R running task 10032 1519 2 0x00000804 Call Trace: ... --- interrupt: 901 at smp_call_function_many+0x3c8/0x460 LR = smp_call_function_many+0x37c/0x460 pmdp_invalidate+0x100/0x1b0 __split_huge_pmd+0x52c/0xdb0 try_to_unmap_one+0x764/0x8b0 rmap_walk_anon+0x15c/0x370 try_to_unmap+0xb4/0x170 split_huge_page_to_list+0x148/0xa30 try_to_merge_one_page+0xc8/0x990 try_to_merge_with_ksm_page+0x74/0xf0 ksm_scan_thread+0x10ec/0x1ac0 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x78 This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle"), which added a check in pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and test of kvm_hstate.hwthread_req. The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when entering the guest, so it can then come out to cede with KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after setting hwthread_req to 1, but because hwthread_state is still KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we wake up from idle and won't go to kvm_start_guest. From there the thread will return somewhere garbage and crash. Fix it by skipping the store of hwthread_state, but not the test of hwthread_req, when coming out of idle. It's OK to skip the sync in that case because hwthread_req will have been set on the same thread, so there is no synchronisation required. Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19powerpc/eeh: Fix enabling bridge MMIO windowsMichael Neuling
On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset") Cc: stable@vger.kernel.org # v3.11+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>