summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-08-05x86/mce: Don't check for the overflow bit on action optional machine checksTony Luck
We currently do not process SRAO (Software Recoverable Action Optional) machine checks if they are logged with the overflow bit set to 1 in the machine check bank status register. This is overly conservative. There are two cases where we could end up with an SRAO+OVER log based on the SDM volume 3 overwrite rules in "Table 15-8. Overwrite Rules for UC, CE, and UCR Errors" 1) First a corrected error is logged, then the SRAO error overwrites. The second error overwrites the first because uncorrected errors have a higher severity than corrected errors. 2) The SRAO error was logged first, followed by a correcetd error. In this case the first error is retained in the bank. So in either case the machine check bank will contain the address of the SRAO error. So we can process that even if the overflow bit was set. Reported-by: Yongkai Wu <yongkaiwu@tencent.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190718182920.32621-1-tony.luck@intel.com
2019-08-02olpc: x01: convert platform driver to use dev_groupsGreg Kroah-Hartman
Platform drivers now have the option to have the platform core create and remove any needed sysfs attribute files. So take advantage of that and do not register "by hand" a lid sysfs file. Cc: Darren Hart <dvhart@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: platform-driver-x86@vger.kernel.org Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20190731124349.4474-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-01bpf: fix x64 JIT code generation for jmp to 1st insnAlexei Starovoitov
Introduction of bounded loops exposed old bug in x64 JIT. JIT maintains the array of offsets to the end of all instructions to compute jmp offsets. addrs[0] - offset of the end of the 1st insn (that includes prologue). addrs[1] - offset of the end of the 2nd insn. JIT didn't keep the offset of the beginning of the 1st insn, since classic BPF didn't have backward jumps and valid extended BPF couldn't have a branch to 1st insn, because it didn't allow loops. With bounded loops it's possible to construct a valid program that jumps backwards to the 1st insn. Fix JIT by computing: addrs[0] - offset of the end of prologue == start of the 1st insn. addrs[1] - offset of the end of 1st insn. v1->v2: - Yonghong noticed a bug in jit linfo. Fix it by passing 'addrs + 1' to bpf_prog_fill_jited_linfo(), since it expects insn_to_jit_off array to be offsets to last byte. Reported-by: syzbot+35101610ff3e83119b1b@syzkaller.appspotmail.com Fixes: 2589726d12a1 ("bpf: introduce bounded loops") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com>
2019-08-01KVM: LAPIC: Mark hrtimer to expire in hard interrupt contextSebastian Andrzej Siewior
On PREEMPT_RT enabled kernels unmarked hrtimers are moved into soft interrupt expiry mode by default. While that's not a functional requirement for the KVM local APIC timer emulation, it's a latency issue which can be avoided by marking the timer so hard interrupt context expiry is enforced. No functional change. [ tglx: Split out from larger combo patch. Add changelog. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.363363474@linutronix.de
2019-07-31x86/kvm: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the conditional for async pagefaults to use CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.789755413@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31x86/dumpstack: Indicate PREEMPT_RT in dumpsThomas Gleixner
Stack dumps print whether the kernel has preemption enabled or not. Extend it so a PREEMPT_RT enabled kernel can be identified. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.699136351@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31x86: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the entry code, preempt and kprobes conditionals over to CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.608488448@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31x86/vdso/32: Use 32bit syscall fallbackThomas Gleixner
The generic VDSO implementation uses the Y2038 safe clock_gettime64() and clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks seccomp setups because these syscalls might be not (yet) allowed. Implement the 32bit variants which use the legacy syscalls and select the variant in the core library. The 64bit time variants are not removed because they are required for the time64 based vdso accessors. Fixes: 7ac870747988 ("x86/vdso: Switch to generic vDSO implementation") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Reported-by: Paul Bolle <pebolle@tiscali.nl> Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20190728131648.879156507@linutronix.de
2019-07-30cpuidle-haltpoll: disable host side polling when kvm virtualizedMarcelo Tosatti
When performing guest side polling, it is not necessary to also perform host side polling. So disable host side polling, via the new MSR interface, when loading cpuidle-haltpoll driver. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-30add cpuidle-haltpoll driverMarcelo Tosatti
Add a cpuidle driver that calls the architecture default_idle routine. To be used in conjunction with the haltpoll governor. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-29Merge drm/drm-next into drm-intel-next-queuedRodrigo Vivi
Catching up with 5.3-rc* Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-07-28Merge branch master from ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Pick up the spectre documentation so the Grand Schemozzle can be added.
2019-07-28x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGSThomas Gleixner
Intel provided the following information: On all current Atom processors, instructions that use a segment register value (e.g. a load or store) will not speculatively execute before the last writer of that segment retires. Thus they will not use a speculatively written segment value. That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS entry paths can be excluded from the extra LFENCE if PTI is disabled. Create a separate bug flag for the through SWAPGS speculation and mark all out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs are excluded from the whole mitigation mess anyway. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-07-28Merge tag 'spdx-5.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX fixes from Greg KH: "Here are some small SPDX fixes for 5.3-rc2 for things that came in during the 5.3-rc1 merge window that we previously missed. Only three small patches here: - two uapi patches to resolve some SPDX tags that were not correct - fix an invalid SPDX tag in the iomap Makefile file All have been properly reviewed on the public mailing lists" * tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: iomap: fix Invalid License ID treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers again treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers
2019-07-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes and functional updates: - Prevent stale huge I/O TLB mappings on 32bit. A long standing bug which got exposed by KPTI support for 32bit - Prevent bogus access_ok() warnings in arch_stack_walk_user() - Add display quirks for Lenovo devices which have height and width swapped - Add the missing CR2 fixup for 32 bit async pagefaults. Fallout of the CR2 bug fix series. - Unbreak handling of force enabled HPET by moving the 'is HPET counting' check back to the original place. - A more accurate check for running on a hypervisor platform in the MDS mitigation code. Not perfect, but more accurate than the previous one. - Update a stale and confusing comment vs. IRQ stacks" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mds: Apply more accurate check on hypervisor platform x86/hpet: Undo the early counter is counting check x86/entry/32: Pass cr2 to do_async_page_fault() x86/irq/64: Update stale comment x86/sysfb_efi: Add quirks for some devices with swapped width and height x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user() mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() x86/mm: Sync also unmappings in vmalloc_sync_all() x86/mm: Check for pfn instead of page in vmalloc_sync_one()
2019-07-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf related fixes: Kernel: - Fix SLOTS PEBS event constraints for Icelake CPUs - Add the missing mask bit to allow counting hardware generated prefetches on L3 for Icelake CPUs - Make the test for hypervisor platforms more accurate (as far as possible) - Handle PMUs correctly which override event->cpu - Yet another missing fallthrough annotation Tools: perf.data: - Fix loading of compressed data split across adjacent records - Fix buffer size setting for processing CPU topology perf.data header. perf stat: - Fix segfault for event group in repeat mode - Always separate "stalled cycles per insn" line, it was being appended to the "instructions" line. perf script: - Fix --max-blocks man page description. - Improve man page description of metrics. - Fix off by one in brstackinsn IPC computation. perf probe: - Avoid calling freeing routine multiple times for same pointer. perf build: - Do not use -Wshadow on gcc < 4.8, avoiding too strict warnings treated as errors, breaking the build" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Mark expected switch fall-throughs perf/core: Fix creating kernel counters for PMUs that override event->cpu perf/x86: Apply more accurate check on hypervisor platform perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x register perf/x86/intel: Fix SLOTS PEBS event constraint perf build: Do not use -Wshadow on gcc < 4.8 perf probe: Avoid calling freeing routine multiple times for same pointer perf probe: Set pev->nargs to zero after freeing pev->args entries perf session: Fix loading of compressed data split across adjacent records perf stat: Always separate stalled cycles per insn perf stat: Fix segfault for event group in repeat mode perf tools: Fix proper buffer size for feature processing perf script: Fix off by one in brstackinsn IPC computation perf script: Improve man page description of metrics perf script: Fix --max-blocks man page description
2019-07-27Merge tag 'Wimplicit-fallthrough-5.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough enablement from Gustavo A. R. Silva: "This marks switch cases where we are expecting to fall through, and globally enables the -Wimplicit-fallthrough option in the main Makefile. Finally, some missing-break fixes that have been tagged for -stable: - drm/amdkfd: Fix missing break in switch statement - drm/amdgpu/gfx10: Fix missing break in switch statement With these changes, we completely get rid of all the fall-through warnings in the kernel" * tag 'Wimplicit-fallthrough-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: Makefile: Globally enable fall-through warning drm/i915: Mark expected switch fall-throughs drm/amd/display: Mark expected switch fall-throughs drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning drm/amdgpu/gfx10: Fix missing break in switch statement drm/amdkfd: Fix missing break in switch statement perf/x86/intel: Mark expected switch fall-throughs mtd: onenand_base: Mark expected switch fall-through afs: fsclient: Mark expected switch fall-throughs afs: yfsclient: Mark expected switch fall-throughs can: mark expected switch fall-throughs firewire: mark expected switch fall-throughs
2019-07-27crypto: ghash - add comment and improve help textEric Biggers
To help avoid confusion, add a comment to ghash-generic.c which explains the convention that the kernel's implementation of GHASH uses. Also update the Kconfig help text and module descriptions to call GHASH a "hash function" rather than a "message digest", since the latter normally means a real cryptographic hash function, which GHASH is not. Cc: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: aegis128l/aegis256 - remove x86 and generic implementationsArd Biesheuvel
Three variants of AEGIS were proposed for the CAESAR competition, and only one was selected for the final portfolio: AEGIS128. The other variants, AEGIS128L and AEGIS256, are not likely to ever turn up in networking protocols or other places where interoperability between Linux and other systems is a concern, nor are they likely to be subjected to further cryptanalysis. However, uninformed users may think that AEGIS128L (which is faster) is equally fit for use. So let's remove them now, before anyone starts using them and we are forced to support them forever. Note that there are no known flaws in the algorithms or in any of these implementations, but they have simply outlived their usefulness. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: morus - remove generic and x86 implementationsArd Biesheuvel
MORUS was not selected as a winner in the CAESAR competition, which is not surprising since it is considered to be cryptographically broken [0]. (Note that this is not an implementation defect, but a flaw in the underlying algorithm). Since it is unlikely to be in use currently, let's remove it before we're stuck with it. [0] https://eprint.iacr.org/2019/172.pdf Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: x86/aes - drop scalar assembler implementationsArd Biesheuvel
The AES assembler code for x86 isn't actually faster than code generated by the compiler from aes_generic.c, and considering the disproportionate maintenance burden of assembler code on x86, it is better just to drop it entirely. Modern x86 systems will use AES-NI anyway, and given that the modules being removed have a dependency on aes_generic already, we can remove them without running the risk of regressions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: x86/aes-ni - switch to generic for fallback and key routinesArd Biesheuvel
The AES-NI code contains fallbacks for invocations that occur from a context where the SIMD unit is unavailable, which really only occurs when running in softirq context that was entered from a hard IRQ that was taken while running kernel code that was already using the FPU. That means performance is not really a consideration, and we can just use the new library code for this use case, which has a smaller footprint and is believed to be time invariant. This will allow us to drop the non-SIMD asm routines in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: aes - rename local routines to prevent future clashesArd Biesheuvel
Rename some local AES encrypt/decrypt routines so they don't clash with the names we are about to introduce for the routines exposed by the generic AES library. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-25perf/x86/intel: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/events/intel/core.c: In function ‘intel_pmu_init’: arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=] pmem = true; ~~~~~^~~~~~ arch/x86/events/intel/core.c:4960:2: note: here case INTEL_FAM6_SKYLAKE_MOBILE: ^~~~ arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=] pmem = true; ~~~~~^~~~~~ arch/x86/events/intel/core.c:5009:2: note: here case INTEL_FAM6_ICELAKE_MOBILE: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-07-25x86/apic/x2apic: Implement IPI shorthands supportThomas Gleixner
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Implement shorthand support for x2apic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105221.134696837@linutronix.de
2019-07-25x86/apic/flat64: Remove the IPI shorthand decision logicThomas Gleixner
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Remove the now redundant decision logic in the APIC code and the duplicate helper in probe_64.c. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105221.042964120@linutronix.de
2019-07-25x86/apic: Share common IPI helpersThomas Gleixner
The 64bit implementations need the same wrappers around __default_send_IPI_shortcut() as 32bit. Move them out of the 32bit section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.951534451@linutronix.de
2019-07-25x86/apic: Remove the shorthand decision logicThomas Gleixner
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Remove the now redundant decision logic in the 32bit implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.860244707@linutronix.de
2019-07-25x86/smp: Enhance native_send_call_func_ipi()Thomas Gleixner
Nadav noticed that the cpumask allocations in native_send_call_func_ipi() are noticeable in microbenchmarks. Use the new cpumask_or_equal() function to simplify the decision whether the supplied target CPU mask is either equal to cpu_online_mask or equal to cpu_online_mask except for the CPU on which the function is invoked. cpumask_or_equal() or's the target mask and the cpumask of the current CPU together and compares it to cpu_online_mask. If the result is false, use the mask based IPI function, otherwise check whether the current CPU is set in the target mask and invoke either the send_IPI_all() or the send_IPI_allbutselt() APIC callback. Make the shorthand decision also depend on the static key which enables shorthand mode. That allows to remove the extra cpumask comparison with cpu_callout_mask. Reported-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.768238809@linutronix.de
2019-07-25x86/smp: Move smp_function_call implementations into IPI codeThomas Gleixner
Move it where it belongs. That allows to keep all the shorthand logic in one place. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.677835995@linutronix.de
2019-07-25x86/apic: Provide and use helper for send_IPI_allbutself()Thomas Gleixner
To support IPI shorthands wrap invocations of apic->send_IPI_allbutself() in a helper function, so the static key controlling the shorthand mode is only in one place. Fixup all callers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.492691679@linutronix.de
2019-07-25x86/apic: Add static key to Control IPI shorthandsThomas Gleixner
The IPI shorthand functionality delivers IPI/NMI broadcasts to all CPUs in the system. This can have similar side effects as the MCE broadcasting when CPUs are waiting in the BIOS or are offlined. The kernel tracks already the state of offlined CPUs whether they have been brought up at least once so that the CR4 MCE bit is set to make sure that MCE broadcasts can't brick the machine. Utilize that information and compare it to the cpu_present_mask. If all present CPUs have been brought up at least once then the broadcast side effect is mitigated by disabling regular interrupt/IPI delivery in the APIC itself and by the cpu offline check at the begin of the NMI handler. Use a static key to switch between broadcasting via shorthands or sending the IPI/NMI one by one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.386410643@linutronix.de
2019-07-25x86/apic: Move no_ipi_broadcast() out of 32bitThomas Gleixner
For the upcoming shorthand support for all APIC incarnations the command line option needs to be available for 64 bit as well. While at it, rename the control variable, make it static and mark it __ro_after_init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.278327940@linutronix.de
2019-07-25x86/apic: Add NMI_VECTOR wait to IPI shorthandThomas Gleixner
To support NMI shorthand broadcasts add the safe wait for ICR idle for NMI vector delivery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.185838026@linutronix.de
2019-07-25x86/apic: Remove dest argument from __default_send_IPI_shortcut()Thomas Gleixner
The SDM states: "The destination shorthand field of the ICR allows the delivery mode to be by-passed in favor of broadcasting the IPI to all the processors on the system bus and/or back to itself (see Section 10.6.1, Interrupt Command Register (ICR)). Three destination shorthands are supported: self, all excluding self, and all including self. The destination mode is ignored when a destination shorthand is used." So there is no point to supply the destination mode to the shorthand delivery function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.094613426@linutronix.de
2019-07-25x86/hotplug: Silence APIC and NMI when CPU is deadThomas Gleixner
In order to support IPI/NMI broadcasting via the shorthand mechanism side effects of shorthands need to be mitigated: Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs Neither of those can be handled on unplugged CPUs for obvious reasons. It would be trivial to just fully disable the APIC via the enable bit in MSR_APICBASE. But that's not possible because clearing that bit on systems based on the 3 wire APIC bus would require a hardware reset to bring it back as the APIC would lose track of bus arbitration. On systems with FSB delivery APICBASE could be disabled, but it has to be guaranteed that no interrupt is sent to the APIC while in that state and it's not clear from the SDM whether it still responds to INIT/SIPI messages. Therefore stay on the safe side and switch the APIC into soft disabled mode so it won't deliver any regular vector to the CPU. NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check for the CPU being offline on early nmi entry and if so bail. Note, this cannot use the stop/restart_nmi() magic which is used in the alternatives code. A dead CPU cannot invoke nmi_enter() or anything else due to RCU and other reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de
2019-07-25x86/cpu: Move arch_smt_update() to a neutral placeThomas Gleixner
arch_smt_update() will be used to control IPI/NMI broadcasting via the shorthand mechanism. Keeping it in the bugs file and calling the apic function from there is possible, but not really intuitive. Move it to a neutral place and invoke the bugs function from there. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.910317273@linutronix.de
2019-07-25x86/apic/uv: Make x2apic_extra_bits staticThomas Gleixner
Not used outside of the UV apic source. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.725264153@linutronix.de
2019-07-25x86/apic: Consolidate the apic local headersThomas Gleixner
Now there are three small local headers. Some contain functions which are only used in one source file. Move all the inlines and declarations into a single local header and the inlines which are only used in one source file into that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.618612624@linutronix.de
2019-07-25x86/apic: Move apic_flat_64 header into apic directoryThomas Gleixner
Only used locally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.526508168@linutronix.de
2019-07-25x86/apic: Move ipi header into apic directoryThomas Gleixner
Only used locally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.434738036@linutronix.de
2019-07-25x86/apic: Cleanup the include mazeThomas Gleixner
All of these APIC files include the world and some more. Remove the unneeded cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.342631201@linutronix.de
2019-07-25x86/apic: Move IPI inlines into ipi.cThomas Gleixner
No point in having them in an header file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.252225936@linutronix.de
2019-07-25x86/apic: Make apic_pending_intr_clear() more robustThomas Gleixner
In course of developing shorthand based IPI support issues with the function which tries to clear eventually pending ISR bits in the local APIC were observed. 1) O-day testing triggered the WARN_ON() in apic_pending_intr_clear(). This warning is emitted when the function fails to clear pending ISR bits or observes pending IRR bits which are not delivered to the CPU after the stale ISR bit(s) are ACK'ed. Unfortunately the function only emits a WARN_ON() and fails to dump the IRR/ISR content. That's useless for debugging. Feng added spot on debug printk's which revealed that the stale IRR bit belonged to the APIC timer interrupt vector, but adding ad hoc debug code does not help with sporadic failures in the field. Rework the loop so the full IRR/ISR contents are saved and on failure dumped. 2) The loop termination logic is interesting at best. If the machine has no TSC or cpu_khz is not known yet it tries 1 million times to ack stale IRR/ISR bits. What? With TSC it uses the TSC to calculate the loop termination. It takes a timestamp at entry and terminates the loop when: (rdtsc() - start_timestamp) >= (cpu_hkz << 10) That's roughly one second. Both methods are problematic. The APIC has 256 vectors, which means that in theory max. 256 IRR/ISR bits can be set. In practice this is impossible and the chance that more than a few bits are set is close to zero. With the pure loop based approach the 1 million retries are complete overkill. With TSC this can terminate too early in a guest which is running on a heavily loaded host even with only a couple of IRR/ISR bits set. The reason is that after acknowledging the highest priority ISR bit, pending IRRs must get serviced first before the next round of acknowledge can take place as the APIC (real and virtualized) does not honour EOI without a preceeding interrupt on the CPU. And every APIC read/write takes a VMEXIT if the APIC is virtualized. While trying to reproduce the issue 0-day reported it was observed that the guest was scheduled out long enough under heavy load that it terminated after 8 iterations. Make the loop terminate after 512 iterations. That's plenty enough in any case and does not take endless time to complete. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.158847694@linutronix.de
2019-07-25x86/apic: Soft disable APIC before initializing itThomas Gleixner
If the APIC was already enabled on entry of setup_local_APIC() then disabling it soft via the SPIV register makes a lot of sense. That masks all LVT entries and brings it into a well defined state. Otherwise previously enabled LVTs which are not touched in the setup function stay unmasked and might surprise the just booting kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.068290579@linutronix.de
2019-07-25x86/apic: Invoke perf_events_lapic_init() after enabling APICThomas Gleixner
If the APIC is soft disabled then unmasking an LVT entry does not work and the write is ignored. perf_events_lapic_init() tries to do so. Move the invocation after the point where the APIC has been enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105218.962517234@linutronix.de
2019-07-25x86/kgbd: Use NMI_VECTOR not APIC_DM_NMIThomas Gleixner
apic->send_IPI_allbutself() takes a vector number as argument. APIC_DM_NMI is clearly not a vector number. It's defined to 0x400 which is outside the vector space. Use NMI_VECTOR instead as that's what it is intended to be. Fixes: 82da3ff89dc2 ("x86: kgdb support") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105218.855189979@linutronix.de
2019-07-25x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI failsGrzegorz Halat
A reboot request sends an IPI via the reboot vector and waits for all other CPUs to stop. If one or more CPUs are in critical regions with interrupts disabled then the IPI is not handled on those CPUs and the shutdown hangs if native_stop_other_cpus() is called with the wait argument set. Such a situation can happen when one CPU was stopped within a lock held section and another CPU is trying to acquire that lock with interrupts disabled. There are other scenarios which can cause such a lockup as well. In theory the shutdown should be attempted by an NMI IPI after the timeout period elapsed. Though the wait loop after sending the reboot vector IPI prevents this. It checks the wait request argument and the timeout. If wait is set, which is true for sys_reboot() then it won't fall through to the NMI shutdown method after the timeout period has finished. This was an oversight when the NMI shutdown mechanism was added to handle the 'reboot IPI is not working' situation. The mechanism was added to deal with stuck panic shutdowns, which do not have the wait request set, so the 'wait request' case was probably not considered. Remove the wait check from the post reboot vector IPI wait loop and enforce that the wait loop in the NMI fallback path is invoked even if NMI IPIs are disabled or the registration of the NMI handler fails. That second wait loop will then hang if not all CPUs shutdown and the wait argument is set. [ tglx: Avoid the hard to parse line break in the NMI fallback path, add comments and massage the changelog ] Fixes: 7d007d21e539 ("x86/reboot: Use NMI to assist in shutting down if IRQ fails") Signed-off-by: Grzegorz Halat <ghalat@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Don Zickus <dzickus@redhat.com> Link: https://lkml.kernel.org/r/20190628122813.15500-1-ghalat@redhat.com
2019-07-25perf/x86/intel: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/events/intel/core.c: In function ‘intel_pmu_init’: arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190624161913.GA32270@embeddedor Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25perf/x86: Apply more accurate check on hypervisor platformZhenzhong Duan
check_msr is used to fix a bug report in guest where KVM doesn't support LBR MSR and cause #GP. The msr check is bypassed on real HW to workaround a false failure, see commit d0e1a507bdc7 ("perf/x86/intel: Disable check_msr for real HW") When running a guest with CONFIG_HYPERVISOR_GUEST not set or "nopv" enabled, current check isn't enough and #GP could trigger. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1564022366-18293-1-git-send-email-zhenzhong.duan@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>