summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-09-30x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUIDAndy Lutomirski
Otherwise arch_task_struct_size == 0 and we die. While we're at it, set X86_FEATURE_ALWAYS, too. Reported-by: David Saggiorato <david@saggiorato.net> Tested-by: David Saggiorato <david@saggiorato.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86") Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30xen: Remove event channel notification through Xen PCI platform deviceKarimAllah Ahmed
Ever since commit 254d1a3f02eb ("xen/pv-on-hvm kexec: shutdown watches from old kernel") using the INTx interrupt from Xen PCI platform device for event channel notification would just lockup the guest during bootup. postcore_initcall now calls xs_reset_watches which will eventually try to read a value from XenStore and will get stuck on read_reply at XenBus forever since the platform driver is not probed yet and its INTx interrupt handler is not registered yet. That means that the guest can not be notified at this moment of any pending event channels and none of the per-event handlers will ever be invoked (including the XenStore one) and the reply will never be picked up by the kernel. The exact stack where things get stuck during xenbus_init: -xenbus_init -xs_init -xs_reset_watches -xenbus_scanf -xenbus_read -xs_single -xs_single -xs_talkv Vector callbacks have always been the favourite event notification mechanism since their introduction in commit 38e20b07efd5 ("x86/xen: event channels delivery on HVM.") and the vector callback feature has always been advertised for quite some time by Xen that's why INTx was broken for several years now without impacting anyone. Luckily this also means that event channel notification through INTx is basically dead-code which can be safely removed without impacting anybody since it has been effectively disabled for more than 4 years with nobody complaining about it (at least as far as I'm aware of). This commit removes event channel notification through Xen PCI platform device. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Julien Grall <julien.grall@citrix.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: xen-devel@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-09-30x86/asm: Get rid of __read_cr4_safe()Andy Lutomirski
We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30Merge branch 'x86/urgent' into x86/asmThomas Gleixner
Get the cr4 fixes so we can apply the final cleanup
2016-09-30x86/vdso: Fix building on big endian hostSegher Boessenkool
We need to call GET_LE to read hdr->e_type. Fixes: 57f90c3dfc75 ("x86/vdso: Error out if the vDSO isn't a valid DSO") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-next@vger.kernel.org Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30x86/boot: Fix another __read_cr4() case on 486Andy Lutomirski
The condition for reading CR4 was wrong: there are some CPUs with CPUID but not CR4. Rather than trying to make the condition exact, use __read_cr4_safe(). Fixes: 18bc7bd523e0 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly") Reported-by: david@saggiorato.net Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30xen/x86: Convert to hotplug state machineBoris Ostrovsky
Switch to new CPU hotplug infrastructure. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-09-30x86/xen: add missing \n at end of printk warning messageColin Ian King
The message is missing a \n, add it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-09-30x86/cmpxchg, locking/atomics: Remove superfluous definitionsNikolay Borisov
cmpxchg contained definitions for unused (x)add_* operations, dating back to the original ticket spinlock implementation. Nowadays these are unused so remove them. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1474913478-17757-1-git-send-email-n.borisov.lkml@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86, locking/spinlocks: Remove ticket (spin)lock implementationPeter Zijlstra
We've unconditionally used the queued spinlock for many releases now. Its time to remove the old ticket lock code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Waiman.Long@hpe.com Cc: david.vrabel@citrix.com Cc: dhowells@redhat.com Cc: pbonzini@redhat.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core, x86/topology: Fix NUMA in package topology bugTim Chen
Current code can call set_cpu_sibling_map() and invoke sched_set_topology() more than once (e.g. on CPU hot plug). When this happens after sched_init_smp() has been called, we lose the NUMA topology extension to sched_domain_topology in sched_init_numa(). This results in incorrect topology when the sched domain is rebuilt. This patch fixes the bug and issues warning if we call sched_set_topology() after sched_init_smp(). Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-29x86/init: Fix cr4_init_shadow() on CR4-less machinesAndy Lutomirski
cr4_init_shadow() will panic on 486-like machines without CR4. Fix it using __read_cr4_safe(). Reported-by: david@saggiorato.net Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") Link: http://lkml.kernel.org/r/43a20f81fb504013bf613913dc25574b45336a61.1475091074.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-29x86/entry: spell EBX register correctly in documentationNicolas Iooss
As EBS does not mean anything reasonable in the context it is used, it seems like a misspelling for EBX. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-09-28Merge tag 'v4.8-rc8' into drm-nextDave Airlie
Linux 4.8-rc8 There was a lot of fallout in the imx/amdgpu/i915 drivers, so backmerge it now to avoid troubles. * tag 'v4.8-rc8': (1442 commits) Linux 4.8-rc8 fault_in_multipages_readable() throws set-but-unused error mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing radix tree: fix sibling entry handling in radix_tree_descend() radix tree test suite: Test radix_tree_replace_slot() for multiorder entries fix memory leaks in tracing_buffers_splice_read() tracing: Move mutex to protect against resetting of seq data MIPS: Fix delay slot emulation count in debugfs MIPS: SMP: Fix possibility of deadlock when bringing CPUs online mm: delete unnecessary and unsafe init_tlb_ubc() huge tmpfs: fix Committed_AS leak shmem: fix tmpfs to handle the huge= option properly blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx MIPS: Fix pre-r6 emulation FPU initialisation arm64: kgdb: handle read-only text / modules arm64: Call numa_store_cpu_info() earlier. locking/hung_task: Fix typo in CONFIG_DETECT_HUNG_TASK help text nvme-rdma: only clear queue flags after successful connect i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended perf/core: Limit matching exclusive events to one PMU ...
2016-09-27x86: separate extable.h, switch sections.h to itAl Viro
drivers/platform/x86/dell-smo8800.c is touched due to the following obscenity: drivers/platform/x86/dell-smo8800.c -> linux/interrupt.h -> linux/hardirq.h -> asm/hardirq.h -> linux/irq.h -> asm/hw_irq.h -> asm/sections.h -> asm/uaccess.h is the only chain of includes pulling asm/uaccess.h there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27remove stray include of asm/uaccess.h from cacheflush.hAl Viro
It was introduced in "arch, x86: pmem api for ensuring durability of persistent memory updates" in July 2015, along with the code that really used that stuff. Three weeks later all that code got moved to asm/pmem.h by "pmem, x86: move x86 PMEM API to new pmem.h header"; include, however, was left behind. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27exceptions: detritus removalAl Viro
externs and defines for stuff that is never used Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-26x86/apic: Fix silent & fatal merge conflict in __generic_processor_info()Thomas Gleixner
Fix up the silent merge conflict between commit c291b0151585 in x86/urgent and commit f7c28833c2520 in x86/apic which both remove num_processors++ from the original location and then add it at two different locations. As a result num_processors is incremented twice which can cut the number of available cpus in half. Remove the one which is added by commit c291b0151585. In hindsight I should have merged x86/urgent into x86/apic _before_ adding the nodeid bits, but in hindsight we are always smarter. Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: 1e1b37273cf7 ("Merge branch 'x86/urgent' into x86/apic") Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1609261350090.5483@nanos Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-26Merge branch 'x86/urgent' into x86/apicThomas Gleixner
Bring in the upstream modifications so we can fixup the silent merge conflict which is introduced by this merge. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-26x86/RAS/mce_amd_inj: Remove debugfs dir recursively on exitBorislav Petkov
Simplify exit_mce_inject() by using debugfs_remove_recursive() and do away with the noodling over the dentry elements. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160926083152.30848-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-26x86/RAS/mce_amd_inj: Fix signed wrap around when decrementing index 'i'Colin Ian King
Change predecrement compare to post decrement compare to avoid an unsigned integer wrap-around comparisomn when decrementing in the while loop. For example, if the debugfs_create_file() fails when 'i' is zero, the current situation will predecrement 'i' in the while loop, wrapping 'i' to the maximum signed integer and cause multiple out of bounds reads on dfs_fls[i].d as the loop interates to zero. Also, as Borislav Petkov suggested, return -ENODEV rather than -ENOMEM on the error condition. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: http://lkml.kernel.org/r/20160926083152.30848-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-26Merge tag 'v4.8-rc8' into ras/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Three fixlets for perf: - add a missing NULL pointer check in the intel BTS driver - make BTS an exclusive PMU because BTS can only handle one event at a time - ensure that exclusive events are limited to one PMU so that several exclusive events can be scheduled on different PMU instances" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Limit matching exclusive events to one PMU perf/x86/intel/bts: Make it an exclusive PMU perf/x86/intel/bts: Make sure debug store is valid
2016-09-24x86/platform/mellanox: Fix return value check in mlxplat_init()Wei Yongjun
In case of error, the function platform_device_register_simple() returns ERR_PTR() and never returns NULL. The NULL test in the return value check must therefor be replaced with IS_ERR(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Vadim Pasternak <vadimp@mellanox.com> Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-24x86/alternatives: Add stack frame dependency to alternative_call_2()Josh Poimboeuf
Linus reported the following objtool warning: kernel/signal.o: warning: objtool: .altinstr_replacement+0x54: call without frame pointer save/setup The warning is valid. It's caused by the fact that gcc placed the call instruction in alternative_call_2()'s inline asm before the frame pointer setup, which breaks frame pointer convention and can result in a bad stack trace. Force a stack frame to be created before the call instruction by listing the stack pointer as an output operand in the inline asm statement. Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160923214939.j5o7c67nhepzmh3t@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-23x86/PCI: VMD: Request userspace control of PCIe hotplug indicatorsKeith Busch
Add set_dev_domain_options() to set PCI domain-specific options as devices are added. The first usage is to request exclusive userspace control of PCIe hotplug indicators in VMD domains. Devices in a VMD domain use PCIe hotplug Attention and Power Indicators in a non-standard way; tell pciehp to ignore the indicators so userspace can control them via the sysfs "attention" file. To determine whether a bus is within a VMD domain, add a bool to the pci_sysdata structure that the VMD driver sets during initialization. [bhelgaas: changelog] Requested-by: Kapil Karkra <kapil.karkra@intel.com> Tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-23Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-23KVM: nVMX: Fix the NMI IDT-vectoring handlingWanpeng Li
Run kvm-unit-tests/eventinj.flat in L1: Sending NMI to self After NMI to self FAIL: NMI This test scenario is to test whether VMM can handle NMI IDT-vectoring info correctly. At the beginning, L2 writes LAPIC to send a self NMI, the EPT page tables on both L1 and L0 are empty so: - The L2 accesses memory can generate EPT violation which can be intercepted by L0. The EPT violation vmexit occurred during delivery of this NMI, and the NMI info is recorded in vmcs02's IDT-vectoring info. - L0 walks L1's EPT12 and L0 sees the mapping is invalid, it injects the EPT violation into L1. The vmcs02's IDT-vectoring info is reflected to vmcs12's IDT-vectoring info since it is a nested vmexit. - L1 receives the EPT violation, then fixes its EPT12. - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exits to L0. - L0 emulates VMRESUME which is called from L1, then return to L2. L0 merges the requirement of vmcs12's IDT-vectoring info and injects it to L2 through vmcs02. - The L2 re-executes the fault instruction and cause EPT violation again. - Since the L1's EPT12 is valid, L0 can fix its EPT02 - L0 resume L2 The EPT violation vmexit occurred during delivery of this NMI again, and the NMI info is recorded in vmcs02's IDT-vectoring info. L0 should inject the NMI through vmentry event injection since it is caused by EPT02's EPT violation. However, vmx_inject_nmi() refuses to inject NMI from IDT-vectoring info if vCPU is in guest mode, this patch fix it by permitting to inject NMI from IDT-vectoring if it is the L0's responsibility to inject NMI from IDT-vectoring info to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Bandan Das <bsd@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactiveWanpeng Li
I observed that kvmvapic(to optimize flexpriority=N or AMD) is used to boost TPR access when testing kvm-unit-test/eventinj.flat tpr case on my haswell desktop (w/ flexpriority, w/o APICv). Commit (8d14695f9542 x86, apicv: add virtual x2apic support) disable virtual x2apic mode completely if w/o APICv, and the author also told me that windows guest can't enter into x2apic mode when he developed the APICv feature several years ago. However, it is not truth currently, Interrupt Remapping and vIOMMU is added to qemu and the developers from Intel test windows 8 can work in x2apic mode w/ Interrupt Remapping enabled recently. This patch enables TPR shadow for virtual x2apic mode to boost windows guest in x2apic mode even if w/o APICv. Can pass the kvm-unit-test. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Suggested-by: Wincy Van <fanwenyi0529@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wincy Van <fanwenyi0529@gmail.com> Cc: Yang Zhang <yang.zhang.wz@gmail.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23KVM: nVMX: Fix reload apic access page warningWanpeng Li
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80 do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0 CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 ? prepare_to_swait+0x39/0xa0 ? prepare_to_swait+0x39/0xa0 __might_sleep+0x7e/0x80 __gfn_to_pfn_memslot+0x156/0x480 [kvm] gfn_to_pfn+0x2a/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/4230: #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm] stack backtrace: CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 lockdep_rcu_suspicious+0xe7/0x120 gfn_to_memslot+0x12a/0x140 [kvm] gfn_to_pfn+0x12/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0xfd/0x210 ? __lock_is_held+0x54/0x70 do_vfs_ioctl+0x96/0x6a0 ? __fget+0x11c/0x210 ? __fget+0x5/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat The nested preemption timer is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) if need to yield pCPU and w/o holding srcu read lock when accesses memslots, both can be in nested preemption timer evaluation path which results in the warning above. This patch fix it by leveraging request bit to async reload APIC access page before vmentry in order to avoid to reload directly during the nested preemption timer evaluation, it is safe since the vmcs01 is loaded and current is nested vmexit. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-23config: move x86 kvm_guest.config to a common locationRob Herring
kvm_guest.config is useful for KVM guests on other arches, and nothing in it appears to be x86 specific, so just move the whole file. Kbuild will find it in either location. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvmarm@lists.cs.columbia.edu Cc: kvm@vger.kernel.org Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-09-22x86/platform/mellanox: Introduce support for Mellanox systems platformVadim Pasternak
Enable system support for the Mellanox Technologies platform, which provides support for the next Mellanox basic systems: "msx6710", "msx6720", "msb7700", "msn2700", "msx1410", "msn2410", "msb7800", "msn2740", "msn2100" and also various number of derivative systems from the above basic types. The Kconfig controlling compilation of this code is: MLX_PLATFORM Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Cc: jiri@resnulli.us Cc: gregkh@linuxfoundation.org Cc: platform-driver-x86@vger.kernel.org Cc: geert@linux-m68k.org Cc: linux@roeck-us.net Cc: akpm@linux-foundation.org Cc: mchehab@kernel.org Cc: davem@davemloft.net Cc: kvalo@codeaurora.org Link: http://lkml.kernel.org/r/1474578822-33805-1-git-send-email-vadimp@mellanox.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-22percpu: eliminate two sparse warningsLance Richardson
Fix two cases where a __percpu pointer cast drops __percpu. Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-22Merge branch 'locking/urgent' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22perf/x86/intel/bts: Make it an exclusive PMUAlexander Shishkin
Just like intel_pt, intel_bts can only handle one event at a time, which is the reason we introduced PERF_PMU_CAP_EXCLUSIVE in the first place. However, at the moment one can have as many intel_bts events within the same context at the same time as one pleases. Only one of them, however, will get scheduled and receive the actual trace data. Fix this by making intel_bts an "exclusive" PMU. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160920154811.3255-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulationDan Williams
In commit: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type") Christoph references the original patch I wrote implementing pmem support. The intent of the 'max_pfn' changes in that commit were to enable persistent memory ranges to be covered by the struct page memmap by default. However, that approach was abandoned when Christoph ported the patches [1], and that functionality has since been replaced by devm_memremap_pages(). In the meantime, this max_pfn manipulation is confusing kdump [2] that assumes that everything covered by the max_pfn is "System RAM". This results in kdump hanging or crashing. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-March/000348.html [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1351098 So fix it. Reported-by: Zhang Yi <yizhan@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Zhang Yi <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> # v4.1 and later kernels Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-nvdimm@lists.01.org Fixes: ec776ef6bbe1 ("x86/mm: Add support for the non-standard protected e820 type") Link: http://lkml.kernel.org/r/147448744538.34910.11287693517367139607.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Add UV4-specific functionsAndrew Banman
Add the UV4-specific function definitions and define an operations struct to implement them in the BAU driver. Many BAU MMRs, although functionally the same, have new addresses on UV4 due to hardware changes. Each MMR requires new read/write functions, but their implementation in the driver does not change. Thus, it is enough to enumerate them in the operations struct for the changes to take effect. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-11-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Fix payload queue setup on UV4 hardwareAndrew Banman
The BAU on UV4 does not need to maintain the payload queue tail pointer. Do not initialize the tail pointer MMR on UV4. Note that write_payload_tail is not an abstracted BAU function since it is an operation specific to pre-UV4 versions. Then we must switch on the UV version to control its usage, for which we use uvhub_version rather than is_uv*_hub because it is quicker/more concise. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-10-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Disable software timeout on UV4 hardwareAndrew Banman
Software timeouts are not currently supported on BAU for UV4. Instead, the BAU will rely on hardware-level fairness protocols to determine broadcast timeouts. Do not call enable_timeouts or calculate_destination_timeout on UV4. These functions write to pre-UV4 MMRs so they generate error messages on UV4. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-9-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Populate ->uvhub_version with UV4 version informationAndrew Banman
Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-8-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Use generic function pointersAndrew Banman
Convert the use of UV version-specific functions to their abstracted counterparts. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-7-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Add generic function pointersAndrew Banman
Many BAU functions have different implementations depending on the UV version. Rather than switching on the uvhub_version throughout the driver, we can define a set of operations for each version. This is especially beneficial for UV4, which will require many new MMR read/write functions. Currently, the set of abstracted functions are the same for UV1, UV2, and UV3. The functions were chosen because each one will have a different implementation for UV4. Other functions will be added as needed to handle new implementations or to cleanup the existing differences between UV1, UV2, and UV3, i.e. read_status and wait_completion. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-6-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Convert uv_physnodeaddr() use to uv_gpa_to_offset()Andrew Banman
The BAU driver should use the functions provided by uv_hub.h rather than its own implementations. uv_physnodeaddr converts vaddrs to paddrs for BAU MMR fields, but this is done better by uv_gpa_to_offset. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-5-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Clean up pq_init()Andrew Banman
The payload queue first MMR requires the physical memory address and hub GNODE of where the payload queue resides in memory, but the associated variables are named as if the PNODE were used. Rename gnode-related variables and clarify the definitions of the payload queue head, last, and tail pointers. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-4-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Clean up and update printksAndrew Banman
Replace all uses of printk with the appropriate pr_*() function. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-3-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22x86/platform/uv/BAU: Clean up vertical alignmentAndrew Banman
Fix whitespace on blocks of code to be vertically aligned. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-2-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22Merge branch 'linus' into x86/platform, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-21x86/acpi: Set persistent cpuid <-> nodeid mapping when bootingGu Zheng
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 4. This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled processors at boot time via an additional acpi namespace walk for processors. [ tglx: Remove the unneeded exports ] Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-6-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21x86/acpi: Introduce persistent storage for cpuid <-> apicid mappingGu Zheng
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 2. In this patch, we introduce a new static array named cpuid_to_apicid[], which is large enough to store info for all possible cpus. And then, we modify the cpuid calculation. In generic_processor_info(), it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid mapping changes with node hotplug. After this patch, we find the next unused cpuid, map it to an apicid, and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid mapping will be persistent. And finally we will use this array to make cpuid <-> nodeid persistent. cpuid <-> apicid mapping is established at local apic registeration time. But non-present or disabled cpus are ignored. In this patch, we establish all possible cpuid <-> apicid mapping when registering local apic. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>