summaryrefslogtreecommitdiff
path: root/virt
AgeCommit message (Collapse)Author
2017-01-17Merge tag 'kvm-arm-for-4.10-rc4' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for 4.10-rc4 - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails
2017-01-13KVM: arm/arm64: vgic: Fix deadlock on error handlingMarc Zyngier
Dmitry Vyukov reported that the syzkaller fuzzer triggered a deadlock in the vgic setup code when an error was detected, as the cleanup code tries to take a lock that is already held by the setup code. The fix is to avoid retaking the lock when cleaning up, by telling the cleanup function that we already hold it. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systemsJintack Lim
Current KVM world switch code is unintentionally setting wrong bits to CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical timer. Bit positions of CNTHCTL_EL2 are changing depending on HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is not set, but they are 11th and 10th bits respectively when E2H is set. In fact, on VHE we only need to set those bits once, not for every world switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE == 1, which makes those bits have no effect for the host kernel execution. So we just set those bits once for guests, and that's it. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13KVM: arm/arm64: Fix occasional warning from the timer work functionChristoffer Dall
When a VCPU blocks (WFI) and has programmed the vtimer, we program a soft timer to expire in the future to wake up the vcpu thread when appropriate. Because such as wake up involves a vcpu kick, and the timer expire function can get called from interrupt context, and the kick may sleep, we have to schedule the kick in the work function. The work function currently has a warning that gets raised if it turns out that the timer shouldn't fire when it's run, which was added because the idea was that in that case the work should never have been cancelled. However, it turns out that this whole thing is racy and we can get spurious warnings. The problem is that we clear the armed flag in the work function, which may run in parallel with the kvm_timer_unschedule->timer_disarm() call. This results in a possible situation where the timer_disarm() call does not call cancel_work_sync(), which effectively synchronizes the completion of the work function with running the VCPU. As a result, the VCPU thread proceeds before the work function completees, causing changes to the timer state such that kvm_timer_should_fire(vcpu) returns false in the work function. All we do in the work function is to kick the VCPU, and an occasional rare extra kick never harmed anyone. Since the race above is extremely rare, we don't bother checking if the race happens but simply remove the check and the clearing of the armed flag from the work function. Reported-by: Matthias Brugger <mbrugger@suse.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-12KVM: eventfd: fix NULL deref irqbypass consumerWanpeng Li
Reported syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] PGD 0 Oops: 0002 [#1] SMP CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1 Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm] task: ffff9bbe0dfbb900 task.stack: ffffb61802014000 RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] Call Trace: irqfd_shutdown+0x66/0xa0 [kvm] process_one_work+0x16b/0x480 worker_thread+0x4b/0x500 kthread+0x101/0x140 ? process_one_work+0x480/0x480 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x25/0x30 RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20 CR2: 0000000000000008 The syzkaller folks reported a NULL pointer dereference that due to unregister an consumer which fails registration before. The syzkaller creates two VMs w/ an equal eventfd occasionally. So the second VM fails to register an irqbypass consumer. It will make irqfd as inactive and queue an workqueue work to shutdown irqfd and unregister the irqbypass consumer when eventfd is closed. However, the second consumer has been initialized though it fails registration. So the token(same as the first VM's) is taken to unregister the consumer through the workqueue, the consumer of the first VM is found and unregistered, then NULL deref incurred in the path of deleting consumer from the consumers list. This patch fixes it by making irq_bypass_register/unregister_consumer() looks for the consumer entry based on consumer pointer itself instead of token matching. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t
2016-12-25Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-25cpu/hotplug: Cleanup state namesThomas Gleixner
When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14mm: unexport __get_user_pages_unlocked()Lorenzo Stoakes
Unexport the low-level __get_user_pages_unlocked() function and replaces invocations with calls to more appropriate higher-level functions. In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with get_user_pages_unlocked() since we can now pass gup_flags. In async_pf_execute() and process_vm_rw_single_vec() we need to pass different tsk, mm arguments so get_user_pages_remote() is the sane replacement in these cases (having added manual acquisition and release of mmap_sem.) Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH flag. However, this flag was originally silently dropped by commit 1e9877902dc7 ("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been unintentional and reintroducing it is therefore not an issue. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...
2016-12-13Merge tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - VFIO updates for v4.10 primarily include a new Mediated Device interface, which essentially allows software defined devices to be exposed to users through VFIO. The host vendor driver providing this virtual device polices, or mediates user access to the device. These devices often incorporate portions of real devices, for instance the primary initial users of this interface expose vGPUs which allow the user to map mediated devices, or mdevs, to a portion of a physical GPU. QEMU composes these mdevs into PCI representations using the existing VFIO user API. This enables both Intel KVM-GT support, which is also expected to arrive into Linux mainline during the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O devices (aka CCW devices) for s390 virtualization support. (Kirti Wankhede, Neo Jia) - Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin) - Fixes to VFIO capability chain handling (Eric Auger) - Error handling fixes for fallout from mdev (Christophe JAILLET) - Notifiers to expose struct kvm to mdev vendor drivers (Jike Song) - type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia) * tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits) vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP. vfio iommu type1: WARN_ON if notifier block is not unregistered kvm: set/clear kvm to/from vfio_group when group add/delete vfio: support notifier chain in vfio_group vfio: vfio_register_notifier: classify iommu notifier vfio: Fix handling of error returned by 'vfio_group_get_from_dev()' vfio: fix vfio_info_cap_add/shift vfio/pci: Drop unnecessary pcibios_err_to_errno() MAINTAINERS: Add entry VFIO based Mediated device drivers docs: Sample driver to demonstrate how to use Mediated device framework. docs: Sysfs ABI for mediated device framework docs: Add Documentation for Mediated devices vfio: Define device_api strings vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare() vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare() vfio: Introduce vfio_set_irqs_validate_and_prepare() vfio_pci: Update vfio_pci to use vfio_info_add_capability() vfio: Introduce common function to add capabilities vfio iommu: Add blocking notifier to notify DMA_UNMAP ...
2016-12-12Merge tag 'kvm-arm-for-4.10' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.10: - Support for the GICv3 ITS on 32bit platforms - A handful of timer and GIC emulation fixes - A PMU architecture fix
2016-12-11Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-09KVM: arm/arm64: timer: Check for properly initialized timer on initChristoffer Dall
When the arch timer code fails to initialize (for example because the memory mapped timer doesn't work, which is currently seen with the AEM model), then KVM just continues happily with a final result that KVM eventually does a NULL pointer dereference of the uninitialized cycle counter. Check directly for this in the init path and give the user a reasonable error in this case. Cc: Shih-Wei Li <shihwei@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-09KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUsAndre Przywara
The GICv2 spec says in section 4.3.12 that a "CPU targets field bit that corresponds to an unimplemented CPU interface is RAZ/WI." Currently we allow the guest to write any value in there and it can read that back. Mask the written value with the proper CPU mask to be spec compliant. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-01kvm: set/clear kvm to/from vfio_group when group add/deleteJike Song
Sometimes users need to be aware when a vfio_group attaches to a KVM or detaches from it. KVM already calls get/put method from vfio to manipulate the vfio_group reference, it can notify vfio_group in a similar way. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01KVM: use after free in kvm_ioctl_create_device()Dan Carpenter
We should move the ops->destroy(dev) after the list_del(&dev->vm_node) so that we don't use "dev" after freeing it. Fixes: a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-01Merge tag 'kvm-arm-for-4.9-rc7' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc7 - Do not call kvm_notify_acked for PPIs
2016-11-28KVM: Export kvm module parameter variablesSuraj Jitindar Singh
The kvm module has the parameters halt_poll_ns, halt_poll_ns_grow, and halt_poll_ns_shrink. Halt polling was recently added to the powerpc kvm-hv module and these parameters were essentially duplicated for that. There is no benefit to this duplication and it can lead to confusion when trying to tune halt polling. Thus move the definition of these variables to kvm_host.h and export them. This will allow the kvm-hv module to use the same module parameters by accessing these variables, which will be implemented in the next patch, meaning that they will no longer be duplicated. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: arm/arm64: vgic: Don't notify EOI for non-SPIsMarc Zyngier
When we inject a level triggerered interrupt (and unless it is backed by the physical distributor - timer style), we request a maintenance interrupt. Part of the processing for that interrupt is to feed to the rest of KVM (and to the eventfd subsystem) the information that the interrupt has been EOIed. But that notification only makes sense for SPIs, and not PPIs (such as the PMU interrupt). Skip over the notification if the interrupt is not an SPI. Cc: stable@vger.kernel.org # 4.7+ Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-22kvm: Introduce kvm_write_guest_offset_cached()Pan Xinhui
It allows us to update some status or field of a structure partially. We can also save a kvm_read_guest_cached() call if we just update one fild of the struct regardless of its current value. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-19KVM: async_pf: avoid recursive flushing of work itemsPaolo Bonzini
This was reported by syzkaller: [ INFO: possible recursive locking detected ] 4.9.0-rc4+ #49 Not tainted --------------------------------------------- kworker/2:1/5658 is trying to acquire lock: ([ 1644.769018] (&work->work) [< inline >] list_empty include/linux/compiler.h:243 [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511 but task is already holding lock: ([ 1644.769018] (&work->work) [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093 stack backtrace: CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events async_pf_execute ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480 0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27 ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0 Call Trace: ... [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846 [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916 [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951 [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126 [< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841 [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946 [< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731 [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752 [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111 [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096 [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230 [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209 [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 The reason is that kvm_put_kvm is causing the destruction of the VM, but the page fault is still on the ->queue list. The ->queue list is owned by the VCPU, not by the work items, so we cannot just add list_del to the work item. Instead, use work->vcpu to note async page faults that have been resolved and will be processed through the done list. There is no need to flush those. Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19Merge tag 'kvm-arm-for-4.9-rc6' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc6 - Fix handling of the 32bit cycle counter - Fix cycle counter filtering
2016-11-18KVM: arm64: Fix the issues when guest PMCCFILTR is configuredWei Huang
KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured. But this function can't deals with PMCCFILTR correctly because the evtCount bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this function shouldn't return immediately; instead it needs to check further if select_idx is ARMV8_PMU_CYCLE_IDX. Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER blindly to attr.config. Instead it ought to convert the request to the "cpu cycle" event type (i.e. 0x11). To support this patch and to prevent duplicated definitions, a limited set of ARMv8 perf event types were relocated from perf_event.c to asm/perf_event.h. Cc: stable@vger.kernel.org # 4.6+ Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-16Merge branch 'x86/cpufeature' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into kvm/next Topic branch for AVX512_4VNNIW and AVX512_4FMAPS support in KVM.
2016-11-15arm/arm64: KVM: Clean up useless code in kvm_timer_enableLongpeng(Mike)
1) Since commit:41a54482 changed timer enabled variable to per-vcpu, the correlative comment in kvm_timer_enable is useless now. 2) After the kvm module init successfully, the timecounter is always non-null, so we can remove the checking of timercounter. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-14ARM: KVM: Support vGICv3 ITSVladimir Murzin
This patch allows to build and use vGICv3 ITS in 32-bit mode. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-14KVM: arm64: vgic-its: Fix compatibility with 32-bitVladimir Murzin
Evaluate GITS_BASER_ENTRY_SIZE once as an int data (GITS_BASER<n>'s Entry Size is 5-bit wide only), so when used as divider no reference to __aeabi_uldivmod is generated when build for AArch32. Use unsigned long long for GITS_BASER_PAGE_SIZE_* since they are used in conjunction with 64-bit data. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-11Merge tag 'kvm-arm-for-v4.9-rc4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for v4.9-rc4 - Kick the vcpu when a pending interrupt becomes pending again - Prevent access to invalid interrupt registers - Invalid TLBs when two vcpus from the same VM share a CPU
2016-11-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-04KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQsShih-Wei Li
In cases like IPI, we could be queueing an interrupt for a VCPU that is already running and is not about to exit, because the VCPU has entered the VM with the interrupt pending and would not trap on EOI'ing that interrupt. This could result to delays in interrupt deliveries or even loss of interrupts. To guarantee prompt interrupt injection, here we have to try to kick the VCPU. Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-04KVM: arm/arm64: vgic: Prevent access to invalid SPIsAndre Przywara
In our VGIC implementation we limit the number of SPIs to a number that the userland application told us. Accordingly we limit the allocation of memory for virtual IRQs to that number. However in our MMIO dispatcher we didn't check if we ever access an IRQ beyond that limit, leading to out-of-bound accesses. Add a test against the number of allocated SPIs in check_region(). Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which is not implemented on ARM(32). [maz: cleaned-up original patch] Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-02kvm/stats: Update kvm stats to clear on write to their debugfs entrySuraj Jitindar Singh
Various kvm vm and vcpu stats are provided via debugfs entries. Currently there is no way to reset these stats back to zero. Add the ability to clear (reset back to zero) these stats on a per stat basis by writing to the debugfs files. Only a write value of 0 is accepted. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-10-26KVM: fix OOPS on flush_workPaolo Bonzini
The conversion done by commit 3706feacd007 ("KVM: Remove deprecated create_singlethread_workqueue") is broken. It flushes a single work item &irqfd->shutdown instead of all of them, and even worse if there is no irqfd on the list then you get a NULL pointer dereference. Revert the virt/kvm/eventfd.c part of that patch; to avoid the deprecated function, just allocate our own workqueue---it does not even have to be unbound---with alloc_workqueue. Fixes: 3706feacd007 Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-24mm: unexport __get_user_pages()Lorenzo Stoakes
This patch unexports the low-level __get_user_pages() function. Recent refactoring of the get_user_pages* functions allow flags to be passed through get_user_pages() which eliminates the need for access to this function from its one user, kvm. We can see that the two calls to get_user_pages() which replace __get_user_pages() in kvm_main.c are equivalent by examining their call stacks: get_user_page_nowait(): get_user_pages(start, 1, flags, page, NULL) __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, start, 1, flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL) check_user_page_hwpoison(): get_user_pages(addr, 1, flags, NULL, NULL) __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL, NULL, NULL) Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18mm: remove write/force parameters from __get_user_pages_unlocked()Lorenzo Stoakes
This removes the redundant 'write' and 'force' parameters from __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-29Merge tag 'kvm-arm-for-v4.9' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs
2016-09-27KVM: arm/arm64: vgic: Don't flush/sync without a working vgicChristoffer Dall
If the vgic hasn't been created and initialized, we shouldn't attempt to look at its data structures or flush/sync anything to the GIC hardware. This fixes an issue reported by Alexander Graf when using a userspace irqchip. Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Cc: stable@vger.kernel.org Reported-by: Alexander Graf <agraf@suse.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-27KVM: arm64: Require in-kernel irqchip for PMU supportChristoffer Dall
If userspace creates a PMU for the VCPU, but doesn't create an in-kernel irqchip, then we end up in a nasty path where we try to take an uninitialized spinlock, which can lead to all sorts of breakages. Luckily, QEMU always creates the VGIC before the PMU, so we can establish this as ABI and check for the VGIC in the PMU init stage. This can be relaxed at a later time if we want to support PMU with a userspace irqchip. Cc: stable@vger.kernel.org Cc: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22ARM: KVM: Support vgic-v3Vladimir Murzin
This patch allows to build and use vgic-v3 in 32-bit mode. Unfortunately, it can not be split in several steps without extra stubs to keep patches independent and bisectable. For instance, virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre to be already defined. It is how support has been done: * handle SGI requests from the guest * report configured SRE on access to GICv3 cpu interface from the guest * required vgic-v3 macros are provided via uapi.h * static keys are used to select GIC backend * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with the static inlines Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22KVM: arm: vgic: Support 64-bit data manipulation on 32-bit host systemsVladimir Murzin
We have couple of 64-bit registers defined in GICv3 architecture, so unsigned long accesses to these registers will only access a single 32-bit part of that regitser. On the other hand these registers can't be accessed as 64-bit with a single instruction like ldrd/strd or ldmia/stmia if we run a 32-bit host because KVM does not support access to MMIO space done by these instructions. It means that a 32-bit guest accesses these registers in 32-bit chunks, so the only thing we need to do is to ensure that extract_bytes() always takes 64-bit data. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22KVM: arm: vgic: Fix compiler warnings when built for 32-bitVladimir Murzin
Well, this patch is looking ahead of time, but we'll get following compiler warnings as soon as we introduce vgic-v3 to 32-bit world CC arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer': arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow] value = (mpidr & GENMASK(23, 0)) << 32; ^ In file included from ./include/linux/kernel.h:10:0, from ./include/asm-generic/bug.h:13, from ./arch/arm/include/asm/bug.h:59, from ./include/linux/bug.h:4, from ./include/linux/io.h:23, from ./arch/arm/include/asm/arch_gicv3.h:23, from ./include/linux/irqchip/arm-gic-v3.h:411, from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14: arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi': ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow] #define BIT(nr) (1UL << (nr)) ^ arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT' broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT); ^ Let's fix them now. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22KVM: arm64: vgic-its: Introduce config option to guard ITS specific codeVladimir Murzin
By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was introduced to hide everything specific to vgic-v3 from 32-bit world. We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3 will gone, but we don't have support for ITS there yet and we need to continue keeping ITS away. Introduce the new config option to prevent ITS code being build in 32-bit mode when support for vgic-v3 is done. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22arm64: KVM: Move vgic-v3 save/restore to virt/kvm/arm/hypVladimir Murzin
So we can reuse the code under arch/arm Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22arm64: KVM: Use static keys for selecting the GIC backendVladimir Murzin
Currently GIC backend is selected via alternative framework and this is fine. We are going to introduce vgic-v3 to 32-bit world and there we don't have patching framework in hand, so we can either check support for GICv3 every time we need to choose which backend to use or try to optimise it by using static keys. The later looks quite promising because we can share logic involved in selecting GIC backend between architectures if both uses static keys. This patch moves arm64 from alternative to static keys framework for selecting GIC backend. For that we embed static key into vgic_global and enable the key during vgic initialisation based on what has already been exposed by the host GIC driver. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-16kvm: create per-vcpu dirs in debugfsLuiz Capitulino
This commit adds the ability for archs to export per-vcpu information via a new per-vcpu dir in the VM's debugfs directory. If kvm_arch_has_vcpu_debugfs() returns true, then KVM will create a vcpu dir for each vCPU in the VM's debugfs directory. Then kvm_arch_create_vcpu_debugfs() is responsible for populating each vcpu directory with arch specific entries. The per-vcpu path in debugfs will look like: /sys/kernel/debug/kvm/29162-10/vcpu0 /sys/kernel/debug/kvm/29162-10/vcpu1 This is all arch specific for now because the only user of this interface (x86) wants to export x86-specific per-vcpu information to user-space. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-16kvm: kvm_destroy_vm_debugfs(): check debugfs_stat_data pointerLuiz Capitulino
This make it possible to call kvm_destroy_vm_debugfs() from kvm_create_vm_debugfs() in error conditions. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: The highlights are: * Reduced latency for interrupts from PCI pass-through devices, from Suresh Warrier and me. * Halt-polling implementation from Suraj Jitindar Singh. * 64-bit VCPU statistics, also from Suraj. * Various other minor fixes and improvements.