summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-03-28KVM: MIPS: Implement VZ supportJames Hogan
Add the main support for the MIPS Virtualization ASE (A.K.A. VZ) to MIPS KVM. The bulk of this work is in vz.c, with various new state and definitions elsewhere. Enough is implemented to be able to run on a minimal VZ core. Further patches will fill out support for guest features which are optional or can be disabled. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-doc@vger.kernel.org
2017-03-28KVM: MIPS: Update exit handler for VZJames Hogan
The general guest exit handler needs a few tweaks for VZ compared to trap & emulate, which for now are made directly depending on CONFIG_KVM_MIPS_VZ: - There is no need to re-enable the hardware page table walker (HTW), as it can be left enabled during guest mode operation with VZ. - There is no need to perform a privilege check, as any guest privilege violations should have already been detected by the hardware and triggered the appropriate guest exception. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS/Emulate: Drop CACHE emulation for VZJames Hogan
Ifdef out the trap & emulate CACHE instruction emulation functions for VZ. We will provide separate CACHE instruction emulation in vz.c, and we need to avoid linker errors due to the use of T&E specific MMU helpers. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS/Emulate: Update CP0_Compare emulation for VZJames Hogan
Update emulation of guest writes to CP0_Compare for VZ. There are two main differences compared to trap & emulate: - Writing to CP0_Compare in the VZ hardware guest context acks any pending timer, clearing CP0_Cause.TI. If we don't want an ack to take place we must carefully restore the TI bit if it was previously set. - Even with guest timer access disabled in CP0_GuestCtl0.GT, if the guest CP0_Count reaches the guest CP0_Compare the timer interrupt will assert. To prevent this we must set CP0_GTOffset to move the guest CP0_Count out of the way of the new guest CP0_Compare, either before or after depending on whether it is a forwards or backwards change. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS/TLB: Add VZ TLB managementJames Hogan
Add functions for MIPS VZ TLB management to tlb.c. kvm_vz_host_tlb_inv() will be used for invalidating root TLB entries after GPA page tables have been modified due to a KVM page fault. It arranges for a root GPA mapping to be flushed from the TLB, using the gpa_mm ASID or the current GuestID to do the probe. kvm_vz_local_flush_roottlb_all_guests() and kvm_vz_local_flush_guesttlb_all() flush all TLB entries in the corresponding TLB for guest mappings (GPA->RPA for root TLB with GuestID, and all entries for guest TLB). They will be used when starting a new GuestID cycle, when VZ hardware is enabled/disabled, and also when switching to a guest when the guest TLB contents may be stale or belong to a different VM. kvm_vz_guest_tlb_lookup() converts a guest virtual address to a guest physical address using the guest TLB. This will be used to decode guest virtual addresses which are sometimes provided by VZ hardware in CP0_BadVAddr for certain exceptions when the guest physical address is unavailable. kvm_vz_save_guesttlb() and kvm_vz_load_guesttlb() will be used to preserve wired guest VTLB entries while a guest isn't running. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS/Entry: Update entry code to support VZJames Hogan
Update MIPS KVM entry code to support VZ: - We need to set GuestCtl0.GM while in guest mode. - For cores supporting GuestID, we need to set the root GuestID to match the main GuestID while in guest mode so that the root TLB refill handler writes the correct GuestID into the TLB. - For cores without GuestID where the root ASID dealiases RVA/GPA mappings, we need to load that ASID from the gpa_mm rather than the per-VCPU guest_kernel_mm or guest_user_mm, since the root TLB maps guest physical addresses. We also need to restore the normal process ASID on exit. - The normal linux process pgd needs restoring on exit, as we can't leave the GPA mappings active for kernel code. - GuestCtl0 needs saving on exit for the GExcCode field, as it may be clobbered if a preemption occurs. We also need to move the TLB refill handler to the XTLB vector at offset 0x80 on 64-bit VZ kernels, as hardware will use Root.Status.KX to determine whether a TLB refill or XTLB Refill exception is to be taken on a root TLB miss from guest mode, and KX needs to be set for kernel code to be able to access the 64-bit segments. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Abstract guest CP0 register access for VZJames Hogan
Abstract the MIPS KVM guest CP0 register access macros into inline functions which are generated by macros. This allows them to be generated differently for VZ, where they will usually need to access the hardware guest CP0 context rather than the saved values in RAM. Accessors for each individual register are generated using these macros: - __BUILD_KVM_*_SW() for registers which are not present in the VZ hardware guest context, so kvm_{read,write}_c0_guest_##name() will access the saved value in RAM regardless of whether VZ is enabled. - __BUILD_KVM_*_HW() for registers which are present in the VZ hardware guest context, so kvm_{read,write}_c0_guest_##name() will access the hardware register when VZ is enabled. These build the underlying accessors using further macros: - __BUILD_KVM_*_SAVED() builds e.g. kvm_{read,write}_sw_gc0_##name() functions for accessing the saved versions of the registers in RAM. This is used for implementing the common kvm_{read,write}_c0_guest_##name() accessors with T&E where registers are always stored in RAM, but are also available with VZ HW registers to allow them to be accessed while saved. - __BUILD_KVM_*_VZ() builds e.g. kvm_{read,write}_vz_gc0_##name() functions for accessing the VZ hardware guest context registers directly. This is used for implementing the common kvm_{read,write}_c0_guest_##name() accessors with VZ. - __BUILD_KVM_*_WRAP() builds wrappers with different names, which allows the common kvm_{read,write}_c0_guest_##name() functions to be implemented using the VZ accessors while still having the SAVED accessors available too. - __BUILD_KVM_SAVE_VZ() builds functions for saving and restoring VZ hardware guest context register state to RAM, improving conciseness of VZ context saving and restoring. Similar macros exist for generating modifiers (set, clear, change), either with a normal unlocked read/modify/write, or using atomic LL/SC sequences. These changes change the types of 32-bit registers to u32 instead of unsigned long, which requires some changes to printk() functions in MIPS KVM. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Add guest exit exception callbackJames Hogan
Add a callback for MIPS KVM implementations to handle the VZ guest exit exception. Currently the trap & emulate implementation contains a stub which reports an internal error, but the callback will be used properly by the VZ implementation. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Add hardware_{enable,disable} callbackJames Hogan
Add an implementation callback for the kvm_arch_hardware_enable() and kvm_arch_hardware_disable() architecture functions, with simple stubs for trap & emulate. This is in preparation for VZ which will make use of them. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Add callback to check extensionJames Hogan
Add an implementation callback for checking presence of KVM extensions. This allows implementation specific extensions to be provided without ifdefs in mips.c. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Init timer frequency from callbackJames Hogan
Currently the software emulated timer is initialised to a frequency of 100MHz by kvm_mips_init_count(), but this isn't suitable for VZ where the frequency of the guest timer matches that of the host. Add a count_hz argument so the caller can specify the default frequency, and move the call from kvm_arch_vcpu_create() to the implementation specific vcpu_setup() callback, so that VZ can specify a different frequency. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Add 64BIT capabilityJames Hogan
Add a new KVM_CAP_MIPS_64BIT capability to indicate that 64-bit MIPS guests are available and supported. In this case it should still be possible to run 32-bit guest code. If not available it won't be possible to run 64-bit guest code and the instructions may not be available, or the kernel may not support full context switching of 64-bit registers. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-doc@vger.kernel.org
2017-03-28KVM: MIPS: Add VZ & TE capabilitiesJames Hogan
Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order to allow MIPS KVM to support VZ without confusing old users (which expect the trap & emulate implementation), define and start checking KVM_CREATE_VM type codes. The codes available are: - KVM_VM_MIPS_TE = 0 This is the current value expected from the user, and will create a VM using trap & emulate in user mode, confined to the user mode address space. This may in future become unavailable if the kernel is only configured to support VZ, in which case the EINVAL error will be returned and KVM_CAP_MIPS_TE won't be available even though KVM_CAP_MIPS_VZ is. - KVM_VM_MIPS_VZ = 1 This can be provided when the KVM_CAP_MIPS_VZ capability is available to create a VM using VZ, with a fully virtualized guest virtual address space. If VZ support is unavailable in the kernel, the EINVAL error will be returned (although old kernels without the KVM_CAP_MIPS_VZ capability may well succeed and create a trap & emulate VM). This is designed to allow the desired implementation (T&E vs VZ) to be potentially chosen at runtime rather than being fixed in the kernel configuration. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-doc@vger.kernel.org
2017-03-28KVM: MIPS: Extend counters & events for VZ GExcCodesJames Hogan
Extend MIPS KVM stats counters and kvm_transition trace event codes to cover hypervisor exceptions, which have their own GExcCode field in CP0_GuestCtl0 with up to 32 hypervisor exception cause codes. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Update kvm_lose_fpu() for VZJames Hogan
Update the implementation of kvm_lose_fpu() for VZ, where there is no need to enable the FPU/MSA in the root context if the FPU/MSA state is loaded but disabled in the guest context. The trap & emulate implementation needs to disable FPU/MSA in the root context when the guest disables them in order to catch the COP1 unusable or MSA disabled exception when they're used and pass it on to the guest. For VZ however as long as the context is loaded and enabled in the root context, the guest can enable and disable it in the guest context without the hypervisor having to do much, and will take guest exceptions without hypervisor intervention if used without being enabled in the guest context. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS/Emulate: Implement 64-bit MMIO emulationJames Hogan
Implement additional MMIO emulation for MIPS64, including 64-bit loads/stores, and 32-bit unsigned loads. These are only exposed on 64-bit VZ hosts. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS/Emulate: De-duplicate MMIO emulationJames Hogan
Refactor MIPS KVM MMIO load/store emulation to reduce code duplication. Each duplicate differed slightly anyway, and it will simplify adding 64-bit MMIO support for VZ. kvm_mips_emulate_store() and kvm_mips_emulate_load() can now return EMULATE_DO_MMIO (as possibly originally intended). We therefore stop calling either of these from kvm_mips_emulate_inst(), which is now only used by kvm_trap_emul_handle_cop_unusable() which is picky about return values. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: MIPS: Implement HYPCALL emulationJames Hogan
Emulate the HYPCALL instruction added in the VZ ASE and used by the MIPS paravirtualised guest support that is already merged. The new hypcall.c handles arguments and the return value. No actual hypercalls are yet supported, but this still allows us to safely step over hypercalls and set an error code in the return value for forward compatibility. Non-zero HYPCALL codes are not handled. We also document the hypercall ABI which asm/kvm_para.h uses. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Andreas Herrmann <andreas.herrmann@caviumnetworks.com> Cc: David Daney <david.daney@cavium.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-doc@vger.kernel.org
2017-03-28MIPS: asm/tlb.h: Add UNIQUE_GUEST_ENTRYHI() macroJames Hogan
Add a distinct UNIQUE_GUEST_ENTRYHI() macro for invalidation of guest TLB entries by KVM, using addresses in KSeg1 rather than KSeg0. This avoids conflicts with guest invalidation routines when there is no EHINV bit to mark the whole entry as invalid, avoiding guest machine check exceptions on Cavium Octeon III. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28MIPS: Add some missing guest CP0 accessors & defsJames Hogan
Add some missing guest accessors and register field definitions for KVM for MIPS VZ to make use of. Guest CP0_LLAddr register accessors and definitions for the LLB field allow KVM to clear the guest LLB to cancel in-progress LL/SC atomics on restore, and to emulate accesses by the guest to the CP0_LLAddr register. Bitwise modifiers and definitions for the guest CP0_Wired and CP0_Config1 registers allow KVM to modify fields within the CP0_Wired and CP0_Config1 registers. Finally a definition for the CP0_Config5.SBRI bit allows KVM to initialise and allow modification of the guest version of the SBRI bit. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28MIPS: Probe guest MVHJames Hogan
Probe for availablility of M{T,F}HC0 instructions used with e.g. XPA in the VZ guest context, and make it available via cpu_guest_has_mvh. This will be helpful in properly emulating the MAAR registers in KVM for MIPS VZ. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28MIPS: Probe guest CP0_UserLocalJames Hogan
Probe for presence of guest CP0_UserLocal register and expose via cpu_guest_has_userlocal. This register is optional pre-r6, so this will allow KVM to only save/restore/expose the guest CP0_UserLocal register if it exists. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28MIPS: Separate MAAR V bit into VL and VH for XPAJames Hogan
The MAAR V bit has been renamed VL since another bit called VH is added at the top of the register when it is extended to 64-bits on a 32-bit processor with XPA. Rename the V definition, fix the various users, and add definitions for the VH bit. Also add a definition for the MAARI Index field. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28MIPS: Add defs & probing of UFRJames Hogan
Add definitions and probing of the UFR bit in Config5. This bit allows user mode control of the FR bit (floating point register mode). It is present if the UFRP bit is set in the floating point implementation register. This is a capability KVM may want to expose to guest kernels, even though Linux is unlikely to ever use it due to the implications for multi-threaded programs. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
2017-03-28KVM: x86: cleanup the page tracking SRCU instancePaolo Bonzini
SRCU uses a delayed work item. Skip cleaning it up, and the result is use-after-free in the work item callbacks. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: 0eb05bf290cfe8610d9680b49abef37febd1c38a Reviewed-by: Xiao Guangrong <xiaoguangrong.eric@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-28KVM: nVMX: fix nested EPT detectionLadi Prosek
The nested_ept_enabled flag introduced in commit 7ca29de2136 was not computed correctly. We are interested only in L1's EPT state, not the the combined L0+L1 value. In particular, if L0 uses EPT but L1 does not, nested_ept_enabled must be false to make sure that PDPSTRs are loaded based on CR3 as usual, because the special case described in 26.3.2.4 Loading Page-Directory- Pointer-Table Entries does not apply. Fixes: 7ca29de21362 ("KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT") Cc: qemu-stable@nongnu.org Reported-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-28KVM: pci-assign: do not map smm memory slot pages in vt-d page tablesHerongguang (Stephen)
or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when destroy VM. This is consistent with current vfio implementation. Signed-off-by: herongguang <herongguang.he@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27powerpc/mmu: Add real mode support for IOMMU preregistered memoryAlexey Kardashevskiy
This makes mm_iommu_lookup() able to work in realmode by replacing list_for_each_entry_rcu() (which can do debug stuff which can fail in real mode) with list_for_each_entry_lockless(). This adds realmode version of mm_iommu_ua_to_hpa() which adds explicit vmalloc'd-to-linear address conversion. Unlike mm_iommu_ua_to_hpa(), mm_iommu_ua_to_hpa_rm() can fail. This changes mm_iommu_preregistered() to receive @mm as in real mode @current does not always have a correct pointer. This adds realmode version of mm_iommu_lookup() which receives @mm (for the same reason as for mm_iommu_preregistered()) and uses lockless version of list_for_each_entry_rcu(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-24x86/syscalls/32: Ignore arch_prctl for other architecturesArnd Bergmann
sys_arch_prctl is only provided on x86, and there is no reason to add it elsewhere. However, including it on the 32-bit syscall table caused a warning for most configurations on non-x86: :1328:2: warning: #warning syscall arch_prctl not implemented [-Wcpp] This adds an exception to the syscall table checking script. Fixes: 79170fda313e ("x86/syscalls/32: Wire up arch_prctl on x86-32") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Kyle Huey <khuey@kylehuey.com> Link: http://lkml.kernel.org/r/20170323151904.706286-1-arnd@arndb.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-23KVM: kvm_io_bus_unregister_dev() should never failDavid Hildenbrand
No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU") Cc: stable@vger.kernel.org # 3.4+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23KVM: VMX: Fix enable VPID conditionsWanpeng Li
This can be reproduced by running L2 on L1, and disable VPID on L0 if w/o commit "KVM: nVMX: Fix nested VPID vmx exec control", the L2 crash as below: KVM: entry failed, hardware error 0x7 EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c3 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Reference SDM 30.3 INVVPID: Protected Mode Exceptions - #UD - If not in VMX operation. - If the logical processor does not support VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=0). - If the logical processor supports VPIDs (IA32_VMX_PROCBASED_CTLS2[37]=1) but does not support the INVVPID instruction (IA32_VMX_EPT_VPID_CAP[32]=0). So we should check both VPID enable bit in vmx exec control and INVVPID support bit in vmx capability MSRs to enable VPID. This patch adds the guarantee to not enable VPID if either INVVPID or single-context/all-context invalidation is not exposed in vmx capability MSRs. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23KVM: nVMX: Fix nested VPID vmx exec controlWanpeng Li
This can be reproduced by running kvm-unit-tests/vmx.flat on L0 w/ vpid disabled. Test suite: VPID Unhandled exception 6 #UD at ip 00000000004051a6 error_code=0000 rflags=00010047 cs=00000008 rax=0000000000000000 rcx=0000000000000001 rdx=0000000000000047 rbx=0000000000402f79 rbp=0000000000456240 rsi=0000000000000001 rdi=0000000000000000 r8=000000000000000a r9=00000000000003f8 r10=0000000080010011 r11=0000000000000000 r12=0000000000000003 r13=0000000000000708 r14=0000000000000000 r15=0000000000000000 cr0=0000000080010031 cr2=0000000000000000 cr3=0000000007fff000 cr4=0000000000002020 cr8=0000000000000000 STACK: @4051a6 40523e 400f7f 402059 40028f We should hide and forbid VPID in L1 if it is disabled on L0. However, nested VPID enable bit is set unconditionally during setup nested vmx exec controls though VPID is not exposed through nested VMX capablity. This patch fixes it by don't set nested VPID enable bit if it is disabled on L0. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Fixes: 5c614b3583e (KVM: nVMX: nested VPID emulation) Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23KVM: x86: correct async page present tracepointWanpeng Li
After async pf setup successfully, there is a broadcast wakeup w/ special token 0xffffffff which tells vCPU that it should wake up all processes waiting for APFs though there is no real process waiting at the moment. The async page present tracepoint print prematurely and fails to catch the special token setup. This patch fixes it by moving the async page present tracepoint after the special token setup. Before patch: qemu-system-x86-8499 [006] ...1 5973.473292: kvm_async_pf_ready: token 0x0 gva 0x0 After patch: qemu-system-x86-8499 [006] ...1 5973.473292: kvm_async_pf_ready: token 0xffffffff gva 0x0 Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-23kvm: vmx: Flush TLB when the APIC-access address changesJim Mattson
Quoting from the Intel SDM, volume 3, section 28.3.3.4: Guidelines for Use of the INVEPT Instruction: If EPT was in use on a logical processor at one time with EPTP X, it is recommended that software use the INVEPT instruction with the "single-context" INVEPT type and with EPTP X in the INVEPT descriptor before a VM entry on the same logical processor that enables EPT with EPTP X and either (a) the "virtualize APIC accesses" VM-execution control was changed from 0 to 1; or (b) the value of the APIC-access address was changed. In the nested case, the burden falls on L1, unless L0 enables EPT in vmcs02 when L1 doesn't enable EPT in vmcs12. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-23KVM: x86: use pic/ioapic destructor when destroy vmPeter Xu
We have specific destructors for pic/ioapic, we'd better use them when destroying the VM as well. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-23KVM: x86: check existance before destroyPeter Xu
Mostly used for split irqchip mode. In that case, these two things are not inited at all, so no need to release. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-22KVM: arm64: Use common Set/Way sys definitionsMark Rutland
Now that we have common definitions for the encoding of Set/Way cache maintenance operations, make the KVM code use these, simplifying the sys_reg_descs table. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: Use common sysreg definitionsMark Rutland
Now that we have common definitions for the remaining register encodings required by KVM, make the KVM code use these, simplifying the sys_reg_descs table and the genericv8_sys_regs table. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: use common invariant sysreg definitionsMark Rutland
Now that we have common definitions for the register encodings used by KVM, make the KVM code uses thse for invariant sysreg definitions. This makes said definitions a reasonable amount shorter, especially as many comments are rendered redundant and can be removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: Use common physical timer sysreg definitionsMark Rutland
Now that we have common definitions for the physical timer control registers, make the KVM code use these, simplifying the sys_reg_descs table. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: Use common GICv3 sysreg definitionsMark Rutland
Now that we have common definitions for the GICv3 register encodings, make the KVM code use these, simplifying the sys_reg_descs table. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: Use common performance monitor sysreg definitionsMark Rutland
Now that we have common definitions for the performance monitor register encodings, make the KVM code use these, simplifying the sys_reg_descs table. The comments for PMUSERENR_EL0 and PMCCFILTR_EL0 are kept, as these describe non-obvious details regarding the registers. However, a slight fixup is applied to bring these into line with the usual comment style. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: Use common debug sysreg definitionsMark Rutland
Now that we have common definitions for the debug register encodings, make the KVM code use these, simplifying the sys_reg_descs table. The table previously erroneously referred to MDCCSR_EL0 as MDCCSR_EL1. This is corrected (as is necessary in order to use the common sysreg definition). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: arm64: add SYS_DESC()Mark Rutland
This patch adds a macro enabling us to initialise sys_reg_desc structures based on common sysreg encoding definitions in <asm/sysreg.h>. Subsequent patches will use this to simplify the KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu
2017-03-22KVM: s390: gs support for kvm guestsFan Zhang
This patch adds guarded storage support for KVM guest. We need to setup the necessary control blocks, the kvm_run structure for the new registers, the necessary wrappers for VSIE, as well as the machine check save areas. GS is enabled lazily and the register saving and reloading is done in KVM code. As this feature adds new content for migration, we provide a new capability for enablement (KVM_CAP_S390_GS). Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-03-22Merge remote-tracking branch 's390/guarded-storage' into kvms390/nextChristian Borntraeger
2017-03-22s390: add a system call for guarded storageMartin Schwidefsky
This adds a new system call to enable the use of guarded storage for user space processes. The system call takes two arguments, a command and pointer to a guarded storage control block: s390_guarded_storage(int command, struct gs_cb *gs_cb); The second argument is relevant only for the GS_SET_BC_CB command. The commands in detail: 0 - GS_ENABLE Enable the guarded storage facility for the current task. The initial content of the guarded storage control block will be all zeros. After the enablement the user space code can use load-guarded-storage-controls instruction (LGSC) to load an arbitrary control block. While a task is enabled the kernel will save and restore the current content of the guarded storage registers on context switch. 1 - GS_DISABLE Disables the use of the guarded storage facility for the current task. The kernel will cease to save and restore the content of the guarded storage registers, the task specific content of these registers is lost. 2 - GS_SET_BC_CB Set a broadcast guarded storage control block. This is called per thread and stores a specific guarded storage control block in the task struct of the current task. This control block will be used for the broadcast event GS_BROADCAST. 3 - GS_CLEAR_BC_CB Clears the broadcast guarded storage control block. The guarded- storage control block is removed from the task struct that was established by GS_SET_BC_CB. 4 - GS_BROADCAST Sends a broadcast to all thread siblings of the current task. Every sibling that has established a broadcast guarded storage control block will load this control block and will be enabled for guarded storage. The broadcast guarded storage control block is used up, a second broadcast without a refresh of the stored control block with GS_SET_BC_CB will not have any effect. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-21mm, swap: Remove WARN_ON_ONCE() in free_swap_slot()Huang Ying
Before commit 452b94b8c8c7 ("mm/swap: don't BUG_ON() due to uninitialized swap slot cache"), the following bug is reported, ------------[ cut here ]------------ kernel BUG at mm/swap_slots.c:270! invalid opcode: 0000 [#1] SMP CPU: 5 PID: 1745 Comm: (sd-pam) Not tainted 4.11.0-rc1-00243-g24c534bb161b #1 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 RIP: 0010:free_swap_slot+0xba/0xd0 Call Trace: swap_free+0x36/0x40 do_swap_page+0x360/0x6d0 __handle_mm_fault+0x880/0x1080 handle_mm_fault+0xd0/0x240 __do_page_fault+0x232/0x4d0 do_page_fault+0x20/0x70 page_fault+0x22/0x30 ---[ end trace aefc9ede53e0ab21 ]--- This is raised by the BUG_ON(!swap_slot_cache_initialized) in free_swap_slot(). This is incorrect, because even if the swap slots cache fails to be initialized, the swap should operate properly without the swap slots cache. And the use_swap_slot_cache check later in the function will protect the uninitialized swap slots cache case. In commit 452b94b8c8c7, the BUG_ON() is replaced by WARN_ON_ONCE(). In the patch, the WARN_ON_ONCE() is removed too. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-21Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Nine small fixes: the biggest is probably finally sorting out Kconfig issues with lpfc nvme. There are some performance fixes for megaraid and hpsa and a static checker fix" [ Johannes Thumshirn points out that there still seems to be more lpfc vs nvme config issues. Oh well. - Linus ] * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: lpfc: Finalize Kconfig options for nvme scsi: ufs: don't check unsigned type for a negative value scsi: hpsa: do not timeout reset operations scsi: hpsa: limit outstanding rescans scsi: hpsa: update check for logical volume status scsi: megaraid_sas: Driver version upgrade scsi: megaraid_sas: raid6 also require cpuSel check same as raid5 scsi: megaraid_sas: add correct return type check for ldio hint logic for raid1 scsi: megaraid_sas: enable intx only if msix request fails
2017-03-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - regression fixes for Wacom devices, from Aaron Armstrong Skomra and Ping Cheng - memory leak in hid-sony driver from Roderick Colenbrander - new device IDs support from Oscar Campos and Daniel Drake * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: wacom: generic: Wacom mouse is only provided for opaque tablets HID: corsair: Add driver Scimitar Pro RGB gaming mouse 1b1c:1b3e support to hid-corsair HID: corsair: support for K65-K70 Rapidfire and Scimitar Pro RGB HID: wacom: don't manually release resources for the EKR HID: wacom: Correct Intuos Pro 2 resolution HID: sony: Fix input device leak when connecting a DS4 twice using USB/BT HID: chicony: Add support for another ASUS Zen AiO keyboard