summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-07-12KVM: PPC: booke: use shadow_msrScott Wood
Keep the guest MSR and the guest-mode true MSR separate, rather than modifying the guest MSR on each guest entry to produce a true MSR. Any bits which should be modified based on guest MSR must be explicitly propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in kvmppc_set_msr(). While we're modifying the guest entry code, reorder a few instructions to bury some load latencies. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12powerpc/e500: SPE register saving: take arbitrary struct offsetScott Wood
Previously, these macros hardcoded THREAD_EVR0 as the base of the save area, relative to the base register passed. This base offset is now passed as a separate macro parameter, allowing reuse with other SPE save areas, such as used by KVM. Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12powerpc/e500: Save SPEFCSR in flush_spe_to_thread()yu liu
giveup_spe() saves the SPE state which is protected by MSR[SPE]. However, modifying SPEFSCR does not trap when MSR[SPE]=0. And since SPEFSCR is already saved/restored in _switch(), not all the callers want to save SPEFSCR again. Thus, saving SPEFSCR should not belong to giveup_spe(). This patch moves SPEFSCR saving to flush_spe_to_thread(), and cleans up the caller that needs to save SPEFSCR accordingly. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Resolve real-mode handlers through function exportsAlexander Graf
Up until now, Book3S KVM had variables stored in the kernel that a kernel module or the kvm code in the kernel could read from to figure out where some real mode helper functions are located. This is all unnecessary. The high bits of the EA get ignore in real mode, so we can just use the pointer as is. Also, it's a lot easier on relocations when we use the normal way of resolving the address to a function, instead of jumping through hoops. This patch fixes compilation with CONFIG_RELOCATABLE=y. Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: fix partial application of "exit timing in ticks"Stuart Yoder
When http://www.spinics.net/lists/kvm-ppc/msg02664.html was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2, the removal of the conversion in add_exit_timing was left out. Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: MMU: make kvm_mmu_reset_context() flush the guest TLBAvi Kivity
kvm_set_cr0() and kvm_set_cr4(), and possible other functions, assume that kvm_mmu_reset_context() flushes the guest TLB. However, it does not. Fix by flushing the tlb (and syncing the new root as well). Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0Avi Kivity
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Enable ERMS feature support for KVMYang, Wei
This patch exposes ERMS feature to KVM guests. The REP MOVSB/STOSB instruction can enhance fast strings attempts to move as much of the data with larger size load/stores as possible. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Expose RDWRGSFS bit to KVM guestsYang, Wei
This patch exposes RDWRGSFS bit to KVM guests. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Add RDWRGSFS support when setting CR4Yang, Wei
This patch adds RDWRGSFS support when setting CR4. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITSYang, Wei
This patch removes RDWRGSFS bit from CR4_RESERVED_BITS. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Enable DRNG feature support for KVMYang, Wei Y
This patch exposes DRNG feature to KVM guests. The RDRAND instruction can provide software with sequences of random numbers generated from white noise. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: fix XSAVE bit scanning (now properly)Andre Przywara
commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs XSAVE valid feature scanning, but it was wrong. It was not considering the sparse nature of this bitfield, instead reading values from uninitialized members of the entries array. This patch now separates subleaf indicies from KVM's array indicies and fills the entry before querying it's value. This fixes AVX support in KVM guests. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentationJan Kiszka
The documented behavior did not match the implemented one (which also never changed). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Fix off-by-one in overflow check of KVM_ASSIGN_SET_MSIX_NRJan Kiszka
KVM_MAX_MSIX_PER_DEV implies that up to that many MSI-X entries can be requested. But the kernel so far rejected already the upper limit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Add compat ioctl for KVM_SET_SIGNAL_MASKAlexander Graf
KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentationJan Kiszka
Neither host_irq nor the guest_msi struct are used anymore today. Tag the former, drop the latter to avoid confusion. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Add instruction fetch checking when walking guest page tableYang, Wei Y
This patch adds instruction fetch checking when walking guest page table, to implement SMEP when emulating instead of executing natively. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Mask function7 ebx against host capability word9Yang, Wei Y
This patch masks CPUID leaf 7 ebx against host capability word9. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Add SMEP support when setting CR4Yang, Wei Y
This patch adds SMEP handling when setting CR4. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: Remove SMEP bit from CR4_RESERVED_BITSYang, Wei Y
This patch removes SMEP bit from CR4_RESERVED_BITS. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: nVMX: Fix bug preventing more than two levels of nestingNadav Har'El
The nested VMX feature is supposed to fully emulate VMX for the guest. This (theoretically) not only allows it to run its own guests, but also also to further emulate VMX for its own guests, and allow arbitrarily deep nesting. This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH by L2, which prevented deeper nesting. Deeper nesting now works (I only actually tested L3), but is currently *absurdly* slow, to the point of being unusable. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: Fixup documentation section numberingJan Kiszka
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: fold decode_cache into x86_emulate_ctxtAvi Kivity
This saves a lot of pointless casts x86_emulate_ctxt and decode_cache. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: rename decode_cache::eip to _eipAvi Kivity
The name eip conflicts with a field of the same name in x86_emulate_ctxt, which we plan to fold decode_cache into. The name _eip is unfortunate, but what's really needed is a refactoring here, not a better name. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: VMX: Silence warning on 32-bit hostsJan Kiszka
a is unused now on CONFIG_X86_32. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for CLI/STI(FA/FB)Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for LOOP/JCXZTakuya Yoshikawa
LOOP/LOOPcc : E0-E2 JCXZ/JECXZ/JRCXZ : E3 Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Clean up INT n/INTO/INT 3(CC/CD/CE)Takuya Yoshikawa
Call emulate_int() directly to avoid spaghetti goto's. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for MOV(8C/8E)Takuya Yoshikawa
Different functions for those which take segment register operands. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for RET(C3)Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for XCHG(86/87)Takuya Yoshikawa
In addition, replace one "goto xchg" with an em_xchg() call. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for TEST(84/85, A8/A9)Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use opcode::execute for some instructionsTakuya Yoshikawa
Move the following functions to the opcode tables: RET (Far return) : CB IRET : CF JMP (Jump far) : EA SYSCALL : 0F 05 CLTS : 0F 06 SYSENTER : 0F 34 SYSEXIT : 0F 35 Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Rename emulate_xxx() to em_xxx()Takuya Yoshikawa
The next patch will change these to be called by opcode::execute. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: x86 emulator: Use the pointers ctxt and c consistentlyTakuya Yoshikawa
We should use the local variables ctxt and c when the emulate_ctxt and decode appears many times. At least, we need to be consistent about how we use these in a function. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: Document KVM_IOEVENTFDSasha Levin
Document KVM_IOEVENTFD that can be used to receive notifications of PIO/MMIO events without triggering an exit. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: DocumentationNadav Har'El
This patch includes a brief introduction to the nested vmx feature in the Documentation/kvm directory. The document also includes a copy of the vmcs12 structure, as requested by Avi Kivity. [marcelo: move to Documentation/virtual/kvm] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12ARM: dma: replace ISA_DMA_THRESHOLD with a variableRussell King
ISA_DMA_THRESHOLD has been unused by non-arch code, so lets now get rid of it from ARM by replacing it with arm_dma_zone_mask. Move dma_supported() and dma_set_mask() out of line, and have dma_supported() check this new variable instead. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-12PM / Runtime: Add new helper function: pm_runtime_status_suspended()Kevin Hilman
This boolean function simply returns whether or not the runtime status of the device is 'suspended'. Unlike pm_runtime_suspended(), this function returns the runtime status whether or not runtime PM for the device has been disabled or not. Also add entry to Documentation/power/runtime.txt Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12arm: lpc32xx: add rtc-deviceWolfram Sang
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Tested-by: Kevin Wells <Kevin.wells@nxp.com>
2011-07-12arm: lpc32xx: add tsc-deviceWolfram Sang
Fix the clock name, too. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Tested-by: Kevin Wells <Kevin.wells@nxp.com>
2011-07-12arm: davinci: Fix low level gpio irq handlers' argumentIdo Yariv
Commit 7416401 ("arm: davinci: Fix fallout from generic irq chip conversion") introduced a bug, causing low level interrupt handlers to get a bogus irq number as an argument. The gpio irq handler falsely assumes that the handler data is the irq base number and that is no longer true. Set the irq handler data to be a pointer to the corresponding gpio controller. The chained irq handler can then use it to extract both the irq base number and the gpio registers structure. Signed-off-by: Ido Yariv <ido@wizery.com> CC: Thomas Gleixner <tglx@linutronix.de> [nsekhar@ti.com: renamed "ctl" to "d", simplified indexing logic for chips and took care of odd bank handling in irq handler] Signed-off-by: Sekhar Nori <nsekhar@ti.com>
2011-07-12KVM: nVMX: Miscellenous small correctionsNadav Har'El
Small corrections of KVM (spelling, etc.) not directly related to nested VMX. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: Add VMX to list of supported cpuid featuresNadav Har'El
If the "nested" module option is enabled, add the "VMX" CPU feature to the list of CPU features KVM advertises with the KVM_GET_SUPPORTED_CPUID ioctl. Qemu uses this ioctl, and intersects KVM's list with its own list of desired cpu features (depending on the -cpu option given to qemu) to determine the final list of features presented to the guest. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: Additional TSC-offset handlingNadav Har'El
In the unlikely case that L1 does not capture MSR_IA32_TSC, L0 needs to emulate this MSR write by L2 by modifying vmcs02.tsc_offset. We also need to set vmcs12.tsc_offset, for this change to survive the next nested entry (see prepare_vmcs02()). Additionally, we also need to modify vmx_adjust_tsc_offset: The semantics of this function is that the TSC of all guests on this vcpu, L1 and possibly several L2s, need to be adjusted. To do this, we need to adjust vmcs01's tsc_offset (this offset will also apply to each L2s we enter). We can't set vmcs01 now, so we have to remember this adjustment and apply it when we later exit to L1. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: Further fixes for lazy FPU loadingNadav Har'El
KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and NM exceptions, even if we have a guest hypervisor (L1) who didn't want these traps. And of course, conversely: If L1 wanted to trap these events, we must let it, even if L0 is not interested in them. This patch fixes some existing KVM code (in update_exception_bitmap(), vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's and L1's needs. Note that handle_cr() was already fixed in the above patch, and that new code in introduced in previous patches already handles CR0 correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()). Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: Handling of CR0 and CR4 modifying instructionsNadav Har'El
When L2 tries to modify CR0 or CR4 (with mov or clts), and modifies a bit which L1 asked to shadow (via CR[04]_GUEST_HOST_MASK), we already do the right thing: we let L1 handle the trap (see nested_vmx_exit_handled_cr() in a previous patch). When L2 modifies bits that L1 doesn't care about, we let it think (via CR[04]_READ_SHADOW) that it did these modifications, while only changing (in GUEST_CR[04]) the bits that L0 doesn't shadow. This is needed for corect handling of CR0.TS for lazy FPU loading: L0 may want to leave TS on, while pretending to allow the guest to change it. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: Correct handling of idt vectoring infoNadav Har'El
This patch adds correct handling of IDT_VECTORING_INFO_FIELD for the nested case. When a guest exits while delivering an interrupt or exception, we get this information in IDT_VECTORING_INFO_FIELD in the VMCS. When L2 exits to L1, there's nothing we need to do, because L1 will see this field in vmcs12, and handle it itself. However, when L2 exits and L0 handles the exit itself and plans to return to L2, L0 must inject this event to L2. In the normal non-nested case, the idt_vectoring_info case is discovered after the exit, and the decision to inject (though not the injection itself) is made at that point. However, in the nested case a decision of whether to return to L2 or L1 also happens during the injection phase (see the previous patches), so in the nested case we can only decide what to do about the idt_vectoring_info right after the injection, i.e., in the beginning of vmx_vcpu_run, which is the first time we know for sure if we're staying in L2. Therefore, when we exit L2 (is_guest_mode(vcpu)), we disable the regular vmx_complete_interrupts() code which queues the idt_vectoring_info for injection on next entry - because such injection would not be appropriate if we will decide to exit to L1. Rather, we just save the idt_vectoring_info and related fields in vmcs12 (which is a convenient place to save these fields). On the next entry in vmx_vcpu_run (*after* the injection phase, potentially exiting to L1 to inject an event requested by user space), if we find ourselves in L1 we don't need to do anything with those values we saved (as explained above). But if we find that we're in L2, or rather *still* at L2 (it's not nested_run_pending, meaning that this is the first round of L2 running after L1 having just launched it), we need to inject the event saved in those fields - by writing the appropriate VMCS fields. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12KVM: nVMX: Correct handling of exception injectionNadav Har'El
Similar to the previous patch, but concerning injection of exceptions rather than external interrupts. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>