summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-19Merge branch 'topic/ppc-kvm' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD Fix conflicts between memslot overhaul and commit 511d25d6b789f ("KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots") from the powerpc tree.
2021-12-14KVM: PPC: Book3S HV P9: Use kvm_arch_vcpu_get_wait() to get rcuwait objectSean Christopherson
Use kvm_arch_vcpu_get_wait() to get a vCPU's rcuwait object instead of using vcpu->wait directly in kvmhv_run_single_vcpu(). Functionally, this is a nop as vcpu->arch.waitp is guaranteed to point at vcpu->wait. But that is not obvious at first glance, and a future change coming in via the KVM tree, commit 510958e99721 ("KVM: Force PPC to define its own rcuwait object"), will hide vcpu->wait from architectures that define __KVM_HAVE_ARCH_WQP to prevent generic KVM from attepting to wake a vCPU with the wrong rcuwait object. Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211213174556.3871157-1-seanjc@google.com
2021-12-09KVM: arm64: Use Makefile.kvm for common filesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211121125451.9489-8-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: powerpc: Use Makefile.kvm for common filesDavid Woodhouse
It's all fairly baroque but in the end, I don't think there's any reason for $(KVM)/irqchip.o to have been handled differently, as they all end up in $(kvm-y) in the end anyway, regardless of whether they get there via $(common-objs-y) and the CPU-specific object lists. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Message-Id: <20211121125451.9489-7-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: RISC-V: Use Makefile.kvm for common filesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211121125451.9489-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: mips: Use Makefile.kvm for common filesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211121125451.9489-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: s390: Use Makefile.kvm for common filesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20211121125451.9489-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: Add Makefile.kvm for common files, use it for x86David Woodhouse
Splitting kvm_main.c out into smaller and better-organized files is slightly non-trivial when it involves editing a bunch of per-arch KVM makefiles. Provide virt/kvm/Makefile.kvm for them to include. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211121125451.9489-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: Introduce CONFIG_HAVE_KVM_DIRTY_RINGDavid Woodhouse
I'd like to make the build include dirty_ring.c based on whether the arch wants it or not. That's a whole lot simpler if there's a config symbol instead of doing it implicitly on KVM_DIRTY_LOG_PAGE_OFFSET being set to something non-zero. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211121125451.9489-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: VMX: Clean up PI pre/post-block WARNsSean Christopherson
Move the WARN sanity checks out of the PI descriptor update loop so as not to spam the kernel log if the condition is violated and the update takes multiple attempts due to another writer. This also eliminates a few extra uops from the retry path. Technically not checking every attempt could mean KVM will now fail to WARN in a scenario that would have failed before, but any such failure would be inherently racy as some other agent (CPU or device) would have to concurrent modify the PI descriptor. Add a helper to handle the actual write and more importantly to document why the write may need to be retried. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: nVMX: Ensure vCPU honors event request if posting nested IRQ failsSean Christopherson
Add a memory barrier between writing vcpu->requests and reading vcpu->guest_mode to ensure the read is ordered after the write when (potentially) delivering an IRQ to L2 via nested posted interrupt. If the request were to be completed after reading vcpu->mode, it would be possible for the target vCPU to enter the guest without posting the interrupt and without handling the event request. Note, the barrier is only for documentation since atomic operations are serializing on x86. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: 6b6977117f50 ("KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2") Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211208015236.1616697-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-09KVM: x86: add a tracepoint for APICv/AVIC interrupt deliveryMaxim Levitsky
This allows to see how many interrupts were delivered via the APICv/AVIC from the host. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211209115440.394441-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nVMX: Implement Enlightened MSR Bitmap featureVitaly Kuznetsov
Updating MSR bitmap for L2 is not cheap and rearly needed. TLFS for Hyper-V offers 'Enlightened MSR Bitmap' feature which allows L1 hypervisor to inform L0 when it changes MSR bitmap, this eliminates the need to examine L1's MSR bitmap for L2 every time when 'real' MSR bitmap for L2 gets constructed. Use 'vmx->nested.msr_bitmap_changed' flag to implement the feature. Note, KVM already uses 'Enlightened MSR bitmap' feature when it runs as a nested hypervisor on top of Hyper-V. The newly introduced feature is going to be used by Hyper-V guests on KVM. When the feature is enabled for Win10+WSL2, it shaves off around 700 CPU cycles from a nested vmexit cost (tight cpuid loop test). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211129094704.326635-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: nVMX: Track whether changes in L0 require MSR bitmap for L2 to be rebuiltVitaly Kuznetsov
Introduce a flag to keep track of whether MSR bitmap for L2 needs to be rebuilt due to changes in MSR bitmap for L1 or switching to a different L2. This information will be used for Enlightened MSR Bitmap feature for Hyper-V guests. Note, setting msr_bitmap_changed to 'true' from set_current_vmptr() is not really needed for Enlightened MSR Bitmap as the feature can only be used in conjunction with Enlightened VMCS but let's keep tracking information complete, it's cheap and in the future similar PV feature can easily be implemented for KVM on KVM too. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211129094704.326635-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Introduce vmx_msr_bitmap_l01_changed() helperVitaly Kuznetsov
In preparation to enabling 'Enlightened MSR Bitmap' feature for Hyper-V guests move MSR bitmap update tracking to a dedicated helper. Note: vmx_msr_bitmap_l01_changed() is called when MSR bitmap might be updated. KVM doesn't check if the bit we're trying to set is already set (or the bit it's trying to clear is already cleared). Such situations should not be common and a few false positives should not be a problem. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211129094704.326635-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08Merge branch 'kvm-on-hv-msrbm-fix' into HEADPaolo Bonzini
Merge bugfix for enlightened MSR Bitmap, before adding support to KVM for exposing the feature to nested guests.
2021-12-08KVM: x86: Exit to userspace if emulation prepared a completion callbackHou Wenlong
em_rdmsr() and em_wrmsr() return X86EMUL_IO_NEEDED if MSR accesses required an exit to userspace. However, x86_emulate_insn() doesn't return X86EMUL_*, so x86_emulate_instruction() doesn't directly act on X86EMUL_IO_NEEDED; instead, it looks for other signals to differentiate between PIO, MMIO, etc. causing RDMSR/WRMSR emulation not to exit to userspace now. Nevertheless, if the userspace_msr_exit_test testcase in selftests is changed to test RDMSR/WRMSR with a forced emulation prefix, the test passes. What happens is that first userspace exit information is filled but the userspace exit does not happen. Because x86_emulate_instruction() returns 1, the guest retries the instruction---but this time RIP has already been adjusted past the forced emulation prefix, so the guest executes RDMSR/WRMSR and the userspace exit finally happens. Since the X86EMUL_IO_NEEDED path has provided a complete_userspace_io callback, x86_emulate_instruction() can just return 0 if the callback is not NULL. Then RDMSR/WRMSR instruction emulation will exit to userspace directly, without the RDMSR/WRMSR vmexit. Fixes: 1ae099540e8c7 ("KVM: x86: Allow deflecting unknown MSR accesses to user space") Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <56f9df2ee5c05a81155e2be366c9dc1f7adc8817.1635842679.git.houwenlong93@linux.alibaba.com>
2021-12-08KVM: nVMX: Don't use Enlightened MSR Bitmap for L3Vitaly Kuznetsov
When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from clean fields to make Hyper-V aware of the change. For KVM's L1s, this is done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr(). MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't check if the resulting bitmap is different and never cleans HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and may result in Hyper-V missing the update. The issue could've been solved by calling evmcs_touch_msr_bitmap() for eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so would not give any performance benefits (compared to not using Enlightened MSR Bitmap at all). 3-level nesting is also not a very common setup nowadays. Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for now. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211129094704.326635-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86: Use different callback if msr access comes from the emulatorHou Wenlong
If msr access triggers an exit to userspace, the complete_userspace_io callback would skip instruction by vendor callback for kvm_skip_emulated_instruction(). However, when msr access comes from the emulator, e.g. if kvm.force_emulation_prefix is enabled and the guest uses rdmsr/wrmsr with kvm prefix, VM_EXIT_INSTRUCTION_LEN in vmcs is invalid and kvm_emulate_instruction() should be used to skip instruction instead. As Sean noted, unlike the previous case, there's no #UD if unrestricted guest is disabled and the guest accesses an MSR in Big RM. So the correct way to fix this is to attach a different callback when the msr access comes from the emulator. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <34208da8f51580a06e45afefac95afea0e3f96e3.1635842679.git.houwenlong93@linux.alibaba.com>
2021-12-08KVM: x86: Add an emulation type to handle completion of user exitsHou Wenlong
The next patch would use kvm_emulate_instruction() with EMULTYPE_SKIP in complete_userspace_io callback to fix a problem in msr access emulation. However, EMULTYPE_SKIP only updates RIP, more things like updating interruptibility state and injecting single-step #DBs would be done in the callback. Since the emulator also does those things after x86_emulate_insn(), add a new emulation type to pair with EMULTYPE_SKIP to do those things for completion of user exits within the emulator. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <8f8c8e268b65f31d55c2881a4b30670946ecfa0d.1635842679.git.houwenlong93@linux.alibaba.com>
2021-12-08KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code segSean Christopherson
Truncate the new EIP to a 32-bit value when handling EMULTYPE_SKIP as the decode phase does not truncate _eip. Wrapping the 32-bit boundary is legal if and only if CS is a flat code segment, but that check is implicitly handled in the form of limit checks in the decode phase. Opportunstically prepare for a future fix by storing the result of any truncation in "eip" instead of "_eip". Fixes: 1957aa63be53 ("KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <093eabb1eab2965201c9b018373baf26ff256d85.1635842679.git.houwenlong93@linux.alibaba.com>
2021-12-08KVM: Clear pv eoi pending bit only when it is setLi RongQing
merge pv_eoi_get_pending and pv_eoi_clr_pending into a single function pv_eoi_test_and_clear_pending, which returns and clear the value of the pending bit. This makes it possible to clear the pending bit only if the guest had set it, and otherwise skip the call to pv_eoi_put_user(). This can save up to 300 nsec on AMD EPYC processors. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-Id: <1636026974-50555-2-git-send-email-lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86: don't print when fail to read/write pv eoi memoryLi RongQing
If guest gives MSR_KVM_PV_EOI_EN a wrong value, this printk() will be trigged, and kernel log is spammed with the useless message Fixes: 0d88800d5472 ("kvm: x86: ioapic and apic debug macros cleanup") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: stable@kernel.org Message-Id: <1636026974-50555-1-git-send-email-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Remove mmu parameter from load_pdptrs()Lai Jiangshan
It uses vcpu->arch.walk_mmu always; nested EPT does not have PDPTRs, and nested NPT treats them like all other non-leaf page table levels instead of caching them. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-11-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the directionLai Jiangshan
This bit is very close to mean "role.quadrant is not in use", except that it is false also when the MMU is mapping guest physical addresses directly. In that case, role.quadrant is indeed not in use, but there are no guest PTEs at all. Changing the name and direction of the bit removes the special case, since a guest with paging disabled, or not considering guest paging structures as is the case for two-dimensional paging, does not have to deal with 4-byte guest PTEs. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-10-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Use ept_caps_to_lpage_level() in hardware_setup()Lai Jiangshan
Using ept_caps_to_lpage_level is simpler. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-9-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Add parameter huge_page_level to kvm_init_shadow_ept_mmu()Lai Jiangshan
The level of supported large page on nEPT affects the rsvds_bits_mask. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-8-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Add huge_page_level to __reset_rsvds_bits_mask_ept()Lai Jiangshan
Bit 7 on pte depends on the level of supported large page. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-7-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Remove mmu->translate_gpaLai Jiangshan
Reduce an indirect function call (retpoline) and some intialization code. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-4-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Add parameter struct kvm_mmu *mmu into mmu->gva_to_gpa()Lai Jiangshan
The mmu->gva_to_gpa() has no "struct kvm_mmu *mmu", so an extra FNAME(gva_to_gpa_nested) is needed. Add the parameter can simplify the code. And it makes it explicit that the walk is upon vcpu->arch.walk_mmu for gva and vcpu->arch.mmu for L2 gpa in translate_nested_gpa() via the new parameter. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Calculate quadrant when !role.gpte_is_8_bytesLai Jiangshan
role.quadrant is only valid when gpte size is 4 bytes and only be calculated when gpte size is 4 bytes. Although "vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL" also means gpte size is 4 bytes, but using "!role.gpte_is_8_bytes" is clearer Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-15-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Remove useless code to set role.gpte_is_8_bytes when role.directLai Jiangshan
role.gpte_is_8_bytes is unused when role.direct; there is no point in changing a bit in the role, the value that was set when the MMU is initialized is just fine. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-14-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Remove unused declaration of __kvm_mmu_free_some_pages()Lai Jiangshan
The body of __kvm_mmu_free_some_pages() has been removed. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-13-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Fix comment in __kvm_mmu_create()Lai Jiangshan
The allocation of special roots is moved to mmu_alloc_special_roots(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-12-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Skip allocating pae_root for vcpu->arch.guest_mmu when !tdp_enabledLai Jiangshan
It is never used. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-11-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: SVM: Allocate sd->save_area with __GFP_ZEROLai Jiangshan
And remove clear_page() on it. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-10-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: SVM: Rename get_max_npt_level() to get_npt_level()Lai Jiangshan
It returns the only proper NPT level, so the "max" in the name is not appropriate. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-9-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Change comments about vmx_get_msr()Lai Jiangshan
The variable name is changed in the code. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-8-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Use kvm_set_msr_common() for MSR_IA32_TSC_ADJUST in the default wayLai Jiangshan
MSR_IA32_TSC_ADJUST can be left to the default way which also uese kvm_set_msr_common(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-7-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()Lai Jiangshan
The host CR3 in the vcpu thread can only be changed when scheduling. Moving the code in vmx_prepare_switch_to_guest() makes the code simpler. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-5-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Update msr value after kvm_set_user_return_msr() succeedsLai Jiangshan
Aoid earlier modification. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-4-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Avoid to rdmsrl(MSR_IA32_SYSENTER_ESP)Lai Jiangshan
The value of host MSR_IA32_SYSENTER_ESP is known to be constant for each CPU: (cpu_entry_stack(cpu) + 1) when 32 bit syscall is enabled or NULL is 32 bit syscall is not enabled. So rdmsrl() can be avoided for the first case and both rdmsrl() and vmcs_writel() can be avoided for the second case. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211118110814.2568-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Update mmu->pdptrs only when it is changedLai Jiangshan
It is unchanged in most cases. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211111144527.88852-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Remove kvm_register_clear_available()Lai Jiangshan
It has no user. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-15-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: vmx, svm: clean up mass updates to regs_avail/regs_dirty bitsPaolo Bonzini
Document the meaning of the three combinations of regs_avail and regs_dirty. Update regs_dirty just after writeback instead of doing it later after vmexit. After vmexit, instead, we clear the regs_avail bits corresponding to lazily-loaded registers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: VMX: Update vmcs.GUEST_CR3 only when the guest CR3 is dirtyLai Jiangshan
When vcpu->arch.cr3 is changed, it is marked dirty, so vmcs.GUEST_CR3 can be updated only when kvm_register_is_dirty(vcpu, VCPU_EXREG_CR3). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-12-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Mark CR3 dirty when vcpu->arch.cr3 is changedLai Jiangshan
When vcpu->arch.cr3 is changed, it should be marked dirty unless it is being updated to the value of the architecture guest CR3 (i.e. VMX.GUEST_CR3 or vmcb->save.cr3 when tdp is enabled). This patch has no functionality changed because kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3) is superset of kvm_register_mark_available(vcpu, VCPU_EXREG_CR3) with additional change to vcpu->arch.regs_dirty, but no code uses regs_dirty for VCPU_EXREG_CR3. (vmx_load_mmu_pgd() uses vcpu->arch.regs_avail instead to test if VCPU_EXREG_CR3 dirty which means current code (ab)uses regs_avail for VCPU_EXREG_CR3 dirty information.) Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-11-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: SVM: Remove references to VCPU_EXREG_CR3Lai Jiangshan
VCPU_EXREG_CR3 is never cleared from vcpu->arch.regs_avail or vcpu->arch.regs_dirty in SVM; therefore, marking CR3 as available is merely a NOP, and testing it will likewise always succeed. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-9-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: SVM: Remove outdated comment in svm_load_mmu_pgd()Lai Jiangshan
The comment had been added in the commit 689f3bf21628 ("KVM: x86: unify callbacks to load paging root") and its related code was removed later, and it has nothing to do with the next line of code. So the comment should be removed too. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-8-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Move CR0 pdptr_bits into header file as X86_CR0_PDPTR_BITSLai Jiangshan
Not functionality changed. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-7-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>