summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2020-05-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A new testcase for guest debugging (gdbstub) that exposed a bunch of bugs, mostly for AMD processors. And a few other x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c KVM: SVM: Disable AVIC before setting V_IRQ KVM: Introduce kvm_make_all_cpus_request_except() KVM: VMX: pass correct DR6 for GD userspace exit KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on KVM: selftests: Add KVM_SET_GUEST_DEBUG test KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG KVM: x86: fix DR6 delivery for various cases of #DB injection KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
2020-05-15KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mceJim Mattson
Bank_num is a one-based count of banks, not a zero-based index. It overflows the allocated space only when strictly greater than KVM_MAX_MCE_BANKS. Fixes: a9e38c3e01ad ("KVM: x86: Catch potential overrun in MCE setup") Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20200511225616.19557-1-jmattson@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.cBabu Moger
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU resource isn't. It can be read with XSAVE and written with XRSTOR. So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state), the guest can read the host value. In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could potentially use XRSTOR to change the host PKRU value. While at it, move pkru state save/restore to common code and the host_pkru field to kvm_vcpu_arch. This will let SVM support protection keys. Cc: stable@vger.kernel.org Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "14 fixes and one selftest to verify the ipc fixes herein" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: limit boost_watermark on small zones ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST mm/vmscan: remove unnecessary argument description of isolate_lru_pages() epoll: atomically remove wait entry on wake up kselftests: introduce new epoll60 testcase for catching lost wakeups percpu: make pcpu_alloc() aware of current gfp context mm/slub: fix incorrect interpretation of s->offset scripts/gdb: repair rb_first() and rb_last() eventpoll: fix missing wakeup for ovflist in ep_poll_callback arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() scripts/decodecode: fix trapping instruction formatting kernel/kcov.c: fix typos in kcov_remote_start documentation mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() mm, memcg: fix error return value of mem_cgroup_css_alloc() ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08KVM: SVM: Disable AVIC before setting V_IRQSuravee Suthikulpanit
The commit 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") introduced a WARN_ON, which checks if AVIC is enabled when trying to set V_IRQ in the VMCB for enabling irq window. The following warning is triggered because the requesting vcpu (to deactivate AVIC) does not get to process APICv update request for itself until the next #vmexit. WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd] RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd] Call Trace: kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm] ? _copy_to_user+0x26/0x30 ? kvm_vm_ioctl+0xb3e/0xd90 [kvm] ? set_next_entity+0x78/0xc0 kvm_vcpu_ioctl+0x236/0x610 [kvm] ksys_ioctl+0x8a/0xc0 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x58/0x210 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes by sending APICV update request to all other vcpus, and immediately update APIC for itself. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lkml.org/lkml/2020/5/2/167 Fixes: 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") Message-Id: <1588818939-54264-1-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: Introduce kvm_make_all_cpus_request_except()Suravee Suthikulpanit
This allows making request to all other vcpus except the one specified in the parameter. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: VMX: pass correct DR6 for GD userspace exitPaolo Bonzini
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD), DR6 was incorrectly copied from the value in the VM. Instead, DR6.BD should be set in order to catch this case. On AMD this does not need any special code because the processor triggers a #DB exception that is intercepted. However, the testcase would fail without the previous patch because both DR6.BS and DR6.BD would be set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6Paolo Bonzini
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6Paolo Bonzini
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()Janakarajan Natarajan
When trying to lock read-only pages, sev_pin_memory() fails because FOLL_WRITE is used as the flag for get_user_pages_fast(). Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") updated the get_user_pages_fast() call sites to use flags, but incorrectly updated the call in sev_pin_memory(). As the original coding of this call was correct, revert the change made by that commit. Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07KVM: nSVM: trap #DB and #BP to userspace if guest debugging is onPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUGPeter Xu
When single-step triggered with KVM_SET_GUEST_DEBUG, we should fill in the pc value with current linear RIP rather than the cached singlestep address. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505205000.188252-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUGPeter Xu
RTM should always been set even with KVM_EXIT_DEBUG on #DB. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505205000.188252-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: x86: fix DR6 delivery for various cases of #DB injectionPaolo Bonzini
Go through kvm_queue_exception_p so that the payload is correctly delivered through the exit qualification, and add a kvm_update_dr6 call to kvm_deliver_exception_payload that is needed on AMD. Reported-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bitsPaolo Bonzini
Using CPUID data can be useful for the processor compatibility check, but that's it. Using it to compute guest-reserved bits can have both false positives (such as LA57 and UMIP which we are already handling) and false negatives: in particular, with this patch we don't allow anymore a KVM guest to set CR4.PKE when CR4.PKE is clear on the host. Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU") Reported-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB pathSean Christopherson
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the RSB has always clobbered RFLAGS, its current incarnation just happens clear CF and ZF in the processs. Relying on the macro to clear CF and ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such that the ZF flag is always set. Reported-by: Qian Cai <cai@lca.pw> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04kvm: ioapic: Restrict lazy EOI update to edge-triggered interruptsPaolo Bonzini
Commit f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") introduces the following infinite loop: BUG: stack guard page was hit at 000000008f595917 \ (stack is 00000000bdefe5a4..00000000ae2b06f5) kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI RIP: 0010:kvm_set_irq+0x51/0x160 [kvm] Call Trace: irqfd_resampler_ack+0x32/0x90 [kvm] kvm_notify_acked_irq+0x62/0xd0 [kvm] kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm] ioapic_set_irq+0x20e/0x240 [kvm] kvm_ioapic_set_irq+0x5c/0x80 [kvm] kvm_set_irq+0xbb/0x160 [kvm] ? kvm_hv_set_sint+0x20/0x20 [kvm] irqfd_resampler_ack+0x32/0x90 [kvm] kvm_notify_acked_irq+0x62/0xd0 [kvm] kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm] ioapic_set_irq+0x20e/0x240 [kvm] kvm_ioapic_set_irq+0x5c/0x80 [kvm] kvm_set_irq+0xbb/0x160 [kvm] ? kvm_hv_set_sint+0x20/0x20 [kvm] .... The re-entrancy happens because the irq state is the OR of the interrupt state and the resamplefd state. That is, we don't want to show the state as 0 until we've had a chance to set the resamplefd. But if the interrupt has _not_ gone low then ioapic_set_irq is invoked again, causing an infinite loop. This can only happen for a level-triggered interrupt, otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high and then low. Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because TMR is set. Thus, fix the bug by restricting the lazy invocation of the ack notifier to edge-triggered interrupts, the only ones that need it. Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reported-by: borisvk@bstnet.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://www.spinics.net/lists/kvm/msg213512.html Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04KVM: SVM: fill in kvm_run->debug.arch.dr[67]Paolo Bonzini
The corresponding code was added for VMX in commit 42dbaa5a057 ("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD. Fix this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warningSean Christopherson
Use BUG() in the impossible-to-hit default case when switching on the scope of INVEPT to squash a warning with clang 11 due to clang treating the BUG_ON() as conditional. >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] BUG_ON(1); Reported-by: kbuild test robot <lkp@intel.com> Fixes: ce8fe7b77bd8 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21Merge tag 'kvm-s390-master-5.7-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fix for 5.7 and maintainer update - Silence false positive lockdep warning - add Claudio as reviewer
2020-04-20kvm: Disable objtool frame pointer checking for vmenter.SJosh Poimboeuf
Frame pointers are completely broken by vmenter.S because it clobbers RBP: arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register That's unavoidable, so just skip checking that file when frame pointers are configured in. On the other hand, ORC can handle that code just fine, so leave objtool enabled in the !FRAME_POINTER case. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Fixes: 199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-17kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GPVenkatesh Srinivas
Linux 3.14 unconditionally reads the RAPL PMU MSRs on boot, without handling General Protection Faults on reading those MSRs. Rather than injecting a #GP, which prevents boot, handle the MSRs by returning 0 for their data. Zero was checked to be safe by code review of the RAPL PMU driver and in discussion with the original driver author (eranian@google.com). Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Jon Cargille <jcargill@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200416184254.248374-1-jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-17KVM: Remove CREATE_IRQCHIP/SET_PIT2 raceSteve Rutherford
Fixes a NULL pointer dereference, caused by the PIT firing an interrupt before the interrupt table has been initialized. SET_PIT2 can race with the creation of the IRQchip. In particular, if SET_PIT2 is called with a low PIT timer period (after the creation of the IOAPIC, but before the instantiation of the irq routes), the PIT can fire an interrupt at an uninitialized table. Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Jon Cargille <jcargill@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200416191152.259434-1-jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15KVM: SVM: Fix __svm_vcpu_run declaration.Uros Bizjak
The function returns no value. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: 199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409114926.1407442-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15KVM: SVM: Do not setup frame pointer in __svm_vcpu_runUros Bizjak
__svm_vcpu_run is a leaf function and does not need a frame pointer. %rbp is also destroyed a few instructions later when guest registers are loaded. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409120440.1427215-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15KVM: SVM: Fix build error due to missing release_pages() includeBorislav Petkov
Fix: arch/x86/kvm/svm/sev.c: In function ‘sev_pin_memory’: arch/x86/kvm/svm/sev.c:360:3: error: implicit declaration of function ‘release_pages’;\ did you mean ‘reclaim_pages’? [-Werror=implicit-function-declaration] 360 | release_pages(pages, npinned); | ^~~~~~~~~~~~~ | reclaim_pages because svm.c includes pagemap.h but the carved out sev.c needs it too. Triggered by a randconfig build. Fixes: eaf78265a4ab ("KVM: SVM: Move SEV code to separate file") Signed-off-by: Borislav Petkov <bp@suse.de> Message-Id: <20200411160927.27954-1-bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARDUros Bizjak
svm_vcpu_run does not change stack or frame pointer anymore. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414113612.104501-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15kvm: nVMX: match comment with return type for nested_vmx_exit_reflectedOliver Upton
nested_vmx_exit_reflected() returns a bool, not int. As such, refer to the return values as true/false in the comment instead of 1/0. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200414221241.134103-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15kvm: nVMX: reflect MTF VM-exits if injected by L1Oliver Upton
According to SDM 26.6.2, it is possible to inject an MTF VM-exit via the VM-entry interruption-information field regardless of the 'monitor trap flag' VM-execution control. KVM appropriately copies the VM-entry interruption-information field from vmcs12 to vmcs02. However, if L1 has not set the 'monitor trap flag' VM-execution control, KVM fails to reflect the subsequent MTF VM-exit into L1. Fix this by consulting the VM-entry interruption-information field of vmcs12 to determine if L1 has injected the MTF VM-exit. If so, reflect the exit, regardless of the 'monitor trap flag' VM-execution control. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200414224746.240324-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14KVM: VMX: Enable machine check support for 32bit targetsUros Bizjak
There is no reason to limit the use of do_machine_check to 64bit targets. MCE handling works for both target familes. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Fixes: a0861c02a981 ("KVM: Add VT-x machine check support") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414071414.45636-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14KVM: SVM: move more vmentry code to assemblyPaolo Bonzini
Manipulate IF around vmload/vmsave to remove the confusing usage of local_irq_enable where interrupts are actually disabled via GIF. And stuff the RSB immediately without waiting for a RET to avoid Spectre-v2 attacks. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14KVM: SVM: fix compilation with modular PSP and non-modular KVMPaolo Bonzini
Use svm_sev_enabled() in order to cull all calls to PSP code. Otherwise, compilation fails with undefined symbols if the PSP device driver is compiled as a module and KVM is not. Reported-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-11KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guestXiaoyao Li
Two types of #AC can be generated in Intel CPUs: 1. legacy alignment check #AC 2. split lock #AC Reflect #AC back into the guest if the guest has legacy alignment checks enabled or if split lock detection is disabled. If the #AC is not a legacy one and split lock detection is enabled, then invoke handle_guest_split_lock() which will either warn and disable split lock detection for this task or force SIGBUS on it. [ tglx: Switch it to handle_guest_split_lock() and rename the misnamed helper function. ] Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115517.176308876@linutronix.de
2020-04-11KVM: x86: Emulate split-lock access as a write in emulatorXiaoyao Li
Emulate split-lock accesses as writes if split lock detection is on to avoid #AC during emulation, which will result in a panic(). This should never occur for a well-behaved guest, but a malicious guest can manipulate the TLB to trigger emulation of a locked instruction[1]. More discussion can be found at [2][3]. [1] https://lkml.kernel.org/r/8c5b11c9-58df-38e7-a514-dc12d687b198@redhat.com [2] https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com [3] https://lkml.kernel.org/r/20200227001117.GX9940@linux.intel.com Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115517.084300242@linutronix.de
2020-04-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more kvm updates from Paolo Bonzini: "s390: - nested virtualization fixes x86: - split svm.c - miscellaneous fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: fix crash cleanup when KVM wasn't used KVM: X86: Filter out the broadcast dest for IPI fastpath KVM: s390: vsie: Fix possible race when shadowing region 3 tables KVM: s390: vsie: Fix delivery of addressing exceptions KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks KVM: nVMX: don't clear mtf_pending when nested events are blocked KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter KVM: SVM: Split svm_vcpu_run inline assembly to separate file KVM: SVM: Move SEV code to separate file KVM: SVM: Move AVIC code to separate file KVM: SVM: Move Nested SVM Implementation to nested.c kVM SVM: Move SVM related files to own sub-directory
2020-04-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - Some bug fixes - The new vdpa subsystem with two first drivers * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM" vdpa: move to drivers/vdpa virtio: Intel IFC VF driver for VDPA vdpasim: vDPA device simulator vhost: introduce vDPA-based backend virtio: introduce a vDPA based transport vDPA: introduce vDPA bus vringh: IOTLB support vhost: factor out IOTLB vhost: allow per device message handler vhost: refine vhost and vringh kconfig virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM virtio-net: Introduce hash report feature virtio-net: Introduce RSS receive steering feature virtio-net: Introduce extended RSC feature tools/virtio: option to build an out of tree module
2020-04-07KVM: VMX: fix crash cleanup when KVM wasn't usedVitaly Kuznetsov
If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: 31603d4fc2bb ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07KVM: X86: Filter out the broadcast dest for IPI fastpathWanpeng Li
Except destination shorthand, a destination value 0xffffffff is used to broadcast interrupts, let's also filter out this for single target IPI fastpath. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1585815626-28370-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07KVM: nVMX: don't clear mtf_pending when nested events are blockedOliver Upton
If nested events are blocked, don't clear the mtf_pending flag to avoid missing later delivery of the MTF VM-exit. Fixes: 5ef8acbdd687c ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200406201237.178725-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-07KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenterUros Bizjak
The exception trampoline in .fixup section is not needed, the exception handling code can jump directly to the label in the .text section. Changes since v1: - Fix commit message. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200406202108.74300-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-03KVM: SVM: Split svm_vcpu_run inline assembly to separate fileUros Bizjak
The compiler (GCC) does not like the situation, where there is inline assembly block that clobbers all available machine registers in the middle of the function. This situation can be found in function svm_vcpu_run in file kvm/svm.c and results in many register spills and fills to/from stack frame. This patch fixes the issue with the same approach as was done for VMX some time ago. The big inline assembly is moved to a separate assembly .S file, taking into account all ABI requirements. There are two main benefits of the above approach: * elimination of several register spills and fills to/from stack frame, and consequently smaller function .text size. The binary size of svm_vcpu_run is lowered from 2019 to 1626 bytes. * more efficient access to a register save array. Currently, register save array is accessed as: 7b00: 48 8b 98 28 02 00 00 mov 0x228(%rax),%rbx 7b07: 48 8b 88 18 02 00 00 mov 0x218(%rax),%rcx 7b0e: 48 8b 90 20 02 00 00 mov 0x220(%rax),%rdx and passing ia pointer to a register array as an argument to a function one gets: 12: 48 8b 48 08 mov 0x8(%rax),%rcx 16: 48 8b 50 10 mov 0x10(%rax),%rdx 1a: 48 8b 58 18 mov 0x18(%rax),%rbx As a result, the total size, considering that the new function size is 229 bytes, gets lowered by 164 bytes. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-03KVM: SVM: Move SEV code to separate fileJoerg Roedel
Move the SEV specific parts of svm.c into the new sev.c file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-5-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-03KVM: SVM: Move AVIC code to separate fileJoerg Roedel
Move the AVIC related functions from svm.c to the new avic.c file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-4-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-03KVM: SVM: Move Nested SVM Implementation to nested.cJoerg Roedel
Split out the code for the nested SVM implementation and move it to a separate file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-3-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-03kVM SVM: Move SVM related files to own sub-directoryJoerg Roedel
Move svm.c and pmu_amd.c into their own arch/x86/kvm/svm/ subdirectory. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-2-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...
2020-04-02x86/kvm: fix a missing-prototypes "vmread_error"Qian Cai
The commit 842f4be95899 ("KVM: VMX: Add a trampoline to fix VMREAD error handling") removed the declaration of vmread_error() causes a W=1 build failure with KVM_WERROR=y. Fix it by adding it back. arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes] asmlinkage void vmread_error(unsigned long field, bool fault) ^~~~~~~~~~~~ Signed-off-by: Qian Cai <cai@lca.pw> Message-Id: <20200402153955.1695-1-cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-01vhost: refine vhost and vringh kconfigJason Wang
Currently, CONFIG_VHOST depends on CONFIG_VIRTUALIZATION. But vhost is not necessarily for VM since it's a generic userspace and kernel communication protocol. Such dependency may prevent archs without virtualization support from using vhost. To solve this, a dedicated vhost menu is created under drivers so CONIFG_VHOST can be decoupled out of CONFIG_VIRTUALIZATION. While at it, also squash Kconfig.vringh into vhost Kconfig file. This avoids the trick of conditional inclusion from VOP or CAIF. Then it will be easier to introduce new vringh users and common dependency for both vringh and vhost. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200326140125.19794-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>