summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/lapic.h
AgeCommit message (Collapse)Author
2018-10-17KVM: hyperv: define VP assist page helpersLadi Prosek
The state related to the VP assist page is still managed by the LAPIC code in the pv_eoi field. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-14kvm: vmx: Introduce lapic_mode enumerationJim Mattson
The local APIC can be in one of three modes: disabled, xAPIC or x2APIC. (A fourth mode, "invalid," is included for completeness.) Using the new enumeration can make some of the APIC mode logic easier to read. In kvm_set_apic_base, for instance, it is clear that one cannot transition directly from x2APIC mode to xAPIC mode or directly from APIC disabled to x2APIC mode. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> [Check invalid bits even if msr_info->host_initiated. Reported by Wanpeng Li. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-28x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGELadi Prosek
The assist page has been used only for the paravirtual EOI so far, hence the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's called "Virtual VP Assist MSR". Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-16KVM: x86: Change __kvm_apic_update_irr() to also return if max IRR updatedLiran Alon
This commit doesn't change semantics. It is done as a preparation for future commits. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07KVM: hyperv: support HV_X64_MSR_TSC_FREQUENCY and HV_X64_MSR_APIC_FREQUENCYLadi Prosek
It has been experimentally confirmed that supporting these two MSRs is one of the necessary conditions for nested Hyper-V to use the TSC page. Modern Windows guests are noticeably slower when they fall back to reading timestamps from the HV_X64_MSR_TIME_REF_COUNT MSR instead of using the TSC page. The newly supported MSRs are advertised with the AccessFrequencyRegs partition privilege flag and CPUID.40000003H:EDX[8] "Support for determining timer frequencies is available" (both outside of the scope of this KVM patch). Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-29KVM: lapic: reorganize restart_apic_timerPaolo Bonzini
Move the code to cancel the hv timer into the caller, just before it starts the hrtimer. Check availability of the hv timer in start_hv_timer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-15KVM: x86: preparatory changes for APICv cleanupsPaolo Bonzini
Add return value to __kvm_apic_update_irr/kvm_apic_update_irr. Move vmx_sync_pir_to_irr around. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-17Merge branch 'x86/cpufeature' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next For AVX512_VPOPCNTDQ.
2017-01-12KVM: x86: flush pending lapic jump label updates on module unloadDavid Matlack
KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled). These are implemented with delayed_work structs which can still be pending when the KVM module is unloaded. We've seen this cause kernel panics when the kvm_intel module is quickly reloaded. Use the new static_key_deferred_flush() API to flush pending updates on module unload. Signed-off-by: David Matlack <dmatlack@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09KVM: vmx: speed up TPR below threshold vmexitsPaolo Bonzini
Since we're already in VCPU context, all we have to do here is recompute the PPR value. That will in turn generate a KVM_REQ_EVENT if necessary. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-01-09KVM: x86: replace kvm_apic_id with kvm_{x,x2}apic_idRadim Krčmář
There were three calls sites: - recalculate_apic_map and kvm_apic_match_physical_addr, where it would only complicate implementation of x2APIC hotplug; - in apic_debug, where it was still somewhat preserved, but keeping the old function just for apic_debug was not worth it Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer supportWanpeng Li
Most windows guests still utilize APIC Timer periodic/oneshot mode instead of tsc-deadline mode, and the APIC Timer periodic/oneshot mode are still emulated by high overhead hrtimer on host. This patch converts the expected expire time of the periodic/oneshot mode to guest deadline tsc in order to leverage VMX preemption timer logic for APIC Timer tsc-deadline mode. After each preemption timer vmexit preemption timer is restarted to emulate LVTT current-count register is automatically reloaded from the initial-count register when the count reaches 0. This patch reduces ~5600 cycles for each APIC Timer periodic mode operation virtualization. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Squashed with my fixes that were reviewed-by Paolo.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-02KVM: LAPIC: introduce kvm_get_lapic_target_expiration_tsc()Wanpeng Li
Introdce kvm_get_lapic_target_expiration_tsc() to get APIC Timer target deadline tsc. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-07-14KVM: x86: use hardware-compatible format for APIC ID registerRadim Krčmář
We currently always shift APIC ID as if APIC was in xAPIC mode. x2APIC mode wants to use more bits and storing a hardware-compabible value is the the sanest option. KVM API to set the lapic expects that bottom 8 bits of APIC ID are in top 8 bits of APIC_ID register, so the register needs to be shifted in x2APIC mode. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14KVM: x86: dynamic kvm_apic_mapRadim Krčmář
x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will probably has fewer VCPUs. Dynamic size saves memory at the cost of turning one constant into a variable. apic_map mutex had to be moved before allocation to avoid races with cpu hotplug. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16KVM: x86: support using the vmx preemption timer for tsc deadline timerYunhong Jiang
The VMX preemption timer can be used to virtualize the TSC deadline timer. The VMX preemption timer is armed when the vCPU is running, and a VMExit will happen if the virtual TSC deadline timer expires. When the vCPU thread is blocked because of HLT, KVM will switch to use an hrtimer, and then go back to the VMX preemption timer when the vCPU thread is unblocked. This solution avoids the complex OS's hrtimer system, and the host timer interrupt handling cost, replacing them with a little math (for guest->host TSC and host TSC->preemption timer conversion) and a cheaper VMexit. This benefits latency for isolated pCPUs. [A word about performance... Yunhong reported a 30% reduction in average latency from cyclictest. I made a similar test with tscdeadline_latency from kvm-unit-tests, and measured - ~20 clock cycles loss (out of ~3200, so less than 1% but still statistically significant) in the worst case where the test halts just after programming the TSC deadline timer - ~800 clock cycles gain (25% reduction in latency) in the best case where the test busy waits. I removed the VMX bits from Yunhong's patch, to concentrate them in the next patch - Paolo] Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-18svm: Add VMEXIT handlers for AVICSuravee Suthikulpanit
This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception() and avic_unaccelerated_access_interception() along with two trace points (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-18KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_regSuravee Suthikulpanit
Rename kvm_apic_get_reg to kvm_lapic_get_reg to be consistent with the existing kvm_lapic_set_reg counterpart. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-18KVM: x86: Misc LAPIC changes to expose helper functionsSuravee Suthikulpanit
Exporting LAPIC utility functions and macros for re-use in SVM code. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03kvm: x86: Convert ioapic->rtc_status.dest_map to a structJoerg Roedel
Currently this is a bitmap which tracks which CPUs we expect an EOI from. Move this bitmap to a struct so that we can track additional information there. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: x86: consolidate different ways to test for in-kernel LAPICPaolo Bonzini
Different pieces of code checked for vcpu->arch.apic being (non-)NULL, or used kvm_vcpu_has_lapic (more optimized) or lapic_in_kernel. Replace everything with lapic_in_kernel's name and kvm_vcpu_has_lapic's implementation. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: x86: Use vector-hashing to deliver lowest-priority interruptsFeng Wu
Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25kvm/x86: Hyper-V synthetic interrupt controllerAndrey Smetanin
SynIC (synthetic interrupt controller) is a lapic extension, which is controlled via MSRs and maintains for each vCPU - 16 synthetic interrupt "lines" (SINT's); each can be configured to trigger a specific interrupt vector optionally with auto-EOI semantics - a message page in the guest memory with 16 256-byte per-SINT message slots - an event flag page in the guest memory with 16 2048-bit per-SINT event flag areas The host triggers a SINT whenever it delivers a new message to the corresponding slot or flips an event flag bit in the corresponding area. The guest informs the host that it can try delivering a message by explicitly asserting EOI in lapic or writing to End-Of-Message (EOM) MSR. The userspace (qemu) triggers interrupts and receives EOM notifications via irqfd with resampler; for that, a GSI is allocated for each configured SINT, and irq_routing api is extended to support GSI-SINT mapping. Changes v4: * added activation of SynIC by vcpu KVM_ENABLE_CAP * added per SynIC active flag * added deactivation of APICv upon SynIC activation Changes v3: * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into docs Changes v2: * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors * add Hyper-V SynIC vectors into EOI exit bitmap * Hyper-V SyniIC SINT msr write logic simplified Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25kvm/x86: per-vcpu apicv deactivation supportAndrey Smetanin
The decision on whether to use hardware APIC virtualization used to be taken globally, based on the availability of the feature in the CPU and the value of a module parameter. However, under certain circumstances we want to control it on per-vcpu basis. In particular, when the userspace activates HyperV synthetic interrupt controller (SynIC), APICv has to be disabled as it's incompatible with SynIC auto-EOI behavior. To achieve that, introduce 'apicv_active' flag on struct kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv off. The flag is initialized based on the module parameter and CPU capability, and consulted whenever an APICv-specific action is performed. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> CC: qemu-devel@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01KVM: Define a new interface kvm_intr_is_single_vcpu()Feng Wu
This patch defines a new interface kvm_intr_is_single_vcpu(), which can returns whether the interrupt is for single-CPU or not. It is used by VT-d PI, since now we only support single-CPU interrupts, For lowest-priority interrupts, if user configures it via /proc/irq or uses irqbalance to make it single-CPU, we can use PI to deliver the interrupts to it. Full functionality of lowest-priority support will be added later. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01KVM: x86: replace vm_has_apicv hook with cpu_uses_apicvPaolo Bonzini
This will avoid an unnecessary trip to ->kvm and from there to the VPIC. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01KVM: x86: set TMR when the interrupt is acceptedPaolo Bonzini
Do not compute TMR in advance. Instead, set the TMR just before the interrupt is accepted into the IRR. This limits the coupling between IOAPIC and LAPIC. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-23kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c fileAndrey Smetanin
This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-03KVM: x86: keep track of LVT0 changes under APICvRadim Krčmář
Memory-mapped LVT0 register already contains the new value when APICv traps so we can't directly detect a change. Memorize a bit we are interested in to enable legacy NMI watchdog. Suggested-by: Yoshida Nobuo <yoshida.nb@ncos.nec.co.jp> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04KVM: x86: API changes for SMM supportPaolo Bonzini
This patch includes changes to the external API for SMM support. Userspace can predicate the availability of the new fields and ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end of the patch series. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-04kvm: x86: fix kvm_apic_has_events to check for NULL pointerPaolo Bonzini
Malicious (or egregiously buggy) userspace can trigger it, but it should never happen in normal operation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07kvm: x86: Deliver MSI IRQ to only lowest prio cpu if msi_redir_hint is trueJames Sullivan
An MSI interrupt should only be delivered to the lowest priority CPU when it has RH=1, regardless of the delivery mode. Modified kvm_is_dm_lowest_prio() to check for either irq->delivery_mode == APIC_DM_LOWPRI or irq->msi_redir_hint. Moved kvm_is_dm_lowest_prio() into lapic.h and renamed to kvm_lowest_prio_delivery(). Changed a check in kvm_irq_delivery_to_apic_fast() from irq->delivery_mode == APIC_DM_LOWPRI to kvm_is_dm_lowest_prio(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-07KVM: x86: INIT and reset sequences are differentNadav Amit
x86 architecture defines differences between the reset and INIT sequences. INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP. References (from Intel SDM): "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions] [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT] "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed." [9.2: X87 FPU INITIALIZATION] "The state of the local APIC following an INIT reset is the same as it is after a power-up or hardware reset, except that the APIC ID and arbitration ID registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset ("Wait-for-SIPI" State)] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-08KVM: x86: simplify kvm_apic_mapRadim Krčmář
recalculate_apic_map() uses two passes over all VCPUs. This is a relic from time when we selected a global mode in the first pass and set up the optimized table in the second pass (to have a consistent mode). Recent changes made mixed mode unoptimized and we can do it in one pass. Format of logical MDA is a function of the mode, so we encode it in apic_logical_id() and drop obsoleted variables from the struct. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-5-git-send-email-rkrcmar@redhat.com> [Add lid_bits temporary in apic_logical_id. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-26KVM: move iodev.h from virt/kvm/ to include/kvmAndre Przywara
iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-02-03KVM: nVMX: Enable nested posted interrupt processingWincy Van
If vcpu has a interrupt in vmx non-root mode, injecting that interrupt requires a vmexit. With posted interrupt processing, the vmexit is not needed, and interrupts are fully taken care of by hardware. In nested vmx, this feature avoids much more vmexits than non-nested vmx. When L1 asks L0 to deliver L1's posted interrupt vector, and the target VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts if a concurrent vmexit happens and L1's vector is different with L0's. The IPI triggers posted interrupt processing in the target physical CPU. In case the target vCPU was not in guest mode, complete the posted interrupt delivery on the next entry to L2. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: x86: return bool from kvm_apic_match*()Radim Krčmář
And don't export the internal ones while at it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08KVM: x86: add option to advance tscdeadline hrtimer expirationMarcelo Tosatti
For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-04KVM: x86: allow 256 logical x2APICs againRadim Krčmář
While fixing an x2apic bug, 17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) we've made only one cluster available. This means that the amount of logically addressible x2APICs was reduced to 16 and VCPUs kept overwriting themselves in that region, so even the first cluster wasn't set up correctly. This patch extends x2APIC support back to the logical_map's limit, and keeps the CVE fixed as messages for non-present APICs are dropped. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03KVM: x86: optimize some accesses to LVTT and SPIVRadim Krčmář
We mirror a subset of these registers in separate variables. Using them directly should be faster. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03KVM: x86: detect LVTT changes under APICvRadim Krčmář
APIC-write VM exits are "trap-like": they save CS:RIP values for the instruction after the write, and more importantly, the handler will already see the new value in the virtual-APIC page. This means that apic_reg_write cannot use kvm_apic_get_reg to omit timer cancelation when mode changes. timer_mode_mask shouldn't be changing as it depends on cpuid. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03KVM: x86: detect SPIV changes under APICvRadim Krčmář
APIC-write VM exits are "trap-like": they save CS:RIP values for the instruction after the write, and more importantly, the handler will already see the new value in the virtual-APIC page. This caused a bug if you used KVM_SET_IRQCHIP to set the SW-enabled bit in the SPIV register. The chain of events is as follows: * When the irqchip is added to the destination VM, the apic_sw_disabled static key is incremented (1) * When the KVM_SET_IRQCHIP ioctl is invoked, it is decremented (0) * When the guest disables the bit in the SPIV register, e.g. as part of shutdown, apic_set_spiv does not notice the change and the static key is _not_ incremented. * When the guest is destroyed, the static key is decremented (-1), resulting in this trace: WARNING: at kernel/jump_label.c:81 __static_key_slow_dec+0xa6/0xb0() jump label: negative count! [<ffffffff816bf898>] dump_stack+0x19/0x1b [<ffffffff8107c6f1>] warn_slowpath_common+0x61/0x80 [<ffffffff8107c76c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff811931e6>] __static_key_slow_dec+0xa6/0xb0 [<ffffffff81193226>] static_key_slow_dec_deferred+0x16/0x20 [<ffffffffa0637698>] kvm_free_lapic+0x88/0xa0 [kvm] [<ffffffffa061c63e>] kvm_arch_vcpu_uninit+0x2e/0xe0 [kvm] [<ffffffffa05ff301>] kvm_vcpu_uninit+0x21/0x40 [kvm] [<ffffffffa067cec7>] vmx_free_vcpu+0x47/0x70 [kvm_intel] [<ffffffffa061bc50>] kvm_arch_vcpu_free+0x50/0x60 [kvm] [<ffffffffa061ca22>] kvm_arch_destroy_vm+0x102/0x260 [kvm] [<ffffffff810b68fd>] ? synchronize_srcu+0x1d/0x20 [<ffffffffa06030d1>] kvm_put_kvm+0xe1/0x1c0 [kvm] [<ffffffffa06036f8>] kvm_vcpu_release+0x18/0x20 [kvm] [<ffffffff81215c62>] __fput+0x102/0x310 [<ffffffff81215f4e>] ____fput+0xe/0x10 [<ffffffff810ab664>] task_work_run+0xb4/0xe0 [<ffffffff81083944>] do_exit+0x304/0xc60 [<ffffffff816c8dfc>] ? _raw_spin_unlock_irq+0x2c/0x50 [<ffffffff810fd22d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [<ffffffff8108432c>] do_group_exit+0x4c/0xc0 [<ffffffff810843b4>] SyS_exit_group+0x14/0x20 [<ffffffff816d33a9>] system_call_fastpath+0x16/0x1b Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03KVM: x86: some apic broadcast modes does not workNadav Amit
KVM does not deliver x2APIC broadcast messages with physical mode. Intel SDM (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of FFFF_FFFFH is used for broadcast of interrupts in both logical destination and physical destination modes." In addition, the local-apic enables cluster mode broadcast. As Intel SDM 10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all destination bits to one." This patch enables cluster mode broadcast. The fix tries to combine broadcast in different modes through a unified code. One rare case occurs when the source of IPI has its APIC disabled. In such case, the source can still issue IPIs, but since the source is not obliged to have the same LAPIC mode as the enabled ones, we cannot rely on it. Since it is a rare case, it is unoptimized and done on the slow-path. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> [As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from kvm_apic_broadcast. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-27KVM: x86: Validate guest writes to MSR_IA32_APICBASEJan Kiszka
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)Andy Honig
In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa59d6 ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-04-16KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang
Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: Set TMR when programming ioapic entryYang Zhang
We already know the trigger mode of a given interrupt when programming the ioapice entry. So it's not necessary to set it in each interrupt delivery. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-15KVM: Add reset/restore rtc_status supportYang Zhang
restore rtc_status from migration or save/restore Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-15KVM: Return destination vcpu on interrupt injectionYang Zhang
Add a new parameter to know vcpus who received the interrupt. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>