summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-02-23KVM: nVMX: Refactor IO bitmap checks into helper functionOliver Upton
Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Don't emulate instructions in guest modePaolo Bonzini
vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23KVM: fix error handling in svm_hardware_setupLi RongQing
rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-22Merge tag 'ras-urgent-2020-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixes for the AMD MCE driver: - Populate the per CPU MCA bank descriptor pointer only after it has been completely set up to prevent a use-after-free in case that one of the subsequent initialization step fails - Implement a proper release function for the sysfs entries of MCA threshold controls instead of freeing the memory right in the CPU teardown code, which leads to another use-after-free when the associated sysfs file is opened and accessed" * tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Fix kobject lifetime x86/mce/amd: Publish the bank pointer only after setup has succeeded
2020-02-22Merge tag 'x86-urgent-2020-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Remove the __force_oder definiton from the kaslr boot code as it is already defined in the page table code which makes GCC 10 builds fail because it changed the default to -fno-common. - Address the AMD erratum 1054 concerning the IRPERF capability and enable the Instructions Retired fixed counter on machines which are not affected by the erratum" * tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF x86/boot/compressed: Don't declare __force_order in kaslr_64.c
2020-02-22efi/libstub/x86: Avoid overflowing code32_start on PE entryArd Biesheuvel
When using the native PE entry point (as opposed to the EFI handover protocol entry point that is used more widely), we set code32_start, which is a 32-bit wide field, to the effective symbol address of startup_32, which could overflow given that the EFI loader may have located the running image anywhere in memory, and we haven't reached the point yet where we relocate ourselves. Since we relocate ourselves if code32_start != pref_address, this isn't likely to lead to problems in practice, given how unlikely it is that the truncated effective address of startup_32 happens to equal pref_address. But it is better to defer the assignment of code32_start to after the relocation, when it is guaranteed to fit. While at it, move the call to efi_relocate_kernel() to an earlier stage so it is more likely that our preferred offset in memory has not been occupied by other memory allocations done in the mean time. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22efi/libstub/x86: Remove pointless zeroing of apm_bios_infoArd Biesheuvel
We have some code in the EFI stub entry point that takes the address of the apm_bios_info struct in the newly allocated and zeroed out boot_params structure, only to zero it out again. This is pointless so remove it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22efi/x86: Mark setup_graphics staticArvind Sankar
This function is only called from efi_main in the same source file. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200130222004.1932152-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22x86/boot: Micro-optimize GDT loading instructionsArvind Sankar
Rearrange the instructions a bit to use a 32-bit displacement once instead of 2/3 times. This saves 8 bytes of machine code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-8-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22x86/boot: GDT limit value should be size - 1Arvind Sankar
The limit value for the GDTR should be such that adding it to the base address gives the address of the last byte of the GDT, i.e. it should be one less than the size, not the size. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-7-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22efi/x86: Remove GDT setup from efi_mainArvind Sankar
The 64-bit kernel will already load a GDT in startup_64, which is the next function to execute after return from efi_main. Add GDT setup code to the 32-bit kernel's startup_32 as well. Doing it in the head code has the advantage that we can avoid potentially corrupting the GDT during copy/decompression. This also removes dependence on having a specific GDT layout setup by the bootloader. Both startup_32 and startup_64 now clear interrupts on entry, so we can remove that from efi_main as well. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-6-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22x86/boot: Clear direction and interrupt flags in startup_64Arvind Sankar
startup_32 already clears these flags on entry, do it in startup_64 as well for consistency. The direction flag in particular is not specified to be cleared in the boot protocol documentation, and we currently call into C code (paging_prepare) without explicitly clearing it. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-5-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22x86/boot: Reload GDTR after copying to the end of the bufferArvind Sankar
The GDT may get overwritten during the copy or during extract_kernel, which will cause problems if any segment register is touched before the GDTR is reloaded by the decompressed kernel. For safety update the GDTR to point to the GDT within the copied kernel. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-4-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22efi/x86: Don't depend on firmware GDT layoutArvind Sankar
When booting in mixed mode, the firmware's GDT is still installed at handover entry in efi32_stub_entry. We save the GDTR for later use in __efi64_thunk but we are assuming that descriptor 2 (__KERNEL_CS) is a valid 32-bit code segment descriptor and that descriptor 3 (__KERNEL_DS/__BOOT_DS) is a valid data segment descriptor. This happens to be true for OVMF (it actually uses descriptor 1 for data segments, but descriptor 3 is also setup as data), but we shouldn't depend on this being the case. Fix this by saving the code and data selectors in addition to the GDTR in efi32_stub_entry, and restoring them in __efi64_thunk before calling the firmware. The UEFI specification guarantees that selectors will be flat, so using the DS selector for all the segment registers should be enough. We also need to install our own GDT before initializing segment registers in startup_32, so move the GDT load up to the beginning of the function. [ardb: mention mixed mode in the commit log] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-3-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-22x86/boot: Remove KEEP_SEGMENTS supportArvind Sankar
Commit a24e785111a3 ("i386: paravirt boot sequence") added this flag for use by paravirtualized environments such as Xen. However, Xen never made use of this flag [1], and it was only ever used by lguest [2]. Commit ecda85e70277 ("x86/lguest: Remove lguest support") removed lguest, so KEEP_SEGMENTS has lost its last user. [1] https://lore.kernel.org/lkml/4D4B097C.5050405@goop.org [2] https://www.mail-archive.com/lguest@lists.ozlabs.org/msg00469.html Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200202171353.3736319-2-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-21Merge tag 'for-linus-5.6-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two small fixes for Xen: - a fix to avoid warnings with new gcc - a fix for incorrectly disabled interrupts when calling _cond_resched()" * tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Enable interrupts when calling _cond_resched() x86/xen: Distribute switch variables for initialization
2020-02-21KVM: SVM: Fix potential memory leak in svm_cpu_init()Miaohe Lin
When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: apic: avoid calculating pending eoi from an uninitialized valMiaohe Lin
When pv_eoi_get_user() fails, 'val' may remain uninitialized and the return value of pv_eoi_get_pending() becomes random. Fix the issue by initializing the variable. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when ↵Vitaly Kuznetsov
apicv is globally disabled When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*), nothing happens to VMX MSRs on the already existing vCPUs, however, all new ones are created with PIN_BASED_POSTED_INTR filtered out. This is very confusing and results in the following picture inside the guest: $ rdmsr -ax 0x48d ff00000016 7f00000016 7f00000016 7f00000016 This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three. L1 hypervisor may only check CPU0's controls to find out what features are available and it will be very confused later. Switch to setting PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov
Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabledSuravee Suthikulpanit
Launching VM w/ AVIC disabled together with pass-through device results in NULL pointer dereference bug with the following call trace. RIP: 0010:svm_refresh_apicv_exec_ctrl+0x17e/0x1a0 [kvm_amd] Call Trace: kvm_vcpu_update_apicv+0x44/0x60 [kvm] kvm_arch_vcpu_ioctl_run+0x3f4/0x1c80 [kvm] kvm_vcpu_ioctl+0x3d8/0x650 [kvm] do_vfs_ioctl+0xaa/0x660 ? tomoyo_file_ioctl+0x19/0x20 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x57/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation shows that this is due to the uninitialized usage of struct vapu_svm.ir_list in the svm_set_pi_irte_mode(), which is called from svm_refresh_apicv_exec_ctrl(). The ir_list is initialized only if AVIC is enabled. So, fixes by adding a check if AVIC is enabled in the svm_refresh_apicv_exec_ctrl(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Fixes: 8937d762396d ("kvm: x86: svm: Add support to (de)activate posted interrupts.") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSEXiaoyao Li
Commit 159348784ff0 ("x86/vmx: Introduce VMX_FEATURES_*") missed bit 26 (enable user wait and pause) of Secondary Processor-based VM-Execution Controls. Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo, and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them uniform. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadowwanpeng li
For the duration of mapping eVMCS, it derefences ->memslots without holding ->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU is already taken. It can be reproduced by running kvm's evmcs_test selftest. ============================= warning: suspicious rcu usage 5.6.0-rc1+ #53 tainted: g w ioe ----------------------------- ./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by evmcs_test/8507: #0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680 [kvm] stack backtrace: cpu: 6 pid: 8507 comm: evmcs_test tainted: g w ioe 5.6.0-rc1+ #53 hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016 call trace: dump_stack+0x68/0x9b kvm_read_guest_cached+0x11d/0x150 [kvm] kvm_hv_get_assist_page+0x33/0x40 [kvm] nested_enlightened_vmentry+0x2c/0x60 [kvm_intel] nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel] nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel] vmx_vcpu_run+0x67a/0xe60 [kvm_intel] vcpu_enter_guest+0x35e/0x1bc0 [kvm] kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm] kvm_vcpu_ioctl+0x370/0x680 [kvm] ksys_ioctl+0x235/0x850 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x77/0x780 entry_syscall_64_after_hwframe+0x49/0xbe Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOIMiaohe Lin
Commit 13db77347db1 ("KVM: x86: don't notify userspace IOAPIC on edge EOI") said, edge-triggered interrupts don't set a bit in TMR, which means that IOAPIC isn't notified on EOI. And var level indicates level-triggered interrupt. But commit 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") replace var level with irq.level by mistake. Fix it by changing irq.level to irq.trig_mode. Cc: stable@vger.kernel.org Fixes: 3159d36ad799 ("KVM: x86: use generic function for MSI parsing") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA fixes from Mimi Zohar: "Two bug fixes and an associated change for each. The one that adds SM3 to the IMA list of supported hash algorithms is a simple change, but could be considered a new feature" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: add sm3 algorithm to hash algorithm configuration list crypto: rename sm3-256 to sm3 in hash_algo_name efi: Only print errors about failing to get certs if EFI vars are found x86/ima: use correct identifier for SetupMode variable
2020-02-20kvm/emulate: fix a -Werror=cast-function-typeQian Cai
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn': arch/x86/kvm/emulate.c:5686:22: error: cast between incompatible function types from 'int (*)(struct x86_emulate_ctxt *)' to 'void (*)(struct fastop *)' [-Werror=cast-function-type] rc = fastop(ctxt, (fastop_t)ctxt->execute); Fix it by using an unnamed union of a (*execute) function pointer and a (*fastop) function pointer. Fixes: 3009afc6e39e ("KVM: x86: Use a typedef for fastop functions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20KVM: x86: fix incorrect comparison in trace eventPaolo Bonzini
The "u" field in the event has three states, -1/0/1. Using u8 however means that comparison with -1 will always fail, so change to signed char. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20x86/xen: Distribute switch variables for initializationKees Cook
Variables declared in a switch statement before any case statements cannot be automatically initialized with compiler instrumentation (as they are not part of any execution flow). With GCC's proposed automatic stack variable initialization feature, this triggers a warning (and they don't get initialized). Clang's automatic stack variable initialization (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also doesn't initialize such variables[1]. Note that these warnings (or silent skipping) happen before the dead-store elimination optimization phase, so even when the automatic initializations are later elided in favor of direct initializations, the warnings remain. To avoid these problems, move such variables into the "case" where they're used or lift them up into the main function body. arch/x86/xen/enlighten_pv.c: In function ‘xen_write_msr_safe’: arch/x86/xen/enlighten_pv.c:904:12: warning: statement will never be executed [-Wswitch-unreachable] 904 | unsigned which; | ^~~~~ [1] https://bugs.llvm.org/show_bug.cgi?id=44916 Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200220062318.69299-1-keescook@chromium.org Reviewed-by: Juergen Gross <jgross@suse.com> [boris: made @which an 'unsigned int'] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-02-19x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERFKim Phillips
Commit aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter") added support for access to the free-running counter via 'perf -e msr/irperf/', but when exercised, it always returns a 0 count: BEFORE: $ perf stat -e instructions,msr/irperf/ true Performance counter stats for 'true': 624,833 instructions 0 msr/irperf/ Simply set its enable bit - HWCR bit 30 - to make it start counting. Enablement is restricted to all machines advertising IRPERF capability, except those susceptible to an erratum that makes the IRPERF return bad values. That erratum occurs in Family 17h models 00-1fh [1], but not in F17h models 20h and above [2]. AFTER (on a family 17h model 31h machine): $ perf stat -e instructions,msr/irperf/ true Performance counter stats for 'true': 621,690 instructions 622,490 msr/irperf/ [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors [2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors The revision guides are available from the bugzilla Link below. [ bp: Massage commit message. ] Fixes: aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.com
2020-02-19x86/mce: Do not log spurious corrected mce errorsPrarit Bhargava
A user has reported that they are seeing spurious corrected errors on their hardware. Intel Errata HSD131, HSM142, HSW131, and BDM48 report that "spurious corrected errors may be logged in the IA32_MC0_STATUS register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits [15:0]) of 0x0005." The Errata PDFs are linked in the bugzilla below. Block these spurious errors from the console and logs. [ bp: Move the intel_filter_mce() header declarations into the already existing CONFIG_X86_MCE_INTEL ifdeffery. ] Co-developed-by: Alexander Krupp <centos@akr.yagii.de> Signed-off-by: Alexander Krupp <centos@akr.yagii.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206587 Link: https://lkml.kernel.org/r/20200219131611.36816-1-prarit@redhat.com
2020-02-19x86/boot/compressed: Don't declare __force_order in kaslr_64.cH.J. Lu
GCC 10 changed the default to -fno-common, which leads to LD arch/x86/boot/compressed/vmlinux ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order'; \ arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here make[2]: *** [arch/x86/boot/compressed/Makefile:119: arch/x86/boot/compressed/vmlinux] Error 1 Since __force_order is already provided in pgtable_64.c, there is no need to declare __force_order in kaslr_64.c. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200124181811.4780-1-hjl.tools@gmail.com
2020-02-17KVM: nVMX: Fix some obsolete comments and grammar errorMiaohe Lin
Fix wrong variable names and grammar error in comment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Bugfixes and improvements to selftests. On top of this, Mauro converted the KVM documentation to rst format, which was very welcome" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits) docs: virt: guest-halt-polling.txt convert to ReST docs: kvm: review-checklist.txt: rename to ReST docs: kvm: Convert timekeeping.txt to ReST format docs: kvm: Convert s390-diag.txt to ReST format docs: kvm: Convert ppc-pv.txt to ReST format docs: kvm: Convert nested-vmx.txt to ReST format docs: kvm: Convert mmu.txt to ReST format docs: kvm: Convert locking.txt to ReST format docs: kvm: Convert hypercalls.txt to ReST format docs: kvm: arm/psci.txt: convert to ReST docs: kvm: convert arm/hyp-abi.txt to ReST docs: kvm: Convert api.txt to ReST format docs: kvm: convert devices/xive.txt to ReST docs: kvm: convert devices/xics.txt to ReST docs: kvm: convert devices/vm.txt to ReST docs: kvm: convert devices/vfio.txt to ReST docs: kvm: convert devices/vcpu.txt to ReST docs: kvm: convert devices/s390_flic.txt to ReST docs: kvm: convert devices/mpic.txt to ReST docs: kvm: convert devices/arm-vgit.txt to ReST ...
2020-02-14x86/mce/amd: Fix kobject lifetimeThomas Gleixner
Accessing the MCA thresholding controls in sysfs concurrently with CPU hotplug can lead to a couple of KASAN-reported issues: BUG: KASAN: use-after-free in sysfs_file_ops+0x155/0x180 Read of size 8 at addr ffff888367578940 by task grep/4019 and BUG: KASAN: use-after-free in show_error_count+0x15c/0x180 Read of size 2 at addr ffff888368a05514 by task grep/4454 for example. Both result from the fact that the threshold block creation/teardown code frees the descriptor memory itself instead of defining proper ->release function and leaving it to the driver core to take care of that, after all sysfs accesses have completed. Do that and get rid of the custom freeing code, fixing the above UAFs in the process. [ bp: write commit message. ] Fixes: 95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200214082801.13836-1-bp@alien8.de
2020-02-13x86/mce/amd: Publish the bank pointer only after setup has succeededBorislav Petkov
threshold_create_bank() creates a bank descriptor per MCA error thresholding counter which can be controlled over sysfs. It publishes the pointer to that bank in a per-CPU variable and then goes on to create additional thresholding blocks if the bank has such. However, that creation of additional blocks in allocate_threshold_blocks() can fail, leading to a use-after-free through the per-CPU pointer. Therefore, publish that pointer only after all blocks have been setup successfully. Fixes: 019f34fccfd5 ("x86, MCE, AMD: Move shared bank to node descriptor") Reported-by: Saar Amar <Saar.Amar@microsoft.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200128140846.phctkvx5btiexvbx@kili.mountain
2020-02-13x86: platform: iosf_mbi: Call cpu_latency_qos_*() instead of pm_qos_*()Rafael J. Wysocki
Call cpu_latency_qos_add/update/remove_request() instead of pm_qos_add/update/remove_request(), respectively, because the latter are going to be dropped. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-12KVM: x86: enable -WerrorPaolo Bonzini
Avoid more embarrassing mistakes. At least those that the compiler can catch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: fix WARN_ON check of an unsigned less than zeroPaolo Bonzini
The check cpu->hv_clock.system_time < 0 is redundant since system_time is a u64 and hence can never be less than zero. But what was actually meant is to check that the result is positive, since kernel_ns and v->kvm->arch.kvmclock_offset are both s64. Reported-by: Colin King <colin.king@canonical.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Addresses-Coverity: ("Macro compares unsigned to 0") Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86/mmu: Fix struct guest_walker arrays for 5-level pagingSean Christopherson
Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow paging for 5-level guest page tables. PT_MAX_FULL_LEVELS is used to size the arrays that track guest pages table information, i.e. using a "max levels" of 4 causes KVM to access garbage beyond the end of an array when querying state for level 5 entries. E.g. FNAME(gpte_changed) will read garbage and most likely return %true for a level 5 entry, soft-hanging the guest because FNAME(fetch) will restart the guest instead of creating SPTEs because it thinks the guest PTE has changed. Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS gets to stay "4" for the PTTYPE_EPT case. Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: nVMX: Use correct root level for nested EPT shadow page tablesSean Christopherson
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU currently also hardcodes the page walk level for nested EPT to be 4 levels. The L2 guest is all but guaranteed to soft hang on its first instruction when L1 is using EPT, as KVM will construct 4-level page tables and then tell hardware to use 5-level page tables. Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: nVMX: Fix some comment typos and coding styleMiaohe Lin
Fix some typos in the comments. Also fix coding style. [Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable field in struct kvm_vcpu_arch.] Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86/mmu: Avoid retpoline on ->page_fault() with TDPSean Christopherson
Wrap calls to ->page_fault() with a small shim to directly invoke the TDP fault handler when the kernel is using retpolines and TDP is being used. Single out the TDP fault handler and annotate the TDP path as likely to coerce the compiler into preferring it over the indirect function call. Rename tdp_page_fault() to kvm_tdp_page_fault(), as it's exposed outside of mmu.c to allow inlining the shim. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: apic: reuse smp_wmb() in kvm_make_request()Miaohe Lin
kvm_make_request() provides smp_wmb() so pending_events changes are guaranteed to be visible. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: remove duplicated KVM_REQ_EVENT requestMiaohe Lin
The KVM_REQ_EVENT request is already made in kvm_set_rflags(). We should not make it again. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTSOliver Upton
KVM allows the deferral of exception payloads when a vCPU is in guest mode to allow the L1 hypervisor to intercept certain events (#PF, #DB) before register state has been modified. However, this behavior is incompatible with the KVM_{GET,SET}_VCPU_EVENTS ABI, as userspace expects register state to have been immediately modified. Userspace may opt-in for the payload deferral behavior with the KVM_CAP_EXCEPTION_PAYLOAD per-VM capability. As such, kvm_multiple_exception() will immediately manipulate guest registers if the capability hasn't been requested. Since the deferral is only necessary if a userspace ioctl were to be serviced at the same as a payload bearing exception is recognized, this behavior can be relaxed. Instead, opportunistically defer the payload from kvm_multiple_exception() and deliver the payload before completing a KVM_GET_VCPU_EVENTS ioctl. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: nVMX: Handle pending #DB when injecting INIT VM-exitOliver Upton
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will be populated if a VM-exit caused by an INIT signal takes priority over a debug-trap. Emulate this behavior when synthesizing an INIT signal VM-exit into L1. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: Mask off reserved bit from #DB exception payloadOliver Upton
KVM defines the #DB payload as compatible with the 'pending debug exceptions' field under VMX, not DR6. Mask off bit 12 when applying the payload to DR6, as it is reserved on DR6 but not the 'pending debug exceptions' field. Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12KVM: x86: do not reset microcode version on INIT or RESETPaolo Bonzini
Do not initialize the microcode version at RESET or INIT, only on vCPU creation. Microcode updates are not lost during INIT, and exact behavior across a warm RESET is not specified by the architecture. Since we do not support a microcode update directly from the hypervisor, but only as a result of userspace setting the microcode version MSR, it's simpler for userspace if we do nothing in KVM and let userspace emulate behavior for RESET as it sees fit. Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in the list of emulated MSRs. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-11x86/ima: use correct identifier for SetupMode variableArd Biesheuvel
The IMA arch code attempts to inspect the "SetupMode" EFI variable by populating a variable called efi_SetupMode_name with the string "SecureBoot" and passing that to the EFI GetVariable service, which obviously does not yield the expected result. Given that the string is only referenced a single time, let's get rid of the intermediate variable, and pass the correct string as an immediate argument. While at it, do the same for "SecureBoot". Fixes: 399574c64eaf ("x86/ima: retry detecting secure boot mode") Fixes: 980ef4d22a95 ("x86/ima: check EFI SetupMode too") Cc: Matthew Garrett <mjg59@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org # v5.3 Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>