summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-03-08efi/x86: Decompress at start of PE image load addressArvind Sankar
When booted via PE loader, define image_offset to hold the offset of startup_32() from the start of the PE image, and use it as the start of the decompression buffer. [ mingo: Fixed the grammar in the comments. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-3-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-17-ardb@kernel.org
2020-03-08x86/boot/compressed/32: Save the output address instead of recalculating itArvind Sankar
In preparation for being able to decompress into a buffer starting at a different address than startup_32, save the calculated output address instead of recalculating it later. We now keep track of three addresses: %edx: startup_32 as we were loaded by bootloader %ebx: new location of compressed kernel %ebp: start of decompression buffer Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-2-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-16-ardb@kernel.org
2020-03-08x86/boot: Use unsigned comparison for addressesArvind Sankar
The load address is compared with LOAD_PHYSICAL_ADDR using a signed comparison currently (using jge instruction). When loading a 64-bit kernel using the new efi32_pe_entry() point added by: 97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section") using Qemu with -m 3072, the firmware actually loads us above 2Gb, resulting in a very early crash. Use the JAE instruction to perform a unsigned comparison instead, as physical addresses should be considered unsigned. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org
2020-03-08efi/x86: Avoid using code32_startArvind Sankar
code32_start is meant for 16-bit real-mode bootloaders to inform the kernel where the 32-bit protected mode code starts. Nothing in the protected mode kernel except the EFI stub uses it. efi_main() currently returns boot_params, with code32_start set inside it to tell efi_stub_entry() where startup_32 is located. Since it was invoked by efi_stub_entry() in the first place, boot_params is already known. Return the address of startup_32 instead. This will allow a 64-bit kernel to live above 4Gb, for example, and it's cleaner as well. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-5-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-13-ardb@kernel.org
2020-03-08efi/x86: Make efi32_pe_entry() more readableArvind Sankar
Set up a proper frame pointer in efi32_pe_entry() so that it's easier to calculate offsets for arguments. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-4-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-12-ardb@kernel.org
2020-03-08efi/x86: Respect 32-bit ABI in efi32_pe_entry()Arvind Sankar
verify_cpu() clobbers BX and DI. In case we have to return error, we need to preserve them to respect the 32-bit calling convention. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-3-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-11-ardb@kernel.org
2020-03-08efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATAArvind Sankar
Use SYM_DATA*() macros to annotate this constant, and explicitly align it to 4-byte boundary. Use lower-case for hexadecimal data. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-2-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-10-ardb@kernel.org
2020-03-08Merge branch 'efi/urgent' into efi/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-08Merge tag 'efi-next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core More EFI updates for v5.7 - Incorporate a stable branch with the EFI pieces of Hans's work on loading device firmware from EFI boot service memory regions Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-03efi: Add embedded peripheral firmware supportHans de Goede
Just like with PCI options ROMs, which we save in the setup_efi_pci* functions from arch/x86/boot/compressed/eboot.c, the EFI code / ROM itself sometimes may contain data which is useful/necessary for peripheral drivers to have access to. Specifically the EFI code may contain an embedded copy of firmware which needs to be (re)loaded into the peripheral. Normally such firmware would be part of linux-firmware, but in some cases this is not feasible, for 2 reasons: 1) The firmware is customized for a specific use-case of the chipset / use with a specific hardware model, so we cannot have a single firmware file for the chipset. E.g. touchscreen controller firmwares are compiled specifically for the hardware model they are used with, as they are calibrated for a specific model digitizer. 2) Despite repeated attempts we have failed to get permission to redistribute the firmware. This is especially a problem with customized firmwares, these get created by the chip vendor for a specific ODM and the copyright may partially belong with the ODM, so the chip vendor cannot give a blanket permission to distribute these. This commit adds support for finding peripheral firmware embedded in the EFI code and makes the found firmware available through the new efi_get_embedded_fw() function. Support for loading these firmwares through the standard firmware loading mechanism is added in a follow-up commit in this patch-series. Note we check the EFI_BOOT_SERVICES_CODE for embedded firmware near the end of start_kernel(), just before calling rest_init(), this is on purpose because the typical EFI_BOOT_SERVICES_CODE memory-segment is too large for early_memremap(), so the check must be done after mm_init(). This relies on EFI_BOOT_SERVICES_CODE not being free-ed until efi_free_boot_services() is called, which means that this will only work on x86 for now. Reported-by: Dave Olsthoorn <dave@bewaar.me> Suggested-by: Peter Jones <pjones@redhat.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-3-hdegoede@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-03-03efi: Export boot-services code and data as debugfs-blobsHans de Goede
Sometimes it is useful to be able to dump the efi boot-services code and data. This commit adds these as debugfs-blobs to /sys/kernel/debug/efi, but only if efi=debug is passed on the kernel-commandline as this requires not freeing those memory-regions, which costs 20+ MB of RAM. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20200115163554.101315-2-hdegoede@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-03-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a pkeys fix for a bug that triggers with weird BIOS settings, and two Xen PV fixes: a paravirt interface fix, and pagetable dumping fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix dump_pagetables with Xen PV x86/ioperm: Add new paravirt function update_io_bitmap() x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
2020-03-02Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Three fixes to EFI mixed boot mode, mostly related to x86-64 vmap stacks activated years ago, bug-fixed recently for EFI, which had knock-on effects of various 1:1 mapping assumptions in mixed mode. There's also a READ_ONCE() fix for reading an mmap-ed EFI firmware data field only once, out of caution" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: READ_ONCE rng seed size before munmap efi/x86: Handle by-ref arguments covering multiple pages in mixed mode efi/x86: Remove support for EFI time and counter services in mixed mode efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
2020-03-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "More bugfixes, including a few remaining "make W=1" issues such as too large frame sizes on some configurations. On the ARM side, the compiler was messing up shadow stacks between EL1 and EL2 code, which is easily fixed with __always_inline" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: check descriptor table exits on instruction emulation kvm: x86: Limit the number of "kvm: disabled by bios" messages KVM: x86: avoid useless copy of cpufreq policy KVM: allow disabling -Werror KVM: x86: allow compiling as non-module with W=1 KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis KVM: Introduce pv check helpers KVM: let declaration of kvm_get_running_vcpus match implementation KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter arm64: Ask the compiler to __always_inline functions used by KVM at HYP KVM: arm64: Define our own swab32() to avoid a uapi static inline KVM: arm64: Ask the compiler to __always_inline functions used at HYP kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe() KVM: arm/arm64: Fix up includes for trace.h
2020-03-01KVM: VMX: check descriptor table exits on instruction emulationOliver Upton
KVM emulates UMIP on hardware that doesn't support it by setting the 'descriptor table exiting' VM-execution control and performing instruction emulation. When running nested, this emulation is broken as KVM refuses to emulate L2 instructions by default. Correct this regression by allowing the emulation of descriptor table instructions if L1 hasn't requested 'descriptor table exiting'. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Reported-by: Jan Kiszka <jan.kiszka@web.de> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-29x86/mm: Fix dump_pagetables with Xen PVJuergen Gross
Commit 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range") broke Xen PV guests as the hypervisor reserved hole in the memory map was not taken into account. Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR. Fixes: 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range") Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Julien Grall <julien@xen.org> Link: https://lkml.kernel.org/r/20200221103851.7855-1-jgross@suse.com
2020-02-29x86/ioperm: Add new paravirt function update_io_bitmap()Juergen Gross
Commit 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") reworked the iopl syscall to use I/O bitmaps. Unfortunately this broke Xen PV domains using that syscall as there is currently no I/O bitmap support in PV domains. Add I/O bitmap support via a new paravirt function update_io_bitmap which Xen PV domains can use to update their I/O bitmaps via a hypercall. Fixes: 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 5.5 Link: https://lkml.kernel.org/r/20200218154712.25490-1-jgross@suse.com
2020-02-29x86/boot/compressed: Fix reloading of GDTR post-relocationArvind Sankar
The following commit: ef5a7b5eb13e ("efi/x86: Remove GDT setup from efi_main") introduced GDT setup into the 32-bit kernel's startup_32, and reloads the GDTR after relocating the kernel for paranoia's sake. A followup commit: 32d009137a56 ("x86/boot: Reload GDTR after copying to the end of the buffer") introduced a similar GDTR reload in the 64-bit kernel as well. The GDTR is adjusted by (init_size-_end), however this may not be the correct offset to apply if the kernel was loaded at a misaligned address or below LOAD_PHYSICAL_ADDR, as in that case the decompression buffer has an additional offset from the original load address. This should never happen for a conformant bootloader, but we're being paranoid anyway, so just store the new GDT address in there instead of adding any offsets, which is simpler as well. Fixes: ef5a7b5eb13e ("efi/x86: Remove GDT setup from efi_main") Fixes: 32d009137a56 ("x86/boot: Reload GDTR after copying to the end of the buffer") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-efi@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20200226230031.3011645-2-nivedita@alum.mit.edu
2020-02-29efi/x86: Add RNG seed EFI table to unencrypted mapping checkTom Lendacky
When booting with SME active, EFI tables must be mapped unencrypted since they were built by UEFI in unencrypted memory. Update the list of tables to be checked during early_memremap() processing to account for the EFI RNG seed table. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-efi@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Hildenbrand <david@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Link: https://lore.kernel.org/r/b64385fc13e5d7ad4b459216524f138e7879234f.1582662842.git.thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20200228121408.9075-3-ardb@kernel.org
2020-02-29efi/x86: Add TPM related EFI tables to unencrypted mapping checksTom Lendacky
When booting with SME active, EFI tables must be mapped unencrypted since they were built by UEFI in unencrypted memory. Update the list of tables to be checked during early_memremap() processing to account for the EFI TPM tables. This fixes a bug where an EFI TPM log table has been created by UEFI, but it lives in memory that has been marked as usable rather than reserved. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-efi@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Hildenbrand <david@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/4144cd813f113c20cdfa511cf59500a64e6015be.1582662842.git.thomas.lendacky@amd.com Link: https://lore.kernel.org/r/20200228121408.9075-2-ardb@kernel.org
2020-02-28kvm: x86: Limit the number of "kvm: disabled by bios" messagesErwan Velu
In older version of systemd(219), at boot time, udevadm is called with : /usr/bin/udevadm trigger --type=devices --action=add" This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent, leading to the "kvm: disabled by bios" message in case of your Bios disabled the virtualization extensions. On a modern system running up to 256 CPU threads, this pollutes the Kernel logs. This patch offers to ratelimit this message to avoid any userspace program triggering this uevent printing this message too often. This patch is only a workaround but greatly reduce the pollution without breaking the current behavior of printing a message if some try to instantiate KVM on a system that doesn't support it. Note that recent versions of systemd (>239) do not have trigger this behavior. This patch will be useful at least for some using older systemd with recent Kernels. Signed-off-by: Erwan Velu <e.velu@criteo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: x86: avoid useless copy of cpufreq policyPaolo Bonzini
struct cpufreq_policy is quite big and it is not a good idea to allocate one on the stack. Just use cpufreq_cpu_get and cpufreq_cpu_put which is even simpler. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: allow disabling -WerrorPaolo Bonzini
Restrict -Werror to well-tested configurations and allow disabling it via Kconfig. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: x86: allow compiling as non-module with W=1Valdis Klētnieks
Compile error with CONFIG_KVM_INTEL=y and W=1: CC arch/x86/kvm/vmx/vmx.o arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=] 68 | static const struct x86_cpu_id vmx_cpu_id[] = { | ^~~~~~~~~~ cc1: all warnings being treated as errors When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a reference to the structure (or any code at all). This makes W=1 compiles unhappy. Wrap both in a #ifdef to avoid the issue. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> [Do the same for CONFIG_KVM_AMD. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipisWanpeng Li
Nick Desaulniers Reported: When building with: $ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000 The following warning is observed: arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=] static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector) ^ Debugging with: https://github.com/ClangBuiltLinux/frame-larger-than via: $ python3 frame_larger_than.py arch/x86/kernel/kvm.o \ kvm_send_ipi_mask_allbutself points to the stack allocated `struct cpumask newmask` in `kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as 8192, making a single instance of a `struct cpumask` 1024 B. This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for both pv tlb and pv ipis.. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: Introduce pv check helpersWanpeng Li
Introduce some pv check helpers for consistency. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28KVM: SVM: allocate AVIC data structures based on kvm_amd module parameterPaolo Bonzini
Even if APICv is disabled at startup, the backing page and ir_list need to be initialized in case they are needed later. The only case in which this can be skipped is for userspace irqchip, and that must be done because avic_init_backing_page dereferences vcpu->arch.apic (which is NULL for userspace irqchip). Tested-by: rmuncrief@humanavance.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-27x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changesSean Christopherson
Explicitly set X86_FEATURE_OSPKE via set_cpu_cap() instead of calling get_cpu_cap() to pull the feature bit from CPUID after enabling CR4.PKE. Invoking get_cpu_cap() effectively wipes out any {set,clear}_cpu_cap() changes that were made between this_cpu->c_init() and setup_pku(), as all non-synthetic feature words are reinitialized from the CPU's CPUID values. Blasting away capability updates manifests most visibility when running on a VMX capable CPU, but with VMX disabled by BIOS. To indicate that VMX is disabled, init_ia32_feat_ctl() clears X86_FEATURE_VMX, using clear_cpu_cap() instead of setup_clear_cpu_cap() so that KVM can report which CPU is misconfigured (KVM needs to probe every CPU anyways). Restoring X86_FEATURE_VMX from CPUID causes KVM to think VMX is enabled, ultimately leading to an unexpected #GP when KVM attempts to do VMXON. Arguably, init_ia32_feat_ctl() should use setup_clear_cpu_cap() and let KVM figure out a different way to report the misconfigured CPU, but VMX is not the only feature bit that is affected, i.e. there is precedent that tweaking feature bits via {set,clear}_cpu_cap() after ->c_init() is expected to work. Most notably, x86_init_rdrand()'s clearing of X86_FEATURE_RDRAND when RDRAND malfunctions is also overwritten. Fixes: 0697694564c8 ("x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU") Reported-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
2020-02-26Revert "KVM: x86: enable -Werror"Christoph Hellwig
This reverts commit ead68df94d248c80fdbae220ae5425eb5af2e753. Using the -Werror flag breaks the build for me due to mostly harmless KASAN or similar warnings: arch/x86/kvm/x86.c: In function ‘kvm_timer_init’: arch/x86/kvm/x86.c:7209:1: error: the frame size of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Feel free to add a CONFIG_WERROR if you care strong enough, but don't break peoples builds for absolutely no good reason. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-26efi/x86: Handle by-ref arguments covering multiple pages in mixed modeArd Biesheuvel
The mixed mode runtime wrappers are fragile when it comes to how the memory referred to by its pointer arguments are laid out in memory, due to the fact that it translates these addresses to physical addresses that the runtime services can dereference when running in 1:1 mode. Since vmalloc'ed pages (including the vmap'ed stack) are not contiguous in the physical address space, this scheme only works if the referenced memory objects do not cross page boundaries. Currently, the mixed mode runtime service wrappers require that all by-ref arguments that live in the vmalloc space have a size that is a power of 2, and are aligned to that same value. While this is a sensible way to construct an object that is guaranteed not to cross a page boundary, it is overly strict when it comes to checking whether a given object violates this requirement, as we can simply take the physical address of the first and the last byte, and verify that they point into the same physical page. When this check fails, we emit a WARN(), but then simply proceed with the call, which could cause data corruption if the next physical page belongs to a mapping that is entirely unrelated. Given that with vmap'ed stacks, this condition is much more likely to trigger, let's relax the condition a bit, but fail the runtime service call if it does trigger. Fixes: f6697df36bdf0bf7 ("x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-efi@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200221084849.26878-4-ardb@kernel.org
2020-02-26efi/x86: Remove support for EFI time and counter services in mixed modeArd Biesheuvel
Mixed mode calls at runtime are rather tricky with vmap'ed stacks, as we can no longer assume that data passed in by the callers of the EFI runtime wrapper routines is contiguous in physical memory. We need to fix this, but before we do, let's drop the implementations of routines that we know are never used on x86, i.e., the RTC related ones. Given that UEFI rev2.8 permits any runtime service to return EFI_UNSUPPORTED at runtime, let's return that instead. As get_next_high_mono_count() is never used at all, even on other architectures, let's make that return EFI_UNSUPPORTED too. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-efi@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200221084849.26878-3-ardb@kernel.org
2020-02-26efi/x86: Align GUIDs to their size in the mixed mode runtime wrapperArd Biesheuvel
Hans reports that his mixed mode systems running v5.6-rc1 kernels hit the WARN_ON() in virt_to_phys_or_null_size(), caused by the fact that efi_guid_t objects on the vmap'ed stack happen to be misaligned with respect to their sizes. As a quick (i.e., backportable) fix, copy GUID pointer arguments to the local stack into a buffer that is naturally aligned to its size, so that it is guaranteed to cover only one physical page. Note that on x86, we cannot rely on the stack pointer being aligned the way the compiler expects, so we need to allocate an 8-byte aligned buffer of sufficient size, and copy the GUID into that buffer at an offset that is aligned to 16 bytes. Fixes: f6697df36bdf0bf7 ("x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y") Reported-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Hans de Goede <hdegoede@redhat.com> Cc: linux-efi@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200221084849.26878-2-ardb@kernel.org
2020-02-26Merge tag 'efi-next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core Pull EFI updates for v5.7 from Ard Biesheuvel: This time, the set of changes for the EFI subsystem is much larger than usual. The main reasons are: - Get things cleaned up before EFI support for RISC-V arrives, which will increase the size of the validation matrix, and therefore the threshold to making drastic changes, - After years of defunct maintainership, the GRUB project has finally started to consider changes from the distros regarding UEFI boot, some of which are highly specific to the way x86 does UEFI secure boot and measured boot, based on knowledge of both shim internals and the layout of bootparams and the x86 setup header. Having this maintenance burden on other architectures (which don't need shim in the first place) is hard to justify, so instead, we are introducing a generic Linux/UEFI boot protocol. Summary of changes: - Boot time GDT handling changes (Arvind) - Simplify handling of EFI properties table on arm64 - Generic EFI stub cleanups, to improve command line handling, file I/O, memory allocation, etc. - Introduce a generic initrd loading method based on calling back into the firmware, instead of relying on the x86 EFI handover protocol or device tree. - Introduce a mixed mode boot method that does not rely on the x86 EFI handover protocol either, and could potentially be adopted by other architectures (if another one ever surfaces where one execution mode is a superset of another) - Clean up the contents of struct efi, and move out everything that doesn't need to be stored there. - Incorporate support for UEFI spec v2.8A changes that permit firmware implementations to return EFI_UNSUPPORTED from UEFI runtime services at OS runtime, and expose a mask of which ones are supported or unsupported via a configuration table. - Various documentation updates and minor code cleanups (Heinrich) - Partial fix for the lack of by-VA cache maintenance in the decompressor on 32-bit ARM. Note that these patches were deliberately put at the beginning so they can be used as a stable branch that will be shared with a PR containing the complete fix, which I will send to the ARM tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-23efi/libstub: Introduce symbolic constants for the stub major/minor versionArd Biesheuvel
Now that we have added new ways to load the initrd or the mixed mode kernel, we will also need a way to tell the loader about this. Add symbolic constants for the PE/COFF major/minor version numbers (which fortunately have always been 0x0 for all architectures), so that we can bump them later to document the capabilities of the stub. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Use symbolic constants in PE header instead of bare numbersArd Biesheuvel
Replace bare numbers in the PE/COFF header structure with symbolic constants so they become self documenting. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23x86/ima: Use EFI GetVariable only when availableArd Biesheuvel
Replace the EFI runtime services check with one that tells us whether EFI GetVariable() is implemented by the firmware. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Add true mixed mode entry point into .compat sectionArd Biesheuvel
Currently, mixed mode is closely tied to the EFI handover protocol and relies on intimate knowledge of the bootparams structure, setup header etc, all of which are rather byzantine and entirely specific to x86. Even though no other EFI supported architectures are currently known that could support something like mixed mode, it still makes sense to abstract a bit from this, and make it part of a generic Linux on EFI boot protocol. To that end, add a .compat section to the mixed mode binary, and populate it with the PE machine type and entry point address, allowing firmware implementations to match it to their native machine type, and invoke non-native binaries using a secondary entry point. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Implement mixed mode boot without the handover protocolArd Biesheuvel
Add support for booting 64-bit x86 kernels from 32-bit firmware running on 64-bit capable CPUs without requiring the bootloader to implement the EFI handover protocol or allocate the setup block, etc etc, all of which can be done by the stub itself, using code that already exists. Instead, create an ordinary EFI application entrypoint but implemented in 32-bit code [so that it can be invoked by 32-bit firmware], and stash the address of this 32-bit entrypoint in the .compat section where the bootloader can find it. Note that we use the setup block embedded in the binary to go through startup_32(), but it gets reallocated and copied in efi_pe_entry(), using the same code that runs when the x86 kernel is booted in EFI mode from native firmware. This requires the loaded image protocol to be installed on the kernel image's EFI handle, and point to the kernel image itself and not to its loader. This, in turn, requires the bootloader to use the LoadImage() boot service to load the 64-bit image from 32-bit firmware, which is in fact supported by firmware based on EDK2. (Only StartImage() will fail, and instead, the newly added entrypoint needs to be invoked) Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/libstub/x86: Use Exit() boot service to exit the stub on errorsArd Biesheuvel
Currently, we either return with an error [from efi_pe_entry()] or enter a deadloop [in efi_main()] if any fatal errors occur during execution of the EFI stub. Let's switch to calling the Exit() EFI boot service instead in both cases, so that we a) can get rid of the deadloop, and simply return to the boot manager if any errors occur during execution of the stub, including during the call to ExitBootServices(), b) can also return cleanly from efi_pe_entry() or efi_main() in mixed mode, once we introduce support for LoadImage/StartImage based mixed mode in the next patch. Note that on systems running downstream GRUBs [which do not use LoadImage or StartImage to boot the kernel, and instead, pass their own image handle as the loaded image handle], calling Exit() will exit from GRUB rather than from the kernel, but this is a tolerable side effect. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Drop redundant .bss sectionArd Biesheuvel
In commit c7fb93ec51d462ec ("x86/efi: Include a .bss section within the PE/COFF headers") we added a separate .bss section to the PE/COFF header of the compressed kernel describing the static memory footprint of the decompressor, to ensure that it has enough headroom to decompress itself. We can achieve the exact same result by increasing the virtual size of the .text section, without changing the raw size, which, as per the PE/COFF specification, requires the loader to zero initialize the delta. Doing so frees up a slot in the section table, which we will use later to describe the mixed mode entrypoint. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: add headroom to decompressor BSS to account for setup blockArd Biesheuvel
In the bootparams struct, init_size defines the static footprint of the bzImage, counted from the start of the kernel image, i.e., startup_32(). The PE/COFF metadata declares the same size for the entire image, but this time, the image includes the setup block as well, and so the space reserved by UEFI is a bit too small. This usually doesn't matter, since we normally relocate the kernel into a memory allocation of the correct size. But in the unlikely case that the image happens to be loaded at exactly the preferred offset, we skip this relocation, and execute the image in place, stepping on memory beyond the provided allocation, which may be in use for other purposes. Let's fix this by adding the size of the setup block to the image size as declared in the PE/COFF header. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Drop 'systab' member from struct efiArd Biesheuvel
The systab member in struct efi has outlived its usefulness, now that we have better ways to access the only piece of information we are interested in after init, which is the EFI runtime services table address. So instead of instantiating a doctored copy at early boot with lots of mangled values, and switching the pointer when switching into virtual mode, let's grab the values we need directly, and get rid of the systab pointer entirely. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi: Add 'runtime' pointer to struct efiArd Biesheuvel
Instead of going through the EFI system table each time, just copy the runtime services table pointer into struct efi directly. This is the last use of the system table pointer in struct efi, allowing us to drop it in a future patch, along with a fair amount of quirky handling of the translated address. Note that usually, the runtime services pointer changes value during the call to SetVirtualAddressMap(), so grab the updated value as soon as that call returns. (Mixed mode uses a 1:1 mapping, and kexec boot enters with the updated address in the system table, so in those cases, we don't need to do anything here) Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Merge assignments of efi.runtime_versionArd Biesheuvel
efi.runtime_version is always set to the same value on both existing code paths, so just set it earlier from a shared one. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Make fw_vendor, config_table and runtime sysfs nodes x86 specificArd Biesheuvel
There is some code that exposes physical addresses of certain parts of the EFI firmware implementation via sysfs nodes. These nodes are only used on x86, and are of dubious value to begin with, so let's move their handling into the x86 arch code. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Remove runtime table address from kexec EFI setup dataArd Biesheuvel
Since commit 33b85447fa61946b ("efi/x86: Drop two near identical versions of efi_runtime_init()"), we no longer map the EFI runtime services table before calling SetVirtualAddressMap(), which means we don't need the 1:1 mapped physical address of this table, and so there is no point in passing the address via EFI setup data on kexec boot. Note that the kexec tools will still look for this address in sysfs, so we still need to provide it. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi: Clean up config_parse_tables()Ard Biesheuvel
config_parse_tables() is a jumble of pointer arithmetic, due to the fact that on x86, we may be dealing with firmware whose native word size differs from the kernel's. This is not a concern on other architectures, and doesn't quite justify the state of the code, so let's clean it up by adding a non-x86 code path, constifying statically allocated tables and replacing preprocessor conditionals with IS_ENABLED() checks. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi: Make efi_config_init() x86 onlyArd Biesheuvel
The efi_config_init() routine is no longer shared with ia64 so let's move it into the x86 arch code before making further x86 specific changes to it. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi: Merge EFI system table revision and vendor checksArd Biesheuvel
We have three different versions of the code that checks the EFI system table revision and copies the firmware vendor string, and they are mostly equivalent, with the exception of the use of early_memremap_ro vs. __va() and the lowest major revision to warn about. Let's move this into common code and factor out the commonalities. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>