summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-05-27KVM: x86: simplify is_mmio_sptePaolo Bonzini
We can simply look at bits 52-53 to identify MMIO entries in KVM's page tables. Therefore, there is no need to pass a mask to kvm_mmu_set_mmio_spte_mask. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionallyMaxim Levitsky
This msr is only available when the host supports WAITPKG feature. This breaks a nested guest, if the L1 hypervisor is set to ignore unknown msrs, because the only other safety check that the kernel does is that it attempts to read the msr and rejects it if it gets an exception. Cc: stable@vger.kernel.org Fixes: 6e3ba4abce ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200523161455.3940-3-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: VMX: enable X86_FEATURE_WAITPKG in KVM capabilitiesMaxim Levitsky
Even though we might not allow the guest to use WAITPKG's new instructions, we should tell KVM that the feature is supported by the host CPU. Note that vmx_waitpkg_supported checks that WAITPKG _can_ be set in secondary execution controls as specified by VMX capability MSR, rather that we actually enable it for a guest. Cc: stable@vger.kernel.org Fixes: e69e72faa3a0 ("KVM: x86: Add support for user wait instructions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200523161455.3940-2-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86/mmu: Set mmio_value to '0' if reserved #PF can't be generatedSean Christopherson
Set the mmio_value to '0' instead of simply clearing the present bit to squash a benign warning in kvm_mmu_set_mmio_spte_mask() that complains about the mmio_value overlapping the lower GFN mask on systems with 52 bits of PA space. Opportunistically clean up the code and comments. Cc: stable@vger.kernel.org Fixes: d43e2675e96fc ("KVM: x86: only do L1TF workaround on affected processors") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200527084909.23492-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27x86: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU or AMD_IOMMU are not selected). For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but neither INTEL_IOMMU nor AMD_IOMMU are not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27x86/apb_timer: Drop unused declaration and macroJohan Hovold
Drop an extern declaration that has never been used and a no longer needed macro. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200513100944.9171-2-johan@kernel.org
2020-05-27x86/apb_timer: Drop unused TSC calibrationJohan Hovold
Drop the APB-timer TSC calibration, which hasn't been used since the removal of Moorestown support by commit 1a8359e411eb ("x86/mid: Remove Intel Moorestown"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200513100944.9171-1-johan@kernel.org
2020-05-26x86/io_apic: Remove unused function mp_init_irq_at_boot()YueHaibing
There are no callers in-tree anymore since ef9e56d894ea ("x86/ioapic: Remove obsolete post hotplug update") so remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200508140808.49428-1-yuehaibing@huawei.com
2020-05-26x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"Andy Lutomirski
Revert 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") and add a comment to discourage someone else from making the same mistake again. It turns out that some user code fails to compile if __X32_SYSCALL_BIT is unsigned long. See, for example [1] below. [ bp: Massage and do the same thing in the respective tools/ header. ] Fixes: 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") Reported-by: Thorsten Glaser <t.glaser@tarent.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@kernel.org Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294 Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
2020-05-26x86/apic: Make TSC deadline timer detection message visibleBorislav Petkov
The commit c84cb3735fd5 ("x86/apic: Move TSC deadline timer debug printk") removed the message which said that the deadline timer was enabled. It added a pr_debug() message which is issued when deadline timer validation succeeds. Well, issued only when CONFIG_DYNAMIC_DEBUG is enabled - otherwise pr_debug() calls get optimized away if DEBUG is not defined in the compilation unit. Therefore, make the above message pr_info() so that it is visible in dmesg. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200525104218.27018-1-bp@alien8.de
2020-05-25x86/reboot/quirks: Add MacBook6,1 reboot quirkHill Ma
On MacBook6,1 reboot would hang unless parameter reboot=pci is added. Make it automatic. Signed-off-by: Hill Ma <maahiuzeon@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200425200641.GA1554@cslab.localdomain
2020-05-25Merge tag 'efi-changes-for-v5.8' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core More EFI changes for v5.8: - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently - Simplify and unify initrd loading - Parse the builtin command line on x86 (if provided) - Implement printk() support, including support for wide character strings - Some fixes for issues introduced by the first batch of v5.8 changes - Fix a missing prototypes warning - Simplify GDT handling in early mixed mode thunking code - Some other minor fixes and cleanups Conflicts: drivers/firmware/efi/libstub/efistub.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-25Merge tag 'v5.7-rc7' into efi/core, to refresh the branch and pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24Merge tag 'efi-urgent-2020-05-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Thomas Gleixner: "A set of EFI fixes: - Don't return a garbage screen info when EFI framebuffer is not available - Make the early EFI console work properly with wider fonts instead of drawing garbage - Prevent a memory buffer leak in allocate_e820() - Print the firmware error record properly so it can be decoded by users - Fix a symbol clash in the host tool build which only happens with newer compilers. - Add a missing check for the event log version of TPM which caused boot failures on several Dell systems due to an attempt to decode SHA-1 format with the crypto agile algorithm" * tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tpm: check event log version before reading final events efi: Pull up arch-specific prototype efi_systab_show_arch() x86/boot: Mark global variables as static efi: cper: Add support for printing Firmware Error Record Reference efi/libstub/x86: Avoid EFI map buffer alloc in allocate_e820() efi/earlycon: Fix early printk for wider fonts efi/libstub: Avoid returning uninitialized data from setup_graphics()
2020-05-24Merge tag 'x86-urgent-2020-05-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Unbreak stack dumps for inactive tasks by interpreting the special first frame left by __switch_to_asm() correctly. The recent change not to skip the first frame so ORC and frame unwinder behave in the same way caused all entries to be unreliable, i.e. prepended with '?'. - Use cpumask_available() instead of an implicit NULL check of a cpumask_var_t in mmio trace to prevent a Clang build warning" * tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables
2020-05-24efi/x86: Drop the special GDT for the EFI thunkArvind Sankar
Instead of using efi_gdt64 to switch back to 64-bit mode and then switching to the real boot-time GDT, just switch to the boot-time GDT directly. The two GDT's are identical other than efi_gdt64 not including the 32-bit code segment. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200523221513.1642948-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-23x86: bitops: fix build regressionNick Desaulniers
This is easily reproducible via CC=clang + CONFIG_STAGING=y + CONFIG_VT6656=m. It turns out that if your config tickles __builtin_constant_p via differences in choices to inline or not, these statements produce invalid assembly: $ cat foo.c long a(long b, long c) { asm("orb %1, %0" : "+q"(c): "r"(b)); return c; } $ gcc foo.c foo.c: Assembler messages: foo.c:2: Error: `%rax' not allowed with `orb' Use the `%b` "x86 Operand Modifier" to instead force register allocation to select a lower-8-bit GPR operand. The "q" constraint only has meaning on -m32 otherwise is treated as "r". Not all GPRs have low-8-bit aliases for -m32. Fixes: 1651e700664b4 ("x86: Fix bitops.h warning with a moved cast") Reported-by: kernelci.org bot <bot@kernelci.org> Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Suggested-by: Brian Gerst <brgerst@gmail.com> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> [build, clang-11] Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-By: Brian Gerst <brgerst@gmail.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Marco Elver <elver@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Daniel Axtens <dja@axtens.net> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: http://lkml.kernel.org/r/20200508183230.229464-1-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/961 Link: https://lore.kernel.org/lkml/20200504193524.GA221287@google.com/ Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-23x86/apic/uv: Remove code for unused distributed GRU modeSteve Wahl
Distributed GRU mode appeared in only one generation of UV hardware, and no version of the BIOS has shipped with this feature enabled, and we have no plans to ever change that. The gru.s3.mode check has always been and will continue to be false. So remove this dead code. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dimitri Sivanich <sivanich@hpe.com> Link: https://lkml.kernel.org/r/20200513221123.GJ3240@raspberrypi
2020-05-23x86/mm: Stop printing BRK addressesArvind Sankar
This currently leaks kernel physical addresses into userspace. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/20200229231120.1147527-1-nivedita@alum.mit.edu
2020-05-22Merge tag 'efi-fixes-for-v5.7-rc6' of ↵Borislav Petkov
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent Pull EFI fixes from Ard Biesheuvel: "- fix EFI framebuffer earlycon for wide fonts - avoid filling screen_info with garbage if the EFI framebuffer is not available - fix a potential host tool build error due to a symbol clash on x86 - work around a EFI firmware bug regarding the binary format of the TPM final events table - fix a missing memory free by reworking the E820 table sizing routine to not do the allocation in the first place - add CPER parsing for firmware errors"
2020-05-22x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasksJosh Poimboeuf
Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with unwind_next_frame(). If the stack address matches the pointer returned by unwind_get_return_address_ptr() for the current frame, the text address is printed normally without a question mark. Otherwise it's considered a breadcrumb (potentially from a previous call path) and it's printed with a question mark to indicate that the address is unreliable and typically can be ignored. Since the following commit: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") ... for inactive tasks, show_trace_log_lvl() prints *only* unreliable addresses (prepended with '?'). That happens because, for the first frame of an inactive task, unwind_get_return_address_ptr() returns the wrong return address pointer: one word *below* the task stack pointer. show_trace_log_lvl() starts scanning at the stack pointer itself, so it never finds the first 'reliable' address, causing only guesses to being printed. The first frame of an inactive task isn't a normal stack frame. It's actually just an instance of 'struct inactive_task_frame' which is left behind by __switch_to_asm(). Now that this inactive frame is actually exposed to callers, fix unwind_get_return_address_ptr() to interpret it properly. Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
2020-05-22x86/amd_nb: Add AMD family 17h model 60h PCI IDsAlexander Monakov
Add PCI IDs for AMD Renoir (4000-series Ryzen CPUs). This is necessary to enable support for temperature sensors via the k10temp module. Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/20200510204842.2603-2-amonakov@ispras.ru
2020-05-22x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinuxFangrui Song
With commit ce5e3f909fc0 ("efi/printf: Add 64-bit and 8-bit integer support") arch/x86/boot/compressed/vmlinux may have an undesired .discard.unreachable section coming from drivers/firmware/efi/libstub/vsprintf.stub.o. That section gets generated from unreachable() annotations when CONFIG_STACK_VALIDATION is enabled. .discard.unreachable contains an R_X86_64_PC32 relocation which will be warned about by LLD: a non-SHF_ALLOC section (.discard.unreachable) is not part of the memory image, thus conceptually the distance between a non-SHF_ALLOC and a SHF_ALLOC is not a constant which can be resolved at link time: % ld.lld -m elf_x86_64 -T arch/x86/boot/compressed/vmlinux.lds ... -o arch/x86/boot/compressed/vmlinux ld.lld: warning: vsprintf.c:(.discard.unreachable+0x0): has non-ABS relocation R_X86_64_PC32 against symbol '' Reuse the DISCARDS macro which includes .discard.* to drop .discard.unreachable. [ bp: Massage and complete the commit message. ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Arvind Sankar <nivedita@alum.mit.edu> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://lkml.kernel.org/r/20200520182010.242489-1-maskray@google.com
2020-05-21x86/tsc: Add tsc_early_khz command line parameterKrzysztof Piecuch
Changing base clock frequency directly impacts TSC Hz but not CPUID.16h value. An overclocked CPU supporting CPUID.16h and with partial CPUID.15h support will set TSC KHZ according to "best guess" given by CPUID.16h relying on tsc_refine_calibration_work to give better numbers later. tsc_refine_calibration_work will refuse to do its work when the outcome is off the early TSC KHZ value by more than 1% which is certain to happen on an overclocked system. Fix this by adding a tsc_early_khz command line parameter that makes the kernel skip early TSC calibration and use the given value instead. This allows the user to provide the expected TSC frequency that is closer to reality than the one reported by the hardware, enabling tsc_refine_calibration_work to do meaningful error checking. [ tglx: Made the variable __initdata as it's only used on init and removed the error checking in the argument parser because kstrto*() only stores to the variable if the string is valid ] Signed-off-by: Krzysztof Piecuch <piecuch@protonmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/O2CpIOrqLZHgNRkfjRpz_LGqnc1ix_seNIiOCvHY4RHoulOVRo6kMXKuLOfBVTi0SMMevg6Go1uZ_cL9fLYtYdTRNH78ChaFaZyG3VAyYz8=@protonmail.com
2020-05-20efi/libstub: Add definitions for console input and eventsArvind Sankar
Add the required typedefs etc for using con_in's simple text input protocol, and for using the boottime event services. Also add the prototype for the "stall" boot service. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200518190716.751506-19-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-20x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent filesMichael Kelley
In preparation for adding ARM64 support, split hyperv-tlfs.h into architecture dependent and architecture independent files, similar to what has been done with mshyperv.h. Move architecture independent definitions into include/asm-generic/hyperv-tlfs.h. The split will avoid duplicating significant lines of code in the ARM64 version of hyperv-tlfs.h. The split has no functional impact. Some of the common definitions have "X64" in the symbol name. Change these to remove the "X64" in the architecture independent version of hyperv-tlfs.h, but add aliases with the "X64" in the x86 version so that x86 code will continue to compile. A later patch set will change all the references and allow removal of the aliases. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200422195737.10223-4-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-20x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #definesMichael Kelley
The HV_PROCESSOR_POWER_STATE_C<n> #defines date back to year 2010, but they are not in the TLFS v6.0 document and are not used anywhere in Linux. Remove them. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200422195737.10223-3-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-20KVM: x86: hyperv: Remove duplicate definitions of Reference TSC PageMichael Kelley
The Hyper-V Reference TSC Page structure is defined twice. struct ms_hyperv_tsc_page has padding out to a full 4 Kbyte page size. But the padding is not needed because the declaration includes a union with HV_HYP_PAGE_SIZE. KVM uses the second definition, which is struct _HV_REFERENCE_TSC_PAGE, because it does not have the padding. Fix the duplication by removing the padding from ms_hyperv_tsc_page. Fix up the KVM code to use it. Remove the no longer used struct _HV_REFERENCE_TSC_PAGE. There is no functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20200422195737.10223-2-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-20Merge tag 'noinstr-x86-kvm-2020-05-16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
2020-05-19Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fix from Wei Liu: "One patch from Vitaly to fix reenlightenment notifications" * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Properly suspend/resume reenlightenment notifications
2020-05-19perf/x86: Replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200511200911.GA13149@embeddedor
2020-05-19perf/x86/intel: Add more available bits for OFFCORE_RESPONSE of Intel TremontKan Liang
The mask in the extra_regs for Intel Tremont need to be extended to allow more defined bits. "Outstanding Requests" (bit 63) is only available on MSR_OFFCORE_RSP0; Fixes: 6daeb8737f8a ("perf/x86/intel: Add Tremont core PMU support") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200501125442.7030-1-kan.liang@linux.intel.com
2020-05-19perf/x86/rapl: Add Ice Lake RAPL supportKan Liang
Enable RAPL support for Intel Ice Lake X and Ice Lake D. For RAPL support, it is identical to Sky Lake X. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1588857258-38213-1-git-send-email-kan.liang@linux.intel.com
2020-05-19x86/mmiotrace: Use cpumask_available() for cpumask_var_t variablesNathan Chancellor
When building with Clang + -Wtautological-compare and CONFIG_CPUMASK_OFFSTACK unset: arch/x86/mm/mmio-mod.c:375:6: warning: comparison of array 'downed_cpus' equal to a null pointer is always false [-Wtautological-pointer-compare] if (downed_cpus == NULL && ^~~~~~~~~~~ ~~~~ arch/x86/mm/mmio-mod.c:405:6: warning: comparison of array 'downed_cpus' equal to a null pointer is always false [-Wtautological-pointer-compare] if (downed_cpus == NULL || cpumask_weight(downed_cpus) == 0) ^~~~~~~~~~~ ~~~~ 2 warnings generated. Commit f7e30f01a9e2 ("cpumask: Add helper cpumask_available()") added cpumask_available() to fix warnings of this nature. Use that here so that clang does not warn regardless of CONFIG_CPUMASK_OFFSTACK's value. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://github.com/ClangBuiltLinux/linux/issues/982 Link: https://lkml.kernel.org/r/20200408205323.44490-1-natechancellor@gmail.com
2020-05-19x86/audit: Fix a -Wmissing-prototypes warning for ia32_classify_syscall()Benjamin Thiel
Lift the prototype of ia32_classify_syscall() into its own header. Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200516123816.2680-1-b.thiel@posteo.de
2020-05-19x86/kvm: Restrict ASYNC_PF to user spaceThomas Gleixner
The async page fault injection into kernel space creates more problems than it solves. The host has absolutely no knowledge about the state of the guest if the fault happens in CPL0. The only restriction for the host is interrupt disabled state. If interrupts are enabled in the guest then the exception can hit arbitrary code. The HALT based wait in non-preemotible code is a hacky replacement for a proper hypercall. For the ongoing work to restrict instrumentation and make the RCU idle interaction well defined the required extra work for supporting async pagefault in CPL0 is just not justified and creates complexity for a dubious benefit. The CPL3 injection is well defined and does not cause any issues as it is more or less the same as a regular page fault from CPL3. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.369802541@linutronix.de
2020-05-19x86/kvm: Sanitize kvm_async_pf_task_wait()Thomas Gleixner
While working on the entry consolidation I stumbled over the KVM async page fault handler and kvm_async_pf_task_wait() in particular. It took me a while to realize that the randomly sprinkled around rcu_irq_enter()/exit() invocations are just cargo cult programming. Several patches "fixed" RCU splats by curing the symptoms without noticing that the code is flawed from a design perspective. The main problem is that this async injection is not based on a proper handshake mechanism and only respects the minimal requirement, i.e. the guest is not in a state where it has interrupts disabled. Aside of that the actual code is a convoluted one fits it all swiss army knife. It is invoked from different places with different RCU constraints: 1) Host side: vcpu_enter_guest() kvm_x86_ops->handle_exit() kvm_handle_page_fault() kvm_async_pf_task_wait() The invocation happens from fully preemptible context. 2) Guest side: The async page fault interrupted: a) user space b) preemptible kernel code which is not in a RCU read side critical section c) non-preemtible kernel code or a RCU read side critical section or kernel code with CONFIG_PREEMPTION=n which allows not to differentiate between #2b and #2c. RCU is watching for: #1 The vCPU exited and current is definitely not the idle task #2a The #PF entry code on the guest went through enter_from_user_mode() which reactivates RCU #2b There is no preemptible, interrupts enabled code in the kernel which can run with RCU looking away. (The idle task is always non preemptible). I.e. all schedulable states (#1, #2a, #2b) do not need any of this RCU voodoo at all. In #2c RCU is eventually not watching, but as that state cannot schedule anyway there is no point to worry about it so it has to invoke rcu_irq_enter() before running that code. This can be optimized, but this will be done as an extra step in course of the entry code consolidation work. So the proper solution for this is to: - Split kvm_async_pf_task_wait() into schedule and halt based waiting interfaces which share the enqueueing code. - Add comments (condensed form of this changelog) to spare others the time waste and pain of reverse engineering all of this with the help of uncomprehensible changelogs and code history. - Invoke kvm_async_pf_task_wait_schedule() from kvm_handle_page_fault(), user mode and schedulable kernel side async page faults (#1, #2a, #2b) - Invoke kvm_async_pf_task_wait_halt() for the non schedulable kernel case (#2c). For this case also remove the rcu_irq_exit()/enter() pair around the halt as it is just a pointless exercise: - vCPUs can VMEXIT at any random point and can be scheduled out for an arbitrary amount of time by the host and this is not any different except that it voluntary triggers the exit via halt. - The interrupted context could have RCU watching already. So the rcu_irq_exit() before the halt is not gaining anything aside of confusing the reader. Claiming that this might prevent RCU stalls is just an illusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.262701431@linutronix.de
2020-05-19x86/kvm: Handle async page faults directly through do_page_fault()Andy Lutomirski
KVM overloads #PF to indicate two types of not-actually-page-fault events. Right now, the KVM guest code intercepts them by modifying the IDT and hooking the #PF vector. This makes the already fragile fault code even harder to understand, and it also pollutes call traces with async_page_fault and do_async_page_fault for normal page faults. Clean it up by moving the logic into do_page_fault() using a static branch. This gets rid of the platform trap_init override mechanism completely. [ tglx: Fixed up 32bit, removed error code from the async functions and massaged coding style ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.169270470@linutronix.de
2020-05-19x86: Replace ist_enter() with nmi_enter()Peter Zijlstra
A few exceptions (like #DB and #BP) can happen at any location in the code, this then means that tracers should treat events from these exceptions as NMI-like. The interrupted context could be holding locks with interrupts disabled for instance. Similarly, #MC is an actual NMI-like exception. All of them use ist_enter() which only concerns itself with RCU, but does not do any of the other setup that NMIs need. This means things like: printk() raw_spin_lock_irq(&logbuf_lock); <#DB/#BP/#MC> printk() raw_spin_lock_irq(&logbuf_lock); are entirely possible (well, not really since printk tries hard to play nice, but the concept stands). So replace ist_enter() with nmi_enter(). Also observe that any nmi_enter() caller must be both notrace and NOKPROBE, or in the noinstr text section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134101.525508608@linutronix.de
2020-05-19x86/mce: Send #MC singal from task workPeter Zijlstra
Convert #MC over to using task_work_add(); it will run the same code slightly later, on the return to user path of the same exception. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200505134100.957390899@linutronix.de
2020-05-19x86/entry: Get rid of ist_begin/end_non_atomic()Thomas Gleixner
This is completely overengineered and definitely not an interface which should be made available to anything else than this particular MCE case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.462640294@linutronix.de
2020-05-19x86/boot: Correct relocation destination on old linkersArvind Sankar
For the 32-bit kernel, as described in 6d92bc9d483a ("x86/build: Build compressed x86 kernels as PIE"), pre-2.26 binutils generates R_386_32 relocations in PIE mode. Since the startup code does not perform relocation, any reloc entry with R_386_32 will remain as 0 in the executing code. Commit 974f221c84b0 ("x86/boot: Move compressed kernel to the end of the decompression buffer") added a new symbol _end but did not mark it hidden, which doesn't give the correct offset on older linkers. This causes the compressed kernel to be copied beyond the end of the decompression buffer, rather than flush against it. This region of memory may be reserved or already allocated for other purposes by the bootloader. Mark _end as hidden to fix. This changes the relocation from R_386_32 to R_386_RELATIVE even on the pre-2.26 binutils. For 64-bit, this is not strictly necessary, as the 64-bit kernel is only built as PIE if the linker supports -z noreloc-overflow, which implies binutils-2.27+, but for consistency, mark _end as hidden here too. The below illustrates the before/after impact of the patch using binutils-2.25 and gcc-4.6.4 (locally compiled from source) and QEMU. Disassembly before patch: 48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax 4e: 2d 00 00 00 00 sub $0x0,%eax 4f: R_386_32 _end Disassembly after patch: 48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax 4e: 2d 00 f0 76 00 sub $0x76f000,%eax 4f: R_386_RELATIVE *ABS* Dump from extract_kernel before patch: early console in extract_kernel input_data: 0x0207c098 <--- this is at output + init_size input_len: 0x0074fef1 output: 0x01000000 output_len: 0x00fa63d0 kernel_total_size: 0x0107c000 needed_size: 0x0107c000 Dump from extract_kernel after patch: early console in extract_kernel input_data: 0x0190d098 <--- this is at output + init_size - _end input_len: 0x0074fef1 output: 0x01000000 output_len: 0x00fa63d0 kernel_total_size: 0x0107c000 needed_size: 0x0107c000 Fixes: 974f221c84b0 ("x86/boot: Move compressed kernel to the end of the decompression buffer") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200207214926.3564079-1-nivedita@alum.mit.edu
2020-05-19KVM: x86: only do L1TF workaround on affected processorsPaolo Bonzini
KVM stores the gfn in MMIO SPTEs as a caching optimization. These are split in two parts, as in "[high 11111 low]", to thwart any attempt to use these bits in an L1TF attack. This works as long as there are 5 free bits between MAXPHYADDR and bit 50 (inclusive), leaving bit 51 free so that the MMIO access triggers a reserved-bit-set page fault. The bit positions however were computed wrongly for AMD processors that have encryption support. In this case, x86_phys_bits is reduced (for example from 48 to 43, to account for the C bit at position 47 and four bits used internally to store the SEV ASID and other stuff) while x86_cache_bits in would remain set to 48, and _all_ bits between the reduced MAXPHYADDR and bit 51 are set. Then low_phys_bits would also cover some of the bits that are set in the shadow_mmio_value, terribly confusing the gfn caching mechanism. To fix this, avoid splitting gfns as long as the processor does not have the L1TF bug (which includes all AMD processors). When there is no splitting, low_phys_bits can be set to the reduced MAXPHYADDR removing the overlap. This fixes "npt=0" operation on EPYC processors. Thanks to Maxim Levitsky for bisecting this bug. Cc: stable@vger.kernel.org Fixes: 52918ed5fcf0 ("KVM: SVM: Override default MMIO mask if memory encryption is enabled") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-18x86/cpu: Use RDRAND and RDSEED mnemonics in archrandom.hUros Bizjak
Current minimum required version of binutils is 2.23, which supports RDRAND and RDSEED instruction mnemonics. Replace the byte-wise specification of RDRAND and RDSEED with these proper mnemonics. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200508105817.207887-1-ubizjak@gmail.com
2020-05-18kgdb: Delay "kgdbwait" to dbg_late_init() by defaultDouglas Anderson
Using kgdb requires at least some level of architecture-level initialization. If nothing else, it relies on the architecture to pass breakpoints / crashes onto kgdb. On some architectures this all works super early, specifically it starts working at some point in time before Linux parses early_params's. On other architectures it doesn't. A survey of a few platforms: a) x86: Presumably it all works early since "ekgdboc" is documented to work here. b) arm64: Catching crashes works; with a simple patch breakpoints can also be made to work. c) arm: Nothing in kgdb works until paging_init() -> devicemaps_init() -> early_trap_init() Let's be conservative and, by default, process "kgdbwait" (which tells the kernel to drop into the debugger ASAP at boot) a bit later at dbg_late_init() time. If an architecture has tested it and wants to re-enable super early debugging, they can select the ARCH_HAS_EARLY_DEBUG KConfig option. We'll do this for x86 to start. It should be noted that dbg_late_init() is still called quite early in the system. Note that this patch doesn't affect when kgdb runs its init. If kgdb is set to initialize early it will still initialize when parsing early_param's. This patch _only_ inhibits the initial breakpoint from "kgdbwait". This means: * Without any extra patches arm64 platforms will at least catch crashes after kgdb inits. * arm platforms will catch crashes (and could handle a hardcoded kgdb_breakpoint()) any time after early_trap_init() runs, even before dbg_late_init(). Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200507130644.v4.4.I3113aea1b08d8ce36dc3720209392ae8b815201b@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-05-18Merge tag 'v5.7-rc6' into objtool/core, to pick up fixes and resolve ↵Ingo Molnar
semantic conflict Resolve structural conflict between: 59566b0b622e: ("x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up") which introduced a new reference to 'ftrace_epilogue', and: 0298739b7983: ("x86,ftrace: Fix ftrace_regs_caller() unwind") Which renamed it to 'ftrace_caller_end'. Rename the new usage site in the merge commit. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-17Merge tag 'objtool-urgent-2020-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 stack unwinding fix from Thomas Gleixner: "A single bugfix for the ORC unwinder to ensure that the error flag which tells the unwinding code whether a stack trace can be trusted or not is always set correctly. This was messed up by a couple of changes in the recent past" * tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fix error handling in __unwind_start()
2020-05-17Merge tag 'x86_urgent_for_v5.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: "A single fix for early boot crashes of kernels built with gcc10 and stack protector enabled" * tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix early boot crash on gcc-10, third try
2020-05-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A new testcase for guest debugging (gdbstub) that exposed a bunch of bugs, mostly for AMD processors. And a few other x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c KVM: SVM: Disable AVIC before setting V_IRQ KVM: Introduce kvm_make_all_cpus_request_except() KVM: VMX: pass correct DR6 for GD userspace exit KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on KVM: selftests: Add KVM_SET_GUEST_DEBUG test KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG KVM: x86: fix DR6 delivery for various cases of #DB injection KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly