summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-10-09perf/x86/amd: Change/fix NMI latency mitigation to use a timestampTom Lendacky
It turns out that the NMI latency workaround from commit: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") ends up being too conservative and results in the perf NMI handler claiming NMIs too easily on AMD hardware when the NMI watchdog is active. This has an impact, for example, on the hpwdt (HPE watchdog timer) module. This module can produce an NMI that is used to reset the system. It registers an NMI handler for the NMI_UNKNOWN type and relies on the fact that nothing has claimed an NMI so that its handler will be invoked when the watchdog device produces an NMI. After the referenced commit, the hpwdt module is unable to process its generated NMI if the NMI watchdog is active, because the current NMI latency mitigation results in the NMI being claimed by the perf NMI handler. Update the AMD perf NMI latency mitigation workaround to, instead, use a window of time. Whenever a PMC is handled in the perf NMI handler, set a timestamp which will act as a perf NMI window. Any NMIs arriving within that window will be claimed by perf. Anything outside that window will not be claimed by perf. The value for the NMI window is set to 100 msecs. This is a conservative value that easily covers any NMI latency in the hardware. While this still results in a window in which the hpwdt module will not receive its NMI, the window is now much, much smaller. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08x86/cpu: Add Comet Lake to the Intel CPU models headerKan Liang
Comet Lake is the new 10th Gen Intel processor. Add two new CPU model numbers to the Intel family list. The CPU model numbers are not published in the SDM yet but they come from an authoritative internal source. [ bp: Touch up commit message. ] Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: ak@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
2019-10-08x86/cpu/vmware: Use the full form of INL in VMWARE_PORTSami Tolvanen
LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified: <inline asm>:1:7: error: invalid operand for instruction inl (%dx) ^ LLVM ERROR: Error parsing inline asm Use the full form of the instruction to fix the build. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Hellstrom <thellstrom@vmware.com> Cc: clang-built-linux@googlegroups.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: "VMware, Inc." <pv-drivers@vmware.com> Cc: x86-ml <x86@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/734 Link: https://lkml.kernel.org/r/20191007192129.104336-1-samitolvanen@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08x86/asm: Fix MWAITX C-state hint valueJanakarajan Natarajan
As per "AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions", MWAITX EAX[7:4]+1 specifies the optional hint of the optimized C-state. For C0 state, EAX[7:4] should be set to 0xf. Currently, a value of 0xf is set for EAX[3:0] instead of EAX[7:4]. Fix this by changing MWAITX_DISABLE_CSTATES from 0xf to 0xf0. This hasn't had any implications so far because setting reserved bits in EAX is simply ignored by the CPU. [ bp: Fixup comment in delay_mwaitx() and massage. ] Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20191007190011.4859-1-Janakarajan.Natarajan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-08x86/cpufeatures: Add feature bit RDPRU on AMDBabu Moger
AMD Zen 2 introduces a new RDPRU instruction which is used to give access to some processor registers that are typically only accessible when the privilege level is zero. ECX is used as the implicit register to specify which register to read. RDPRU places the specified register’s value into EDX:EAX. For example, the RDPRU instruction can be used to read MPERF and APERF at CPL > 0. Add the feature bit so it is visible in /proc/cpuinfo. Details are available in the AMD64 Architecture Programmer’s Manual: https://www.amd.com/system/files/TechDocs/24594.pdf Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lewis <aaronlewis@google.com> Cc: ak@linux.intel.com Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: robert.hu@linux.intel.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191007204839.5727.10803.stgit@localhost.localdomain
2019-10-07x86/xen: Return from panic notifierBoris Ostrovsky
Currently execution of panic() continues until Xen's panic notifier (xen_panic_event()) is called at which point we make a hypercall that never returns. This means that any notifier that is supposed to be called later as well as significant part of panic() code (such as pstore writes from kmsg_dump()) is never executed. There is no reason for xen_panic_event() to be this last point in execution since panic()'s emergency_restart() will call into xen_emergency_restart() from where we can perform our hypercall. Nevertheless, we will provide xen_legacy_crash boot option that will preserve original behavior during crash. This option could be used, for example, if running kernel dumper (which happens after panic notifiers) is undesirable. Reported-by: James Dingwall <james@dingwall.me.uk> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2019-10-07uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to itLinus Torvalds
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") I made filldir() use unsafe_put_user(), which improves code generation on x86 enormously. But because we didn't have a "unsafe_copy_to_user()", the dirent name copy was also done by hand with unsafe_put_user() in a loop, and it turns out that a lot of other architectures didn't like that, because unlike x86, they have various alignment issues. Most non-x86 architectures trap and fix it up, and some (like xtensa) will just fail unaligned put_user() accesses unconditionally. Which makes that "copy using put_user() in a loop" not work for them at all. I could make that code do explicit alignment etc, but the architectures that don't like unaligned accesses also don't really use the fancy "user_access_begin/end()" model, so they might just use the regular old __copy_to_user() interface. So this commit takes that looping implementation, turns it into the x86 version of "unsafe_copy_to_user()", and makes other architectures implement the unsafe copy version as __copy_to_user() (the same way they do for the other unsafe_xyz() accessor functions). Note that it only does this for the copying _to_ user space, and we still don't have a unsafe version of copy_from_user(). That's partly because we have no current users of it, but also partly because the copy_from_user() case is slightly different and cannot efficiently be implemented in terms of a unsafe_get_user() loop (because gcc can't do asm goto with outputs). It would be trivial to do this using "rep movsb", which would work really nicely on newer x86 cores, but really badly on some older ones. Al Viro is looking at cleaning up all our user copy routines to make this all a non-issue, but for now we have this simple-but-stupid version for x86 that works fine for the dirent name copy case because those names are short strings and we simply don't need anything fancier. Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-and-tested-by: Tony Luck <tony.luck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07efi/x86: Do not clean dummy variable in kexec pathDave Young
kexec reboot fails randomly in UEFI based KVM guest. The firmware just resets while calling efi_delete_dummy_variable(); Unfortunately I don't know how to debug the firmware, it is also possible a potential problem on real hardware as well although nobody reproduced it. The intention of the efi_delete_dummy_variable is to trigger garbage collection when entering virtual mode. But SetVirtualAddressMap can only run once for each physical reboot, thus kexec_enter_virtual_mode() is not necessarily a good place to clean a dummy object. Drop the efi_delete_dummy_variable so that kexec reboot can work. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Matthew Garrett <mjg59@google.com> Cc: Ben Dooks <ben.dooks@codethink.co.uk> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Lyude Paul <lyude@redhat.com> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott Talbert <swt@techie.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux-integrity@vger.kernel.org Link: https://lkml.kernel.org/r/20191002165904.8819-8-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Account for UV Hubless in is_uvX_hub OpsMike Travis
The references in the is_uvX_hub() function uses the hub_info pointer which will be NULL when the system is hubless. This change avoids that NULL dereference. It is also an optimization in performance. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.294981941@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Check EFI Boot to set reboot typeMike Travis
Change to checking for EFI Boot type from previous check on if this is a KDUMP kernel. This allows for KDUMP kernels that can handle EFI reboots. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.215091717@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Decode UVsystab InfoMike Travis
Decode the hubless UVsystab passed from BIOS to the kernel saving pertinent info in a similar manner that hubbed UVsystabs are decoded. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.135325066@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Add UV Hubbed/Hubless Proc FS FilesMike Travis
Indicate to UV user utilities that UV hubless support is available on this system via the existing /proc infterface. The current interface is maintained with the addition of new /proc leaves ("hubbed", "hubless", and "oemid") that contain the specific type of UV arch this one is. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145840.055590900@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Setup UV functions for Hubless UV SystemsMike Travis
Add more support for UV systems that do not contain a UV Hub (AKA "hubless"). This update adds support for additional functions required: Use PCH NMI handler instead of a UV Hub NMI handler. Initialize the UV BIOS callback interface used to support specific UV functions. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.975787119@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Add return code to UV BIOS Init functionMike Travis
Add a return code to the UV BIOS init function that indicates the successful initialization of the kernel/BIOS callback interface. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.895739629@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Return UV Hubless System TypeMike Travis
Return the type of UV hubless system for UV specific code that depends on that. Add a function to convert UV system type to bit pattern needed for is_uv_hubless(). Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.814880843@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07x86/platform/uv: Save OEM_ID from ACPI MADT probeMike Travis
Save the OEM_ID and OEM_TABLE_ID passed to the apic driver probe function for later use. Also, convert the char list arg passed from the kernel to a true null-terminated string. Signed-off-by: Mike Travis <mike.travis@hpe.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi.berriche@hpe.com> Cc: Justin Ernst <justin.ernst@hpe.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190910145839.732237241@stormcage.eag.rdlabs.hpecorp.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-05x86/asm: Make boot_gdt_descr localJiri Slaby
As far as I can see, it was never used outside of head_32.S. Not even when added in 2004. So make it local. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191003095238.29831-2-jslaby@suse.cz
2019-10-05x86/asm: Reorder early variablesJiri Slaby
Moving early_recursion_flag (4 bytes) after early_level4_pgt (4k) and early_dynamic_pgts (256k) saves 4k which are used for alignment of early_level4_pgt after early_recursion_flag. The real improvement is merely on the source code side. Previously it was: * __INITDATA + .balign * early_recursion_flag variable * a ton of CPP MACROS * __INITDATA (again) * early_top_pgt and early_recursion_flag variables * .data Now, it is a bit simpler: * a ton of CPP MACROS * __INITDATA + .balign * early_top_pgt and early_recursion_flag variables * early_recursion_flag variable * .data On the binary level the change looks like this: Before: (sections) 12 .init.data 00042000 0000000000000000 0000000000000000 00008000 2**12 (symbols) 000000 4 OBJECT GLOBAL DEFAULT 22 early_recursion_flag 001000 4096 OBJECT GLOBAL DEFAULT 22 early_top_pgt 002000 0x40000 OBJECT GLOBAL DEFAULT 22 early_dynamic_pgts After: (sections) 12 .init.data 00041004 0000000000000000 0000000000000000 00008000 2**12 (symbols) 000000 4096 OBJECT GLOBAL DEFAULT 22 early_top_pgt 001000 0x40000 OBJECT GLOBAL DEFAULT 22 early_dynamic_pgts 041000 4 OBJECT GLOBAL DEFAULT 22 early_recursion_flag So the resulting vmlinux is smaller by 4k with my toolchain as many other variables can be placed after early_recursion_flag to fill the rest of the page. Note that this is only .init data, so it is freed right after being booted anyway. Savings on-disk are none -- compression of zeros is easy, so the size of bzImage is the same pre and post the change. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191003095238.29831-1-jslaby@suse.cz
2019-10-04bpf, x86: Small optimization in comparing against imm0Daniel Borkmann
Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against zero. Saves 1 byte of instruction encoding per occurrence. The flag results of test 'reg, reg' are identical to 'cmp reg, 0' in all cases except for AF which we don't use/care about. In terms of macro-fusibility in combination with a subsequent conditional jump instruction, both have the same properties for the jumps used in the JIT translation. For example, same JITed Cilium program can shrink a bit from e.g. 12,455 to 12,317 bytes as tests with 0 are used quite frequently. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com>
2019-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "ARM and x86 bugfixes of all kinds. The most visible one is that migrating a nested hypervisor has always been busted on Broadwell and newer processors, and that has finally been fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: x86: omit "impossible" pmu MSRs from MSR list KVM: nVMX: Fix consistency check on injected exception error code KVM: x86: omit absent pmu MSRs from MSR list selftests: kvm: Fix libkvm build error kvm: vmx: Limit guest PMCs to those supported on the host kvm: x86, powerpc: do not allow clearing largepages debugfs entry KVM: selftests: x86: clarify what is reported on KVM_GET_MSRS failure KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF selftests: kvm: add test for dirty logging inside nested guests KVM: x86: fix nested guest live migration with PML KVM: x86: assign two bits to track SPTE kinds KVM: x86: Expose XSAVEERPTR to the guest kvm: x86: Enumerate support for CLZERO instruction kvm: x86: Use AMD CPUID semantics for AMD vCPUs kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH KVM: X86: Fix userspace set invalid CR4 kvm: x86: Fix a spurious -E2BIG in __do_cpuid_func KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_ns KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATH arm64: KVM: Kill hyp_alternate_select() ...
2019-10-04Merge tag 'for-linus-5.4-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes and cleanups from Juergen Gross: - a fix in the Xen balloon driver avoiding hitting a BUG_ON() in some cases, plus a follow-on cleanup series for that driver - a patch for introducing non-blocking EFI callbacks in Xen's EFI driver, plu a cleanup patch for Xen EFI handling merging the x86 and ARM arch specific initialization into the Xen EFI driver - a fix of the Xen xenbus driver avoiding a self-deadlock when cleaning up after a user process has died - a fix for Xen on ARM after removal of ZONE_DMA - a cleanup patch for avoiding build warnings for Xen on ARM * tag 'for-linus-5.4-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: fix self-deadlock after killing user process xen/efi: have a common runtime setup function arm: xen: mm: use __GPF_DMA32 for arm64 xen/balloon: Clear PG_offline in balloon_retrieve() xen/balloon: Mark pages PG_offline in balloon_append() xen/balloon: Drop __balloon_append() xen/balloon: Set pages PageOffline() in balloon_add_region() ARM: xen: unexport HYPERVISOR_platform_op function xen/efi: Set nonblocking callbacks
2019-10-04KVM: x86: omit "impossible" pmu MSRs from MSR listPaolo Bonzini
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since some machines actually have MSRs past the reserved range, filtering them against x86_pmu.num_counters_gp may have false positives. Cut the list to 18 entries to avoid this. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jim Mattson <jamttson@google.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03KVM: nVMX: Fix consistency check on injected exception error codeSean Christopherson
Current versions of Intel's SDM incorrectly state that "bits 31:15 of the VM-Entry exception error-code field" must be zero. In reality, bits 31:16 must be zero, i.e. error codes are 16-bit values. The bogus error code check manifests as an unexpected VM-Entry failure due to an invalid code field (error number 7) in L1, e.g. when injecting a #GP with error_code=0x9f00. Nadav previously reported the bug[*], both to KVM and Intel, and fixed the associated kvm-unit-test. [*] https://patchwork.kernel.org/patch/11124749/ Reported-by: Nadav Amit <namit@vmware.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03KVM: x86: omit absent pmu MSRs from MSR listPaolo Bonzini
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since some machines actually have MSRs past the reserved range, these may survive the filtering of msrs_to_save array and would be rejected by KVM_GET/SET_MSR. To avoid this, cut the list to whatever CPUID reports for the host's architectural PMU. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jim Mattson <jmattson@google.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03x86/math-emu: Limit MATH_EMULATION to 486SX compatiblesArnd Bergmann
The FPU emulation code is old and fragile in places, try to limit its use to builds for CPUs that actually use it. As far as I can tell, this is only true for i486sx compatibles, including the Cyrix 486SLC, AMD Am486SX and ÉLAN SC410, UMC U5S amd DM&P VortexSX86, all of which were relatively short-lived and got replaced with i486DX compatible processors soon after introduction, though some of the embedded versions remained available much longer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bill Metzenthen <billm@melbpc.org.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191001142344.1274185-2-arnd@arndb.de
2019-10-03x86/math-emu: Check __copy_from_user() resultArnd Bergmann
The new __must_check annotation on __copy_from_user() successfully identified some code that has lacked the check since at least linux-2.1.73: arch/x86/math-emu/reg_ld_str.c:88:2: error: ignoring return value of \ function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]         __copy_from_user(sti_ptr, s, 10);         ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ arch/x86/math-emu/reg_ld_str.c:1129:2: error: ignoring return value of \ function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]         __copy_from_user(register_base + offset, s, other);         ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/math-emu/reg_ld_str.c:1131:3: error: ignoring return value of \ function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]                 __copy_from_user(register_base, s + other, offset);                 ^~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition, the get_user()/put_user() helpers do not enforce a return value check, but actually still require one. These have been missing for even longer. Change the internal wrappers around get_user()/put_user() to force a signal and add a corresponding wrapper around __copy_from_user() to check all such cases. [ bp: Break long lines. ] Fixes: 257e458057e5 ("Import 2.1.73") Fixes: 9dd819a15162 ("uaccess: add missing __must_check attributes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Bill Metzenthen <billm@melbpc.org.au> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191001142344.1274185-1-arnd@arndb.de
2019-10-02xen/efi: have a common runtime setup functionJuergen Gross
Today the EFI runtime functions are setup in architecture specific code (x86 and arm), with the functions themselves living in drivers/xen as they are not architecture dependent. As the setup is exactly the same for arm and x86 move the setup to drivers/xen, too. This at once removes the need to make the single functions global visible. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> [boris: "Dropped EXPORT_SYMBOL_GPL(xen_efi_runtime_setup)"] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01x86/realmode: Explicitly set entry point via ENTRY in linker scriptNick Desaulniers
Linking with ld.lld via $ make LD=ld.lld produces the warning: ld.lld: warning: cannot find entry symbol _start; defaulting to 0x1000 Linking with ld.bfd shows the default entry is 0x1000: $ readelf -h arch/x86/realmode/rm/realmode.elf | grep Entry Entry point address: 0x1000 While ld.lld is being pedantic, just set the entry point explicitly, instead of depending on the implicit default. The symbol pa_text_start refers to the start of the .text section, which may not be at 0x1000 if the preceding sections listed in arch/x86/realmode/rm/realmode.lds.S were large enough. This matches behavior in arch/x86/boot/setup.ld. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Borislav Petkov <bp@alien8.de> Suggested-by: Peter Smith <Peter.Smith@arm.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: clang-built-linux@googlegroups.com Cc: grimar@accesssoftek.com Cc: Ingo Molnar <mingo@redhat.com> Cc: maskray@google.com Cc: ruiu@google.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190925180908.54260-1-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/216
2019-10-01x86: Use the correct SPDX License Identifier in headersNishad Kamdar
Correct the SPDX License Identifier format in a couple of headers. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/697848ff866ade29e78e872525d7a3067642fd37.1555427420.git.nishadkamdar@gmail.com
2019-10-01x86/rdrand: Sanity-check RDRAND outputBorislav Petkov
It turned out recently that on certain AMD F15h and F16h machines, due to the BIOS dropping the ball after resume, yet again, RDRAND would not function anymore: c49a0a80137c ("x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h") Add a silly test to the CPU bringup path, to sanity-check the random data RDRAND returns and scream as loudly as possible if that returned random data doesn't change. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Pu Wen <puwen@hygon.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/CAHk-=wjWPDauemCmLTKbdMYFB0UveMszZpcrwoUkJRRWKrqaTw@mail.gmail.com
2019-10-01x86/microcode/intel: Issue the revision updated message only on the BSPBorislav Petkov
... in order to not pollute dmesg with a line for each updated microcode engine. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jon Grimm <Jon.Grimm@amd.com> Cc: kanth.ghatraju@oracle.com Cc: konrad.wilk@oracle.com Cc: Mihai Carabas <mihai.carabas@oracle.com> Cc: patrick.colp@oracle.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190824085341.GC16813@zn.tnic
2019-10-01x86/microcode: Update late microcode in parallelAshok Raj
Microcode update was changed to be serialized due to restrictions after Spectre days. Updating serially on a large multi-socket system can be painful since it is being done on one CPU at a time. Cloud customers have expressed discontent as services disappear for a prolonged time. The restriction is that only one core (or only one thread of a core in the case of an SMT system) goes through the update while other cores (or respectively, SMT threads) are quiesced. Do the microcode update only on the first thread of each core while other siblings simply wait for this to complete. [ bp: Simplify, massage, cleanup comments. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jon Grimm <Jon.Grimm@amd.com> Cc: kanth.ghatraju@oracle.com Cc: konrad.wilk@oracle.com Cc: patrick.colp@oracle.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1566506627-16536-2-git-send-email-mihai.carabas@oracle.com
2019-10-01kvm: vmx: Limit guest PMCs to those supported on the hostJim Mattson
KVM can only virtualize as many PMCs as the host supports. Limit the number of generic counters and fixed counters to the number of corresponding counters supported on the host, rather than to INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively. Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since the existing code relies on a contiguous range of MSR indices for event selectors, it can't possibly work for more than 18 general purpose counters. Fixes: f5132b01386b5a ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-01x86/mce: Add Zhaoxin LMCE supportTony W Wang-oc
Newer Zhaoxin CPUs support LMCE compatible with Intel. Add support for that. [ bp: Export functions and massage. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-5-git-send-email-TonyWWang-oc@zhaoxin.com
2019-10-01x86/mce: Add Zhaoxin CMCI supportTony W Wang-oc
All newer Zhaoxin CPUs support CMCI and are compatible with Intel's Machine-Check Architecture. Add that support for Zhaoxin CPUs. [ bp: Massage comments and export intel_init_cmci(). ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-4-git-send-email-TonyWWang-oc@zhaoxin.com
2019-10-01x86/mce: Add Zhaoxin MCE supportTony W Wang-oc
All newer Zhaoxin CPUs are compatible with Intel's Machine-Check Architecture, so add support for them. [ bp: Reflow comment in vendor_disable_error_reporting() and massage commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhaoxin.com
2019-10-01x86/insn: Fix awk regexp warningsAlexander Kapshuk
gawk 5.0.1 generates the following regexp warnings: GEN /home/sasha/torvalds/tools/objtool/arch/x86/lib/inat-tables.c awk: ../arch/x86/tools/gen-insn-attr-x86.awk:260: warning: regexp escape sequence `\:' is not a known regexp operator awk: ../arch/x86/tools/gen-insn-attr-x86.awk:350: (FILENAME=../arch/x86/lib/x86-opcode-map.txt FNR=41) warning: regexp escape sequence `\&' is not a known regexp operator Ealier versions of gawk are not known to generate these warnings. The gawk manual referenced below does not list characters ':' and '&' as needing escaping, so 'unescape' them. See https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html for more info. Running diff on the output generated by the script before and after applying the patch reported no differences. [ bp: Massage commit message. ] [ Caught the respective tools header discrepancy. ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190924044659.3785-1-alexander.kapshuk@gmail.com
2019-10-01x86/mce/amd: Make disable_err_thresholding() staticBorislav Petkov
No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190928170539.2729-1-bp@alien8.de
2019-10-01x86/microcode/amd: Fix two -Wunused-but-set-variable warningsBorislav Petkov
The dummy variable is the high part of the microcode revision MSR which is defined as reserved. Mark it unused so that W=1 builds don't trigger the above warning. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190928162559.26294-1-bp@alien8.de
2019-10-01arch/x86/boot: Use prefix map to avoid embedded pathsBruce Ashfield
It was observed that the kernel embeds the absolute build path in the x86 boot image when the __FILE__ macro is expanded. > From https://bugzilla.yoctoproject.org/show_bug.cgi?id=13458: If you turn on the buildpaths QA test, or try a reproducible build, you discover that the kernel image contains build paths. $ strings bzImage-5.0.19-yocto-standard |grep tmp/ out of pgt_buf in /data/poky-tmp/reproducible/tmp/work-shared/qemux86-64/kernel-source/arch/x86/boot/compressed/kaslr_64.c!? But what's this in the top-level Makefile: $ git grep prefix-map Makefile:KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) So the __FILE__ shouldn't be using the full path. However arch/x86/boot/compressed/Makefile has this: KBUILD_CFLAGS := -m$(BITS) -O2 So that clears KBUILD_FLAGS, removing the -fmacro-prefix-map option. Use -fmacro-prefix-map to have relative paths in the boot image too. [ bp: Massage commit message and put the KBUILD_CFLAGS addition in ..boot/Makefile after the KBUILD_AFLAGS assignment because gas doesn't support -fmacro-prefix-map. ] Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com> Signed-off-by: Ross Burton <ross.burton@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: George Rimar <grimar@accesssoftek.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190926093226.8568-1-ross.burton@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=204333
2019-10-01x86/nmi: Remove stale EDAC include leftoverBorislav Petkov
db47d5f85646 ("x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI") forgot to remove it. Drop it. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190923193807.30896-1-bp@alien8.de
2019-10-01xen/efi: Set nonblocking callbacksRoss Lagerwall
Other parts of the kernel expect these nonblocking EFI callbacks to exist and crash when running under Xen. Since the implementations of xen_efi_set_variable() and xen_efi_query_variable_info() do not take any locks, use them for the nonblocking callbacks too. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-09-30kvm: x86, powerpc: do not allow clearing largepages debugfs entryPaolo Bonzini
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "A kexec fix for the case when GCC_PLUGIN_STACKLEAK=y is enabled" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Disable the stackleak GCC plugin for the purgatory
2019-09-28Merge branch 'next-lockdown' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull kernel lockdown mode from James Morris: "This is the latest iteration of the kernel lockdown patchset, from Matthew Garrett, David Howells and others. From the original description: This patchset introduces an optional kernel lockdown feature, intended to strengthen the boundary between UID 0 and the kernel. When enabled, various pieces of kernel functionality are restricted. Applications that rely on low-level access to either hardware or the kernel may cease working as a result - therefore this should not be enabled without appropriate evaluation beforehand. The majority of mainstream distributions have been carrying variants of this patchset for many years now, so there's value in providing a doesn't meet every distribution requirement, but gets us much closer to not requiring external patches. There are two major changes since this was last proposed for mainline: - Separating lockdown from EFI secure boot. Background discussion is covered here: https://lwn.net/Articles/751061/ - Implementation as an LSM, with a default stackable lockdown LSM module. This allows the lockdown feature to be policy-driven, rather than encoding an implicit policy within the mechanism. The new locked_down LSM hook is provided to allow LSMs to make a policy decision around whether kernel functionality that would allow tampering with or examining the runtime state of the kernel should be permitted. The included lockdown LSM provides an implementation with a simple policy intended for general purpose use. This policy provides a coarse level of granularity, controllable via the kernel command line: lockdown={integrity|confidentiality} Enable the kernel lockdown feature. If set to integrity, kernel features that allow userland to modify the running kernel are disabled. If set to confidentiality, kernel features that allow userland to extract confidential information from the kernel are also disabled. This may also be controlled via /sys/kernel/security/lockdown and overriden by kernel configuration. New or existing LSMs may implement finer-grained controls of the lockdown features. Refer to the lockdown_reason documentation in include/linux/security.h for details. The lockdown feature has had signficant design feedback and review across many subsystems. This code has been in linux-next for some weeks, with a few fixes applied along the way. Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode") is missing a Signed-off-by from its author. Matthew responded that he is providing this under category (c) of the DCO" * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits) kexec: Fix file verification on S390 security: constify some arrays in lockdown LSM lockdown: Print current->comm in restriction messages efi: Restrict efivar_ssdt_load when the kernel is locked down tracefs: Restrict tracefs when the kernel is locked down debugfs: Restrict debugfs when the kernel is locked down kexec: Allow kexec_file() with appropriate IMA policy when locked down lockdown: Lock down perf when in confidentiality mode bpf: Restrict bpf when kernel lockdown is in confidentiality mode lockdown: Lock down tracing and perf kprobes when in confidentiality mode lockdown: Lock down /proc/kcore x86/mmiotrace: Lock down the testmmiotrace module lockdown: Lock down module params that specify hardware parameters (eg. ioport) lockdown: Lock down TIOCSSERIAL lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down acpi: Disable ACPI table override if the kernel is locked down acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down ACPI: Limit access to custom_method when the kernel is locked down x86/msr: Restrict MSR access when the kernel is locked down x86: Lock down IO port access when the kernel is locked down ...
2019-09-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "x86 KVM changes: - The usual accuracy improvements for nested virtualization - The usual round of code cleanups from Sean - Added back optimizations that were prematurely removed in 5.2 (the bare minimum needed to fix the regression was in 5.3-rc8, here comes the rest) - Support for UMWAIT/UMONITOR/TPAUSE - Direct L2->L0 TLB flushing when L0 is Hyper-V and L1 is KVM - Tell Windows guests if SMT is disabled on the host - More accurate detection of vmexit cost - Revert a pvqspinlock pessimization" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (56 commits) KVM: nVMX: cleanup and fix host 64-bit mode checks KVM: vmx: fix build warnings in hv_enable_direct_tlbflush() on i386 KVM: x86: Don't check kvm_rebooting in __kvm_handle_fault_on_reboot() KVM: x86: Drop ____kvm_handle_fault_on_reboot() KVM: VMX: Add error handling to VMREAD helper KVM: VMX: Optimize VMX instruction error and fault handling KVM: x86: Check kvm_rebooting in kvm_spurious_fault() KVM: selftests: fix ucall on x86 Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" kvm: nvmx: limit atomic switch MSRs kvm: svm: Intercept RDPRU kvm: x86: Add "significant index" flag to a few CPUID leaves KVM: x86/mmu: Skip invalid pages during zapping iff root_count is zero KVM: x86/mmu: Explicitly track only a single invalid mmu generation KVM: x86/mmu: Revert "KVM: x86/mmu: Remove is_obsolete() call" KVM: x86/mmu: Revert "Revert "KVM: MMU: reclaim the zapped-obsolete page first"" KVM: x86/mmu: Revert "Revert "KVM: MMU: collapse TLB flushes when zap all pages"" KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch"" KVM: x86/mmu: Revert "Revert "KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages"" KVM: x86/mmu: Revert "Revert "KVM: MMU: show mmu_valid_gen in shadow page related tracepoints"" ...
2019-09-27KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TFWaiman Long
The l1tf_vmx_mitigation is only set to VMENTER_L1D_FLUSH_NOT_REQUIRED when the ARCH_CAPABILITIES MSR indicates that L1D flush is not required. However, if the CPU is not affected by L1TF, l1tf_vmx_mitigation will still be set to VMENTER_L1D_FLUSH_AUTO. This is certainly not the best option for a !X86_BUG_L1TF CPU. So force l1tf_vmx_mitigation to VMENTER_L1D_FLUSH_NOT_REQUIRED to make it more explicit in case users are checking the vmentry_l1d_flush parameter. Signed-off-by: Waiman Long <longman@redhat.com> [Patch rewritten accoring to Borislav Petkov's suggestion. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-27KVM: x86: fix nested guest live migration with PMLPaolo Bonzini
Shadow paging is fundamentally incompatible with the page-modification log, because the GPAs in the log come from the wrong memory map. In particular, for the EPT page-modification log, the GPAs in the log come from L2 rather than L1. (If there was a non-EPT page-modification log, we couldn't use it for shadow paging because it would log GVAs rather than GPAs). Therefore, we need to rely on write protection to record dirty pages. This has the side effect of bypassing PML, since writes now result in an EPT violation vmexit. This is relatively easy to add to KVM, because pretty much the only place that needs changing is spte_clear_dirty. The first access to the page already goes through the page fault path and records the correct GPA; it's only subsequent accesses that are wrong. Therefore, we can equip set_spte (where the first access happens) to record that the SPTE will have to be write protected, and then spte_clear_dirty will use this information to do the right thing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-27KVM: x86: assign two bits to track SPTE kindsPaolo Bonzini
Currently, we are overloading SPTE_SPECIAL_MASK to mean both "A/D bits unavailable" and MMIO, where the difference between the two is determined by mio_mask and mmio_value. However, the next patch will need two bits to distinguish availability of A/D bits from write protection. So, while at it give MMIO its own bit pattern, and move the two bits from bit 62 to bits 52..53 since Intel is allocating EPT page table bits from the top. Reviewed-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26mm: treewide: clarify pgtable_page_{ctor,dtor}() namingMark Rutland
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>