summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2017-12-17x86/entry/64/paravirt: Use paravirt-safe macro to access eflagsBoris Ostrovsky
Commit 1d3e53e8624a ("x86/entry/64: Refactor IRQ stacks and make them NMI-safe") added DEBUG_ENTRY_ASSERT_IRQS_OFF macro that acceses eflags using 'pushfq' instruction when testing for IF bit. On PV Xen guests looking at IF flag directly will always see it set, resulting in 'ud2'. Introduce SAVE_FLAGS() macro that will use appropriate save_fl pv op when running paravirt. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20171204150604.899457242@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17locking/barriers: Convert users of lockless_dereference() to READ_ONCE()Will Deacon
[ Note, this is a Git cherry-pick of the following commit: 506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMDRudolf Marek
[ Note, this is a Git cherry-pick of the following commit: 2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD") ... for easier x86 PTI code testing and back-porting. ] The latest AMD AMD64 Architecture Programmer's Manual adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]). If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers, thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs. Signed-Off-By: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17x86/cpufeature: Add User-Mode Instruction Prevention definitionsRicardo Neri
[ Note, this is a Git cherry-pick of the following commit: (limited to the cpufeatures.h file) 3522c2a6a4f3 ("x86/cpufeature: Add User-Mode Instruction Prevention definitions") ... for easier x86 PTI code testing and back-porting. ] User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception. The subset of instructions comprises: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17Merge commit 'upstream-x86-virt' into WIP.x86/mmIngo Molnar
Merge a minimal set of virt cleanups, for a base for the MM isolation patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17Merge branch 'upstream-acpi-fixes' into WIP.x86/pti.baseIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17Merge branch 'upstream-x86-selftests' into WIP.x86/pti.baseIngo Molnar
Conflicts: arch/x86/kernel/cpu/Makefile Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17Merge commit 'upstream-x86-entry' into WIP.x86/mmIngo Molnar
Pull in a minimal set of v4.15 entry code changes, for a base for the MM isolation patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix the s2ram regression related to confusion around segment register restoration, plus related cleanups that make the code more robust - a guess-unwinder Kconfig dependency fix - an isoimage build target fix for certain tool chain combinations - instruction decoder opcode map fixes+updates, and the syncing of the kernel decoder headers to the objtool headers - a kmmio tracing fix - two 5-level paging related fixes - a topology enumeration fix on certain SMP systems" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version x86/decoder: Fix and update the opcodes map x86/power: Make restore_processor_context() sane x86/power/32: Move SYSENTER MSR restoration to fix_processor_context() x86/power/64: Use struct desc_ptr for the IDT in struct saved_context x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y x86/build: Don't verify mtools configuration file for isoimage x86/mm/kmmio: Fix mmiotrace for page unaligned addresses x86/boot/compressed/64: Print error if 5-level paging is not supported x86/boot/compressed/64: Detect and handle 5-level paging at boot-time x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
2017-12-15x86/power: Make restore_processor_context() saneAndy Lutomirski
My previous attempt to fix a couple of bugs in __restore_processor_context(): 5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()") ... introduced yet another bug, breaking suspend-resume. Rather than trying to come up with a minimal fix, let's try to clean it up for real. This patch fixes quite a few things: - The old code saved a nonsensical subset of segment registers. The only registers that need to be saved are those that contain userspace state or those that can't be trivially restored without percpu access working. (On x86_32, we can restore percpu access by writing __KERNEL_PERCPU to %fs. On x86_64, it's easier to save and restore the kernel's GSBASE.) With this patch, we restore hardcoded values to the kernel state where applicable and explicitly restore the user state after fixing all the descriptor tables. - We used to use an unholy mix of inline asm and C helpers for segment register access. Let's get rid of the inline asm. This fixes the reported s2ram hangs and make the code all around more logical. Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reported-by: Pavel Machek <pavel@ucw.cz> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Zhang Rui <rui.zhang@intel.com> Fixes: 5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()") Link: http://lkml.kernel.org/r/398ee68e5c0f766425a7b746becfc810840770ff.1513286253.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-15x86/power/64: Use struct desc_ptr for the IDT in struct saved_contextAndy Lutomirski
x86_64's saved_context nonsensically used separate idt_limit and idt_base fields and then cast &idt_limit to struct desc_ptr *. This was correct (with -fno-strict-aliasing), but it's confusing, served no purpose, and required #ifdeffery. Simplify this by using struct desc_ptr directly. No change in functionality. Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Zhang Rui <rui.zhang@intel.com> Link: http://lkml.kernel.org/r/967909ce38d341b01d45eff53e278e2728a3a93a.1513286253.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-14KVM: x86: Add emulation of MSR_SMI_COUNTLiran Alon
This MSR returns the number of #SMIs that occurred on CPU since boot. It was seen to be used frequently by ESXi guest. Patch adds a new vcpu-arch specific var called smi_count to save the number of #SMIs which occurred on CPU since boot. It is exposed as a read-only MSR to guest (causing #GP on wrmsr) in RDMSR/WRMSR emulation code. MSR_SMI_COUNT is also added to emulated_msrs[] to make sure user-space can save/restore it for migration purposes. Signed-off-by: Liran Alon <liran.alon@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-14KVM: x86: add support for emulating UMIPPaolo Bonzini
The User-Mode Instruction Prevention feature present in recent Intel processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str) from being executed with CPL > 0. Otherwise, a general protection fault is issued. UMIP instructions in general are also able to trigger vmexits, so we can actually emulate UMIP on older processors. This commit sets up the infrastructure so that kvm-intel.ko and kvm-amd.ko can set the UMIP feature bit for CPUID even if the feature is not actually available in hardware. Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-14KVM: x86: add support for UMIPPaolo Bonzini
Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and add UMIP support for instructions that are already emulated by KVM. Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-12bpf: add a bpf_override_function helperJosef Bacik
Error injection is sloppy and very ad-hoc. BPF could fill this niche perfectly with it's kprobe functionality. We could make sure errors are only triggered in specific call chains that we care about with very specific situations. Accomplish this with the bpf_override_funciton helper. This will modify the probe'd callers return value to the specified value and set the PC to an override function that simply returns, bypassing the originally probed function. This gives us a nice clean way to implement systematic error injection for all of our code paths. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12Merge tag 'v4.15-rc3' into perf/core, to refresh the treeIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-11uprobes/x86: Emulate push insns for uprobe on x86Yonghong Song
Uprobe is a tracing mechanism for userspace programs. Typical uprobe will incur overhead of two traps. First trap is caused by replaced trap insn, and the second trap is to execute the original displaced insn in user space. To reduce the overhead, kernel provides hooks for architectures to emulate the original insn and skip the second trap. In x86, emulation is done for certain branch insns. This patch extends the emulation to "push <reg>" insns. These insns are typical in the beginning of the function. For example, bcc in https://github.com/iovisor/bcc repo provides tools to measure funclantency, detect memleak, etc. The tools will place uprobes in the beginning of function and possibly uretprobes at the end of function. This patch is able to reduce the trap overhead for uprobe from 2 to 1. Without this patch, uretprobe will typically incur three traps. With this patch, if the function starts with "push" insn, the number of traps can be reduced from 3 to 2. An experiment was conducted on two local VMs, fedora 26 64-bit VM and 32-bit VM, both 4 processors and 4GB memory, booted with latest tip repo (and this patch). The host is MacBook with intel i7 processor. The test program looks like: #include <stdio.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> static void test() __attribute__((noinline)); void test() {} int main() { struct timeval start, end; gettimeofday(&start, NULL); for (int i = 0; i < 1000000; i++) { test(); } gettimeofday(&end, NULL); printf("%ld\n", ((end.tv_sec * 1000000 + end.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec))); return 0; } The program is compiled without optimization, and the first insn for function "test" is "push %rbp". The host is relatively idle. Before the test run, the uprobe is inserted as below for uprobe: echo 'p <binary>:<test_func_offset>' > /sys/kernel/debug/tracing/uprobe_events echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable and for uretprobe: echo 'r <binary>:<test_func_offset>' > /sys/kernel/debug/tracing/uprobe_events echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable Unit: microsecond(usec) per loop iteration x86_64 W/ this patch W/O this patch uprobe 1.55 3.1 uretprobe 2.0 3.6 x86_32 W/ this patch W/O this patch uprobe 1.41 3.5 uretprobe 1.75 4.0 You can see that this patch significantly reduced the overhead, 50% for uprobe and 44% for uretprobe on x86_64, and even more on x86_32. Signed-off-by: Yonghong Song <yhs@fb.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20171201001202.3706564-1-yhs@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - A number of issues in the vgic discovered using SMATCH - A bit one-off calculation in out stage base address mask (32-bit and 64-bit) - Fixes to single-step debugging instructions that trap for other reasons such as MMMIO aborts - Printing unavailable hyp mode as error - Potential spinlock deadlock in the vgic - Avoid calling vgic vcpu free more than once - Broken bit calculation for big endian systems s390: - SPDX tags - Fence storage key accesses from problem state - Make sure that irq_state.flags is not used in the future x86: - Intercept port 0x80 accesses to prevent host instability (CVE) - Use userspace FPU context for guest FPU (mainly an optimization that fixes a double use of kernel FPU) - Do not leak one page per module load - Flush APIC page address cache from MMU invalidation notifiers" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: x86: fix APIC page invalidation KVM: s390: Fix skey emulation permission check KVM: s390: mark irq_state.flags as non-usable KVM: s390: Remove redundant license text KVM: s390: add SPDX identifiers to the remaining files KVM: VMX: fix page leak in hardware_setup() KVM: VMX: remove I/O port 0x80 bypass on Intel hosts x86,kvm: remove KVM emulator get_fpu / put_fpu x86,kvm: move qemu/guest FPU switching out to vcpu_run KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion KVM: arm/arm64: kvm_arch_destroy_vm cleanups KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner kvm: arm: don't treat unavailable HYP mode as an error KVM: arm/arm64: Avoid attempting to load timer vgic state without a vgic kvm: arm64: handle single-step of hyp emulated mmio instructions kvm: arm64: handle single-step during SError exceptions kvm: arm64: handle single-step of userspace mmio instructions kvm: arm64: handle single-stepping trapped instructions KVM: arm/arm64: debug: Introduce helper for single-step arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one ...
2017-12-08kmemcheck: rip it out for realMichal Hocko
Commit 4675ff05de2d ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...
2017-12-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - make CR4 handling irq-safe, which bug vmware guests ran into - don't crash on early IRQs in Xen guests - don't crash secondary CPU bringup if #UD assisted WARN()ings are triggered - make X86_BUG_FXSAVE_LEAK optional on newer AMD CPUs that have the fix - fix AMD Fam17h microcode loading - fix broadcom_postcore_init() if ACPI is disabled - fix resume regression in __restore_processor_context() - fix Sparse warnings - fix a GCC-8 warning * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Change time() prototype to match __vdso_time() x86: Fix Sparse warnings about non-static functions x86/power: Fix some ordering bugs in __restore_processor_context() x86/PCI: Make broadcom_postcore_init() check acpi_disabled x86/microcode/AMD: Add support for fam17h microcode loading x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD x86/idt: Load idt early in start_secondary x86/xen: Support early interrupts in xen pv guests x86/tlb: Disable interrupts when changing CR4 x86/tlb: Refactor CR4 setting and shadow write
2017-12-06Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh ↵Ingo Molnar
to v4.15 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06KVM: x86: fix APIC page invalidationRadim Krčmář
Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Fixes: 38b9917350cb ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2") Cc: <stable@vger.kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@hotmail.com> Tested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-06x86/cpufeatures: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMDRudolf Marek
The latest AMD AMD64 Architecture Programmer's Manual adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]). If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers, thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs. Signed-off-by: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-05x86,kvm: remove KVM emulator get_fpu / put_fpuRik van Riel
Now that get_fpu and put_fpu do nothing, because the scheduler will automatically load and restore the guest FPU context for us while we are in this code (deep inside the vcpu_run main loop), we can get rid of the get_fpu and put_fpu hooks. Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-05x86,kvm: move qemu/guest FPU switching out to vcpu_runRik van Riel
Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-05x86/asm: Allow again using asm.h when building for the 'bpf' clang targetArnaldo Carvalho de Melo
Up to f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") we were able to use x86 headers to build to the 'bpf' clang target, as done by the BPF code in tools/perf/. With that commit, we ended up with following failure for 'perf test LLVM', this is because "clang ... -target bpf ..." fails since 4.0 does not have bpf inline asm support and 6.0 does not recognize the register 'esp', fix it by guarding that part with an #ifndef __BPF__, that is defined by clang when building to the "bpf" target. # perf test -v LLVM 37: LLVM search and compile : 37.1: Basic BPF llvm compile : --- start --- test child forked, pid 25526 Kernel build dir is set to /lib/modules/4.14.0+/build set env: KBUILD_DIR=/lib/modules/4.14.0+/build unset env: KBUILD_OPTS include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: NR_CPUS=4 set env: LINUX_VERSION_CODE=0x40e00 set env: CLANG_EXEC=/usr/local/bin/clang set env: CLANG_OPTIONS=-xc set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: WORKING_DIR=/lib/modules/4.14.0+/build set env: CLANG_SOURCE=- llvm compiling command template: echo '/* * bpf-script-example.c * Test basic LLVM building */ #ifndef LINUX_VERSION_CODE # error Need LINUX_VERSION_CODE # error Example: for 4.2 kernel, put 'clang-opt="-DLINUX_VERSION_CODE=0x40200" into llvm section of ~/.perfconfig' #endif #define BPF_ANY 0 #define BPF_MAP_TYPE_ARRAY 2 #define BPF_FUNC_map_lookup_elem 1 #define BPF_FUNC_map_update_elem 2 static void *(*bpf_map_lookup_elem)(void *map, void *key) = (void *) BPF_FUNC_map_lookup_elem; static void *(*bpf_map_update_elem)(void *map, void *key, void *value, int flags) = (void *) BPF_FUNC_map_update_elem; struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; #define SEC(NAME) __attribute__((section(NAME), used)) struct bpf_map_def SEC("maps") flip_table = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(int), .max_entries = 1, }; SEC("func=SyS_epoll_wait") int bpf_func__SyS_epoll_wait(void *ctx) { int ind =0; int *flag = bpf_map_lookup_elem(&flip_table, &ind); int new_flag; if (!flag) return 0; /* flip flag and store back */ new_flag = !*flag; bpf_map_update_elem(&flip_table, &ind, &new_flag, BPF_ANY); return new_flag; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; ' | $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o - test child finished with 0 ---- end ---- LLVM search and compile subtest 0: Ok 37.2: kbuild searching : --- start --- test child forked, pid 25950 Kernel build dir is set to /lib/modules/4.14.0+/build set env: KBUILD_DIR=/lib/modules/4.14.0+/build unset env: KBUILD_OPTS include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: NR_CPUS=4 set env: LINUX_VERSION_CODE=0x40e00 set env: CLANG_EXEC=/usr/local/bin/clang set env: CLANG_OPTIONS=-xc set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h set env: WORKING_DIR=/lib/modules/4.14.0+/build set env: CLANG_SOURCE=- llvm compiling command template: echo '/* * bpf-script-test-kbuild.c * Test include from kernel header */ #ifndef LINUX_VERSION_CODE # error Need LINUX_VERSION_CODE # error Example: for 4.2 kernel, put 'clang-opt="-DLINUX_VERSION_CODE=0x40200" into llvm section of ~/.perfconfig' #endif #define SEC(NAME) __attribute__((section(NAME), used)) #include <uapi/linux/fs.h> #include <uapi/asm/ptrace.h> SEC("func=vfs_llseek") int bpf_func__vfs_llseek(void *ctx) { return 0; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; ' | $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o - In file included from <stdin>:12: In file included from /home/acme/git/linux/arch/x86/include/uapi/asm/ptrace.h:5: In file included from /home/acme/git/linux/include/linux/compiler.h:242: In file included from /home/acme/git/linux/arch/x86/include/asm/barrier.h:5: In file included from /home/acme/git/linux/arch/x86/include/asm/alternative.h:10: /home/acme/git/linux/arch/x86/include/asm/asm.h:145:50: error: unknown register name 'esp' in asm register unsigned long current_stack_pointer asm(_ASM_SP); ^ /home/acme/git/linux/arch/x86/include/asm/asm.h:44:18: note: expanded from macro '_ASM_SP' #define _ASM_SP __ASM_REG(sp) ^ /home/acme/git/linux/arch/x86/include/asm/asm.h:27:32: note: expanded from macro '__ASM_REG' #define __ASM_REG(reg) __ASM_SEL_RAW(e##reg, r##reg) ^ /home/acme/git/linux/arch/x86/include/asm/asm.h:18:29: note: expanded from macro '__ASM_SEL_RAW' # define __ASM_SEL_RAW(a,b) __ASM_FORM_RAW(a) ^ /home/acme/git/linux/arch/x86/include/asm/asm.h:11:32: note: expanded from macro '__ASM_FORM_RAW' # define __ASM_FORM_RAW(x) #x ^ <scratch space>:4:1: note: expanded from here "esp" ^ 1 error generated. ERROR: unable to compile - Hint: Check error message shown above. Hint: You can also pre-compile it into .o using: clang -target bpf -O2 -c - with proper -I and -D options. Failed to compile test case: 'kbuild searching' test child finished with -1 ---- end ---- LLVM search and compile subtest 1: FAILED! Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lkml.kernel.org/r/20171128175948.GL3298@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-05bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program typeHendrik Brueckner
Commit 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") introduced the bpf_perf_event_data structure which exports the pt_regs structure. This is OK for multiple architectures but fail for s390 and arm64 which do not export pt_regs. Programs using them, for example, the bpf selftest fail to compile on these architectures. For s390, exporting the pt_regs is not an option because s390 wants to allow changes to it. For arm64, there is a user_pt_regs structure that covers parts of the pt_regs structure for use by user space. To solve the broken uapi for s390 and arm64, introduce an abstract type for pt_regs and add an asm/bpf_perf_event.h file that concretes the type. An asm-generic header file covers the architectures that export pt_regs today. The arch-specific enablement for s390 and arm64 follows in separate commits. Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Fixes: 0515e5999a466dfe ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-04KVM: SVM: Pin guest memory when SEV is activeBrijesh Singh
The SEV memory encryption engine uses a tweak such that two identical plaintext pages at different location will have different ciphertext. So swapping or moving ciphertext of two pages will not result in plaintext being swapped. Relocating (or migrating) physical backing pages for a SEV guest will require some additional steps. The current SEV key management spec does not provide commands to swap or migrate (move) ciphertext pages. For now, we pin the guest memory registered through KVM_MEMORY_ENCRYPT_REG_REGION ioctl. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
2017-12-04KVM: SVM: Add support for KVM_SEV_LAUNCH_UPDATE_DATA commandBrijesh Singh
The command is used for encrypting the guest memory region using the VM encryption key (VEK) created during KVM_SEV_LAUNCH_START. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04KVM: SVM: Add support for KVM_SEV_LAUNCH_START commandBrijesh Singh
The KVM_SEV_LAUNCH_START command is used to create a memory encryption context within the SEV firmware. In order to do so, the guest owner should provide the guest's policy, its public Diffie-Hellman (PDH) key and session information. The command implements the LAUNCH_START flow defined in SEV spec Section 6.2. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04KVM: SVM: Add KVM_SEV_INIT commandBrijesh Singh
The command initializes the SEV platform context and allocates a new ASID for this guest from the SEV ASID pool. The firmware must be initialized before we issue any guest launch commands to create a new memory encryption context. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctlBrijesh Singh
If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to register/unregister the guest memory regions which may contain the encrypted data (e.g guest RAM, PCI BAR, SMRAM etc). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctlBrijesh Singh
If the hardware supports memory encryption then the KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform specific memory encryption commands. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04kvm: svm: Add SEV feature definitions to KVMTom Lendacky
Define the SEV enable bit for the VMCB control structure. The hypervisor will use this bit to enable SEV in the guest. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04kvm: svm: prepare for new bit definition in nested_ctlTom Lendacky
Currently the nested_ctl variable in the vmcb_control_area structure is used to indicate nested paging support. The nested paging support field is actually defined as bit 0 of the field. In order to support a new feature flag the usage of the nested_ctl and nested paging support must be converted to operate on a single bit. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU featureTom Lendacky
Update the CPU features to include identifying and reporting on the Secure Encrypted Virtualization (SEV) feature. SEV is identified by CPUID 0x8000001f, but requires BIOS support to enable it (set bit 23 of MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR). Only show the SEV feature as available if reported by CPUID and enabled by BIOS. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04Merge tag 'drm-intel-next-2017-11-17-1' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next More change sets for 4.16: - Many improvements for selftests and other igt tests (Chris) - Forcewake with PUNIT->PMIC bus fixes and robustness (Hans) - Define an engine class for uABI (Tvrtko) - Context switch fixes and improvements (Chris) - GT powersavings and power gating simplification and fixes (Chris) - Other general driver clean-ups (Chris, Lucas, Ville) - Removing old, useless and/or bad workarounds (Chris, Oscar, Radhakrishna) - IPS, pipe config, etc in preparation for another Fast Boot attempt (Maarten) - OA perf fixes and support to Coffee Lake and Cannonlake (Lionel) - Fixes around GPU fault registers (Michel) - GEM Proxy (Tina) - Refactor of Geminilake and Cannonlake plane color handling (James) - Generalize transcoder loop (Mika Kahola) - New HW Workaround for Cannonlake and Geminilake (Rodrigo) - Resume GuC before using GEM (Chris) - Stolen Memory handling improvements (Ville) - Initialize entry in PPAT for older compilers (Chris) - Other fixes and robustness improvements on execbuf (Chris) - Improve logs of GEM_BUG_ON (Mika Kuoppala) - Rework with massive rename of GuC functions and files (Sagar) - Don't sanitize frame start delay if pipe is off (Ville) - Cannonlake clock fixes (Rodrigo) - Cannonlake HDMI 2.0 support (Rodrigo) - Add a GuC doorbells selftest (Michel) - Add might_sleep() check to our wait_for() (Chris) Many GVT changes for 4.16: - CSB HWSP update support (Weinan) - GVT debug helpers, dyndbg and debugfs (Chuanxiao, Shuo) - full virtualized opregion (Xiaolin) - VM health check for sane fallback (Fred) - workload submission code refactor for future enabling (Zhi) - Updated repo URL in MAINTAINERS (Zhenyu) - other many misc fixes * tag 'drm-intel-next-2017-11-17-1' of git://anongit.freedesktop.org/drm/drm-intel: (260 commits) drm/i915: Update DRIVER_DATE to 20171117 drm/i915: Add a policy note for removing workarounds drm/i915/selftests: Report ENOMEM clearly for an allocation failure Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk" drm/i915: Calculate g4x intermediate watermarks correctly drm/i915: Calculate vlv/chv intermediate watermarks correctly, v3. drm/i915: Pass crtc_state to ips toggle functions, v2 drm/i915: Pass idle crtc_state to intel_dp_sink_crc drm/i915: Enable FIFO underrun reporting after initial fastset, v4. drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM drm/i915: Add might_sleep() check to wait_for() drm/i915/selftests: Add a GuC doorbells selftest drm/i915/cnl: Extend HDMI 2.0 support to CNL. drm/i915/cnl: Simplify dco_fraction calculation. drm/i915/cnl: Don't blindly replace qdiv. drm/i915/cnl: Fix wrpll math for higher freqs. drm/i915/cnl: Fix, simplify and unify wrpll variable sizes. drm/i915/cnl: Remove useless conversion. drm/i915/cnl: Remove spurious central_freq. drm/i915/selftests: exercise_ggtt may have nothing to do ...
2017-11-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: - x86 bugfixes: APIC, nested virtualization, IOAPIC - PPC bugfix: HPT guests on a POWER9 radix host * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: Let KVM_SET_SIGNAL_MASK work as advertised KVM: VMX: Fix vmx->nested freeing when no SMI handler KVM: VMX: Fix rflags cache during vCPU reset KVM: X86: Fix softlockup when get the current kvmclock KVM: lapic: Fixup LDR on load in x2apic KVM: lapic: Split out x2apic ldr calculation KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP KVM: x86: Fix CPUID function for word 6 (80000001_ECX) KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 KVM: x86: ioapic: Preserve read-only values in the redirection table KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq KVM: x86: ioapic: Don't fire level irq when Remote IRR set KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race KVM: x86: inject exceptions produced by x86_decode_insn KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs KVM: x86: fix em_fxstor() sleeping while in atomic KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry ...
2017-11-29mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITEDan Williams
In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: fix device-dax pud write-faults triggered by get_user_pages()Dan Williams
Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-28x86/xen: Support early interrupts in xen pv guestsJuergen Gross
Add early interrupt handlers activated by idt_setup_early_handler() to the handlers supported by Xen pv guests. This will allow for early WARN() calls not crashing the guest. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Link: https://lkml.kernel.org/r/20171124084221.30172-1-jgross@suse.com
2017-11-27switch wrapper poll.h instances to generic-yAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27platform/x86: intel_pmc_ipc: Add read64 APIChakravarty, Souvik K
Add intel_pmc_gcr_read64() API for reading from 64-bit GCR registers. This API will be called from intel_telemetry. Update description of intel_pmc_gcr_read(). Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-11-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - topology enumeration fixes - KASAN fix - two entry fixes (not yet the big series related to KASLR) - remove obsolete code - instruction decoder fix - better /dev/mem sanity checks, hopefully working better this time - pkeys fixes - two ACPI fixes - 5-level paging related fixes - UMIP fixes that should make application visible faults more debuggable - boot fix for weird virtualization environment * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/decoder: Add new TEST instruction pattern x86/PCI: Remove unused HyperTransport interrupt support x86/umip: Fix insn_get_code_seg_params()'s return value x86/boot/KASLR: Remove unused variable x86/entry/64: Add missing irqflags tracing to native_load_gs_index() x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing x86/pkeys/selftests: Fix protection keys write() warning x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey' x86/mpx/selftests: Fix up weird arrays x86/pkeys: Update documentation about availability x86/umip: Print a warning into the syslog if UMIP-protected instructions are used x86/smpboot: Fix __max_logical_packages estimate x86/topology: Avoid wasting 128k for package id array perf/x86/intel/uncore: Cache logical pkg id in uncore driver x86/acpi: Reduce code duplication in mp_override_legacy_irq() x86/acpi: Handle SCI interrupts above legacy space gracefully x86/boot: Fix boot failure when SMP MP-table is based at 0 x86/mm: Limit mmap() of /dev/mem to valid physical addresses x86/selftests: Add test for mapping placement for 5-level paging ...
2017-11-25x86/tlb: Disable interrupts when changing CR4Nadav Amit
CR4 modifications are implemented as RMW operations which update a shadow variable and write the result to CR4. The RMW operation is protected by preemption disable, but there is no enforcement or debugging mechanism. CR4 modifications happen also in interrupt context via __native_flush_tlb_global(). This implementation does not affect a interrupted thread context CR4 operation, because the CR4 toggle restores the original content and does not modify the shadow variable. So the current situation seems to be safe, but a recent patch tried to add an actual RMW operation in interrupt context, which will cause subtle corruptions. To prevent that and make the CR4 handling future proof: - Add a lockdep assertion to __cr4_set() which will catch interrupt enabled invocations - Disable interrupts in the cr4 manipulator inlines - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called from __switch_to_xtra() where interrupts are already disabled and performance matters. All other call sites are not performance critical, so the extra overhead of an additional local_irq_save/restore() pair is not a problem. If new call sites care about performance then the necessary _irqsoff() variants can be added. [ tglx: Condensed the patch by moving the irq protection inside the manipulator functions. Updated changelog ] Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Luck <tony.luck@intel.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: nadav.amit@gmail.com Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
2017-11-25x86/tlb: Refactor CR4 setting and shadow writeNadav Amit
Refactor the write to CR4 and its shadow value. This is done in preparation for the addition of an assertion to check that IRQs are disabled during CR4 update. No functional change. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: nadav.amit@gmail.com Cc: Andy Lutomirski <luto@kernel.org> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
2017-11-23x86/PCI: Remove unused HyperTransport interrupt supportBjorn Helgaas
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-23x86/umip: Fix insn_get_code_seg_params()'s return valueBorislav Petkov
In order to save on redundant structs definitions insn_get_code_seg_params() was made to return two 4-bit values in a char but clang complains: arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char' changes value from 132 to -124 [-Wconstant-conversion] return INSN_CODE_SEG_PARAMS(4, 8); ~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~ ./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS' #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4)) Those two values do get picked apart afterwards the opposite way of how they were ORed so wrt to the LSByte, the return value is the same. But this function returns -EINVAL in the error case, which is an int. So make it return an int which is the native word size anyway and thus fix the clang warning. Reported-by: Kees Cook <keescook@google.com> Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ricardo.neri-calderon@linux.intel.com Link: https://lkml.kernel.org/r/20171123091951.1462-1-bp@alien8.de
2017-11-17Merge tag 'locks-v4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking update from Jeff Layton: "A couple of fixes for a patch that went into v4.14, and the bug report just came in a few days ago.. It passes my (minimal) testing, and has been in linux-next for a few days now. I also would like to get my address changed in MAINTAINERS to clear that hurdle" * tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall fcntl: don't leak fd reference when fixup_compat_flock fails MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/