summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2019-08-28x86/intel: Aggregate big core mobile namingPeter Zijlstra
Currently big core mobile chips have either: - _L - _ULT - _MOBILE Make it uniformly: _L. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(MOBILE\|ULT\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(MOBILE\|ULT\)/\1_L/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190827195122.568978530@infradead.org
2019-08-28x86/intel: Aggregate big core client namingPeter Zijlstra
Currently the big core client models either have: - no OPTDIFF - _CORE - _DESKTOP Make it uniformly: 'no OPTDIFF'. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(CORE\|DESKTOP\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_\(CORE\|DESKTOP\)/\1/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190827195122.513945586@infradead.org
2019-08-28x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercallsThomas Hellstrom
Vmware has historically used an INL instruction for this, but recent hardware versions support using VMCALL/VMMCALL instead, so use this method if supported at platform detection time. Explicitly code separate macro versions since the alternatives self-patching has not been performed at platform detection time. Also put tighter constraints on the assembly input parameters. Co-developed-by: Doug Covelli <dcovelli@vmware.com> Signed-off-by: Doug Covelli <dcovelli@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Doug Covelli <dcovelli@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-graphics-maintainer@vmware.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: <pv-drivers@vmware.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190828080353.12658-2-thomas_os@shipmail.org
2019-08-28x86/cpufeature: Explain the macro duplicationCao Jin
Explain the intent behind the duplication of the BUILD_BUG_ON_ZERO(NCAPINTS != n) check in *_MASK_CHECK and its immediate use in the *MASK_BIT_SET macros too. [ bp: Massage. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Cao Jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <namit@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190828061100.27032-1-caoj.fnst@cn.fujitsu.com
2019-08-27KVM: x86: Don't update RIP or do single-step on faulting emulationSean Christopherson
Don't advance RIP or inject a single-step #DB if emulation signals a fault. This logic applies to all state updates that are conditional on clean retirement of the emulation instruction, e.g. updating RFLAGS was previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update EFLAGS on faulting emulation"). Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with ctxt->_eip until emulation "retires" anyways. Skipping #DB injection fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to invalid state with EFLAGS.TF=1 would loop indefinitely due to emulation overwriting the #UD with #DB and thus restarting the bad SYSCALL over and over. Cc: Nadav Amit <nadav.amit@gmail.com> Cc: stable@vger.kernel.org Reported-by: Andy Lutomirski <luto@kernel.org> Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-08-27KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when ↵Vitaly Kuznetsov
kvm_intel.nested is disabled If kvm_intel is loaded with nested=0 parameter an attempt to perform KVM_GET_SUPPORTED_HV_CPUID results in OOPS as nested_get_evmcs_version hook in kvm_x86_ops is NULL (we assign it in nested_vmx_hardware_setup() and this only happens in case nested is enabled). Check that kvm_x86_ops->nested_get_evmcs_version is not NULL before calling it. With this, we can remove the stub from svm as it is no longer needed. Cc: <stable@vger.kernel.org> Fixes: e2e871ab2f02 ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-08-27x86/boot/compressed/64: Fix missing initialization in ↵Kirill A. Shutemov
find_trampoline_placement() Gustavo noticed that 'new' can be left uninitialized if 'bios_start' happens to be less or equal to 'entry->addr + entry->size'. Initialize the variable at the begin of the iteration to the current value of 'bios_start'. Fixes: 0a46fff2f910 ("x86/boot/compressed/64: Fix boot on machines with broken E820 table") Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190826133326.7cxb4vbmiawffv2r@box
2019-08-26x86/apic: Include the LDR when clearing out APIC registersBandan Das
Although APIC initialization will typically clear out the LDR before setting it, the APIC cleanup code should reset the LDR. This was discovered with a 32-bit KVM guest jumping into a kdump kernel. The stale bits in the LDR triggered a bug in the KVM APIC implementation which caused the destination mapping for VCPUs to be corrupted. Note that this isn't intended to paper over the KVM APIC bug. The kernel has to clear the LDR when resetting the APIC registers except when X2APIC is enabled. This lacks a Fixes tag because missing to clear LDR goes way back into pre git history. [ tglx: Made x2apic_enabled a function call as required ] Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
2019-08-26x86/apic: Do not initialize LDR and DFR for bigsmpBandan Das
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The bigsmp APIC implementation uses physical destination mode, but it nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with multiple bit being set. This does not cause a functional problem because LDR and DFR are ignored when physical destination mode is active, but it triggered a problem on a 32-bit KVM guest which jumps into a kdump kernel. The multiple bits set unearthed a bug in the KVM APIC implementation. The code which creates the logical destination map for VCPUs ignores the disabled state of the APIC and ends up overwriting an existing valid entry and as a result, APIC calibration hangs in the guest during kdump initialization. Remove the bogus LDR/DFR initialization. This is not intended to work around the KVM APIC bug. The LDR/DFR ininitalization is wrong on its own. The issue goes back into the pre git history. The fixes tag is the commit in the bitkeeper import which introduced bigsmp support in 2003. git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
2019-08-26x86/ftrace: Remove mcount() declarationJisheng Zhang
Commit 562e14f72292 ("ftrace/x86: Remove mcount support") removed the support for mcount, but forgot to remove the mcount() declaration. Clean it up. Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r20190826170150.10f101ba@xhacker.debian
2019-08-26uprobes/x86: Fix detection of 32-bit user modeSebastian Mayr
32-bit processes running on a 64-bit kernel are not always detected correctly, causing the process to crash when uretprobes are installed. The reason for the crash is that in_ia32_syscall() is used to determine the process's mode, which only works correctly when called from a syscall. In the case of uretprobes, however, the function is called from a exception and always returns 'false' on a 64-bit kernel. In consequence this leads to corruption of the process's return address. Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which is correct in any situation. [ tglx: Add a comment and the following historical info ] This should have been detected by the rename which happened in commit abfb9498ee13 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()") which states in the changelog: The is_ia32_task()/is_x32_task() function names are a big misnomer: they suggests that the compat-ness of a system call is a task property, which is not true, the compatness of a system call purely depends on how it was invoked through the system call layer. ..... and then it went and blindly renamed every call site. Sadly enough this was already mentioned here: 8faaed1b9f50 ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()") where the changelog says: TODO: is_ia32_task() is not what we actually want, TS_COMPAT does not necessarily mean 32bit. Fortunately syscall-like insns can't be probed so it actually works, but it would be better to rename and use is_ia32_frame(). and goes all the way back to: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions") Oh well. 7+ years until someone actually tried a uretprobe on a 32bit process on a 64bit kernel.... Fixes: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions") Signed-off-by: Sebastian Mayr <me@sam.st> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st
2019-08-26x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machinesThomas Gleixner
Rahul Tanwar reported the following bug on DT systems: > 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is > updated to the end of hardware IRQ numbers but this is done only when IOAPIC > configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is > a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration > comes from devicetree. > > See dtb_add_ioapic() in arch/x86/kernel/devicetree.c > > In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base' > remains to zero initialized value. This means that for OF based systems, > virtual IRQ base will get set to zero. Such systems will very likely not even boot. For DT enabled machines ioapic_dynirq_base is irrelevant and not updated, so simply map the IRQ base 1:1 instead. Reported-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Tested-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: alan@linux.intel.com Cc: bp@alien8.de Cc: cheol.yong.kim@intel.com Cc: qi-ming.wu@intel.com Cc: rahul.tanwar@intel.com Cc: rppt@linux.ibm.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26perf/x86/intel/pt: Get rid of reverse lookup table for ToPAAlexander Shishkin
In order to quickly find a ToPA entry by its page offset in the buffer, we're using a reverse lookup table. The problem with it is that it's a large array of mostly similar pointers, especially so now that we're using high order allocations from the page allocator. Because its size is limited to whatever is the maximum for kmalloc(), it places a limit on the number of ToPA entries per buffer, and therefore, on the total buffer size, which otherwise doesn't have to be there. Replace the reverse lookup table with a simple runtime lookup. With the high order AUX allocations in place, the runtime penalty of such a lookup is much smaller and in cases where all entries in a ToPA table are of the same size, the complexity is O(1). Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-7-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26perf/x86/intel/pt: Free up space in a ToPA descriptorAlexander Shishkin
Currently, we're storing physical address of a ToPA table in its descriptor, which is completely unnecessary. Since the descriptor and the table itself share the same page, reducing the descriptor size leaves more space for the table. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-6-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26perf/x86/intel/pt: Split ToPA metadata and page layoutAlexander Shishkin
PT uses page sized ToPA tables, where the ToPA table resides at the bottom and its driver-specific metadata taking up a few words at the top of the page. The split is currently calculated manually and needs to be redone every time a field is added to or removed from the metadata structure. Also, the 32-bit version can be made smaller. By splitting the table and metadata into separate structures, we are making the compiler figure out the division of the page. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-5-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculationAlexander Shishkin
Currently, pt_buffer_reset_offsets() calculates the current ToPA entry by casting pointers to addresses and performing ungainly subtractions and divisions instead of a simpler pointer arithmetic, which would be perfectly applicable in that case. Fix that. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26perf/x86/intel/pt: Use helpers to obtain ToPA entry sizeAlexander Shishkin
There are a few places in the PT driver that need to obtain the size of a ToPA entry, some of them for the current ToPA entry in the buffer. Use helpers for those, to make the lines shorter and more readable. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26perf/x86/intel/pt: Clean up ToPA allocation pathAlexander Shishkin
Some of the allocation parameters are passed as function arguments, while the CPU number for per-cpu allocation is passed via the buffer object. There's no reason for this. Pass the CPU as a function argument instead. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26Merge tag 'v5.3-rc6' into x86/cpu, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-25Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Fix a boot regression caused by the recent bootparam sanitizing change, which escaped the attention of all people who reviewed that code. - Address a boot problem on machines with broken E820 tables caused by an underflow which ended up placing the trampoline start at physical address 0. - Handle machines which do not advertise a legacy timer of any form, but need calibration of the local APIC timer gracefully by making the calibration routine independent from the tick interrupt. Marked for stable as well as there seems to be quite some new laptops rolled out which expose this. - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are affected by broken firmware which does not initialize RDRAND correctly after resume. Add a command line parameter to override this for machine which either do not use suspend/resume or have a fixed BIOS. Unfortunately there is no way to detect this on boot, so the only safe decision is to turn it off by default. - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which caused fast KVM instruction emulation to break. - Explain the Intel CPU model naming convention so that the repeating discussions come to an end" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 x86/boot: Fix boot regression caused by bootparam sanitizing x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h x86/boot/compressed/64: Fix boot on machines with broken E820 table x86/apic: Handle missing global clockevent gracefully x86/cpu: Explain Intel model naming convention
2019-08-25Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two small fixes for kprobes and perf: - Prevent a deadlock in kprobe_optimizer() causes by reverse lock ordering - Fix a comment typo" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Fix potential deadlock in kprobe_optimizer() perf/x86: Fix typo in comment
2019-08-23x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386Sean Christopherson
Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to avoid clobbering flags. KVM's emulator makes indirect calls into a jump table of sorts, where the destination of the CALL_NOSPEC is a small blob of code that performs fast emulation by executing the target instruction with fixed operands. adcb_al_dl: 0x000339f8 <+0>: adc %dl,%al 0x000339fa <+2>: ret A major motiviation for doing fast emulation is to leverage the CPU to handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is both an input and output to the target of CALL_NOSPEC. Clobbering flags results in all sorts of incorrect emulation, e.g. Jcc instructions often take the wrong path. Sans the nops... asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n" 0x0003595a <+58>: mov 0xc0(%ebx),%eax 0x00035960 <+64>: mov 0x60(%ebx),%edx 0x00035963 <+67>: mov 0x90(%ebx),%ecx 0x00035969 <+73>: push %edi 0x0003596a <+74>: popf 0x0003596b <+75>: call *%esi 0x000359a0 <+128>: pushf 0x000359a1 <+129>: pop %edi 0x000359a2 <+130>: mov %eax,0xc0(%ebx) 0x000359b1 <+145>: mov %edx,0x60(%ebx) ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); 0x000359a8 <+136>: mov -0x10(%ebp),%eax 0x000359ab <+139>: and $0x8d5,%edi 0x000359b4 <+148>: and $0xfffff72a,%eax 0x000359b9 <+153>: or %eax,%edi 0x000359bd <+157>: mov %edi,0x4(%ebx) For the most part this has gone unnoticed as emulation of guest code that can trigger fast emulation is effectively limited to MMIO when running on modern hardware, and MMIO is rarely, if ever, accessed by instructions that affect or consume flags. Breakage is almost instantaneous when running with unrestricted guest disabled, in which case KVM must emulate all instructions when the guest has invalid state, e.g. when the guest is in Big Real Mode during early BIOS. Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support") Fixes: 1a29b5b7f347a ("KVM: x86: Make indirect calls in emulator speculation safe") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.com
2019-08-23clocksource/drivers/hyperv: Enable TSC page clocksource on 32bitVitaly Kuznetsov
There is no particular reason to not enable TSC page clocksource on 32-bit. mul_u64_u64_shr() is available and despite the increased computational complexity (compared to 64bit) TSC page is still a huge win compared to MSR-based clocksource. In-kernel reads: MSR based clocksource: 3361 cycles TSC page clocksource: 49 cycles Reads from userspace (utilizing vDSO in case of TSC page): MSR based clocksource: 5664 cycles TSC page clocksource: 131 cycles Enabling TSC page on 32bits allows to get rid of CONFIG_HYPERV_TSCPAGE as it is now not any different from CONFIG_HYPERV_TIMER. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lkml.kernel.org/r/20190822083630.17059-1-vkuznets@redhat.com
2019-08-23clocksource/drivers/hyperv: Add Hyper-V specific sched clock functionTianyu Lan
Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the raw TSC value, which can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() to set the sched clock function appropriately. On x86, this sets pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted to be continuous. Also move the Hyper-V reference TSC initialization much earlier in the boot process so no discontinuity is observed when pv_ops.time.sched_clock calculates its offset. [ tglx: Folded build fix ] Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lkml.kernel.org/r/20190814123216.32245-3-Tianyu.Lan@microsoft.com
2019-08-23clocksource/drivers/hyperv: Allocate Hyper-V TSC page staticallyTianyu Lan
Prepare to add Hyper-V sched clock callback and move Hyper-V Reference TSC initialization much earlier in the boot process. Earlier initialization is needed so that it happens while the timestamp value is still 0 and no discontinuity in the timestamp will occur when pv_ops.time.sched_clock calculates its offset. The earlier initialization requires that the Hyper-V TSC page be allocated statically instead of with vmalloc(), so fixup the references to the TSC page and the method of getting its physical address. Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lkml.kernel.org/r/20190814123216.32245-2-Tianyu.Lan@microsoft.com
2019-08-23x86/dma: Get rid of iommu_pass_throughJoerg Roedel
This variable has no users anymore. Remove it and tell the IOMMU code via its new functions about requested DMA modes. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-08-22Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - A few more RCU flavor consolidation cleanups. - Miscellaneous fixes. - Updates to RCU's list-traversal macros improving lockdep usability. - Torture-test updates. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - LKMM updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-22KVM: VMX: Fix and tweak the comments for VM-EnterSean Christopherson
Fix an incorrect/stale comment regarding the vmx_vcpu pointer, as guest registers are now loaded using a direct pointer to the start of the register array. Opportunistically add a comment to document why the vmx_vcpu pointer is needed, its consumption via 'call vmx_update_host_rsp' is rather subtle. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: Assert that struct kvm_vcpu is always as offset zeroSean Christopherson
KVM implementations that wrap struct kvm_vcpu with a vendor specific struct, e.g. struct vcpu_vmx, must place the vcpu member at offset 0, otherwise the usercopy region intended to encompass struct kvm_vcpu_arch will instead overlap random chunks of the vendor specific struct. E.g. padding a large number of bytes before struct kvm_vcpu triggers a usercopy warn when running with CONFIG_HARDENED_USERCOPY=y. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: X86: Add pv tlb shootdown tracepointWanpeng Li
Add pv tlb shootdown tracepoint. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: Unconditionally call x86 ops that are always implementedSean Christopherson
Remove a few stale checks for non-NULL ops now that the ops in question are implemented by both VMX and SVM. Note, this is **not** stable material, the Fixes tags are there purely to show when a particular op was first supported by both VMX and SVM. Fixes: 74f169090b6f ("kvm/svm: Setup MCG_CAP on AMD properly") Fixes: b31c114b82b2 ("KVM: X86: Provide a capability to disable PAUSE intercepts") Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86/mmu: Consolidate "is MMIO SPTE" codeSean Christopherson
Replace the open-coded "is MMIO SPTE" checks in the MMU warnings related to software-based access/dirty tracking to make the code slightly more self-documenting. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86/mmu: Add explicit access mask for MMIO SPTEsSean Christopherson
When shadow paging is enabled, KVM tracks the allowed access type for MMIO SPTEs so that it can do a permission check on a MMIO GVA cache hit without having to walk the guest's page tables. The tracking is done by retaining the WRITE and USER bits of the access when inserting the MMIO SPTE (read access is implicitly allowed), which allows the MMIO page fault handler to retrieve and cache the WRITE/USER bits from the SPTE. Unfortunately for EPT, the mask used to retain the WRITE/USER bits is hardcoded using the x86 paging versions of the bits. This funkiness happens to work because KVM uses a completely different mask/value for MMIO SPTEs when EPT is enabled, and the EPT mask/value just happens to overlap exactly with the x86 WRITE/USER bits[*]. Explicitly define the access mask for MMIO SPTEs to accurately reflect that EPT does not want to incorporate any access bits into the SPTE, and so that KVM isn't subtly relying on EPT's WX bits always being set in MMIO SPTEs, e.g. attempting to use other bits for experimentation breaks horribly. Note, vcpu_match_mmio_gva() explicits prevents matching GVA==0, and all TDP flows explicit set mmio_gva to 0, i.e. zeroing vcpu->arch.access for EPT has no (known) functional impact. [*] Using WX to generate EPT misconfigurations (equivalent to reserved bit page fault) ensures KVM can employ its MMIO page fault tricks even platforms without reserved address bits. Fixes: ce88decffd17 ("KVM: MMU: mmio page fault support") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: Rename access permissions cache member in struct kvm_vcpu_archSean Christopherson
Rename "access" to "mmio_access" to match the other MMIO cache members and to make it more obvious that it's tracking the access permissions for the MMIO cache. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()Vitaly Kuznetsov
Just like we do with other intercepts, in vmrun_interception() we should be doing kvm_skip_emulated_instruction() and not just RIP += 3. Also, it is wrong to increment RIP before nested_svm_vmrun() as it can result in kvm_inject_gp(). We can't call kvm_skip_emulated_instruction() after nested_svm_vmrun() so move it inside. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: eliminate weird goto from vmrun_interception()Vitaly Kuznetsov
Regardless of whether or not nested_svm_vmrun_msrpm() fails, we return 1 from vmrun_interception() so there's no point in doing goto. Also, nested_svm_vmrun_msrpm() call can be made from nested_svm_vmrun() where other nested launch issues are handled. nested_svm_vmrun() returns a bool, however, its result is ignored in vmrun_interception() as we always return '1'. As a preparatory change to putting kvm_skip_emulated_instruction() inside nested_svm_vmrun() make nested_svm_vmrun() return an int (always '1' for now). Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: remove hardcoded instruction length from interceptsVitaly Kuznetsov
Various intercepts hard-code the respective instruction lengths to optimize skip_emulated_instruction(): when next_rip is pre-set we skip kvm_emulate_instruction(vcpu, EMULTYPE_SKIP). The optimization is, however, incorrect: different (redundant) prefixes could be used to enlarge the instruction. We can't really avoid decoding. svm->next_rip is not used when CPU supports 'nrips' (X86_FEATURE_NRIPS) feature: next RIP is provided in VMCB. The feature is not really new (Opteron G3s had it already) and the change should have zero affect. Remove manual svm->next_rip setting with hard-coded instruction lengths. The only case where we now use svm->next_rip is EXIT_IOIO: the instruction length is provided to us by hardware. Hardcoded RIP advancement remains in vmrun_interception(), this is going to be taken care of separately. Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: add xsetbv to the emulatorVitaly Kuznetsov
To avoid hardcoding xsetbv length to '3' we need to support decoding it in the emulator. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: clear interrupt shadow on EMULTYPE_SKIPVitaly Kuznetsov
When doing x86_emulate_instruction(EMULTYPE_SKIP) interrupt shadow has to be cleared if and only if the skipping is successful. There are two immediate issues: - In SVM skip_emulated_instruction() we are not zapping interrupt shadow in case kvm_emulate_instruction(EMULTYPE_SKIP) is used to advance RIP (!nrpip_save). - In VMX handle_ept_misconfig() when running as a nested hypervisor we (static_cpu_has(X86_FEATURE_HYPERVISOR) case) forget to clear interrupt shadow. Note that we intentionally don't handle the case when the skipped instruction is supposed to prolong the interrupt shadow ("MOV/POP SS") as skip-emulation of those instructions should not happen under normal circumstances. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: kvm: svm: propagate errors from skip_emulated_instruction()Vitaly Kuznetsov
On AMD, kvm_x86_ops->skip_emulated_instruction(vcpu) can, in theory, fail: in !nrips case we call kvm_emulate_instruction(EMULTYPE_SKIP). Currently, we only do printk(KERN_DEBUG) when this happens and this is not ideal. Propagate the error up the stack. On VMX, skip_emulated_instruction() doesn't fail, we have two call sites calling it explicitly: handle_exception_nmi() and handle_task_switch(), we can just ignore the result. On SVM, we also have two explicit call sites: svm_queue_exception() and it seems we don't need to do anything there as we check if RIP was advanced or not. In task_switch_interception(), however, we are better off not proceeding to kvm_task_switch() in case skip_emulated_instruction() failed. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22x86: KVM: svm: don't pretend to advance RIP in case wrmsr_interception() ↵Vitaly Kuznetsov
results in #GP svm->next_rip is only used by skip_emulated_instruction() and in case kvm_set_msr() fails we rightfully don't do that. Move svm->next_rip advancement to 'else' branch to avoid creating false impression that it's always advanced (and make it look like rdmsr_interception()). This is a preparatory change to removing hardcoded RIP advancement from instruction intercepts, no functional change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: Fix x86_decode_insn() return when fetching insn bytes failsSean Christopherson
Jump to the common error handling in x86_decode_insn() if __do_insn_fetch_bytes() fails so that its error code is converted to the appropriate return type. Although the various helpers used by x86_decode_insn() return X86EMUL_* values, x86_decode_insn() itself returns EMULATION_FAILED or EMULATION_OK. This doesn't cause a functional issue as the sole caller, x86_emulate_instruction(), currently only cares about success vs. failure, and success is indicated by '0' for both types (X86EMUL_CONTINUE and EMULATION_OK). Fixes: 285ca9e948fa ("KVM: emulate: speed up do_insn_fetch") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: use Intel speculation bugs and features as derived in generic x86 codePaolo Bonzini
Similar to AMD bits, set the Intel bits from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on AMD processors as well. Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: always expose VIRT_SSBD to guestsPaolo Bonzini
Even though it is preferrable to use SPEC_CTRL (represented by X86_FEATURE_AMD_SSBD) instead of VIRT_SPEC, VIRT_SPEC is always supported anyway because otherwise it would be impossible to migrate from old to new CPUs. Make this apparent in the result of KVM_GET_SUPPORTED_CPUID as well. However, we need to hide the bit on Intel processors, so move the setting to svm_set_supported_cpuid. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22KVM: x86: fix reporting of AMD speculation bug CPUID leafPaolo Bonzini
The AMD_* bits have to be set from the vendor-independent feature and bug flags, because KVM_GET_SUPPORTED_CPUID does not care about the vendor and they should be set on Intel processors as well. On top of this, SSBD, STIBP and AMD_SSB_NO bit were not set, and VIRT_SSBD does not have to be added manually because it is a cpufeature that comes directly from the host's CPUID bit. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22crypto: sha256 - Make lib/crypto/sha256.c suitable for generic useHans de Goede
Before this commit lib/crypto/sha256.c has only been used in the s390 and x86 purgatory code, make it suitable for generic use: * Export interesting symbols * Add -D__DISABLE_EXPORTS to CFLAGS_sha256.o for purgatory builds to avoid the exports for the purgatory builds * Add to lib/crypto/Makefile and crypto/Kconfig Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: sha256 - Move lib/sha256.c to lib/cryptoHans de Goede
Generic crypto implementations belong under lib/crypto not directly in lib, likewise the header should be in include/crypto, not include/linux. Note that the code in lib/crypto/sha256.c is not yet available for generic use after this commit, it is still only used by the s390 and x86 purgatory code. Making it suitable for generic use is done in further patches in this series. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: x86/xts - implement support for ciphertext stealingArd Biesheuvel
Align the x86 code with the generic XTS template, which now supports ciphertext stealing as described by the IEEE XTS-AES spec P1619. Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: x86/des - switch to library interfaceArd Biesheuvel
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-22crypto: des - split off DES library from generic DES cipher driverArd Biesheuvel
Another one for the cipher museum: split off DES core processing into a separate module so other drivers (mostly for crypto accelerators) can reuse the code without pulling in the generic DES cipher itself. This will also permit the cipher interface to be made private to the crypto API itself once we move the only user in the kernel (CIFS) to this library interface. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>