summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2016-12-12Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU updates from Ingo Molnar: "The changes in this development cycle were: - AMD CPU topology enhancements that are cleanups on current CPUs but which enable future Fam17 hardware. (Yazen Ghannam) - unify bugs.c and bugs_64.c (Borislav Petkov) - remove the show_msr= boot option (Borislav Petkov) - simplify a boot message (Borislav Petkov)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature x86/cpu: Get rid of the show_msr= boot option x86/cpu: Merge bugs.c and bugs_64.c x86/cpu: Remove the printk format specifier in "CPU0: "
2016-12-12Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Two cleanups in the LDT handling code, by Dan Carpenter and Thomas Gleixner" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Make all size computations unsigned x86/ldt: Make a size argument unsigned
2016-12-12Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "The main changes in this cycle were: - Makefile improvements (Paul Bolle) - KConfig cleanups to better separate 32-bit only, 64-bit only and generic feature enablement sections (Ingo Molnar)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove three unneeded genhdr-y entries x86/build: Don't use $(LINUXINCLUDE) twice x86/kconfig: Sort the 'config X86' selects alphabetically x86/kconfig: Clean up 32-bit compat options x86/kconfig: Clean up IA32_EMULATION select x86/kconfig, x86/pkeys: Move pkeys selects to X86_INTEL_MEMORY_PROTECTION_KEYS x86/kconfig: Move 64-bit only arch Kconfig selects to 'config X86_64' x86/kconfig: Move 32-bit only arch Kconfig selects to 'config X86_32'
2016-12-12Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc cleanups/simplifications by Borislav Petkov, Paul Bolle and Wei Yang" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Optimize fixmap page fixup x86/boot: Simplify the GDTR calculation assembly code a bit x86/boot/build: Remove always empty $(USERINCLUDE)
2016-12-12Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...
2016-12-12Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Misc changes: - optimize (reduce) IRQ handler tracing overhead (Wanpeng Li) - clean up MSR helpers (Borislav Petkov) - fix build warning on some configs (Sebastian Andrzej Siewior)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Cleanup/streamline MSR helpers x86/apic: Prevent tracing on apic_msr_write_eoi() x86/msr: Add wrmsr_notrace() x86/apic: Get rid of "warning: 'acpi_ioapic_lock' defined but not used"
2016-12-12Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "The main changes in this development cycle were: - more AMD northbridge support work, mostly in preparation for Fam17h CPUs (Yazen Ghannam, Borislav Petkov) - cleanups/refactorings and fixes (Borislav Petkov, Tony Luck, Yinghai Lu)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Include the PPIN in MCE records when available x86/mce/AMD: Add system physical address translation for AMD Fam17h x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h x86/amd_nb: Add Fam17h Data Fabric as "Northbridge" x86/amd_nb: Make all exports EXPORT_SYMBOL_GPL x86/amd_nb: Make amd_northbridges internal to amd_nb.c x86/mce/AMD: Reset Threshold Limit after logging error x86/mce/AMD: Fix HWID_MCATYPE calculation by grouping arguments x86/MCE: Correct TSC timestamping of error records x86/RAS: Hide SMCA bank names x86/RAS: Rename smca_bank_names to smca_names x86/RAS: Simplify SMCA HWID descriptor struct x86/RAS: Simplify SMCA bank descriptor struct x86/MCE: Dump MCE to dmesg if no consumers x86/RAS: Add TSC timestamp to the injected MCE x86/MCE: Do not look at panic_on_oops in the severity grading
2016-12-12Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a notion of 'better cores', which the scheduler will prefer to schedule single threaded workloads on. (Tim Chen, Srinivas Pandruvada) - enhance the handling of asymmetric capacity CPUs further (Morten Rasmussen) - improve/fix load handling when moving tasks between task groups (Vincent Guittot) - simplify and clean up the cputime code (Stanislaw Gruszka) - improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent Guittot) - make struct kthread kmalloc()ed and related fixes (Oleg Nesterov) - add uaccess atomicity debugging (when using access_ok() in the wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra) - implement various fixes, cleanups and other enhancements (Daniel Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched/core: Use load_avg for selecting idlest group sched/core: Fix find_idlest_group() for fork kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() kthread: Don't use to_live_kthread() in kthread_[un]park() kthread: Don't use to_live_kthread() in kthread_stop() Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function" kthread: Make struct kthread kmalloc'ed x86/uaccess, sched/preempt: Verify access_ok() context sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> cpufreq/intel_pstate: Use CPPC to get max performance acpi/bus: Set _OSC for diverse core support acpi/bus: Enable HWP CPPC objects x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU x86/sysctl: Add sysctl for ITMT scheduling feature x86: Enable Intel Turbo Boost Max Technology 3.0 x86/topology: Define x86's arch_update_cpu_topology sched: Extend scheduler's asym packing sched/fair: Clean up the tunable parameter definitions ...
2016-12-12Merge branches 'acpi-soc', 'acpi-battery', 'acpi-video', 'acpi-cppc' and ↵Rafael J. Wysocki
'acpi-apei' * acpi-soc: ACPI / LPSS: enable hard LLP for DMA ACPI / APD: Add clock frequency for future AMD I2C controller * acpi-battery: ACPI / battery: If _BIX fails, retry with _BIF * acpi-video: ACPI / video: Add force_native quirk for HP Pavilion dv6 ACPI / video: Add force_native quirk for Dell XPS 17 L702X ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h * acpi-cppc: ACPI / CPPC: set an error code on probe error path * acpi-apei: ACPI / APEI / ARM64: APEI initial support for ARM64 ACPI / APEI: Fix NMI notification handling
2016-12-12Merge branches 'pm-sleep' and 'powercap'Rafael J. Wysocki
* pm-sleep: PM / sleep: Print active wakeup sources when blocking on wakeup_count reads x86/suspend: fix false positive KASAN warning on suspend/resume PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag PM / sleep: System sleep state selection interface rework PM / hibernate: Verify the consistent of e820 memory map by md5 digest * powercap: powercap / RAPL: Add Knights Mill CPUID powercap/intel_rapl: fix and tidy up error handling powercap/intel_rapl: Track active CPUs internally powercap/intel_rapl: Cleanup duplicated init code powercap/intel rapl: Convert to hotplug state machine powercap/intel_rapl: Propagate error code when registration fails powercap/intel_rapl: Add missing domain data update on hotplug
2016-12-12Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "This update is pretty big and almost exclusively includes tooling changes, because v4.9's LTS status forced to completion most of the pending kernel side hardware enablement work and because we tried to freeze core perf work a bit to give a time window for the fuzzing efforts. The diff is large mostly due to the JSON hardware event tables added for Intel and Power8 CPUs. This was a popular feature request from people working close to hardware and from the HPC community. Tree size is big because this added the CPU event tables for over a decade of Intel CPUs. Future changes for a CPU vendor alrady support should be much smaller, as events for new models are added. The new events are listed in 'perf list', for the CPU model the tool is running on. If you find an interesting event it can be used as-is: $ perf stat -a -e l2_lines_out.pf_clean sleep 1 Performance counter stats for 'system wide': 7,860,403 l2_lines_out.pf_clean 1.000624918 seconds time elapsed The event lists can be searched the usual 'perf list' fashion for (case insensitive) substrings as well: $ perf list l2_lines_out List of pre-defined events (to be used in -e): cache: l2_lines_out.demand_clean [Clean L2 cache lines evicted by demand] l2_lines_out.demand_dirty [Dirty L2 cache lines evicted by demand] l2_lines_out.dirty_all [Dirty L2 cache lines filling the L2] l2_lines_out.pf_clean [Clean L2 cache lines evicted by L2 prefetch] l2_lines_out.pf_dirty [Dirty L2 cache lines evicted by L2 prefetch] etc. There's a few high level categories as well that can be listed: 'cache', 'floating point', 'frontend', 'memory', 'pipeline', 'virtual memory'. Existing generic events and workflows should work as-is. The only kernel side change is a late breaking fix for an older regression, related to Intel BTS, LBR and PT feature interaction. On the tooling side there are three new tools / major features: - The new 'perf c2c' tool provides means for Shared Data C2C/HITM analysis. This allows you to track down cacheline contention. The tool is based on x86's load latency and precise store facility events provided by Intel CPUs. It was tested by Joe Mario and has proven to be useful, finding some cacheline contentions. Joe also wrote a blog about c2c tool with examples: https://joemario.github.io/blog/2016/09/01/c2c-blog/ excerpt of the content on this site: At a high level, “perf c2c” will show you: * The cachelines where false sharing was detected. * The readers and writers to those cachelines, and the offsets where those accesses occurred. * The pid, tid, instruction addr, function name, binary object name for those readers and writers. * The source file and line number for each reader and writer. * The average load latency for the loads to those cachelines. * Which numa nodes the samples a cacheline came from and which CPUs were involved. Using perf c2c is similar to using the Linux perf tool today. First collect data with “perf c2c record”, then generate a report output with “perf c2c report” There one finds extensive details on using the tool, with tips on reducing the volume of samples while still capturing enough to do its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa) - The new 'perf sched timehist' tool provides tailored analysis of scheduling events. Example usage: perf sched record -- sleep 1 perf sched timehist By default it shows the individual schedule events, including the wait time (time between sched-out and next sched-in events for the task), the task scheduling delay (time between wakeup and actually running) and run time for the task: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) -------- ------ ---------------- --------- --------- -------- 1.874569 [0011] gcc[31949] 0.014 0.000 1.148 1.874591 [0010] gcc[31951] 0.000 0.000 0.024 1.874603 [0010] migration/10[59] 3.350 0.004 0.011 1.874604 [0011] <idle> 1.148 0.000 0.035 1.874723 [0005] <idle> 0.016 0.000 1.383 1.874746 [0005] gcc[31949] 0.153 0.078 0.022 ... Times are in msec.usec. (David Ahern, Namhyung Kim) - Add CPU vendor hardware event tables: Add JSON files with vendor event naming for Intel and Power8 processors, allowing users of tools like oprofile to keep using the event names they are used to, as well as people reading vendor documentation, where such naming is used. (Andi Kleen, Sukadev Bhattiprolu) You should see all the new events with 'perf list' and you should be able to search them, for example 'perf list miss' will list all the myriads of miss events. Other tooling features added were: - Cross-arch annotation support: o Improve ARM support in the annotation code, affecting 'perf annotate', 'perf report' and live annotation in 'perf top' (Kim Phillips) o Initial support for PowerPC in the annotation code (Ravi Bangoria) o Support AArch64 in the 'annotate' code, native/local and cross-arch/remote (Kim Phillips) - Allow considering just events in a given time interval, via the '--time start.s.ms,end.s.ms' command line, added to 'perf kmem', 'perf report', 'perf sched timehist' and 'perf script' (David Ahern) - Add option to stop printing a callchain at one of a given group of symbol names (David Ahern) - Track memory freed in 'perf kmem stat' (David Ahern) - Allow querying and setting .perfconfig variables (Taeung Song) - Show branch information in callchains (predicted, TSX aborts, loop iteractions, etc) (Jin Yao) - Dynamicly change verbosity level by pressing 'V' in the 'perf top/report' hists TUI browser (Alexis Berlemont) - Implement 'perf trace --delay' in the same fashion as in 'perf record --delay', to skip sampling workload initialization events (Alexis Berlemont) - Make vendor named events case insensitive in 'perf list', i.e. 'perf list LONGEST_LAT' works just the same as 'perf list longest_lat' (Andi Kleen) - Add unwinding support for jitdump (Stefano Sanfilippo) Tooling infrastructure changes: - Support linking perf with clang and LLVM libraries, initially statically, but this limitation will be lifted and shared libraries, when available, will be preferred to the static build, that should, as with other features, be enabled explicitly (Wang Nan) - Add initial support (and perf test entry) for tooling hooks, starting with 'record_start' and 'record_end', that will have as its initial user the eBPF infrastructure, where perf_ prefixed functions will be JITed and run when such hooks are called (Wang Nan) - Implement assorted libbpf improvements (Wang Nan)" ... and lots of other changes, features, cleanups and refactorings I did not list, see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (220 commits) perf/x86: Fix exclusion of BTS and LBR for Goldmont perf tools: Explicitly document that --children is enabled by default perf sched timehist: Cleanup idle_max_cpu handling perf sched timehist: Handle zero sample->tid properly perf callchain: Introduce callchain_cursor__copy() perf sched: Cleanup option processing perf sched timehist: Improve error message when analyzing wrong file perf tools: Move perf build related variables under non fixdep leg perf tools: Force fixdep compilation at the start of the build perf tools: Move PERF-VERSION-FILE target into rules area perf build: Check LLVM version in feature check perf annotate: Show raw form for jump instruction with indirect target perf tools: Add non config targets perf tools: Cleanup build directory before each test perf tools: Move python/perf.so target into rules area perf tools: Move install-gtk target into rules area tools build: Move tabs to spaces where suitable tools build: Make the .cmd file more readable perf clang: Compile BPF script using builtin clang support perf clang: Support compile IR to BPF object and add testcase ...
2016-12-12Merge branch 'mm-pat-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mm/PAT cleanup from Ingo Molnar: "A single cleanup for a generic interface that was originally introduced for PAT" * 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pat, mm: Make track_pfn_insert() return void
2016-12-12Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...
2016-12-12Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this development cycle were: - Implement EFI dev path parser and other changes to fully support thunderbolt devices on Apple Macbooks (Lukas Wunner) - Add RNG seeding via the EFI stub, on ARM/arm64 (Ard Biesheuvel) - Expose EFI framebuffer configuration to user-space, to improve tooling (Peter Jones) - Misc fixes and cleanups (Ivan Hu, Wei Yongjun, Yisheng Xie, Dan Carpenter, Roy Franz)" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit thunderbolt: Compile on x86 only thunderbolt, efi: Fix Kconfig dependencies harder thunderbolt, efi: Fix Kconfig dependencies thunderbolt: Use Device ROM retrieved from EFI x86/efi: Retrieve and assign Apple device properties efi: Allow bitness-agnostic protocol calls efi: Add device path parser efi/arm*/libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table efi/libstub: Add random.c to ARM build efi: Add support for seeding the RNG from a UEFI config table MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem efi/libstub: Fix allocation size calculations efi/efivar_ssdt_load: Don't return success on allocation failure efifb: Show framebuffer layout as device attributes efi/efi_test: Use memdup_user() as a cleanup efi/efi_test: Fix uninitialized variable 'rv' efi/efi_test: Fix uninitialized variable 'datasize' efi/arm*: Fix efi_init() error handling efi: Remove unused include of <linux/version.h>
2016-12-12Merge branch 'core-smp-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP bootup updates from Ingo Molnar: "Three changes to unify/standardize some of the bootup message printing in kernel/smp.c between architectures" * 'core-smp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/smp: Tell the user we're bringing up secondary CPUs kernel/smp: Make the SMP boot message common on all arches kernel/smp: Define pr_fmt() for smp.c
2016-12-11Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11x86/paravirt: Fix bool return type for PVOP_CALL()Peter Zijlstra
Commit: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") introduced a paravirt op with bool return type [*] It turns out that the PVOP_CALL*() macros miscompile when rettype is bool. Code that looked like: 83 ef 01 sub $0x1,%edi ff 15 32 a0 d8 00 callq *0xd8a032(%rip) # ffffffff81e28120 <pv_lock_ops+0x20> 84 c0 test %al,%al ended up looking like so after PVOP_CALL1() was applied: 83 ef 01 sub $0x1,%edi 48 63 ff movslq %edi,%rdi ff 14 25 20 81 e2 81 callq *0xffffffff81e28120 48 85 c0 test %rax,%rax Note how it tests the whole of %rax, even though a typical bool return function only sets %al, like: 0f 95 c0 setne %al c3 retq This is because ____PVOP_CALL() does: __ret = (rettype)__eax; and while regular integer type casts truncate the result, a cast to bool tests for any !0 value. Fix this by explicitly truncating to sizeof(rettype) before casting. [*] The actual bug should've been exposed in commit: 446f3dc8cc0a ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests") but that didn't properly implement the paravirt call. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11x86/paravirt: Fix native_patch()Peter Zijlstra
While chasing a regression I noticed we potentially patch the wrong code in native_patch(). If we do not select the native code sequence, we must use the default patcher, not fall-through the switch case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <xiaolong.ye@intel.com> Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.270616999@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11perf/x86: Fix exclusion of BTS and LBR for GoldmontAndi Kleen
An earlier patch allowed enabling PT and LBR at the same time on Goldmont. However it also allowed enabling BTS and LBR at the same time, which is still not supported. Fix this by bypassing the check only for PT. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alexander.shishkin@intel.com Cc: kan.liang@intel.com Cc: <stable@vger.kernel.org> Fixes: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") Link: http://lkml.kernel.org/r/20161209001417.4713-1-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-10x86/ldt: Make all size computations unsignedThomas Gleixner
ldt->size can never be negative. The helper functions take 'unsigned int' arguments which are assigned from ldt->size. The related user space user_desc struct member entry_number is unsigned as well. But ldt->size itself and a few local variables which are related to ldt->size are type 'int' which makes no sense whatsoever and results in typecasts which make the eyes bleed. Clean it up and convert everything which is related to ldt->size to unsigned it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Carpenter <dan.carpenter@oracle.com>
2016-12-10x86/ldt: Make a size argument unsignedDan Carpenter
My static checker complains that we put an upper bound on the "size" argument but not a lower bound. The checker is not smart enough to know the possible ranges of "old_mm->context.ldt->size" from init_new_context_ldt() so it thinks maybe it could be negative. Let's make it unsigned to silence the warning and future proof the code a bit. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: kernel-janitors@vger.kernel.org Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161208105602.GA11382@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86: Remove empty idle.h headerThomas Gleixner
One include less is always a good thing(tm). Good riddance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/amd: Simplify AMD E400 aware idle routineBorislav Petkov
Reorganize the E400 detection now that we have everything in place: switch the CPUs to broadcast mode after the LAPIC has been initialized and remove the facilities that were used previously on the idle path. Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine because alternatives have been applied when the actual detection happens, so the static switching does not take effect and the test will stay false. Use boot_cpu_has_bug() instead which is definitely an improvement over the RDMSR and the cpumask handling. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/amd: Check for the C1E bug post ACPI subsystem initThomas Gleixner
AMD CPUs affected by the E400 erratum suffer from the issue that the local APIC timer stops when the CPU goes into C1E. Unfortunately there is no way to detect the affected CPUs on early boot. It's only possible to determine the range of possibly affected CPUs from the family/model range. The actual decision whether to enter C1E and thus cause the bug is done by the firmware and we need to detect that case late, after ACPI has been initialized. The current solution is to check in the idle routine whether the CPU is affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is affected and the system is switched into forced broadcast mode. This is ineffective and on non-affected CPUs every entry to idle does the extra RDMSR. After doing some research it turns out that the bits are visible on the boot CPU right after the ACPI subsystem is initialized in the early boot process. So instead of polling for the bits in the idle loop, add a detection function after acpi_subsystem_init() and check for the MSR bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it will stop in C1E state as well. The switch to broadcast mode cannot be done at this point because the boot CPU still uses HPET as a clockevent device and the local APIC timer is not yet calibrated and installed. The switch to broadcast mode on the affected CPUs needs to be done when the local APIC timer is actually set up. This allows to cleanup the amd_e400_idle() function in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/bugs: Separate AMD E400 erratum and C1E bugThomas Gleixner
The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E state) is a two step process: - Selection of the E400 aware idle routine - Detection whether the platform is affected The idle routine selection happens for possibly affected CPUs depending on family/model/stepping information. These range of CPUs is not necessarily affected as the decision whether to enable the C1E feature is made by the firmware. Unfortunately there is no way to query this at early boot. The current implementation polls a MSR in the E400 aware idle routine to detect whether the CPU is affected. This is inefficient on non affected CPUs because every idle entry has to do the MSR read. There is a better way to detect this before going idle for the first time which requires to seperate the bug flags: X86_BUG_AMD_E400 - Selects the E400 aware idle routine and enables the detection X86_BUG_AMD_APIC_C1E - Set when the platform is affected by E400 Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400 bug bit to select the idle routine which currently does an unconditional detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches to remove the MSR polling and simplify the handling of this misfeature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/cpufeature: Provide helper to set bugs bitsBorislav Petkov
Will be used in a later patch to set bug bits for bugs which need late detection. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps ↵Steven Rostedt (Red Hat)
to it With new binutils, gcc may get smart with its optimization and change a jmp from a 5 byte jump to a 2 byte one even though it was jumping to a global function. But that global function existed within a 2 byte radius, and gcc was able to optimize it. Unfortunately, that jump was also being modified when function graph tracing begins. Since ftrace expected that jump to be 5 bytes, but it was only two, it overwrote code after the jump, causing a crash. This was fixed for x86_64 with commit 8329e818f149, with the same subject as this commit, but nothing was done for x86_32. Cc: stable@vger.kernel.org Fixes: d61f82d06672 ("ftrace: use dynamic patching for updating mcount calls") Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09tracing: Have the reg function allow to failSteven Rostedt (Red Hat)
Some tracepoints have a registration function that gets enabled when the tracepoint is enabled. There may be cases that the registraction function must fail (for example, can't allocate enough memory). In this case, the tracepoint should also fail to register, otherwise the user would not know why the tracepoint is not working. Cc: David Howells <dhowells@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Anton Blanchard <anton@samba.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-12-09x86/intel_rdt: Implement show_options() for resctrlfsShaohua Li
Implement show_options() callback for intel resource control filesystem to expose the active mount options in /proc/ Signed-off-by: Shaohua Li <shli@fb.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/7dce7c1886ac9289442d254ea18322c92bd968da.1480717072.git.shli@fb.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09xen/x86: Increase xen_e820_map to E820_X_MAX possible entriesAlex Thorlton
On systems with sufficiently large e820 tables, and several IOAPICs, it is possible for the XENMEM_machine_memory_map callback (and its counterpart, XENMEM_memory_map) to attempt to return an e820 table with more than 128 entries. This callback adds entries to the BIOS-provided e820 table to account for IOAPIC registers, which, on sufficiently large systems, can result in an e820 table that is too large to copy back into xen_e820_map. This change simply increases the size of xen_e820_map to E820_X_MAX to ensure that there is enough room to store the entire e820 map returned from this callback. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-12-09x86: Make E820_X_MAX unconditionally larger than E820MAXAlex Thorlton
It's really not necessary to limit E820_X_MAX to 128 in the non-EFI case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that E820_X_MAX is always at least slightly larger than E820MAX. The real motivation behind this is actually to prevent some issues in the Xen kernel, where the XENMEM_machine_memory_map hypercall can produce an e820 map larger than 128 entries, even on systems where the original e820 table was quite a bit smaller than that, depending on how many IOAPICs are installed on the system. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08KVM: x86: Handle the kthread worker using the new APIPetr Mladek
Use the new API to create and destroy the "kvm-pit" kthread worker. The API hides some implementation details. In particular, kthread_create_worker() allocates and initializes struct kthread_worker. It runs the kthread the right way and stores task_struct into the worker structure. kthread_destroy_worker() flushes all pending works, stops the kthread and frees the structure. This patch does not change the existing behavior except for dynamically allocating struct kthread_worker and storing only the pointer of this structure. It is compile tested only because I did not find an easy way how to run the code. Well, it should be pretty safe given the nature of the change. Signed-off-by: Petr Mladek <pmladek@suse.com> Message-Id: <1476877847-11217-1-git-send-email-pmladek@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-08KVM: nVMX: invvpid handling improvementsJan Dakinevich
- Expose all invalidation types to the L1 - Reject invvpid instruction, if L1 passed zero vpid value to single context invalidations Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-08KVM: nVMX: check host CR3 on vmentry and vmexitLadi Prosek
This commit adds missing host CR3 checks. Before entering guest mode, the value of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is called to set the new CR3 value and check and load PDPTRs. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentryLadi Prosek
Loading CR3 as part of emulating vmentry is different from regular CR3 loads, as implemented in kvm_set_cr3, in several ways. * different rules are followed to check CR3 and it is desirable for the caller to distinguish between the possible failures * PDPTRs are not loaded if PAE paging and nested EPT are both enabled * many MMU operations are not necessary This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of nested vmentry and vmexit, and makes use of it on the nested vmentry path. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: propagate errors from prepare_vmcs02Ladi Prosek
It is possible that prepare_vmcs02 fails to load the guest state. This patch adds the proper error handling for such a case. L1 will receive an INVALID_STATE vmexit with the appropriate exit qualification if it happens. A failure to set guest CR3 is the only error propagated from prepare_vmcs02 at the moment. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPTLadi Prosek
KVM does not correctly handle L1 hypervisors that emulate L2 real mode with PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two related issues: 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are overwritten in ept_load_pdptrs because the registers are believed to have been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is running with PAE enabled but PDPTRs have been set up by L1. 2) When L2 is about to enable paging and loads its CR3, we, again, attempt to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees that this will succeed (it's just a CR3 load, paging is not enabled yet) and if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is then lost and L2 crashes right after it enables paging. This patch replaces the kvm_set_cr3 call with a simple register write if PAE and EPT are both on. CR3 is not to be interpreted in this case. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entryDavid Matlack
vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry is more faithful to VMCS12. This patch correctly causes VM-entry to fail when "IA-32e mode guest" is 1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and "IA-32e mode guest" would silently be disabled by KVM. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUIDDavid Matlack
MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to be 1 during VMX operation. Since the set of allowed-1 bits is the same in and out of VMX operation, we can generate these MSRs entirely from the guest's CPUID. This lets userspace avoiding having to save/restore these MSRs. This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs by default. This is a saner than the current default of -1ull, which includes bits that the host CPU does not support. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: fix checks on CR{0,4} during virtual VMX operationDavid Matlack
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning all CR0 and CR4 bits are allowed to be 1 during VMX operation. This does not match real hardware, which disallows the high 32 bits of CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits which are defined in the SDM but missing according to CPUID). A guest can induce a VM-entry failure by setting these bits in GUEST_CR0 and GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are valid. Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing checks on these registers do not verify must-be-0 bits. Fix these checks to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1. This patch should introduce no change in behavior in KVM, since these MSRs are still -1ULL. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: support restore of VMX capability MSRsDavid Matlack
The VMX capability MSRs advertise the set of features the KVM virtual CPU can support. This set of features varies across different host CPUs and KVM versions. This patch aims to addresses both sources of differences, allowing VMs to be migrated across CPUs and KVM versions without guest-visible changes to these MSRs. Note that cross-KVM- version migration is only supported from this point forward. When the VMX capability MSRs are restored, they are audited to check that the set of features advertised are a subset of what KVM and the CPU support. Since the VMX capability MSRs are read-only, they do not need to be on the default MSR save/restore lists. The userspace hypervisor can set the values of these MSRs or read them from KVM at VCPU creation time, and restore the same value after every save/restore. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: generate non-true VMX MSRs based on true versionsDavid Matlack
The "non-true" VMX capability MSRs can be generated from their "true" counterparts, by OR-ing the default1 bits. The default1 bits are fixed and defined in the SDM. Since we can generate the non-true VMX MSRs from the true versions, there's no need to store both in struct nested_vmx. This also lets userspace avoid having to restore the non-true MSRs. Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so, we simply need to set all the default1 bits in the true MSRs (such that the true MSRs and the generated non-true MSRs are equal). Signed-off-by: David Matlack <dmatlack@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.Kyle Huey
The trap flag stays set until software clears it. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: x86: Add kvm_skip_emulated_instruction and use it.Kyle Huey
kvm_skip_emulated_instruction calls both kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep, skipping the emulated instruction and generating a trap if necessary. Replacing skip_emulated_instruction calls with kvm_skip_emulated_instruction is straightforward, except for: - ICEBP, which is already inside a trap, so avoid triggering another trap. - Instructions that can trigger exits to userspace, such as the IO insns, MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a KVM_GUESTDBG_SINGLESTEP exit, and the handling code for IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will take precedence. The singlestep will be triggered again on the next instruction, which is the current behavior. - Task switch instructions which would require additional handling (e.g. the task switch bit) and are instead left alone. - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction, which do not trigger singlestep traps as mentioned previously. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12Kyle Huey
We can't return both the pass/fail boolean for the vmcs and the upcoming continue/exit-to-userspace boolean for skip_emulated_instruction out of nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead. Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when they advance the IP to the following instruction, not when they a) succeed, b) fail MSR validation or c) throw an exception. Add a separate call to skip_emulated_instruction that will later not be converted to the variant that checks the singlestep flag. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: VMX: Reorder some skip_emulated_instruction callsKyle Huey
The functions being moved ahead of skip_emulated_instruction here don't need updated IPs, and skipping the emulated instruction at the end will make it easier to return its value. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: x86: Add a return value to kvm_emulate_cpuidKyle Huey
Once skipping the emulated instruction can potentially trigger an exit to userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to propagate a return value. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>