summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2017-06-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Many small x86 bug fixes: SVM segment registers access rights, nested VMX, preempt notifiers, LAPIC virtual wire mode, NMI injection" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix nmi injection failure when vcpu got blocked KVM: SVM: do not zero out segment attributes if segment is unusable or not present KVM: SVM: ignore type when setting segment registers KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging KVM: x86: Fix virtual wire mode KVM: nVMX: Fix handling of lmsw instruction KVM: X86: Fix preempt the preemption timer cancel
2017-06-01KVM: x86: Fix nmi injection failure when vcpu got blockedZhuangYanying
When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads, other than the lock-holding one, would enter into S state because of pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could not be injected into vm. The reason is: 1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile. 2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because cpu->kvm_vcpu_dirty is true. It's not enough just to check nmi_queued to decide whether to stay in vcpu_block() or not. NMI should be injected immediately at any situation. Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued in vm_vcpu_has_events(). Do the same change for SMIs. Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-01KVM: SVM: do not zero out segment attributes if segment is unusable or not ↵Roman Pen
present This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt was taken on userspace stack. The root cause lies in the specific AMD CPU behaviour which manifests itself as unusable segment attributes on SYSRET. The corresponding work around for the kernel is the following: 61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") In other turn virtualization side treated unusable segment incorrectly and restored CPL from SS attributes, which were zeroed out few lines above. In current patch it is assured only that P bit is cleared in VMCB.save state and segment attributes are not zeroed out if segment is not presented or is unusable, therefore CPL can be safely restored from DPL field. This is only one part of the fix, since QEMU side should be fixed accordingly not to zero out attributes on its side. Corresponding patch will follow. [1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Mikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-30KVM: SVM: ignore type when setting segment registersGioh Kim
Commit 19bca6ab75d8 ("KVM: SVM: Fix cross vendor migration issue with unusable bit") added checking type when setting unusable. So unusable can be set if present is 0 OR type is 0. According to the AMD processor manual, long mode ignores the type value in segment descriptor. And type can be 0 if it is read-only data segment. Therefore type value is not related to unusable flag. This patch is based on linux-next v4.12.0-rc3. Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-30KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debuggingRadim Krčmář
kvm_skip_emulated_instruction() will return 0 if userspace is single-stepping the guest. kvm_skip_emulated_instruction() uses return status convention of exit handler: 0 means "exit to userspace" and 1 means "continue vm entries". The problem is that nested_vmx_check_vmptr() return status means something else: 0 is ok, 1 is error. This means we would continue executing after a failure. Static checker noticed it because vmptr was not initialized. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6affcbedcac7 ("KVM: x86: Add kvm_skip_emulated_instruction and use it.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A series of fixes for X86: - The final fix for the end-of-stack issue in the unwinder - Handle non PAT systems gracefully - Prevent access to uninitiliazed memory - Move early delay calaibration after basic init - Fix Kconfig help text - Fix a cross compile issue - Unbreak older make versions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/timers: Move simple_udelay_calibration past init_hypervisor_platform x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives() x86/PAT: Fix Xorg regression on CPUs that don't support PAT x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation x86/build: Permit building with old make versions x86/unwind: Add end-of-stack check for ftrace handlers Revert "x86/entry: Fix the end of the stack for newly forked tasks" x86/boot: Use CROSS_COMPILE prefix for readelf
2017-05-27Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixlets for RAS: - Export memory_error() so the NFIT module can utilize it - Handle memory errors in NFIT correctly" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi, nfit: Fix the memory error check in nfit_handle_mce() x86/MCE: Export memory_error()
2017-05-26x86/ftrace: Make sure that ftrace trampolines are not RWXThomas Gleixner
ftrace use module_alloc() to allocate trampoline pages. The mapping of module_alloc() is RWX, which makes sense as the memory is written to right after allocation. But nothing makes these pages RO after writing to them. Add proper set_memory_rw/ro() calls to protect the trampolines after modification. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()Steven Rostedt (VMware)
With function tracing starting in early bootup and having its trampoline pages being read only, a bug triggered with the following: kernel BUG at arch/x86/mm/pageattr.c:189! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc2-test+ #3 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 task: ffffffffb4222500 task.stack: ffffffffb4200000 RIP: 0010:change_page_attr_set_clr+0x269/0x302 RSP: 0000:ffffffffb4203c88 EFLAGS: 00010046 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 00000001b6000000 RDX: ffffffffb4203d40 RSI: 0000000000000000 RDI: ffffffffb4240d60 RBP: ffffffffb4203d18 R08: 00000001b6000000 R09: 0000000000000001 R10: ffffffffb4203aa8 R11: 0000000000000003 R12: ffffffffc029b000 R13: ffffffffb4203d40 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9a639ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9a636b384000 CR3: 00000001ea21d000 CR4: 00000000000406b0 Call Trace: change_page_attr_clear+0x1f/0x21 set_memory_ro+0x1e/0x20 arch_ftrace_update_trampoline+0x207/0x21c ? ftrace_caller+0x64/0x64 ? 0xffffffffc029b000 ftrace_startup+0xf4/0x198 register_ftrace_function+0x26/0x3c function_trace_init+0x5e/0x73 tracer_init+0x1e/0x23 tracing_set_tracer+0x127/0x15a register_tracer+0x19b/0x1bc init_function_trace+0x90/0x92 early_trace_init+0x236/0x2b3 start_kernel+0x200/0x3f5 x86_64_start_reservations+0x29/0x2b x86_64_start_kernel+0x17c/0x18f secondary_startup_64+0x9f/0x9f ? secondary_startup_64+0x9f/0x9f Interrupts should not be enabled at this early in the boot process. It is also fine to leave interrupts enabled during this time as there's only one CPU running, and on_each_cpu() means to only run on the current CPU. If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with interrupts disabled. Don't trigger a BUG_ON() in that case. Link: http://lkml.kernel.org/r/20170526093717.0be3b849@gandalf.local.home Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26kprobes/x86: Fix to set RWX bits correctly before releasing trampolineMasami Hiramatsu
Fix kprobes to set(recover) RWX bits correctly on trampoline buffer before releasing it. Releasing readonly page to module_memfree() crash the kernel. Without this fix, if kprobes user register a bunch of kprobes in function body (since kprobes on function entry usually use ftrace) and unregister it, kernel hits a BUG and crash. Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: d0381c81c2f7 ("kprobes/x86: Set kprobes pages read-only") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-05-26KVM: x86: Fix virtual wire modeJan H. Schönherr
Intel SDM says, that at most one LAPIC should be configured with ExtINT delivery. KVM configures all LAPICs this way. This causes pic_unlock() to kick the first available vCPU from the internal KVM data structures. If this vCPU is not the BSP, but some not-yet-booted AP, the BSP may never realize that there is an interrupt. Fix that by enabling ExtINT delivery only for the BSP. This allows booting a Linux guest without a TSC in the above situation. Otherwise the BSP gets stuck in calibrate_delay_converge(). Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-26KVM: nVMX: Fix handling of lmsw instructionJan H. Schönherr
The decision whether or not to exit from L2 to L1 on an lmsw instruction is based on bogus values: instead of using the information encoded within the exit qualification, it uses the data also used for the mov-to-cr instruction, which boils down to using whatever is in %eax at that point. Use the correct values instead. Without this fix, an L1 may not get notified when a 32-bit Linux L2 switches its secondary CPUs to protected mode; the L1 is only notified on the next modification of CR0. This short time window poses a problem, when there is some other reason to exit to L1 in between. Then, L2 will be resumed in real mode and chaos ensues. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-26KVM: X86: Fix preempt the preemption timer cancelWanpeng Li
Preemption can occur during cancel preemption timer, and there will be inconsistent status in lapic, vmx and vmcs field. CPU0 CPU1 preemption timer vmexit handle_preemption_timer(vCPU0) kvm_lapic_expired_hv_timer vmx_cancel_hv_timer vmx->hv_deadline_tsc = -1 vmcs_clear_bits /* hv_timer_in_use still true */ sched_out sched_in kvm_arch_vcpu_load vmx_set_hv_timer write vmx->hv_deadline_tsc vmcs_set_bits /* back in kvm_lapic_expired_hv_timer */ hv_timer_in_use = false ... vmx_vcpu_run vmx_arm_hv_run write preemption timer deadline spurious preemption timer vmexit handle_preemption_timer(vCPU0) kvm_lapic_expired_hv_timer WARN_ON(!apic->lapic_timer.hv_timer_in_use); This can be reproduced sporadically during boot of L2 on a preemptible L1, causing a splat on L1. WARNING: CPU: 3 PID: 1952 at arch/x86/kvm/lapic.c:1529 kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm] CPU: 3 PID: 1952 Comm: qemu-system-x86 Not tainted 4.12.0-rc1+ #24 RIP: 0010:kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm] Call Trace: handle_preemption_timer+0xe/0x20 [kvm_intel] vmx_handle_exit+0xc9/0x15f0 [kvm_intel] ? lock_acquire+0xdb/0x250 ? lock_acquire+0xdb/0x250 ? kvm_arch_vcpu_ioctl_run+0xdf3/0x1ce0 [kvm] kvm_arch_vcpu_ioctl_run+0xe55/0x1ce0 [kvm] kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? __fget+0xf3/0x210 do_vfs_ioctl+0xa4/0x700 ? __fget+0x114/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x8f/0x750 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 This patch fixes it by disabling preemption while cancelling preemption timer. This way cancel_hv_timer is atomic with respect to kvm_arch_vcpu_load. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-26x86/timers: Move simple_udelay_calibration past init_hypervisor_platformJan Kiszka
This ensures that adjustments to x86_platform done by the hypervisor setup is already respected by this simple calibration. The current user of this, introduced by 1b5aeebf3a92 ("x86/earlyprintk: Add support for earlyprintk via USB3 debug port"), comes much later into play. Fixes: dd759d93f4dd ("x86/timers: Add simple udelay calibration") Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: http://lkml.kernel.org/r/5e89fe60-aab3-2c1c-aba8-32f8ad376189@siemens.com
2017-05-24x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives()Mateusz Jurczyk
In the current form of the code, if a->replacementlen is 0, the reference to *insnbuf for comparison touches potentially garbage memory. While it doesn't affect the execution flow due to the subsequent a->replacementlen comparison, it is (rightly) detected as use of uninitialized memory by a runtime instrumentation currently under my development, and could be detected as such by other tools in the future, too (e.g. KMSAN). Fix the "false-positive" by reordering the conditions to first check the replacement instruction length before referencing specific opcode bytes. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20170524135500.27223-1-mjurczyk@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-24x86/PAT: Fix Xorg regression on CPUs that don't support PATMikulas Patocka
In the file arch/x86/mm/pat.c, there's a '__pat_enabled' variable. The variable is set to 1 by default and the function pat_init() sets __pat_enabled to 0 if the CPU doesn't support PAT. However, on AMD K6-3 CPUs, the processor initialization code never calls pat_init() and so __pat_enabled stays 1 and the function pat_enabled() returns true, even though the K6-3 CPU doesn't support PAT. The result of this bug is that a kernel warning is produced when attempting to start the Xserver and the Xserver doesn't start (fork() returns ENOMEM). Another symptom of this bug is that the framebuffer driver doesn't set the K6-3 MTRR registers: x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3891 at arch/x86/mm/pat.c:1020 untrack_pfn+0x5c/0x9f ... x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining To fix the bug change pat_enabled() so that it returns true only if PAT initialization was actually done. Also, I changed boot_cpu_has(X86_FEATURE_PAT) to this_cpu_has(X86_FEATURE_PAT) in pat_ap_init(), so that we check the PAT feature on the processor that is being initialized. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: stable@vger.kernel.org # v4.2+ Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1704181501450.26399@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog ↵Benjamin Peterson
documentation Signed-off-by: Benjamin Peterson <bp@benjamin.pe> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 9919cba7ff71147803c988521cc1ceb80e7f0f6d ("watchdog: Update documentation") Link: http://lkml.kernel.org/r/20170521002016.13258-1-bp@benjamin.pe Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24x86/build: Permit building with old make versionsJan Kiszka
At least Make 3.82 dislikes the tab in front of the $(warning) function: arch/x86/Makefile:162: *** recipe commences before first target. Stop. Let's be gentle. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1944fcd8-e3df-d1f7-c0e4-60aeb1917a24@siemens.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24x86/unwind: Add end-of-stack check for ftrace handlersJosh Poimboeuf
Dave Jones and Steven Rostedt reported unwinder warnings like the following: WARNING: kernel stack frame pointer at ffff8800bda0ff30 in sshd:1090 has bad value 000055b32abf1fa8 In both cases, the unwinder was attempting to unwind from an ftrace handler into entry code. The callchain was something like: syscall entry code C function ftrace handler save_stack_trace() The problem is that the unwinder's end-of-stack logic gets confused by the way ftrace lays out the stack frame (with fentry enabled). I was able to recreate this warning with: echo call_usermodehelper_exec_async:stacktrace > /sys/kernel/debug/tracing/set_ftrace_filter (exit login session) I considered fixing this by changing the ftrace code to rewrite the stack to make the unwinder happy. But that seemed too intrusive after I implemented it. Instead, just add another check to the unwinder's end-of-stack logic to detect this special case. Side note: We could probably get rid of these end-of-stack checks by encoding the frame pointer for syscall entry just like we do for interrupt entry. That would be simpler, but it would also be a lot more intrusive since it would slightly affect the performance of every syscall. Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: live-patching@vger.kernel.org Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer") Link: http://lkml.kernel.org/r/671ba22fbc0156b8f7e0cfa5ab2a795e08bc37e1.1495553739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-24Revert "x86/entry: Fix the end of the stack for newly forked tasks"Josh Poimboeuf
Petr Mladek reported the following warning when loading the livepatch sample module: WARNING: CPU: 1 PID: 3699 at arch/x86/kernel/stacktrace.c:132 save_stack_trace_tsk_reliable+0x133/0x1a0 ... Call Trace: __schedule+0x273/0x820 schedule+0x36/0x80 kthreadd+0x305/0x310 ? kthread_create_on_cpu+0x80/0x80 ? icmp_echo.part.32+0x50/0x50 ret_from_fork+0x2c/0x40 That warning means the end of the stack is no longer recognized as such for newly forked tasks. The problem was introduced with the following commit: ff3f7e2475bb ("x86/entry: Fix the end of the stack for newly forked tasks") ... which was completely misguided. It only partially fixed the reported issue, and it introduced another bug in the process. None of the other entry code saves the frame pointer before calling into C code, so it doesn't make sense for ret_from_fork to do so either. Contrary to what I originally thought, the original issue wasn't related to newly forked tasks. It was actually related to ftrace. When entry code calls into a function which then calls into an ftrace handler, the stack frame looks different than normal. The original issue will be fixed in the unwinder, in a subsequent patch. Reported-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: live-patching@vger.kernel.org Fixes: ff3f7e2475bb ("x86/entry: Fix the end of the stack for newly forked tasks") Link: http://lkml.kernel.org/r/f350760f7e82f0750c8d1dd093456eb212751caa.1495553739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-21x86: fix 32-bit case of __get_user_asm_u64()Linus Torvalds
The code to fetch a 64-bit value from user space was entirely buggered, and has been since the code was merged in early 2016 in commit b2f680380ddf ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit kernels"). Happily the buggered routine is almost certainly entirely unused, since the normal way to access user space memory is just with the non-inlined "get_user()", and the inlined version didn't even historically exist. The normal "get_user()" case is handled by external hand-written asm in arch/x86/lib/getuser.S that doesn't have either of these issues. There were two independent bugs in __get_user_asm_u64(): - it still did the STAC/CLAC user space access marking, even though that is now done by the wrapper macros, see commit 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses"). This didn't result in a semantic error, it just means that the inlined optimized version was hugely less efficient than the allegedly slower standard version, since the CLAC/STAC overhead is quite high on modern Intel CPU's. - the double register %eax/%edx was marked as an output, but the %eax part of it was touched early in the asm, and could thus clobber other inputs to the asm that gcc didn't expect it to touch. In particular, that meant that the generated code could look like this: mov (%eax),%eax mov 0x4(%eax),%edx where the load of %edx obviously was _supposed_ to be from the 32-bit word that followed the source of %eax, but because %eax was overwritten by the first instruction, the source of %edx was basically random garbage. The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark the 64-bit output as early-clobber to let gcc know that no inputs should alias with the output register. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org # v4.8+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-21Clean up x86 unsafe_get/put_user() type handlingLinus Torvalds
Al noticed that unsafe_put_user() had type problems, and fixed them in commit a7cc722fff0b ("fix unsafe_put_user()"), which made me look more at those functions. It turns out that unsafe_get_user() had a type issue too: it limited the largest size of the type it could handle to "unsigned long". Which is fine with the current users, but doesn't match our existing normal get_user() semantics, which can also handle "u64" even when that does not fit in a long. While at it, also clean up the type cast in unsafe_put_user(). We actually want to just make it an assignment to the expected type of the pointer, because we actually do want warnings from types that don't convert silently. And it makes the code more readable by not having that one very long and complex line. [ This patch might become stable material if we ever end up back-porting any new users of the unsafe uaccess code, but as things stand now this doesn't matter for any current existing uses. ] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-21x86/MCE: Export memory_error()Borislav Petkov
Export the function which checks whether an MCE is a memory error to other users so that we can reuse the logic. Drop the boot_cpu_data use, while at it, as mce.cpuvendor already has the CPU vendor in there. Integrate a piece from a patch from Vishal Verma <vishal.l.verma@intel.com> to export it for modules (nfit). The main reason we're exporting it is that the nfit handler nfit_handle_mce() needs to detect a memory error properly before doing its recovery actions. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc uaccess fixes from Al Viro: "Fix for unsafe_put_user() (no callers currently in mainline, but anyone starting to use it will step into that) + alpha osf_wait4() infoleak fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: osf_wait4(): fix infoleak fix unsafe_put_user()
2017-05-21fix unsafe_put_user()Al Viro
__put_user_size() relies upon its first argument having the same type as what the second one points to; the only other user makes sure of that and unsafe_put_user() should do the same. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21x86/boot: Use CROSS_COMPILE prefix for readelfRob Landley
The boot code Makefile contains a straight 'readelf' invocation. This causes build warnings in cross compile environments, when there is no unprefixed readelf accessible via $PATH. Add the missing $(CROSS_COMPILE) prefix. [ tglx: Rewrote changelog ] Fixes: 98f78525371b ("x86/boot: Refuse to build with data relocations") Signed-off-by: Rob Landley <rob@landley.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: "H.J. Lu" <hjl.tools@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "ARM: - a fix for a build failure introduced in -rc1 when tracepoints are enabled on 32-bit ARM. - disable use of stack pointer protection in the hyp code which can cause panics. - a handful of VGIC fixes. - a fix to the init of the redistributors on GICv3 systems that prevented boot with kvmtool on GICv3 systems introduced in -rc1. - a number of race conditions fixed in our MMU handling code. - a fix for the guest being able to program the debug extensions for the host on the 32-bit side. PPC: - fixes for build failures with PR KVM configurations. - a fix for a host crash that can occur on POWER9 with radix guests. x86: - fixes for nested PML and nested EPT. - a fix for crashes caused by reserved bits in SSE MXCSR that could have been set by userspace. - an optimization of halt polling that fixes high CPU overhead. - fixes for four reports from Dan Carpenter's static checker. - a protection around code that shouldn't have been preemptible. - a fix for port IO emulation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits) KVM: x86: prevent uninitialized variable warning in check_svme() KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh() KVM: x86: zero base3 of unusable segments KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation KVM: x86: Fix potential preemption when get the current kvmclock timestamp KVM: Silence underflow warning in avic_get_physical_id_entry() KVM: arm/arm64: Hold slots_lock when unregistering kvm io bus devices KVM: arm/arm64: Fix bug when registering redist iodevs KVM: x86: lower default for halt_poll_ns kvm: arm/arm64: Fix use after free of stage2 page table kvm: arm/arm64: Force reading uncached stage2 PGD KVM: nVMX: fix EPT permissions as reported in exit qualification KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabled KVM: x86: Fix load damaged SSEx MXCSR register kvm: nVMX: off by one in vmx_write_pml_buffer() KVM: arm: rename pm_fake handler to trap_raz_wi KVM: arm: plug potential guest hardware debug leakage kvm: arm/arm64: Fix race in resetting stage2 PGD KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2 registers KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt ...
2017-05-19Merge tag 'for-linus-4.12b-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Some fixes for the new Xen 9pfs frontend and some minor cleanups" * tag 'for-linus-4.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: make xen_flush_tlb_all() static xen: cleanup pvh leftovers from pv-only sources xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static xen/9pfs: fix return value check in xen_9pfs_front_probe()
2017-05-19KVM: x86: prevent uninitialized variable warning in check_svme()Radim Krčmář
get_msr() of MSR_EFER is currently always going to succeed, but static checker doesn't see that far. Don't complicate stuff and just use 0 for the fallback -- it means that the feature is not present. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-19KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()Radim Krčmář
Static analysis noticed that pmu->nr_arch_gp_counters can be 32 (INTEL_PMC_MAX_GENERIC) and therefore cannot be used to shift 'int'. I didn't add BUILD_BUG_ON for it as we have a better checker. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 25462f7f5295 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-19KVM: x86: zero base3 of unusable segmentsRadim Krčmář
Static checker noticed that base3 could be used uninitialized if the segment was not present (useable). Random stack values probably would not pass VMCS entry checks. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 1aa366163b8b ("KVM: x86 emulator: consolidate segment accessors") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-19KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulationWanpeng Li
Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation. - "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only, a read should be ignored) in guest can get a random number. - "rep insb" instruction to access PIT register port 0x43 can control memcpy() in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes, which will disclose the unimportant kernel memory in host but no crash. The similar test program below can reproduce the read out-of-bounds vulnerability: void hexdump(void *mem, unsigned int len) { unsigned int i, j; for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++) { /* print offset */ if(i % HEXDUMP_COLS == 0) { printf("0x%06x: ", i); } /* print hex data */ if(i < len) { printf("%02x ", 0xFF & ((char*)mem)[i]); } else /* end of block, just aligning for ASCII dump */ { printf(" "); } /* print ASCII dump */ if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1)) { for(j = i - (HEXDUMP_COLS - 1); j <= i; j++) { if(j >= len) /* end of block, not really printing */ { putchar(' '); } else if(isprint(((char*)mem)[j])) /* printable char */ { putchar(0xFF & ((char*)mem)[j]); } else /* other char */ { putchar('.'); } } putchar('\n'); } } } int main(void) { int i; if (iopl(3)) { err(1, "set iopl unsuccessfully\n"); return -1; } static char buf[0x40]; /* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */ memset(buf, 0xab, sizeof(buf)); asm volatile("push %rdi;"); asm volatile("mov %0, %%rdi;"::"q"(buf)); asm volatile ("mov $0x40, %rdx;"); asm volatile ("in %dx,%al;"); asm volatile ("stosb;"); asm volatile ("mov $0x41, %rdx;"); asm volatile ("in %dx,%al;"); asm volatile ("stosb;"); asm volatile ("mov $0x42, %rdx;"); asm volatile ("in %dx,%al;"); asm volatile ("stosb;"); asm volatile ("mov $0x43, %rdx;"); asm volatile ("in %dx,%al;"); asm volatile ("stosb;"); asm volatile ("mov $0x44, %rdx;"); asm volatile ("in %dx,%al;"); asm volatile ("stosb;"); asm volatile ("mov $0x45, %rdx;"); asm volatile ("in %dx,%al;"); asm volatile ("stosb;"); asm volatile ("pop %rdi;"); hexdump(buf, 0x40); printf("\n"); /* ins port 0x40 */ memset(buf, 0xab, sizeof(buf)); asm volatile("push %rdi;"); asm volatile("mov %0, %%rdi;"::"q"(buf)); asm volatile ("mov $0x20, %rcx;"); asm volatile ("mov $0x40, %rdx;"); asm volatile ("rep insb;"); asm volatile ("pop %rdi;"); hexdump(buf, 0x40); printf("\n"); /* ins port 0x43 */ memset(buf, 0xab, sizeof(buf)); asm volatile("push %rdi;"); asm volatile("mov %0, %%rdi;"::"q"(buf)); asm volatile ("mov $0x20, %rcx;"); asm volatile ("mov $0x43, %rdx;"); asm volatile ("rep insb;"); asm volatile ("pop %rdi;"); hexdump(buf, 0x40); printf("\n"); return 0; } The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation w/o clear after using which results in some random datas are left over in the buffer. Guest reads port 0x43 will be ignored since it is write only, however, the function kernel_pio() can't distigush this ignore from successfully reads data from device's ioport. There is no new data fill the buffer from port 0x43, however, emulator_pio_in_emulated() will copy the stale data in the buffer to the guest unconditionally. This patch fixes it by clearing the buffer before in instruction emulation to avoid to grant guest the stale data in the buffer. In addition, string I/O is not supported for in kernel device. So there is no iteration to read ioport %RCX times for string I/O. The function kernel_pio() just reads one round, and then copy the io size * %RCX to the guest unconditionally, actually it copies the one round ioport data w/ other random datas which are left over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by introducing the string I/O support for in kernel device in order to grant the right ioport datas to the guest. Before the patch: 0x000000: fe 38 93 93 ff ff ab ab .8...... 0x000008: ab ab ab ab ab ab ab ab ........ 0x000010: ab ab ab ab ab ab ab ab ........ 0x000018: ab ab ab ab ab ab ab ab ........ 0x000020: ab ab ab ab ab ab ab ab ........ 0x000028: ab ab ab ab ab ab ab ab ........ 0x000030: ab ab ab ab ab ab ab ab ........ 0x000038: ab ab ab ab ab ab ab ab ........ 0x000000: f6 00 00 00 00 00 00 00 ........ 0x000008: 00 00 00 00 00 00 00 00 ........ 0x000010: 00 00 00 00 4d 51 30 30 ....MQ00 0x000018: 30 30 20 33 20 20 20 20 00 3 0x000020: ab ab ab ab ab ab ab ab ........ 0x000028: ab ab ab ab ab ab ab ab ........ 0x000030: ab ab ab ab ab ab ab ab ........ 0x000038: ab ab ab ab ab ab ab ab ........ 0x000000: f6 00 00 00 00 00 00 00 ........ 0x000008: 00 00 00 00 00 00 00 00 ........ 0x000010: 00 00 00 00 4d 51 30 30 ....MQ00 0x000018: 30 30 20 33 20 20 20 20 00 3 0x000020: ab ab ab ab ab ab ab ab ........ 0x000028: ab ab ab ab ab ab ab ab ........ 0x000030: ab ab ab ab ab ab ab ab ........ 0x000038: ab ab ab ab ab ab ab ab ........ After the patch: 0x000000: 1e 02 f8 00 ff ff ab ab ........ 0x000008: ab ab ab ab ab ab ab ab ........ 0x000010: ab ab ab ab ab ab ab ab ........ 0x000018: ab ab ab ab ab ab ab ab ........ 0x000020: ab ab ab ab ab ab ab ab ........ 0x000028: ab ab ab ab ab ab ab ab ........ 0x000030: ab ab ab ab ab ab ab ab ........ 0x000038: ab ab ab ab ab ab ab ab ........ 0x000000: d2 e2 d2 df d2 db d2 d7 ........ 0x000008: d2 d3 d2 cf d2 cb d2 c7 ........ 0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........ 0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........ 0x000020: ab ab ab ab ab ab ab ab ........ 0x000028: ab ab ab ab ab ab ab ab ........ 0x000030: ab ab ab ab ab ab ab ab ........ 0x000038: ab ab ab ab ab ab ab ab ........ 0x000000: 00 00 00 00 00 00 00 00 ........ 0x000008: 00 00 00 00 00 00 00 00 ........ 0x000010: 00 00 00 00 00 00 00 00 ........ 0x000018: 00 00 00 00 00 00 00 00 ........ 0x000020: ab ab ab ab ab ab ab ab ........ 0x000028: ab ab ab ab ab ab ab ab ........ 0x000030: ab ab ab ab ab ab ab ab ........ 0x000038: ab ab ab ab ab ab ab ab ........ Reported-by: Moguofang <moguofang@huawei.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Moguofang <moguofang@huawei.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-19KVM: x86: Fix potential preemption when get the current kvmclock timestampWanpeng Li
BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/2809 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 2 PID: 2809 Comm: qemu-system-x86 Not tainted 4.11.0+ #13 Call Trace: dump_stack+0x99/0xce check_preemption_disabled+0xf5/0x100 __this_cpu_preempt_check+0x13/0x20 get_kvmclock_ns+0x6f/0x110 [kvm] get_time_ref_counter+0x5d/0x80 [kvm] kvm_hv_process_stimers+0x2a1/0x8a0 [kvm] ? kvm_hv_process_stimers+0x2a1/0x8a0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xac9/0x1ce0 [kvm] kvm_arch_vcpu_ioctl_run+0x5bf/0x1ce0 [kvm] kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm] ? __fget+0xf3/0x210 do_vfs_ioctl+0xa4/0x700 ? __fget+0x114/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 RIP: 0033:0x7f9d164ed357 ? __this_cpu_preempt_check+0x13/0x20 This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/ CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled. Safe access to per-CPU data requires a couple of constraints, though: the thread working with the data cannot be preempted and it cannot be migrated while it manipulates per-CPU variables. If the thread is preempted, the thread that replaces it could try to work with the same variables; migration to another CPU could also cause confusion. However there is no preemption disable when reads host per-CPU tsc rate to calculate the current kvmclock timestamp. This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both __this_cpu_read() and rdtsc() are not preempted. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-19xen: make xen_flush_tlb_all() staticJuergen Gross
xen_flush_tlb_all() is used in arch/x86/xen/mmu.c only. Make it static. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-19xen: cleanup pvh leftovers from pv-only sourcesJuergen Gross
There are some leftovers testing for pvh guest mode in pv-only source files. Remove them. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-18KVM: Silence underflow warning in avic_get_physical_id_entry()Dan Carpenter
Smatch complains that we check cap the upper bound of "index" but don't check for negatives. It's a false positive because "index" is never negative. But it's also simple enough to make it unsigned which makes the code easier to audit. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-16KVM: x86: lower default for halt_poll_nsPaolo Bonzini
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to increase heavily even in cases where the performance improvement was small. In particular, bandwidth divided by CPU usage was as much as 60% lower. To some extent this is the expected effect of the patch, and the additional CPU utilization is only visible when running the benchmarks. However, halving the threshold also halves the extra CPU utilization (from +30-130% to +20-70%) and has no negative effect on performance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-15KVM: nVMX: fix EPT permissions as reported in exit qualificationPaolo Bonzini
This fixes the new ept_access_test_read_only and ept_access_test_read_write testcases from vmx.flat. The problem is that gpte_access moves bits around to switch from EPT bit order (XWR) to ACC_*_MASK bit order (RWX). This results in an incorrect exit qualification. To fix this, make pt_access and pte_access operate on raw PTE values (only with NX flipped to mean "can execute") and call gpte_access at the end of the walk. This lets us use pte_access to compute the exit qualification with XWR bit order. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-15KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabledWanpeng Li
We can observe eptad kvm_intel module parameter is still Y even if ept is disabled which is weird. This patch will not enable EPT A/D feature if EPT feature is disabled. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-15KVM: x86: Fix load damaged SSEx MXCSR registerWanpeng Li
Reported by syzkaller: BUG: unable to handle kernel paging request at ffffffffc07f6a2e IP: report_bug+0x94/0x120 PGD 348e12067 P4D 348e12067 PUD 348e14067 PMD 3cbd84067 PTE 80000003f7e87161 Oops: 0003 [#1] SMP CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G OE 4.11.0+ #8 task: ffff92fdfb525400 task.stack: ffffbda6c3d04000 RIP: 0010:report_bug+0x94/0x120 RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202 do_trap+0x156/0x170 do_error_trap+0xa3/0x170 ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm] ? mark_held_locks+0x79/0xa0 ? retint_kernel+0x10/0x10 ? trace_hardirqs_off_thunk+0x1a/0x1c do_invalid_op+0x20/0x30 invalid_op+0x1e/0x30 RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm] ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm] kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm] kvm_vcpu_ioctl+0x384/0x780 [kvm] ? kvm_vcpu_ioctl+0x384/0x780 [kvm] ? sched_clock+0x13/0x20 ? __do_page_fault+0x2a0/0x550 do_vfs_ioctl+0xa4/0x700 ? up_read+0x1f/0x40 ? __do_page_fault+0x2a0/0x550 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 SDM mentioned that "The MXCSR has several reserved bits, and attempting to write a 1 to any of these bits will cause a general-protection exception(#GP) to be generated". The syzkaller forks' testcase overrides xsave area w/ random values and steps on the reserved bits of MXCSR register. The damaged MXCSR register values of guest will be restored to SSEx MXCSR register before vmentry. This patch fixes it by catching userspace override MXCSR register reserved bits w/ random values and bails out immediately. Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-15kvm: nVMX: off by one in vmx_write_pml_buffer()Dan Carpenter
There are PML_ENTITY_NUM elements in the pml_address[] array so the > should be >= or we write beyond the end of the array when we do: pml_address[vmcs12->guest_pml_index--] = gpa; Fixes: c5f983f6e845 ("nVMX: Implement emulated Page Modification Logging") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-13Merge branch 'for-linus-4.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML fixes from Richard Weinberger: "No new stuff, just fixes" * 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Add missing NR_CPUS include um: Fix to call read_initrd after init_bootmem um: Include kbuild.h instead of duplicating its macros um: Fix PTRACE_POKEUSER on x86_64 um: Set number of CPUs um: Fix _print_addr()
2017-05-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, docs: update memory.stat description with workingset* entries mm: vmscan: scan until it finds eligible pages mm, thp: copying user pages must schedule on collapse dax: fix PMD data corruption when fault races with write dax: fix data corruption when fault races with write ext4: return to starting transaction in ext4_dax_huge_fault() mm: fix data corruption due to stale mmap reads dax: prevent invalidation of mapped DAX entries Tigran has moved mm, vmalloc: fix vmalloc users tracking properly mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin gcov: support GCC 7.1 mm, vmstat: Remove spurious WARN() during zoneinfo print time: delete current_fs_time() hwpoison, memcg: forcibly uncharge LRU pages
2017-05-12Tigran has movedAndrew Morton
Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-12Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Incremental fixes and a small feature addition on top of the main libnvdimm 4.12 pull request: - Geert noticed that tinyconfig was bloated by BLOCK selecting DAX. The size regression is fixed by moving all dax helpers into the dax-core and only specifying "select DAX" for FS_DAX and dax-capable drivers. He also asked for clarification of the NR_DEV_DAX config option which, on closer look, does not need to be a config option at all. Mike also throws in a DEV_DAX_PMEM fixup for good measure. - Ben's attention to detail on -stable patch submissions caught a case where the recent fixes to arch_copy_from_iter_pmem() missed a condition where we strand dirty data in the cache. This is tagged for -stable and will also be included in the rework of the pmem api to a proposed {memcpy,copy_user}_flushcache() interface for 4.13. - Vishal adds a feature that missed the initial pull due to pending review feedback. It allows the kernel to clear media errors when initializing a BTT (atomic sector update driver) instance on a pmem namespace. - Ross noticed that the dax_device + dax_operations conversion broke __dax_zero_page_range(). The nvdimm unit tests fail to check this path, but xfstests immediately trips over it. No excuse for missing this before submitting the 4.12 pull request. These all pass the nvdimm unit tests and an xfstests spot check. The set has received a build success notification from the kbuild robot" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: filesystem-dax: fix broken __dax_zero_page_range() conversion libnvdimm, btt: ensure that initializing metadata clears poison libnvdimm: add an atomic vs process context flag to rw_bytes x86, pmem: Fix cache flushing for iovec write < 8 bytes device-dax: kill NR_DEV_DAX block, dax: move "select DAX" from BLOCK to FS_DAX device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
2017-05-12Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates/fixes from Ingo Molnar: "Mostly tooling updates, but also two kernel fixes: a call chain handling robustness fix and an x86 PMU driver event definition fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/callchain: Force USER_DS when invoking perf_callchain_user() tools build: Fixup sched_getcpu feature test perf tests kmod-path: Don't fail if compressed modules aren't supported perf annotate: Fix AArch64 comment char perf tools: Fix spelling mistakes perf/x86: Fix Broadwell-EP DRAM RAPL events perf config: Refactor a duplicated code for obtaining config file name perf symbols: Allow user probes on versioned symbols perf symbols: Accept symbols starting at address 0 tools lib string: Adopt prefixcmp() from perf and subcmd perf units: Move parse_tag_value() to units.[ch] perf ui gtk: Move gtk .so name to the only place where it is used perf tools: Move HAS_BOOL define to where perl headers are used perf memswap: Split the byteswap memory range wrappers from util.[ch] perf tools: Move event prototypes from util.h to event.h perf buildid: Move prototypes from util.h to build-id.h
2017-05-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two boot crash fixes - unwinder fixes - kexec related kernel direct mappings enhancements/fixes - more Clang support quirks - minor cleanups - Documentation fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix a typo in Documentation x86/build: Don't add -maccumulate-outgoing-args w/o compiler support x86/boot/32: Fix UP boot on Quark and possibly other platforms x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() x86/kexec/64: Use gbpages for identity mappings if available x86/mm: Add support for gbpages to kernel_ident_mapping_init() x86/boot: Declare error() as noreturn x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() x86/microcode/AMD: Remove redundant NULL check on mc
2017-05-12Merge tag 'for-linus-4.12b-rc0c-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains two fixes for booting under Xen introduced during this merge window and two fixes for older problems, where one is just much more probable due to another merge window change" * tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: adjust early dom0 p2m handling to xen hypervisor behavior x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen xen/x86: Do not call xen_init_time_ops() until shared_info is initialized x86/xen: fix xsave capability setting
2017-05-11xen: adjust early dom0 p2m handling to xen hypervisor behaviorJuergen Gross
When booted as pv-guest the p2m list presented by the Xen is already mapped to virtual addresses. In dom0 case the hypervisor might make use of 2M- or 1G-pages for this mapping. Unfortunately while being properly aligned in virtual and machine address space, those pages might not be aligned properly in guest physical address space. So when trying to obtain the guest physical address of such a page pud_pfn() and pmd_pfn() must be avoided as those will mask away guest physical address bits not being zero in this special case. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-11x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under XenJuergen Gross
When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set on AMD cpus. This bug/feature bit is kind of special as it will be used very early when switching threads. Setting the bit and clearing it a little bit later leaves a critical window where things can go wrong. This time window has enlarged a little bit by using setup_clear_cpu_cap() instead of the hypervisor's set_cpu_features callback. It seems this larger window now makes it rather easy to hit the problem. The proper solution is to never set the bit in case of Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>