linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-11-17	KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was ↵	Liran Alon
	reinjected to L2 vmx_check_nested_events() should return -EBUSY only in case there is a pending L1 event which requires a VMExit from L2 to L1 but such a VMExit is currently blocked. Such VMExits are blocked either because nested_run_pending=1 or an event was reinjected to L2. vmx_check_nested_events() should return 0 in case there are no pending L1 events which requires a VMExit from L2 to L1 or if a VMExit from L2 to L1 was done internally. However, upstream commit which introduced blocking in case an event was reinjected to L2 (commit acc9ab601327 ("KVM: nVMX: Fix pending events injection")) contains a bug: It returns -EBUSY even if there are no pending L1 events which requires VMExit from L2 to L1. This commit fix this issue. Fixes: acc9ab601327 ("KVM: nVMX: Fix pending events injection") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: ioapic: Preserve read-only values in the redirection table	Nikita Leshenko
	According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb ("ioapic: keep RO bits for IOAPIC entry"). Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered	Nikita Leshenko
	Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for IOAPICs without an EOI register. They simulate the EOI message manually by changing the trigger mode to edge and then back to level, with the entry being masked during this. QEMU implements this feature in commit ed1263c363c9 ("ioapic: clear remote irr bit for edge-triggered interrupts") As a side effect, this commit removes an incorrect behavior where Remote IRR was cleared when the redirection table entry was rewritten. This is not consistent with the manual and also opens an opportunity for a strange behavior when a redirection table entry is modified from an interrupt handler that handles the same entry: The modification will clear the Remote IRR bit even though the interrupt handler is still running. Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq	Nikita Leshenko
	Remote IRR for level-triggered interrupts was previously checked in ioapic_set_irq, but since we now have a check in ioapic_service we can remove the redundant check from ioapic_set_irq. This commit doesn't change semantics. Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: ioapic: Don't fire level irq when Remote IRR set	Nikita Leshenko
	Avoid firing a level-triggered interrupt that has the Remote IRR bit set, because that means that some CPU is already processing it. The Remote IRR bit will be cleared after an EOI and the interrupt will refire if the irq line is still asserted. This behavior is aligned with QEMU's IOAPIC implementation that was introduced by commit f99b86b94987 ("x86: ioapic: ignore level irq during processing") in QEMU. Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race	Nikita Leshenko
	KVM uses ioapic_handled_vectors to track vectors that need to notify the IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an interrupt with old configuration is pending or running and ioapic_handled_vectors only remembers the newest configuration; thus EOI from the old interrupt is not delievered to the IOAPIC. A previous commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") addressed this issue by adding pending edge-triggered interrupts to ioapic_handled_vectors, fixing this race for edge-triggered interrupts. The commit explicitly ignored level-triggered interrupts, but this race applies to them as well: 1) IOAPIC sends a level triggered interrupt vector to VCPU0 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC to route the vector to VCPU1. The reconfiguration rewrites only the upper 32 bits of the IOREDTBLn register. (Causes KVM to update ioapic_handled_vectors for VCPU0 and it no longer includes the vector.) 3) VCPU0 sends EOI for the vector, but it's not delievered to the IOAPIC because the ioapic_handled_vectors doesn't include the vector. 4) New interrupts are not delievered to VCPU1 because remote_irr bit is set forever. Therefore, the correct behavior is to add all pending and running interrupts to ioapic_handled_vectors. This commit introduces a slight performance hit similar to commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") for the rare case that the vector is reused by a non-IOAPIC source on VCPU0. We prefer to keep solution simple and not handle this case just as the original commit does. Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: inject exceptions produced by x86_decode_insn	Paolo Bonzini
	Sometimes, a processor might execute an instruction while another processor is updating the page tables for that instruction's code page, but before the TLB shootdown completes. The interesting case happens if the page is in the TLB. In general, the processor will succeed in executing the instruction and nothing bad happens. However, what if the instruction is an MMIO access? If that happens, KVM invokes the emulator, and the emulator gets the updated page tables. If the update side had marked the code page as non present, the page table walk then will fail and so will x86_decode_insn. Unfortunately, even though kvm_fetch_guest_virt is correctly returning X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as a fatal error if the instruction cannot simply be reexecuted (as is the case for MMIO). And this in fact happened sometimes when rebooting Windows 2012r2 guests. Just checking ctxt->have_exception and injecting the exception if true is enough to fix the case. Thanks to Eduardo Habkost for helping in the debugging of this issue. Reported-by: Yanan Fu <yfu@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs	Eyal Moscovici
	Some guests use these unhandled MSRs very frequently. This cause dmesg to be populated with lots of aggregated messages on usage of ignored MSRs. As ignore_msrs=true means that the user is well-aware his guest use ignored MSRs, allow to also disable the prints on their usage. An example of such guest is ESXi which tends to access a lot to MSR 0x34 (MSR_SMI_COUNT) very frequently. In addition, we have observed this to cause unnecessary delays to guest execution. Such an example is ESXi which experience networking delays in it's guests (L2 guests) because of these prints (even when prints are rate-limited). This can easily be reproduced by pinging from one L2 guest to another. Once in a while, a peak in ping RTT will be observed. Removing these unhandled MSR prints solves the issue. Because these prints can help diagnose issues with guests, this commit only suppress them by a module parameter instead of removing them from code entirely. Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [Changed suppress_ignore_msrs_prints to report_ignored_msrs - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: fix em_fxstor() sleeping while in atomic	David Hildenbrand
	Commit 9d643f63128b ("KVM: x86: avoid large stack allocations in em_fxrstor") optimize the stack size, but introduced a guest memory access which might sleep while in atomic. Fix it by introducing, again, a second fxregs_state. Try to avoid large stacks by using noinline. Add some helpful comments. Reported by syzbot: in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109 2 locks held by syzkaller879109/2909: #0: (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] vcpu_load+0x1c/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:154 #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest arch/x86/kvm/x86.c:6983 [inline] #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run arch/x86/kvm/x86.c:7061 [inline] #1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222 CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014 __might_sleep+0x95/0x190 kernel/sched/core.c:5967 __might_fault+0xab/0x1d0 mm/memory.c:4383 __copy_from_user include/linux/uaccess.h:71 [inline] __kvm_read_guest_page+0x58/0xa0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771 kvm_vcpu_read_guest_page+0x44/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791 kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407 kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466 segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819 em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022 x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471 x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698 kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854 handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400 vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718 vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline] vcpu_run arch/x86/kvm/x86.c:7061 [inline] kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222 kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x437fc9 RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9 RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000 R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000 Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in em_fxrstor") Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure	Wanpeng Li
	Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1) null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1. In L1: BUG: unable to handle kernel paging request at ffffffffc015bf8f IP: vmx_vcpu_run+0x202/0x510 [kvm_intel] PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161 Oops: 0003 [#1] PREEMPT SMP CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6 RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel] Call Trace: WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002 In L0: -----------[ cut here ]------------ WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE 4.14.0-rc7+ #25 RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] Call Trace: paging64_page_fault+0x500/0xde0 [kvm] ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm] ? nonpaging_page_fault+0x3b0/0x3b0 [kvm] ? __asan_storeN+0x12/0x20 ? paging64_gva_to_gpa+0xb0/0x120 [kvm] ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm] ? lock_acquire+0x2c0/0x2c0 ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel] ? vmx_get_segment+0x2a6/0x310 [kvm_intel] ? sched_clock+0x1f/0x30 ? check_chain_key+0x137/0x1e0 ? __lock_acquire+0x83c/0x2420 ? kvm_multiple_exception+0xf2/0x220 [kvm] ? debug_check_no_locks_freed+0x240/0x240 ? debug_smp_processor_id+0x17/0x20 ? __lock_is_held+0x9e/0x100 kvm_mmu_page_fault+0x90/0x180 [kvm] kvm_handle_page_fault+0x15c/0x310 [kvm] ? __lock_is_held+0x9e/0x100 handle_exception+0x3c7/0x4d0 [kvm_intel] vmx_handle_exit+0x103/0x1010 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm] The commit avoids to load host state of vmcs12 as vmcs01's guest state since vmcs12 is not modified (except for the VM-instruction error field) if the checking of vmcs control area fails. However, the mmu context is switched to nested mmu in prepare_vmcs02() and it will not be reloaded since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME fails. This patch fixes it by reloading mmu context when nested VMLAUNCH/VMRESUME fails. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry	Wanpeng Li
	According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the following checks are performed on the field for the IA32_BNDCFGS MSR: - Bits reserved in the IA32_BNDCFGS MSR must be 0. - The linear address in bits 63:12 must be canonical. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: X86: Fix operand/address-size during instruction decoding	Wanpeng Li
	Pedro reported: During tests that we conducted on KVM, we noticed that executing a "PUSH %ES" instruction under KVM produces different results on both memory and the SP register depending on whether EPT support is enabled. With EPT the SP is reduced by 4 bytes (and the written value is 0-padded) but without EPT support it is only reduced by 2 bytes. The difference can be observed when the CS.DB field is 1 (32-bit) but not when it's 0 (16-bit). The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D also should be respected instead of just default operand/address-size/66H prefix/67H prefix during instruction decoding. This patch fixes it by also adjusting operand/address-size according to CS.D. Reported-by: Pedro Fonseca <pfonseca@cs.washington.edu> Tested-by: Pedro Fonseca <pfonseca@cs.washington.edu> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Pedro Fonseca <pfonseca@cs.washington.edu> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: Don't re-execute instruction when not passing CR2 value	Liran Alon
	In case of instruction-decode failure or emulation failure, x86_emulate_instruction() will call reexecute_instruction() which will attempt to use the cr2 value passed to x86_emulate_instruction(). However, when x86_emulate_instruction() is called from emulate_instruction(), cr2 is not passed (passed as 0) and therefore it doesn't make sense to execute reexecute_instruction() logic at all. Fixes: 51d8b66199e9 ("KVM: cleanup emulate_instruction") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure	Liran Alon
	On this case, handle_emulation_failure() fills kvm_run with internal-error information which it expects to be delivered to user-mode for further processing. However, the code reports a wrong return-value which makes KVM to never return to user-mode on this scenario. Fixes: 6d77dbfc88e3 ("KVM: inject #UD if instruction emulation fails and exit to userspace") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: Exit to user-mode on #UD intercept when emulator requires	Liran Alon
	Instruction emulation after trapping a #UD exception can result in an MMIO access, for example when emulating a MOVBE on a processor that doesn't support the instruction. In this case, the #UD vmexit handler must exit to user mode, but there wasn't any code to do so. Add it for both VMX and SVM. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: nVMX/nSVM: Don't intercept #UD when running L2	Liran Alon
	When running L2, #UD should be intercepted by L1 or just forwarded directly to L2. It should not reach L0 x86 emulator. Therefore, set intercept for #UD only based on L1 exception-bitmap. Also add WARN_ON_ONCE() on L0 #UD intercept handlers to make sure it is never reached while running L2. This improves commit ae1f57670703 ("KVM: nVMX: Do not emulate #UD while in guest mode") by removing an unnecessary exit from L2 to L0 on #UD when L1 doesn't intercept it. In addition, SVM L0 #UD intercept handler doesn't handle correctly the case it is raised from L2. In this case, it should forward the #UD to guest instead of x86 emulator. As done in VMX #UD intercept handler. This commit fixes this issue as-well. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk	Liran Alon
	When guest passes KVM it's pvclock-page GPA via WRMSR to MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize pvclock-page to some start-values. It just requests a clock-update which will happen before entering to guest. The clock-update logic will call kvm_setup_pvclock_page() to update the pvclock-page with info. However, kvm_setup_pvclock_page() wrongly assumes that the version-field is initialized to an even number. This is wrong because at first-time write, field could be any-value. Fix simply makes sure that if first-time version-field is odd, increment it once more to make it even and only then start standard logic. This follows same logic as done in other pvclock shared-pages (See kvm_write_wall_clock() and record_steal_time()). Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	kvm: vmx: Allow disabling virtual NMI support	Paolo Bonzini
	To simplify testing of these rarely used code paths, add a module parameter that turns it on. One eventinj.flat test (NMI after iret) fails when loading kvm_intel with vnmi=0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	kvm: vmx: Reinstate support for CPUs without virtual NMI	Paolo Bonzini
	This is more or less a revert of commit 2c82878b0cb3 ("KVM: VMX: require virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines only had virtual NMIs in some SKUs. The revert is not trivial because in the meanwhile there have been several fixes to nested NMI injection. Therefore, the entire vNMI state is moved to struct loaded_vmcs. Another change compared to before the patch is a simplification here: if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked && !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis( get_vmcs12(vcpu))))) { The final condition here is always true (because nested_cpu_has_virtual_nmis is always false) and is removed. Fixes: 2c82878b0cb38fd516fd612c67852a6bbf282003 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	KVM: SVM: obey guest PAT	Paolo Bonzini
	For many years some users of assigned devices have reported worse performance on AMD processors with NPT than on AMD without NPT, Intel or bare metal. The reason turned out to be that SVM is discarding the guest PAT setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-, PA3=UC). The guest might be using a different setting, and especially might want write combining but isn't getting it (instead getting slow UC or UC- accesses). Thanks a lot to geoff@hostfission.com for noticing the relation to the g_pat setting. The patch has been tested also by a bunch of people on VFIO users forums. Fixes: 709ddebf81cb40e3c36c6109a7892e8b93a09464 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Nick Sarnie <commendsarnex@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-11-17	Merge tag 'kvm-arm-gicv4-for-v4.15' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD GICv4 Support for KVM/ARM for v4.15
2017-11-17	ALSA: hda: Fix too short HDMI/DP chmap reporting	Takashi Iwai
	We got a regression report about the HD-audio HDMI chmap, where some surround channels are reported as UNKNOWN. The git bisection pointed the culprit at the commit 9b3dc8aa3fb1 ("ALSA: hda - Register chmap obj as priv data instead of codec"). The story behind scene is like this: - While moving the code out of the legacy HDA to the HDA common place, the patch modifies the code to obtain the chmap array indirectly in a byte array, and it expands it to kctl value array. - At the latter operation, the size of the array is wrongly passed by sizeof() to the pointer. - It can be 4 on 32bit arch, thus too short for 6+ channels. (And that's the reason why it didn't hit other persons; it's 8 on 64bit arch, thus it's usually enough.) The code was further changed meanwhile, but the problem persisted. Let's fix it by correctly evaluating the array size. Fixes: 9b3dc8aa3fb1 ("ALSA: hda - Register chmap obj as priv data instead of codec") Reported-by: VDR User <user.vdr@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-17	ALSA: usb-audio: uac1: Invalidate ctl on interrupt	Julian Scheel
	When an interrupt occurs, the value of at least one of the belonging controls should have changed. To make sure they get re-read from device on the next read, invalidate the cache. This was correctly implemented for uac2 already, but missing for uac1. Signed-off-by: Julian Scheel <julian@jusst.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-17	afs: Fix file locking	David Howells
	Fix the AFS file locking whereby the use of the big kernel lock (which could be slept with) was replaced by a spinlock (which couldn't). The problem is that the AFS code was doing stuff inside the critical section that might call schedule(), so this is a broken transformation. Fix this by the following means: (1) Use a state machine with a proper state that can only be changed under the spinlock rather than using a collection of bit flags. (2) Cache the key used for the lock and the lock type in the afs_vnode struct so that the manager work function doesn't have to refer to a file_lock struct that's been dequeued. This makes signal handling safer. (4) Move the unlock from afs_do_unlk() to afs_fl_release_private() which means that unlock is achieved in other circumstances too. (5) Unlock the file on the server before taking the next conflicting lock. Also change: (1) Check the permits on a file before actually trying the lock. (2) fsync the file before effecting an explicit unlock operation. We don't fsync if the lock is erased otherwise as we might not be in a context where we can actually do that. Further fixes: (1) Fixed-fileserver address rotation is made to work. It's only used by the locking functions, so couldn't be tested before. Fixes: 72f98e72551f ("locks: turn lock_flocks into a spinlock") Signed-off-by: David Howells <dhowells@redhat.com> cc: jlayton@redhat.com
2017-11-17	Merge branch 'nfp-flower-fixes-and-typo-in-ethtool-stats-name'	David S. Miller
	Jakub Kicinski says: ==================== nfp: flower fixes and typo in ethtool stats name This set comes from the flower offload team. From Pieter we have a fix to the semantics of the flag telling FW whether to allocate or free a mask and correction of a typo in name of one of the MAC statistics (reveive -> received, we use past participle to match HW docs). Dirk fixes propagation of max MTU to representors. John improves VXLAN offload. The old code was not using egress_dev at all, so Jiri missed it in his conversion. The validation of ingress port is still not perfect, we will have to wait for shared block dust to settle to tackle it. This is how John explains the cases: The following example rule is now correctly offloaded in net-next kernel: tc filter add dev vxlan0 ... enc_dst_port 4789 ... skip_sw \ action redirect dev nfp_p0 The following rule will not be offloaded to the NFP (previously it incorrectly matched vxlan packets - it shouldn't as ingress dev is not a vxlan netdev): tc filter add dev nfp_p0 ... enc_dst_port 4789 ... skip_sw \ action redirect dev nfp_p0 Rules that are not matching on tunnels and are an egress offload are rejected. The standard match code assumes the offloaded repr is the ingress port. Rejecting egress offloads removes the chances of false interpretation of the rules on the NFP. A know issue is that the following rule example could still be offloaded and incorrectly match tunnel data: tc filter add dev dummy ... enc_dst_port 4789 ... skip_sw \ action redirect dev nfp_p0 Because the egress register callback does not give information on the ingress netdev, the patch assumes that if it is not a repr then it is the correct tunnel netdev. This may not be the case. The chances of this happening is reduced as it is enforced that the rule match on the well known vxlan port but it is still possible. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-17	nfp: remove false positive offloads in flower vxlan	John Hurley
	Pass information to the match offload on whether or not the repr is the ingress or egress dev. Only accept tunnel matches if repr is the egress dev. This means rules such as the following are successfully offloaded: tc .. add dev vxlan0 .. enc_dst_port 4789 .. action redirect dev nfp_p0 While rules such as the following are rejected: tc .. add dev nfp_p0 .. enc_dst_port 4789 .. action redirect dev vxlan0 Also reject non tunnel flows that are offloaded to an egress dev. Non tunnel matches assume that the offload dev is the ingress port and offload a match accordingly. Fixes: 611aec101ab7 ("nfp: compile flower vxlan tunnel metadata match fields") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-17	nfp: register flower reprs for egress dev offload	John Hurley
	Register a callback for offloading flows that have a repr as their egress device. The new egdev_register function is added to net-next for the 4.15 release. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-17	nfp: inherit the max_mtu from the PF netdev	Dirk van der Merwe
	The PF netdev is used for data transfer for reprs, so reprs inherit the maximum MTU settings of the PF netdev. Fixes: 5de73ee46704 ("nfp: general representor implementation") Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-17	nfp: fix vlan receive MAC statistics typo	Pieter Jansen van Vuuren
	Correct typo in vlan receive MAC stats. Previously the MAC statistics reported in ethtool for vlan receive contained a typo resulting in ethtool reporting rx_vlan_reveive_ok instead of rx_vlan_received_ok. Fixes: a5950182c00e ("nfp: map mac_stats and vf_cfg BARs") Fixes: 098ce840c9ef ("nfp: report MAC statistics in ethtool") Reported-by: Brendan Galloway <brendan.galloway@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-17	nfp: fix flower offload metadata flag usage	Pieter Jansen van Vuuren
	Hardware has no notion of new or last mask id, instead it makes use of the message type (i.e. add flow or del flow) in combination with a single bit in metadata flags to determine when to add or delete a mask id. Previously we made use of the new or last flags to indicate that a new mask should be allocated or deallocated, respectively. This incorrect behaviour is fixed by making use single bit in metadata flags to indicate mask allocation or deallocation. Fixes: 43f84b72c50d ("nfp: add metadata to each flow offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-17	virto_net: remove empty file 'virtio_net.'	Joel Stanley
	Looks like this was mistakenly added to the tree as part of commit 186b3c998c50 ("virtio-net: support XDP_REDIRECT"). Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-16	platform/x86: dell-laptop: Allocate buffer before rfkill use	Mario Limonciello
	On machines using rfkill interface the buffer needs to have been allocated before the initial use (memset) of it. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-11-16	platform/x86: dell-wmi: Relay failed initial probe to dependent drivers	Mario Limonciello
	dell-wmi and dell-smbios-wmi are dependent upon dell-wmi-descriptor finishing probe successfully to probe themselves. Currently if dell-wmi-descriptor fails probing in a non-recoverable way (such as invalid header) dell-wmi and dell-smbios-wmi will continue to try to redo probing due to deferred probing. To solve this have the dependent drivers query the dell-wmi-descriptor driver whether the descriptor has been determined valid. The possible results are: -ENODEV: Descriptor GUID missing from WMI bus -EPROBE_DEFER: Descriptor not yet probed, dependent driver should wait and use deferred probing < 0: Descriptor probed, invalid. Dependent driver should return an error. 0: Successful descriptor probe, dependent driver can continue Successful descriptor probe still doesn't mean that the descriptor driver is necessarily bound at the time of initialization of dependent driver. Userspace can unbind the driver, so all methods used from driver should still be verified to return success values otherwise deferred probing be used. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-11-16	platform/x86: dell-wmi-descriptor: check if memory was allocated	Mario Limonciello
	devm_kzalloc will return NULL pointer if no memory was allocated. This should be checked. This problem also existed when the driver was dell-wmi.c. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-11-16	Merge tag 'armsoc-drivers' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "This branch contains platform-related driver updates for ARM and ARM64, these are the areas that bring the changes: New drivers: - driver support for Renesas R-Car V3M (R8A77970) - power management support for Amlogic GX - a new driver for the Tegra BPMP thermal sensor - a new bus driver for Technologic Systems NBUS Changes for subsystems that prefer to merge through arm-soc: - the usual updates for reset controller drivers from Philipp Zabel, with five added drivers for SoCs in the arc, meson, socfpa, uniphier and mediatek families - updates to the ARM SCPI and PSCI frameworks, from Sudeep Holla, Heiner Kallweit and Lorenzo Pieralisi Changes specific to some ARM-based SoC - the Freescale/NXP DPAA QBMan drivers from PowerPC can now work on ARM as well - several changes for power management on Broadcom SoCs - various improvements on Qualcomm, Broadcom, Amlogic, Atmel, Mediatek - minor Cleanups for Samsung, TI OMAP SoCs" [ NOTE! This doesn't work without the previous ARM SoC device-tree pull, because the R8A77970 driver is missing a header file that came from that pull. The fact that this got merged afterwards only fixes it at this point, and bisection of that driver will fail if/when you walk into the history of that driver. - Linus ] * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (96 commits) soc: amlogic: meson-gx-pwrc-vpu: fix power-off when powered by bootloader bus: add driver for the Technologic Systems NBUS memory: omap-gpmc: Remove deprecated gpmc_update_nand_reg() soc: qcom: remove unused label soc: amlogic: gx pm domain: add PM and OF dependencies drivers/firmware: psci_checker: Add missing destroy_timer_on_stack() dt-bindings: power: add amlogic meson power domain bindings soc: amlogic: add Meson GX VPU Domains driver soc: qcom: Remote filesystem memory driver dt-binding: soc: qcom: Add binding for rmtfs memory of: reserved_mem: Accessor for acquiring reserved_mem of/platform: Generalize /reserved-memory handling soc: mediatek: pwrap: fix fatal compiler error soc: mediatek: pwrap: fix compiler errors arm64: mediatek: cleanup message for platform selection soc: Allow test-building of MediaTek drivers soc: mediatek: place Kconfig for all SoC drivers under menu soc: mediatek: pwrap: add support for MT7622 SoC soc: mediatek: pwrap: add common way for setup CS timing extenstion soc: mediatek: pwrap: add MediaTek MT6380 as one slave of pwrap ..
2017-11-16	Merge tag 'armsoc-dt' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM device-tree updates from Arnd Bergmann: "We add device tree files for a couple of additional SoCs in various areas: Allwinner R40/V40 for entertainment, Broadcom Hurricane 2 for networking, Amlogic A113D for audio, and Renesas R-Car V3M for automotive. As usual, lots of new boards get added based on those and other SoCs: - Actions S500 based CubieBoard6 single-board computer - Amlogic Meson-AXG A113D based development board - Amlogic S912 based Khadas VIM2 single-board computer - Amlogic S912 based Tronsmart Vega S96 set-top-box - Allwinner H5 based NanoPi NEO Plus2 single-board computer - Allwinner R40 based Banana Pi M2 Ultra and Berry single-board computers - Allwinner A83T based TBS A711 Tablet - Broadcom Hurricane 2 based Ubiquiti UniFi Switch 8 - Broadcom bcm47xx based Luxul XAP-1440/XAP-810/ABR-4500/XBR-4500 wireless access points and routers - NXP i.MX51 based Zodiac Inflight Innovations RDU1 board - NXP i.MX53 based GE Healthcare PPD biometric monitor - NXP i.MX6 based Pistachio single-board computer - NXP i.MX6 based Vining-2000 automotive diagnostic interface - NXP i.MX6 based Ka-Ro TX6 Computer-on-Module in additional variants - Qualcomm MSM8974 (Snapdragon 800) based Fairphone 2 phone - Qualcomm MSM8974pro (Snapdragon 801) based Sony Xperia Z2 Tablet - Realtek RTD1295 based set-top-boxes MeLE V9 and PROBOX2 AVA - Renesas R-Car V3M (R8A77970) SoC and "Eagle" reference board - Renesas H3ULCB and M3ULCB "Kingfisher" extension infotainment boards - Renasas r8a7745 based iWave G22D-SODIMM SoM - Rockchip rk3288 based Amarula Vyasa single-board computer - Samsung Exynos5800 based Odroid HC1 single-board computer For existing SoC support, there was a lot of ongoing work, as usual most of that concentrated on the Renesas, Rockchip, OMAP, i.MX, Amlogic and Allwinner platforms, but others were also active. Rob Herring and many others worked on reducing the number of issues that the latest version of 'dtc' now warns about. Unfortunately there is still a lot left to do. A rework of the ARM foundation model introduced several new files for common variations of the model" * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (599 commits) arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3 dt-bindings: bus: Add documentation for the Technologic Systems NBUS arm64: dts: actions: s900-bubblegum-96: Add fake uart5 clock ARM: dts: owl-s500: Add CubieBoard6 dt-bindings: arm: actions: Add CubieBoard6 ARM: dts: owl-s500-guitar-bb-rev-b: Add fake uart3 clock ARM: dts: owl-s500: Set power domains for CPU2 and CPU3 arm: dts: mt7623: remove unused compatible string for pio node arm: dts: mt7623: update usb related nodes arm: dts: mt7623: update crypto node ARM: dts: sun8i: a711: Enable USB OTG ARM: dts: sun8i: a711: Add regulator support ARM: dts: sun8i: a83t: bananapi-m3: Enable AP6212 WiFi on mmc1 ARM: dts: sun8i: a83t: cubietruck-plus: Enable AP6330 WiFi on mmc1 ARM: dts: sun8i: a83t: Move mmc1 pinctrl setting to dtsi file ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes ARM: dts: sunxi: Add dtsi for AXP81x PMIC arm64: dts: allwinner: H5: Restore EMAC changes ...
2017-11-16	scsi: Use 'blist_flags_t' for scsi_devinfo flags	Hannes Reinecke
	As per recommendation from Linus we should be using a distinct type for blacklist flags. [mkp: was cut against an older kernel, applied by hand] Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-16	Merge tag 'armsoc-soc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform updates from Arnd Bergmann: "Most of the commits are for defconfig changes, to enable newly added drivers or features that people have started using. For the changed lines lines, we have mostly cleanups, the affected platforms are OMAP, Versatile, EP93xx, Samsung, Broadcom, i.MX, and Actions. The largest single change is the introduction of the TI "sysc" bus driver, with the intention of cleaning up more legacy code. Two new SoC platforms get added this time: - Allwinner R40 is a modernized version of the A20 chip, now with a Quad-Core ARM Cortex-A7. According to the manufacturer, it is intended for "Smart Hardware" - Broadcom Hurricane 2 (Aka Strataconnect BCM5334X) is a family of chips meant for managed gigabit ethernet switches, based around a Cortex-A9 CPU. Finally, we gain SMP support for two platforms: Renesas R-Car E2 and Amlogic Meson8/8b, which were previously added but only supported uniprocessor operation" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (118 commits) ARM: multi_v7_defconfig: Select RPMSG_VIRTIO as module ARM: multi_v7_defconfig: enable CONFIG_GPIO_UNIPHIER arm64: defconfig: enable CONFIG_GPIO_UNIPHIER ARM: meson: enable MESON_IRQ_GPIO in Kconfig for meson8b ARM: meson: Add SMP bringup code for Meson8 and Meson8b ARM: smp_scu: allow the platform code to read the SCU CPU status ARM: smp_scu: add a helper for powering on a specific CPU dt-bindings: Amlogic: Add Meson8 and Meson8b SMP related documentation ARM: OMAP3: Delete an unnecessary variable initialisation in omap3xxx_hwmod_init() ARM: OMAP3: Use common error handling code in omap3xxx_hwmod_init() ARM: defconfig: select the right SX150X driver arm64: defconfig: Enable QCOM_IOMMU arm64: Add ThunderX drivers to defconfig arm64: defconfig: Enable Tegra PCI controller cpufreq: imx6q: Move speed grading check to cpufreq driver arm64: defconfig: re-enable Qualcomm DB410c USB ARM: configs: stm32: Add MDMA support in STM32 defconfig ARM: imx: Enable cpuidle for i.MX6DL starting at 1.1 bus: ti-sysc: Fix unbalanced pm_runtime_enable by adding remove bus: ti-sysc: mark PM functions as __maybe_unused ...
2017-11-16	PM / runtime: Drop children check from __pm_runtime_set_status()	Rafael J. Wysocki
	The check for "active" children in __pm_runtime_set_status(), when trying to set the parent device status to "suspended", doesn't really make sense, because in fact it is not invalid to set the status of a device with runtime PM disabled to "suspended" in any case. It is invalid to enable runtime PM for a device with its status set to "suspended" while its child_count reference counter is nonzero, but the check in __pm_runtime_set_status() doesn't really cover that situation. For this reason, drop the children check from __pm_runtime_set_status() and add a check against child_count reference counters of "suspended" devices to pm_runtime_enable(). Fixes: a8636c89648a (PM / Runtime: Don't allow to suspend a device with an active child) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Johan Hovold <johan@kernel.org>
2017-11-16	dm bufio: fix integer overflow when limiting maximum cache size	Eric Biggers
	The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Cc: <stable@vger.kernel.org> # v3.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16	dm: clear all discard attributes in queue_limits when discards are disabled	Mike Snitzer
	Otherwise, it can happen that the QUEUE_FLAG_DISCARD isn't set but the various discard attributes (which get exposed via sysfs) may be set. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16	dm: do not set 'discards_supported' in targets that do not need it	Mike Snitzer
	The DM target's 'discards_supported' flag is intended to act as an override. Meaning, even if the underlying storage doesn't support discards the DM target will. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16	dm: discard support requires all targets in a table support discards	Mike Snitzer
	A DM device with a mix of discard capabilities (due to some underlying devices not having discard support) _should_ just return -EOPNOTSUPP for the region of the device that doesn't support discards (even if only by way of the underlying driver formally not supporting discards). BUT, that does ask the underlying driver to handle something that it never advertised support for. In doing so we're exposing users to the potential for a underlying disk driver hanging if/when a discard is issued a the device that is incapable and never claimed to support discards. Fix this by requiring that each DM target in a DM table provide discard support as a prereq for a DM device to advertise support for discards. This may cause some configurations that were happily supporting discards (even in the face of a mix of discard support) to stop supporting discards -- but the risk of users hitting driver hangs, and forced reboots, outweighs supporting those fringe mixed discard configurations. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16	dm mpath: remove annoying message of 'blk_get_request() returned -11'	Ming Lei
	It is very normal to see allocation failure, especially with blk-mq request_queues, so it's unnecessary to report this error and annoy people. In practice this 'blk_get_request() returned -11' error gets logged quite frequently when a blk-mq DM multipath device sees heavy IO. This change is marked for stable@ because the annoying message in question was included in stable@ commit 7083abbbf. Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-16	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio updates from Michael Tsirkin: "Fixes in qemu, vhost and virtio" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: fw_cfg: fix the command line module name vhost/vsock: fix uninitialized vhost_vsock->guest_cid vhost: fix end of range for access_ok vhost/scsi: Use safe iteration in vhost_scsi_complete_cmd_work() virtio_balloon: fix deadlock on OOM
2017-11-16	Merge tag 'for-linus-4.15-rc1-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Xen features and fixes for v4.15-rc1 Apart from several small fixes it contains the following features: - a series by Joao Martins to add vdso support of the pv clock interface - a series by Juergen Gross to add support for Xen pv guests to be able to run on 5 level paging hosts - a series by Stefano Stabellini adding the Xen pvcalls frontend driver using a paravirtualized socket interface" * tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits) xen/pvcalls: fix potential endless loop in pvcalls-front.c xen/pvcalls: Add MODULE_LICENSE() MAINTAINERS: xen, kvm: track pvclock-abi.h changes x86/xen/time: setup vcpu 0 time info page x86/xen/time: set pvclock flags on xen_time_init() x86/pvclock: add setter for pvclock_pvti_cpu0_va ptp_kvm: probe for kvm guest availability xen/privcmd: remove unused variable pageidx xen: select grant interface version xen: update arch/x86/include/asm/xen/cpuid.h xen: add grant interface version dependent constants to gnttab_ops xen: limit grant v2 interface to the v1 functionality xen: re-introduce support for grant v2 interface xen: support priv-mapping in an HVM tools domain xen/pvcalls: remove redundant check for irq >= 0 xen/pvcalls: fix unsigned less than zero error check xen/time: Return -ENODEV from xen_get_wallclock() xen/pvcalls-front: mark expected switch fall-through xen: xenbus_probe_frontend: mark expected switch fall-throughs xen/time: do not decrease steal time after live migration on xen ...
2017-11-16	Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.15 Common: - Python 3 support in kvm_stat - Accounting of slabs to kmemcg ARM: - Optimized arch timer handling for KVM/ARM - Improvements to the VGIC ITS code and introduction of an ITS reset ioctl - Unification of the 32-bit fault injection logic - More exact external abort matching logic PPC: - Support for running hashed page table (HPT) MMU mode on a host that is using the radix MMU mode; single threaded mode on POWER 9 is added as a pre-requisite - Resolution of merge conflicts with the last second 4.14 HPT fixes - Fixes and cleanups s390: - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes x86: - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and after-reset state - Refined dependencies for VMX features - Fixes for nested SMI injection - A lot of cleanups" * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits) KVM: s390: provide a capability for AIS state migration KVM: s390: clear_io_irq() requests are not expected for adapter interrupts KVM: s390: abstract conversion between isc and enum irq_types KVM: s390: vsie: use common code functions for pinning KVM: s390: SIE considerations for AP Queue virtualization KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup KVM: PPC: Book3S HV: Cosmetic post-merge cleanups KVM: arm/arm64: fix the incompatible matching for external abort KVM: arm/arm64: Unify 32bit fault injection KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared KVM: arm/arm64: vgic-its: New helper functions to free the caches KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device arm/arm64: KVM: Load the timer state when enabling the timer KVM: arm/arm64: Rework kvm_timer_should_fire KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit KVM: arm/arm64: Move phys_timer_emulate function KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps ...
2017-11-16	RDMA: Add Jason Gunthorpe as a co-maintainer	Jason Gunthorpe
	As was discussed in September and October, add Jason along with Doug to have a team maintainership model for the RDMA subystem. Mellanox Technologies will be funding Jason's independent work on the maintainership. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-16	Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm	Linus Torvalds
	Pull ARM updates from Russell King: - add support for ELF fdpic binaries on both MMU and noMMU platforms - linker script cleanups - support for compressed .data section for XIP images - discard memblock arrays when possible - various cleanups - atomic DMA pool updates - better diagnostics of missing/corrupt device tree - export information to allow userspace kexec tool to place images more inteligently, so that the device tree isn't overwritten by the booting kernel - make early_printk more efficient on semihosted systems - noMMU cleanups - SA1111 PCMCIA update in preparation for further cleanups * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits) ARM: 8719/1: NOMMU: work around maybe-uninitialized warning ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r" ARM: 8713/1: NOMMU: Support MPU in XIP configuration ARM: 8712/1: NOMMU: Use more MPU regions to cover memory ARM: 8711/1: V7M: Add support for MPU to M-class ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE ARM: 8709/1: NOMMU: Disallow MPU for XIP ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers ARM: 8706/1: NOMMU: Move out MPU setup in separate module ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel() ARM: 8705/1: early_printk: use printascii() rather than printch() ARM: 8703/1: debug.S: move hexbuf to a writable section ARM: add additional table to compressed kernel ARM: decompressor: fix BSS size calculation pcmcia: sa1111: remove special sa1111 mmio accessors pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources ARM: better diagnostics with missing/corrupt dtb ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size() ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init ..
2017-11-16	Merge tag 'powerpc-4.15-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "A bit of a small release, I suspect in part due to me travelling for KS. But my backlog of patches to review is smaller than usual, so I think in part folks just didn't send as much this cycle. Non-highlights: - Five fixes for the >128T address space handling, both to fix bugs in our implementation and to bring the semantics exactly into line with x86. Highlights: - Support for a new OPAL call on bare metal machines which gives us a true NMI (ie. is not masked by MSR[EE]=0) for debugging etc. - Support for Power9 DD2 in the CXL driver. - Improvements to machine check handling so that uncorrectable errors can be reported into the generic memory_failure() machinery. - Some fixes and improvements for VPHN, which is used under PowerVM to notify the Linux partition of topology changes. - Plumbing to enable TM (transactional memory) without suspend on some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND). - Support for emulating vector loads form cache-inhibited memory, on some Power9 revisions. - Disable the fast-endian switch "syscall" by default (behind a CONFIG), we believe it has never had any users. - A major rework of the API drivers use when initiating and waiting for long running operations performed by OPAL firmware, and changes to the powernv_flash driver to use the new API. - Several fixes for the handling of FP/VMX/VSX while processes are using transactional memory. - Optimisations of TLB range flushes when using the radix MMU on Power9. - Improvements to the VAS facility used to access coprocessors on Power9, and related improvements to the way the NX crypto driver handles requests. - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit. Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A. Kennington III" * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits) powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature powerpc/64s: Fix masking of SRR1 bits on instruction fault powerpc/64s: mm_context.addr_limit is only used on hash powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary powerpc/64s/hash: Fix fork() with 512TB process address space powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Fix 512T hint detection to use >= 128T powerpc: Fix DABR match on hash based systems powerpc/signal: Properly handle return value from uprobe_deny_signal() powerpc/fadump: use kstrtoint to handle sysfs store powerpc/lib: Implement UACCESS_FLUSHCACHE API powerpc/lib: Implement PMEM API powerpc/powernv/npu: Don't explicitly flush nmmu tlb powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm() powerpc/powernv/idle: Round up latency and residency values powerpc/kprobes: refactor kprobe_lookup_name for safer string operations powerpc/kprobes: Blacklist emulate_update_regs() from kprobes powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace powerpc/kprobes: Disable preemption before invoking probe handler for optprobes ...