summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-13sched/core: Fix DEBUG_SPINLOCK annotation for rq->lockPeter Zijlstra
Mark noticed that he had sporadic "spinlock recursion" warnings from the DEBUG_SPINLOCK code. Now rq->lock is special in that the owner changes in the middle of a context switch. It so happens that we fix up the lock.owner too late, @prev can run (remotely) the moment prev->on_cpu is cleared, this then allows @prev to again try and acquire this rq->lock and trigger this warning. So we have to switch lock.owner before clearing prev->on_cpu. Do this by moving the DEBUG_SPINLOCK annotation from after switch_to() to before switch_to() and collect all lockdep annotations there into prepare_lock_switch() to mirror the existing finish_lock_switch(). Debugged-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13sched/rt: Make update_curr_rt() more accurateWen Yang
rq->clock_task may be updated between the two calls of rq_clock_task() in update_curr_rt(). Calling rq_clock_task() only once makes it more accurate and efficient, taking update_curr() as reference. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: zhong.weidong@zte.com.cn Link: http://lkml.kernel.org/r/1517882008-44552-1-git-send-email-wen.yang99@zte.com.cn Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13sched/deadline: Make update_curr_dl() more accurateWen Yang
rq->clock_task may be updated between the two calls of rq_clock_task() in update_curr_dl(). Calling rq_clock_task() only once makes it more accurate and efficient, taking update_curr() as reference. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: zhong.weidong@zte.com.cn Link: http://lkml.kernel.org/r/1517882148-44599-1-git-send-email-wen.yang99@zte.com.cn Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/mm/kcore: Add vsyscall page to /proc/kcore conditionallyJia Zhang
The vsyscall page should be visible only if vsyscall=emulate/native when dumping /proc/kcore. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-3-git-send-email-zhang.jia@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user pageJia Zhang
Commit: df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y. However, accessing the vsyscall user page will cause an SMAP fault. Replace memcpy() with copy_from_user() to fix this bug works, but adding a common way to handle this sort of user page may be useful for future. Currently, only vsyscall page requires KCORE_USER. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13selftests/x86: Do not rely on "int $0x80" in test_mremap_vdso.cDominik Brodowski
On 64-bit builds, we should not rely on "int $0x80" working (it only does if CONFIG_IA32_EMULATION=y is enabled). Without this patch, the move test may succeed, but the "int $0x80" causes a segfault, resulting in a false negative output of this self-test. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Link: http://lkml.kernel.org/r/20180211111013.16888-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13selftests/x86: Fix build bug caused by the 5lvl test which has been moved to ↵Dominik Brodowski
the VM directory Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Fixes: 235266b8e11c "selftests/vm: move 128TB mmap boundary test to generic directory" Link: http://lkml.kernel.org/r/20180211111013.16888-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13selftests/x86/pkeys: Remove unused functionsIngo Molnar
This also gets rid of two build warnings: protection_keys.c: In function ‘dumpit’: protection_keys.c:419:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] write(1, buf, nr_read); ^~~~~~~~~~~~~~~~~~~~~~ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13selftests/x86: Clean up and document sscanf() usageDominik Brodowski
Replace a couple of magically connected buffer length literal constants with a common definition that makes their relationship obvious. Also document why our sscanf() usage is safe. No intended functional changes. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Link: http://lkml.kernel.org/r/20180211205924.GA23210@light.dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13selftests/x86: Fix vDSO selftest segfault for vsyscall=noneDominik Brodowski
The vDSO selftest tries to execute a vsyscall unconditionally, even if it is not present on the test system (e.g. if booted with vsyscall=none or with CONFIG_LEGACY_VSYSCALL_NONE=y set. Fix this by copying (and tweaking) the vsyscall check from test_vsyscall.c Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Link: http://lkml.kernel.org/r/20180211111013.16888-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Remove the unused 'icebp' macroBorislav Petkov
That macro was touched around 2.5.8 times, judging by the full history linux repo, but it was unused even then. Get rid of it already. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux@dominikbrodowski.net Link: http://lkml.kernel.org/r/20180212201318.GD14640@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Fix paranoid_entry() frame pointer warningJosh Poimboeuf
With the following commit: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros") ... one of my suggested improvements triggered a frame pointer warning: arch/x86/entry/entry_64.o: warning: objtool: paranoid_entry()+0x11: call without frame pointer save/setup The warning is correct for the build-time code, but it's actually not relevant at runtime because of paravirt patching. The paravirt swapgs call gets replaced with either a SWAPGS instruction or NOPs at runtime. Go back to the previous behavior by removing the ELF function annotation for paranoid_entry() and adding an unwind hint, which effectively silences the warning. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Fixes: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros") Link: http://lkml.kernel.org/r/20180212174503.5acbymg5z6p32snu@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Indent PUSH_AND_CLEAR_REGS and POP_REGS properlyDominik Brodowski
... same as the other macros in arch/x86/entry/calling.h Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-8-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and ↵Dominik Brodowski
SAVE_AND_CLEAR_REGS macros Previously, error_entry() and paranoid_entry() saved the GP registers onto stack space previously allocated by its callers. Combine these two steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro for that. This adds a significant amount ot text size. However, Ingo Molnar points out that: "these numbers also _very_ significantly over-represent the extra footprint. The assumptions that resulted in us compressing the IRQ entry code have changed very significantly with the new x86 IRQ allocation code we introduced in the last year: - IRQ vectors are usually populated in tightly clustered groups. With our new vector allocator code the typical per CPU allocation percentage on x86 systems is ~3 device vectors and ~10 fixed vectors out of ~220 vectors - i.e. a very low ~6% utilization (!). [...] The days where we allocated a lot of vectors on every CPU and the compression of the IRQ entry code text mattered are over. - Another issue is that only a small minority of vectors is frequent enough to actually matter to cache utilization in practice: 3-4 key IPIs and 1-2 device IRQs at most - and those vectors tend to be tightly clustered as well into about two groups, and are probably already on 2-3 cache lines in practice. For the common case of 'cache cold' IRQs it's the depth of the call chain and the fragmentation of the resulting I$ that should be the main performance limit - not the overall size of it. - The CPU side cost of IRQ delivery is still very expensive even in the best, most cached case, as in 'over a thousand cycles'. So much stuff is done that maybe contemporary x86 IRQ entry microcode already prefetches the IDT entry and its expected call target address."[*] [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need modification. Previously, %rsp was manually decreased by 15*8; with this patch, %rsp is decreased by 15 pushq instructions. [jpoimboe@redhat.com: unwind hint improvements] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Use PUSH_AND_CLEAN_REGS in more casesDominik Brodowski
entry_SYSCALL_64_after_hwframe() and nmi() can be converted to use PUSH_AND_CLEAN_REGS instead of opencoded variants thereof. Due to the interleaving, the additional XOR-based clearing of R8 and R9 in entry_SYSCALL_64_after_hwframe() should not have any noticeable negative implications. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-6-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macroDominik Brodowski
Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Interleave XOR register clearing with PUSH instructionsDominik Brodowski
Same as is done for syscalls, interleave XOR with PUSH instructions for exceptions/interrupts, in order to minimize the cost of the additional instructions required for register clearing. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Merge the POP_C_REGS and POP_EXTRA_REGS macros into a single ↵Dominik Brodowski
POP_REGS macro The two special, opencoded cases for POP_C_REGS can be handled by ASM macros. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/entry/64: Merge SAVE_C_REGS and SAVE_EXTRA_REGS, remove unused extensionsDominik Brodowski
All current code paths call SAVE_C_REGS and then immediately SAVE_EXTRA_REGS. Therefore, merge these two macros and order the MOV sequeneces properly. While at it, remove the macros to save all except specific registers, as these macros have been unused for a long time. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/speculation: Clean up various Spectre related detailsIngo Molnar
Harmonize all the Spectre messages so that a: dmesg | grep -i spectre ... gives us most Spectre related kernel boot messages. Also fix a few other details: - clarify a comment about firmware speculation control - s/KPTI/PTI - remove various line-breaks that made the code uglier Acked-by: David Woodhouse <dwmw@amazon.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmapKarimAllah Ahmed
We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all MSR accesses or create a valid L02 MSR bitmap and use that. This decision has to be made every time we evaluate whether we are going to generate the L02 MSR bitmap. Before commit: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") ... this was probably OK since the decision was always identical. This is no longer the case now since the MSR bitmap might actually change once we decide to not intercept SPEC_CTRL and PRED_CMD. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arjan.van.de.ven@intel.com Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: kvm@vger.kernel.org Cc: sironi@amazon.de Link: http://lkml.kernel.org/r/1518305967-31356-6-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRsKarimAllah Ahmed
These two variables should check whether SPEC_CTRL and PRED_CMD are supposed to be passed through to L2 guests or not. While msr_write_intercepted_l01 would return 'true' if it is not passed through. So just invert the result of msr_write_intercepted_l01 to implement the correct semantics. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Jim Mattson <jmattson@google.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arjan.van.de.ven@intel.com Cc: dave.hansen@intel.com Cc: kvm@vger.kernel.org Cc: sironi@amazon.de Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL") Link: http://lkml.kernel.org/r/1518305967-31356-5-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), ↵David Woodhouse
by always inlining iterator helper methods With retpoline, tight loops of "call this function for every XXX" are very much pessimised by taking a prediction miss *every* time. This one is by far the biggest contributor to the guest launch time with retpoline. By marking the iterator slot_handle_…() functions always_inline, we can ensure that the indirect function call can be optimised away into a direct call and it actually generates slightly smaller code because some of the other conditionals can get optimised away too. Performance is now pretty close to what we see with nospectre_v2 on the command line. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Filippo Sironi <sironi@amazon.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arjan.van.de.ven@intel.com Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: rkrcmar@redhat.com Link: http://lkml.kernel.org/r/1518305967-31356-4-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"David Woodhouse
This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b. We cannot call C functions like that, without marking all the call-clobbered registers as, well, clobbered. We might have got away with it for now because the __ibp_barrier() function was *fairly* unlikely to actually use any other registers. But no. Just no. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arjan.van.de.ven@intel.com Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: sironi@amazon.de Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13x86/speculation: Correct Speculation Control microcode blacklist againDavid Woodhouse
Arjan points out that the Intel document only clears the 0xc2 microcode on *some* parts with CPUID 506E3 (INTEL_FAM6_SKYLAKE_DESKTOP stepping 3). For the Skylake H/S platform it's OK but for Skylake E3 which has the same CPUID it isn't (yet) cleared. So removing it from the blacklist was premature. Put it back for now. Also, Arjan assures me that the 0x84 microcode for Kaby Lake which was featured in one of the early revisions of the Intel document was never released to the public, and won't be until/unless it is also validated as safe. So those can change to 0x80 which is what all *other* versions of the doc have identified. Once the retrospective testing of existing public microcodes is done, we should be back into a mode where new microcodes are only released in batches and we shouldn't even need to update the blacklist for those anyway, so this tweaking of the list isn't expected to be a thing which keeps happening. Requested-by: Arjan van de Ven <arjan.van.de.ven@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arjan.van.de.ven@intel.com Cc: dave.hansen@intel.com Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1518449255-2182-1-git-send-email-dwmw@amazon.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-12drm/i915: Don't wake the device up to check if the engine is asleepChris Wilson
If the entire device is powered off, we can safely assume that the engine is also asleep (and idle). Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: a091d4ee931b ("drm/i915: Hold a wakeref for probing the ring registers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180212093928.6005-1-chris@chris-wilson.co.uk (cherry picked from commit 74d00d28a15c8452f65de0a9477b52d95639cc63) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-12drm/i915: Avoid truncation before clamping userspace's priority valueChris Wilson
Userspace provides a 64b value for the priority, we need to be careful to preserve the full range before validation to prevent truncation (and letting an illegal value pass). Reported-by: Antonio Argenziano <antonio.argenziano@intel.com> Fixes: ac14fbd460d0 ("drm/i915/scheduler: Support user-defined priorities") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180208085151.11480-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit 11a18f631959fd1ca10856c836a827683536770c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-12drm/i915/perf: Fix compiler warning for string truncationChris Wilson
drivers/gpu/drm/i915/i915_oa_cnl.c: In function ‘i915_perf_load_test_config_cnl’: drivers/gpu/drm/i915/i915_oa_cnl.c:99:2: error: ‘strncpy’ output truncated before terminating nul copying 36 bytes from a string of the same length [-Werror=stringop-truncation] v2: strlcpy Fixes: 95690a02fb5d ("drm/i915/perf: enable perf support on CNL") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180208102403.5587-2-chris@chris-wilson.co.uk (cherry picked from commit 020580ff8edd50e64ae1bf47e560c61e5e2f29fc) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-12drm/i915/perf: Fix compiler warning for string truncationChris Wilson
drivers/gpu/drm/i915/i915_oa_cflgt3.c: In function ‘i915_perf_load_test_config_cflgt3’: drivers/gpu/drm/i915/i915_oa_cflgt3.c:87:2: error: ‘strncpy’ output truncated before terminating nul copying 36 bytes from a string of the same length [-Werror=stringop-truncation] v2: strlcpy Fixes: 4407eaa9b0cc ("drm/i915/perf: add support for Coffeelake GT3") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180208102403.5587-1-chris@chris-wilson.co.uk (cherry picked from commit 43df81d324cdd7056ad0ce3df709aff8dce856b7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-12hwmon: (k10temp) Only apply temperature offset if result is positiveGuenter Roeck
A user reports a really bad temperature on Ryzen 1950X. k10temp-pci-00cb Adapter: PCI adapter temp1: +4294948.3°C (high = +70.0°C) This will happen if the temperature reported by the chip is lower than the offset temperature. This has been seen in the field if "Sense MI Skew" and/or "Sense MI Offset" BIOS parameters were set to unexpected values. Let's report a temperature of 0 degrees C in that case. Fixes: 1b50b776355f ("hwmon: (k10temp) Add support for temperature offsets") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-02-12nvme: Don't use a stack buffer for keep-alive commandRoland Dreier
In nvme_keep_alive() we pass a request with a pointer to an NVMe command on the stack into blk_execute_rq_nowait(). However, the block layer doesn't guarantee that the request is fully queued before blk_execute_rq_nowait() returns. If not, and the request is queued after nvme_keep_alive() returns, then we'll end up using stack memory that might have been overwritten to form the NVMe command we pass to hardware. Fix this by keeping a special command struct in the nvme_ctrl struct right next to the delayed work struct used for keep-alives. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2018-02-12Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - oversize stack frames on mn10300 in sha3-generic - warning on old compilers in sha3-generic - API error in sun4i_ss_prng - potential dead-lock in sun4i_ss_prng - null-pointer dereference in sha512-mb - endless loop when DECO acquire fails in caam - kernel oops when hashing empty message in talitos" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: sun4i_ss_prng - convert lock to _bh in sun4i_ss_prng_generate crypto: sun4i_ss_prng - fix return value of sun4i_ss_prng_generate crypto: caam - fix endless loop when DECO acquire fails crypto: sha3-generic - Use __optimize to support old compilers compiler-gcc.h: __nostackprotector needs gcc-4.4 and up compiler-gcc.h: Introduce __optimize function attribute crypto: sha3-generic - deal with oversize stack frames crypto: talitos - fix Kernel Oops on hashing an empty file crypto: sha512-mb - initialize pending lengths correctly
2018-02-12dma-mapping: fix a comment typoChristoph Hellwig
Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-12dma-direct: comment the dma_direct_free calling conventionChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-12dma-direct: mark as is_physChristoph Hellwig
Various PCI_DMA_BUS_IS_PHYS implementations rely on this flag to make proper decisions for block and networking addressability. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-12ia64: fix build failure with CONFIG_SWIOTLBCorentin Labbe
arch/ia64/kernel/pci-swiotlb.c is removed in commit 4fac8076df85 ("ia64: clean up swiotlb support") but pci-swiotlb.o is still present in Makefile, and so build fail when CONFIG_SWIOTLB is enabled. Fix the build failure by removing pci-swiotlb.o from makefile Fixes: 4fac8076df85 ("ia64: clean up swiotlb support") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-12arm64: Add missing Falkor part number for branch predictor hardeningShanker Donthineni
References to CPU part number MIDR_QCOM_FALKOR were dropped from the mailing list patch due to mainline/arm64 branch dependency. So this patch adds the missing part number. Fixes: ec82b567a74f ("arm64: Implement branch predictor hardening for Falkor") Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-12PM: cpuidle: Fix cpuidle_poll_state_init() prototypeRafael J. Wysocki
Commit f85942207516 (x86: PM: Make APM idle driver initialize polling state) made apm_init() call cpuidle_poll_state_init(), but that only is defined for CONFIG_CPU_IDLE set, so make the empty stub of it available for CONFIG_CPU_IDLE unset too to fix the resulting build issue. Fixes: f85942207516 (x86: PM: Make APM idle driver initialize polling state) Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12ACPI: dock: document sysfs interfaceAishwarya Pant
Description has been collected from git commit history and reading through code. Signed-off-by: Aishwarya Pant <aishpant@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12ACPI / DPTF: Document dptf_power sysfs atttributesAishwarya Pant
The descriptions have been collected from git commit logs and reading through code. Signed-off-by: Aishwarya Pant <aishpant@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12PM / runtime: Update links_count also if !CONFIG_SRCULukas Wunner
Commit baa8809f6097 (PM / runtime: Optimize the use of device links) added an invocation of pm_runtime_drop_link() to __device_link_del(). However there are two variants of that function, one for CONFIG_SRCU and another for !CONFIG_SRCU, and the commit only modified the former. Fixes: baa8809f6097 (PM / runtime: Optimize the use of device links) Cc: v4.10+ <stable@vger.kernel.org> # v4.10+ Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12PM / wakeirq: Fix unbalanced IRQ enable for wakeirqTony Lindgren
If a device is runtime PM suspended when we enter suspend and has a dedicated wake IRQ, we can get the following warning: WARNING: CPU: 0 PID: 108 at kernel/irq/manage.c:526 enable_irq+0x40/0x94 [ 102.087860] Unbalanced enable for IRQ 147 ... (enable_irq) from [<c06117a8>] (dev_pm_arm_wake_irq+0x4c/0x60) (dev_pm_arm_wake_irq) from [<c0618360>] (device_wakeup_arm_wake_irqs+0x58/0x9c) (device_wakeup_arm_wake_irqs) from [<c0615948>] (dpm_suspend_noirq+0x10/0x48) (dpm_suspend_noirq) from [<c01ac7ac>] (suspend_devices_and_enter+0x30c/0xf14) (suspend_devices_and_enter) from [<c01adf20>] (enter_state+0xad4/0xbd8) (enter_state) from [<c01ad3ec>] (pm_suspend+0x38/0x98) (pm_suspend) from [<c01ab3e8>] (state_store+0x68/0xc8) This is because the dedicated wake IRQ for the device may have been already enabled earlier by dev_pm_enable_wake_irq_check(). Fix the issue by checking for runtime PM suspended status. This issue can be easily reproduced by setting serial console log level to zero, letting the serial console idle, and suspend the system from an ssh terminal. On resume, dmesg will have the warning above. The reason why I have not run into this issue earlier has been that I typically run my PM test cases from on a serial console instead over ssh. Fixes: c84345597558 (PM / wakeirq: Enable dedicated wakeirq for suspend) Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12Documentation/ABI: update cpuidle sysfs documentationAishwarya Pant
Update cpuidle documentation using git logs and existing documentation in Documentation/cpuidle/sysfs.txt. This might be useful for scripting and tracking changes in the ABI. Signed-off-by: Aishwarya Pant <aishpant@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12mtd: nand: MTD_NAND_MARVELL should depend on HAS_DMAGeert Uytterhoeven
If NO_DMA=y: ERROR: "bad_dma_ops" [drivers/mtd/nand/marvell_nand.ko] undefined! Add a dependency on HAS_DMA to fix this. Fixes: 02f26ecf8c772751 ("mtd: nand: add reworked Marvell NAND controller driver") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Miquel Raynal <miquel.raynal@free-electrons.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-02-12mtd: nand: vf610: set correct ooblayoutStefan Agner
With commit 3cf32d180227 ("mtd: nand: vf610: switch to mtd_ooblayout_ops") the driver started to use the NAND cores default large page ooblayout. However, shortly after commit 6a623e076944 ("mtd: nand: add ooblayout for old hamming layout") changed the default layout to the old hamming layout, which is not what vf610_nfc is using. Specify the default large page layout explicitly. Fixes: 6a623e076944 ("mtd: nand: add ooblayout for old hamming layout") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-02-12Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Pull and OPP update for 4.16-rc2 from Viresh Kumar. * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: opp: cpu: Replace GFP_ATOMIC with GFP_KERNEL in dev_pm_opp_init_cpufreq_table
2018-02-12device property: Constify device_get_match_data()Andy Shevchenko
Constify device_get_match_data() as OF and ACPI variants return constant value. Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12ACPI / bus: Rename acpi_get_match_data() to acpi_device_get_match_data()Andy Shevchenko
Do the renaming to be consistent with its sibling, i.e. of_device_get_match_data(). No functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12ACPI / bus: Remove checks in acpi_get_match_data()Andy Shevchenko
As well as its sibling of_device_get_match_data() has no such checks, no need to do it in acpi_get_match_data(). First of all, we are not supposed to call fwnode API like this without driver attached. Second, since __acpi_match_device() does check input parameter there is no need to duplicate it outside. And last but not least one, the API should still serve the cases when ACPI device is enumerated via PRP0001. In such case driver has neither ACPI table nor driver data there. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-12ACPI / bus: Do not traverse through non-existed device tableAndy Shevchenko
When __acpi_match_device() is called it would be possible to have ACPI ID table a NULL pointer. To avoid potential dereference, check for this before traverse. While here, remove redundant 'else'. Note, this patch implies a bit of refactoring acpi_of_match_device() to return pointer to OF ID when matched followed by refactoring __acpi_match_device() to return either ACPI or OF ID when matches. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>