summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2018-07-24x86/apic: Future-proof the TSC_DEADLINE quirk for SKXLen Brown
All SKX with stepping higher than 4 support the TSC_DEADLINE, no matter the microcode version. Without this patch, upcoming SKX steppings will not be able to use their TSC_DEADLINE timer. Signed-off-by: Len Brown <len.brown@intel.com> Cc: <stable@kernel.org> # v4.14+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 616dd5872e ("x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping") Link: http://lkml.kernel.org/r/d0c7129e509660be9ec6b233284b8d42d90659e8.1532207856.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24x86/platform/pcspeaker: Use PTR_ERR_OR_ZERO() to fix ptr_ret.cocci warningYueHaibing
The ptr_ret.cocci script generates the following warning: arch/x86/kernel/pcspeaker.c:12:8-14: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO() rather than an open-coded version to fix this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: kstewart@linuxfoundation.org Cc: pombredanne@nexb.com Link: http://lkml.kernel.org/r/20180720073213.14996-1-yuehaibing@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "A single fix for a MCE-polling regression, which prevented the disabling of polling" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE: Remove min interval polling limitation
2018-07-21Merge branch 'x86-pti-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti fixes from Ingo Molnar: "An APM fix, and a BTS hardware-tracing fix related to PTI changes" * 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apm: Don't access __preempt_count with zeroed fs x86/events/intel/ds: Fix bts_interrupt_threshold alignment
2018-07-20kernfs: allow creating kernfs objects with arbitrary uid/gidDmitry Torokhov
This change allows creating kernfs files and directories with arbitrary uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments. The "simple" kernfs_create_file() and kernfs_create_dir() are left alone and always create objects belonging to the global root. When creating symlinks ownership (uid/gid) is taken from the target kernfs object. Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20Merge tag 'drm-intel-next-2018-07-19' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next On GEM side: - GuC related fixes (Chris, Michal) - GTT read-only pages support (Jon, Chris) - More selftests fixes (Chris) - More GPU reset improvements (Chris) - Flush caches after GGTT writes (Chris) - Handle recursive shrinker for vma->last_active allocation (Chris) - Other execlists fixes (Chris) On Display side: - GLK HDMI fix (Clint) - Rework and cleanup around HPD pin (Ville) - Preparation work for Display Stream Compression support coming on ICL (Anusha) - Nuke LVDS lid notification (Ville) - Assume eDP is always connected (Ville) - Kill intel panel detection (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> # gpg: Signature made Fri 20 Jul 2018 01:51:45 AM AEST # gpg: using RSA key FA625F640EEB13CA # gpg: Good signature from "Rodrigo Vivi <rodrigo.vivi@intel.com>" # gpg: aka "Rodrigo Vivi <rodrigo.vivi@gmail.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 6D20 7068 EEDD 6509 1C2C E2A3 FA62 5F64 0EEB 13CA # Conflicts: # drivers/gpu/drm/i915/intel_lrc.c Link: https://patchwork.freedesktop.org/patch/msgid/20180719171257.GA12199@intel.com
2018-07-20x86/ldt: Enable LDT user-mapping for PAEJoerg Roedel
This adds the needed special case for PAE to get the LDT mapped into the user page-table when PTI is enabled. The big difference to the other paging modes is that on PAE there is no full top-level PGD entry available for the LDT, but only a PMD entry. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-37-git-send-email-joro@8bytes.org
2018-07-20x86/ldt: Split out sanity check in map_ldt_struct()Joerg Roedel
This splits out the mapping sanity check and the actual mapping of the LDT to user-space from the map_ldt_struct() function in a way so that it is re-usable for PAE paging. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-36-git-send-email-joro@8bytes.org
2018-07-20x86/ldt: Define LDT_END_ADDRJoerg Roedel
It marks the end of the address-space range reserved for the LDT. The LDT-code will use it when unmapping the LDT for user-space. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-35-git-send-email-joro@8bytes.org
2018-07-20x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bitJoerg Roedel
The pti_clone_kernel_text() function references __end_rodata_hpage_align, which is only present on x86-64. This makes sense as the end of the rodata section is not huge-page aligned on 32 bit. Nevertheless a symbol is required for the function that points at the right address for both 32 and 64 bit. Introduce __end_rodata_aligned for that purpose and use it in pti_clone_kernel_text(). Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-28-git-send-email-joro@8bytes.org
2018-07-20x86/pgtable/32: Allocate 8k page-tables when PTI is enabledJoerg Roedel
Allocate a kernel and a user page-table root when PTI is enabled. Also allocate a full page per root for PAE because otherwise the bit to flip in CR3 to switch between them would be non-constant, which creates a lot of hassle. Keep that for a later optimization. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-18-git-send-email-joro@8bytes.org
2018-07-20x86/entry: Rename update_sp0 to update_task_stackJoerg Roedel
The function does not update sp0 anymore but updates makes the task-stack visible for entry code. This is by either writing it to sp1 or by doing a hypercall. Rename the function to get rid of the misleading name. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-15-git-send-email-joro@8bytes.org
2018-07-20x86/entry/32: Enter the kernel via trampoline stackJoerg Roedel
Use the entry-stack as a trampoline to enter the kernel. The entry-stack is already in the cpu_entry_area and will be mapped to userspace when PTI is enabled. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-8-git-send-email-joro@8bytes.org
2018-07-20x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handlerJoerg Roedel
x86_tss.sp0 will be used to point to the entry stack later to use it as a trampoline stack for other kernel entry points besides SYSENTER. So store the real task stack pointer in x86_tss.sp1, which is otherwise unused by the hardware, as Linux doesn't make use of Ring 1. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-4-git-send-email-joro@8bytes.org
2018-07-20x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stackJoerg Roedel
The stack address doesn't need to be stored in tss.sp0 if the stack is switched manually like on sysenter. Rename the offset so that it still makes sense when its location is changed in later patches. This stackk will also be used for all kernel-entry points, not just sysenter. Reflect that and the fact that it is the offset to the task-stack location in the name as well. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-3-git-send-email-joro@8bytes.org
2018-07-20x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.cJoerg Roedel
These offsets will be used in 32 bit assembly code as well, so make them available for all of x86 code. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-2-git-send-email-joro@8bytes.org
2018-07-20x86/tsc: Make use of tsc_calibrate_cpu_early()Pavel Tatashin
During early boot enable tsc_calibrate_cpu_early() and switch to tsc_calibrate_cpu() only later. Do this unconditionally, because it is unknown what methods other cpus will use to calibrate once they are onlined. If by the time tsc_init() is called tsc frequency is still unknown do only pit_hpet_ptimer_calibrate_cpu() to calibrate, as this function contains the only methods wich have not been called and tried earlier. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-27-pasha.tatashin@oracle.com
2018-07-20x86/tsc: Split native_calibrate_cpu() into early and late partsPavel Tatashin
During early boot TSC and CPU frequency can be calibrated using MSR, CPUID, and quick PIT calibration methods. The other methods PIT/HPET/PMTIMER are available only after ACPI is initialized. Split native_calibrate_cpu() into early and late parts so they can be called separately during early and late tsc calibration. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-26-pasha.tatashin@oracle.com
2018-07-20x86/tsc: Use TSC as sched clock earlyPavel Tatashin
All prerequesites for enabling TSC as sched clock early in the boot process are available now: - Early attempt of TSC calibration - Early availablity of static branch patching If TSC frequency can be established in the early calibration, enable the static key which switches sched clock to use TSC. [ tglx: Massaged changelog ] Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-22-pasha.tatashin@oracle.com
2018-07-20x86/tsc: Initialize cyc2ns when tsc frequency is determinedPavel Tatashin
cyc2ns converts tsc to nanoseconds, and it is handled in a per-cpu data structure. Currently, the setup code for c2ns data for every possible CPU goes through the same sequence of calculations as for the boot CPU, but is based on the same tsc frequency as the boot CPU, and thus this is not necessary. Initialize the boot cpu when tsc frequency is determined. Copy the calculated data from the boot CPU to the other CPUs in tsc_init(). In addition do the following: - Remove unnecessary zeroing of c2ns data by removing cyc2ns_data_init() - Split set_cyc2ns_scale() into two functions, so set_cyc2ns_scale() can be called when system is up, and wraps around __set_cyc2ns_scale() that can be called directly when system is booting but avoids saving restoring IRQs and going and waking up from idle. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-21-pasha.tatashin@oracle.com
2018-07-20x86/tsc: Calibrate tsc only oncePavel Tatashin
During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(), and the second time in tsc_init(). Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so the calibration is done only early, and make tsc_init() to use the values already determined in tsc_early_init(). Sometimes it is not possible to determine tsc early, as the subsystem that is required is not yet initialized, in such case try again later in tsc_init(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-20-pasha.tatashin@oracle.com
2018-07-20x86/tsc: Redefine notsc to behave as tsc=unstablePavel Tatashin
Currently, the notsc kernel parameter disables the use of the TSC by sched_clock(). However, this parameter does not prevent the kernel from accessing tsc in other places. The only rationale to boot with notsc is to avoid timing discrepancies on multi-socket systems where TSC are not properly synchronized, and thus exclude TSC from being used for time keeping. But that prevents using TSC as sched_clock() as well, which is not necessary as the core sched_clock() implementation can handle non synchronized TSC based sched clocks just fine. However, there is another method to solve the above problem: booting with tsc=unstable parameter. This parameter allows sched_clock() to use TSC and just excludes it from timekeeping. So there is no real reason to keep notsc, but for compatibility reasons the parameter has to stay. Make it behave like 'tsc=unstable' instead. [ tglx: Massaged changelog ] Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
2018-07-20x86/CPU: Call detect_nopl() only on the BSPBorislav Petkov
Make it use the setup_* variants and have it be called only on the BSP and drop the call in generic_identify() - X86_FEATURE_NOPL will be replicated to the APs through the forced caps. Helps to keep the mess at a manageable level. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-11-pasha.tatashin@oracle.com
2018-07-20x86/jump_label: Initialize static branching earlyPavel Tatashin
Static branching is useful to runtime patch branches that are used in hot path, but are infrequently changed. The x86 clock framework is one example that uses static branches to setup the best clock during boot and never changes it again. It is desired to enable the TSC based sched clock early to allow fine grained boot time analysis early on. That requires the static branching functionality to be functional early as well. Static branching requires patching nop instructions, thus, arch_init_ideal_nops() must be called prior to jump_label_init(). Do all the necessary steps to call arch_init_ideal_nops() right after early_cpu_init(), which also allows to insert a call to jump_label_init() right after that. jump_label_init() will be called again from the generic init code, but the code is protected against reinitialization already. [ tglx: Massaged changelog ] Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-10-pasha.tatashin@oracle.com
2018-07-20x86/alternatives, jumplabel: Use text_poke_early() before mm_init()Pavel Tatashin
It supposed to be safe to modify static branches after jump_label_init(). But, because static key modifying code eventually calls text_poke() it can end up accessing a struct page which has not been initialized yet. Here is how to quickly reproduce the problem. Insert code like this into init/main.c: | +static DEFINE_STATIC_KEY_FALSE(__test); | asmlinkage __visible void __init start_kernel(void) | { | char *command_line; |@@ -587,6 +609,10 @@ asmlinkage __visible void __init start_kernel(void) | vfs_caches_init_early(); | sort_main_extable(); | trap_init(); |+ { |+ static_branch_enable(&__test); |+ WARN_ON(!static_branch_likely(&__test)); |+ } | mm_init(); The following warnings show-up: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:701 text_poke+0x20d/0x230 RIP: 0010:text_poke+0x20d/0x230 Call Trace: ? text_poke_bp+0x50/0xda ? arch_jump_label_transform+0x89/0xe0 ? __jump_label_update+0x78/0xb0 ? static_key_enable_cpuslocked+0x4d/0x80 ? static_key_enable+0x11/0x20 ? start_kernel+0x23e/0x4c8 ? secondary_startup_64+0xa5/0xb0 ---[ end trace abdc99c031b8a90a ]--- If the code above is moved after mm_init(), no warning is shown, as struct pages are initialized during handover from memblock. Use text_poke_early() in static branching until early boot IRQs are enabled and from there switch to text_poke. Also, ensure text_poke() is never invoked when unitialized memory access may happen by using adding a !after_bootmem assertion. Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-9-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Switch kvmclock data to a PER_CPU variableThomas Gleixner
The previous removal of the memblock dependency from kvmclock introduced a static data array sized 64bytes * CONFIG_NR_CPUS. That's wasteful on large systems when kvmclock is not used. Replace it with: - A static page sized array of pvclock data. It's page sized because the pvclock data of the boot cpu is mapped into the VDSO so otherwise random other data would be exposed to the vDSO - A PER_CPU variable of pvclock data pointers. This is used to access the pcvlock data storage on each CPU. The setup is done in two stages: - Early boot stores the pointer to the static page for the boot CPU in the per cpu data. - In the preparatory stage of CPU hotplug assign either an element of the static array (when the CPU number is in that range) or allocate memory and initialize the per cpu pointer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-8-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Move kvmclock vsyscall param and init to kvmclockThomas Gleixner
There is no point to have this in the kvm code itself and call it from there. This can be called from an initcall and the parameter is cleared when the hypervisor is not KVM. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-7-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Mark variables __initdata and __ro_after_initThomas Gleixner
The kvmclock parameter is init data and the other variables are not modified after init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-6-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Cleanup the codeThomas Gleixner
- Cleanup the mrs write for wall clock. The type casts to (int) are sloppy because the wrmsr parameters are u32 and aside of that wrmsrl() already provides the high/low split for free. - Remove the pointless get_cpu()/put_cpu() dance from various functions. Either they are called during early init where CPU is guaranteed to be 0 or they are already called from non preemptible context where smp_processor_id() can be used safely - Simplify the convoluted check for kvmclock in the init function. - Mark the parameter parsing function __init. No point in keeping it around. - Convert to pr_info() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-5-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Decrapify kvm_register_clock()Thomas Gleixner
The return value is pointless because the wrmsr cannot fail if KVM_FEATURE_CLOCKSOURCE or KVM_FEATURE_CLOCKSOURCE2 are set. kvm_register_clock() is only called locally so wants to be static. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-4-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Remove page size requirement from wall_clockThomas Gleixner
There is no requirement for wall_clock data to be page aligned or page sized. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-3-pasha.tatashin@oracle.com
2018-07-20x86/kvmclock: Remove memblock dependencyPavel Tatashin
KVM clock is initialized later compared to other hypervisor clocks because it has a dependency on the memblock allocator. Bring it in line with other hypervisors by using memory from the BSS instead of allocating it. The benefits: - Remove ifdef from common code - Earlier availability of the clock - Remove dependency on memblock, and reduce code The downside: - Static allocation of the per cpu data structures sized NR_CPUS * 64byte Will be addressed in follow up patches. [ tglx: Split out from larger series ] Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-2-pasha.tatashin@oracle.com
2018-07-19Merge branch 'linus' into x86/timersThomas Gleixner
Pick up upstream changes to avoid conflicts
2018-07-19x86: Avoid pr_cont() in show_opcodes()Rasmus Villemoes
If concurrent printk() messages are emitted, then pr_cont() is making it extremly hard to decode which part of the output belongs to what. See the convoluted example at: https://syzkaller.appspot.com/text?tag=CrashReport&x=139d342c400000 Avoid this by using a proper prefix for each line and by using %ph format in show_opcodes() which emits the 'Code:' line in one go. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: joe@perches.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/1532009278-5953-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
2018-07-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Miscellaneous bugfixes, plus a small patchlet related to Spectre v2" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvmclock: fix TSC calibration for nested guests KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. x86/kvmclock: set pvti_cpu0_va after enabling kvmclock x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
2018-07-18kvmclock: fix TSC calibration for nested guestsPeng Hao
Inside a nested guest, access to hardware can be slow enough that tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work to be called periodically and the nested guest to spend a lot of time reading the ACPI timer. However, if the TSC frequency is available from the pvclock page, we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration. 'refine' operation. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> [Commit message rewritten. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-17x86/MCE: Remove min interval polling limitationDewet Thibaut
commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min interval limitation when setting the check interval for polled MCEs. However, the logic is that 0 disables polling for corrected MCEs, see Documentation/x86/x86_64/machinecheck. The limitation prevents disabling. Remove this limitation and allow the value 0 to disable polling again. Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes") Signed-off-by: Dewet Thibaut <thibaut.dewet@nokia.com> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com
2018-07-16x86/apm: Don't access __preempt_count with zeroed fsVille Syrjälä
APM_DO_POP_SEGS does not restore fs/gs which were zeroed by APM_DO_ZERO_SEGS. Trying to access __preempt_count with zeroed fs doesn't really work. Move the ibrs call outside the APM_DO_SAVE_SEGS/APM_DO_RESTORE_SEGS invocations so that fs is actually restored before calling preempt_enable(). Fixes the following sort of oopses: [ 0.313581] general protection fault: 0000 [#1] PREEMPT SMP [ 0.313803] Modules linked in: [ 0.314040] CPU: 0 PID: 268 Comm: kapmd Not tainted 4.16.0-rc1-triton-bisect-00090-gdd84441a7971 #19 [ 0.316161] EIP: __apm_bios_call_simple+0xc8/0x170 [ 0.316161] EFLAGS: 00210016 CPU: 0 [ 0.316161] EAX: 00000102 EBX: 00000000 ECX: 00000102 EDX: 00000000 [ 0.316161] ESI: 0000530e EDI: dea95f64 EBP: dea95f18 ESP: dea95ef0 [ 0.316161] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 0.316161] CR0: 80050033 CR2: 00000000 CR3: 015d3000 CR4: 000006d0 [ 0.316161] Call Trace: [ 0.316161] ? cpumask_weight.constprop.15+0x20/0x20 [ 0.316161] on_cpu0+0x44/0x70 [ 0.316161] apm+0x54e/0x720 [ 0.316161] ? __switch_to_asm+0x26/0x40 [ 0.316161] ? __schedule+0x17d/0x590 [ 0.316161] kthread+0xc0/0xf0 [ 0.316161] ? proc_apm_show+0x150/0x150 [ 0.316161] ? kthread_create_worker_on_cpu+0x20/0x20 [ 0.316161] ret_from_fork+0x2e/0x38 [ 0.316161] Code: da 8e c2 8e e2 8e ea 57 55 2e ff 1d e0 bb 5d b1 0f 92 c3 5d 5f 07 1f 89 47 0c 90 8d b4 26 00 00 00 00 90 8d b4 26 00 00 00 00 90 <64> ff 0d 84 16 5c b1 74 7f 8b 45 dc 8e e0 8b 45 d8 8e e8 8b 45 [ 0.316161] EIP: __apm_bios_call_simple+0xc8/0x170 SS:ESP: 0068:dea95ef0 [ 0.316161] ---[ end trace 656253db2deaa12c ]--- Fixes: dd84441a7971 ("x86/speculation: Use IBRS if available before calling into firmware") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180709133534.5963-1-ville.syrjala@linux.intel.com
2018-07-16Merge 4.18-rc5 into char-misc-nextGreg Kroah-Hartman
We want the char-misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-15Merge tag 'v4.18-rc5' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-15x86/kvmclock: set pvti_cpu0_va after enabling kvmclockRadim Krčmář
pvti_cpu0_va is the address of shared kvmclock data structure. pvti_cpu0_va is currently kept unset (1) on 32 bit systems, (2) when kvmclock vsyscall is disabled, and (3) if kvmclock is not stable. This poses a problem, because kvm_ptp needs pvti_cpu0_va, but (1) can work on 32 bit, (2) has little relation to the vsyscall, and (3) does not need stable kvmclock (although kvmclock won't be used for system clock if it's not stable, so kvm_ptp is pointless in that case). Expose pvti_cpu0_va whenever kvmclock is enabled to allow all users to work with it. This fixes a regression found on Gentoo: https://bugs.gentoo.org/658544. Fixes: 9f08890ab906 ("x86/pvclock: add setter for pvclock_pvti_cpu0_va") Cc: stable@vger.kernel.org Reported-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-13x86/bugs, kvm: Introduce boot-time control of L1TF mitigationsJiri Kosina
Introduce the 'l1tf=' kernel command line option to allow for boot-time switching of mitigation that is used on processors affected by L1TF. The possible values are: full Provides all available mitigations for the L1TF vulnerability. Disables SMT and enables all mitigations in the hypervisors. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. full,force Same as 'full', but disables SMT control. Implies the 'nosmt=force' command line option. sysfs control of SMT and the hypervisor flush control is disabled. flush Leaves SMT enabled and enables the conditional hypervisor mitigation. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. flush,nosmt Disables SMT and enables the conditional hypervisor mitigation. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. If SMT is reenabled or flushing disabled at runtime hypervisors will issue a warning. flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is started in a potentially insecure configuration. off Disables hypervisor mitigations and doesn't emit any warnings. Default is 'flush'. Let KVM adhere to these semantics, which means: - 'lt1f=full,force' : Performe L1D flushes. No runtime control possible. - 'l1tf=full' - 'l1tf-flush' - 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if SMT has been runtime enabled or L1D flushing has been run-time enabled - 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted. - 'l1tf=off' : L1D flushes are not performed and no warnings are emitted. KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush' module parameter except when lt1f=full,force is set. This makes KVM's private 'nosmt' option redundant, and as it is a bit non-systematic anyway (this is something to control globally, not on hypervisor level), remove that option. Add the missing Documentation entry for the l1tf vulnerability sysfs file while at it. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
2018-07-13cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED earlyThomas Gleixner
The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support SMT) when the sysfs SMT control file is initialized. That was fine so far as this was only required to make the output of the control file correct and to prevent writes in that case. With the upcoming l1tf command line parameter, this needs to be set up before the L1TF mitigation selection and command line parsing happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
2018-07-13x86/kvm: Allow runtime control of L1D flushThomas Gleixner
All mitigation modes can be switched at run time with a static key now: - Use sysfs_streq() instead of strcmp() to handle the trailing new line from sysfs writes correctly. - Make the static key management handle multiple invocations properly. - Set the module parameter file to RW Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
2018-07-13x86/l1tf: Handle EPT disabled state properThomas Gleixner
If Extended Page Tables (EPT) are disabled or not supported, no L1D flushing is required. The setup function can just avoid setting up the L1D flush for the EPT=n case. Invoke it after the hardware setup has be done and enable_ept has the correct state and expose the EPT disabled state in the mitigation status as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
2018-07-13x86/litf: Introduce vmx status variableThomas Gleixner
Store the effective mitigation of VMX in a status variable and use it to report the VMX state in the l1tf sysfs file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
2018-07-12x86/intel_rdt: Fix possible circular lock dependencyReinette Chatre
Lockdep is reporting a possible circular locking dependency: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc1-test-test+ #4 Not tainted ------------------------------------------------------ user_example/766 is trying to acquire lock: 0000000073479a0f (rdtgroup_mutex){+.+.}, at: pseudo_lock_dev_mmap but task is already holding lock: 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_sem){++++}: _copy_to_user+0x1e/0x70 filldir+0x91/0x100 dcache_readdir+0x54/0x160 iterate_dir+0x142/0x190 __x64_sys_getdents+0xb9/0x170 do_syscall_64+0x86/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #1 (&sb->s_type->i_mutex_key#3){++++}: start_creating+0x60/0x100 debugfs_create_dir+0xc/0xc0 rdtgroup_pseudo_lock_create+0x217/0x4d0 rdtgroup_schemata_write+0x313/0x3d0 kernfs_fop_write+0xf0/0x1a0 __vfs_write+0x36/0x190 vfs_write+0xb7/0x190 ksys_write+0x52/0xc0 do_syscall_64+0x86/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rdtgroup_mutex){+.+.}: __mutex_lock+0x80/0x9b0 pseudo_lock_dev_mmap+0x2f/0x170 mmap_region+0x3d6/0x610 do_mmap+0x387/0x580 vm_mmap_pgoff+0xcf/0x110 ksys_mmap_pgoff+0x170/0x1f0 do_syscall_64+0x86/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: rdtgroup_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&sb->s_type->i_mutex_key#3); lock(&mm->mmap_sem); lock(rdtgroup_mutex); *** DEADLOCK *** 1 lock held by user_example/766: #0: 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x110 rdtgroup_mutex is already being released temporarily during pseudo-lock region creation to prevent the potential deadlock between rdtgroup_mutex and mm->mmap_sem that is obtained during device_create(). Move the debugfs creation into this area to avoid the same circular dependency. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: vikas.shivappa@linux.intel.com Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/fffb57f9c6b8285904c9a60cc91ce21591af17fe.1531332480.git.reinette.chatre@intel.com
2018-07-10x86/gpu: reserve ICL's graphics stolen memoryPaulo Zanoni
ICL changes the registers and addresses to 64 bits. I also briefly looked at implementing an u64 version of the PCI config read functions, but I concluded this wouldn't be trivial, so it's not worth doing it for a single user that can't have any racing problems while reading the register in two separate operations. v2: - Scrub the development (non-public) changelog (Joonas). - Remove the i915.ko bits so this can be easily backported in order to properly avoid stolen memory even on machines without i915.ko (Joonas). - CC stable for the reasons above. Issue: VIZ-9250 CC: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Fixes: 412310019a20 ("drm/i915/icl: Add initial Icelake definitions.") Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180504203252.28048-1-paulo.r.zanoni@intel.com
2018-07-08Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti updates from Thomas Gleixner: "Two small fixes correcting the handling of SSB mitigations on AMD processors" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR x86/bugs: Update when to check for the LS_CFG SSBD mitigation
2018-07-07x86/mtrr: Don't copy out-of-bounds data in mtrr_writeJann Horn
Don't access the provided buffer out of bounds - this can cause a kernel out-of-bounds read when invoked through sys_splice() or other things that use kernel_write()/__kernel_write(). Fixes: 7f8ec5a4f01a ("x86/mtrr: Convert to use strncpy_from_user() helper") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180706215003.156702-1-jannh@google.com