summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2017-02-05x86/CPU/AMD: Bring back Compute Unit IDBorislav Petkov
Commit: a33d331761bc ("x86/CPU/AMD: Fix Bulldozer topology") restored the initial approach we had with the Fam15h topology of enumerating CU (Compute Unit) threads as cores. And this is still correct - they're beefier than HT threads but still have some shared functionality. Our current approach has a problem with the Mad Max Steam game, for example. Yves Dionne reported a certain "choppiness" while playing on v4.9.5. That problem stems most likely from the fact that the CU threads share resources within one CU and when we schedule to a thread of a different compute unit, this incurs latency due to migrating the working set to a different CU through the caches. When the thread siblings mask mirrors that aspect of the CUs and threads, the scheduler pays attention to it and tries to schedule within one CU first. Which takes care of the latency, of course. Reported-by: Yves Dionne <yves.dionne@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.9 Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30x86/microcode: Do not access the initrd after it has been freedBorislav Petkov
When we look for microcode blobs, we first try builtin and if that doesn't succeed, we fallback to the initrd supplied to the kernel. However, at some point doing boot, that initrd gets jettisoned and we shouldn't access it anymore. But we do, as the below KASAN report shows. That's because find_microcode_in_initrd() doesn't check whether the initrd is still valid or not. So do that. ================================================================== BUG: KASAN: use-after-free in find_cpio_data Read of size 1 by task swapper/1/0 page:ffffea0000db9d40 count:0 mapcount:0 mapping: (null) index:0x1 flags: 0x100000000000000() raw: 0100000000000000 0000000000000000 0000000000000001 00000000ffffffff raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.10.0-rc5-debug-00075-g2dbde22 #3 Hardware name: Dell Inc. XPS 13 9360/0839Y6, BIOS 1.2.3 12/01/2016 Call Trace: dump_stack ? _atomic_dec_and_lock ? __dump_page kasan_report_error ? pointer ? find_cpio_data __asan_report_load1_noabort ? find_cpio_data find_cpio_data ? vsprintf ? dump_stack ? get_ucode_user ? print_usage_bug find_microcode_in_initrd __load_ucode_intel ? collect_cpu_info_early ? debug_check_no_locks_freed load_ucode_intel_ap ? collect_cpu_info ? trace_hardirqs_on ? flat_send_IPI_mask_allbutself load_ucode_ap ? get_builtin_firmware ? flush_tlb_func ? do_raw_spin_trylock ? cpumask_weight cpu_init ? trace_hardirqs_off ? play_dead_common ? native_play_dead ? hlt_play_dead ? syscall_init ? arch_cpu_idle_dead ? do_idle start_secondary start_cpu Memory state around the buggy address: ffff880036e74f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880036e74f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff880036e75000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff880036e75080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880036e75100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Tested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170126165833.evjemhbqzaepirxo@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-12x86/unwind: Include __schedule() in stack tracesJosh Poimboeuf
In the following commit: 0100301bfdf5 ("sched/x86: Rewrite the switch_to() code") ... the layout of the 'inactive_task_frame' struct was designed to have a frame pointer header embedded in it, so that the unwinder could use the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or ret_from_fork() for newly forked tasks which haven't actually run yet). Finish the job by changing get_frame_pointer() to return a pointer to inactive_task_frame's 'bp' field rather than 'bp' itself. This allows the unwinder to start one frame higher on the stack, so that it properly reports __schedule(). Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-12x86/unwind: Disable KASAN checks for non-current tasksJosh Poimboeuf
There are a handful of callers to save_stack_trace_tsk() and show_stack() which try to unwind the stack of a task other than current. In such cases, it's remotely possible that the task is running on one CPU while the unwinder is reading its stack from another CPU, causing the unwinder to see stack corruption. These cases seem to be mostly harmless. The unwinder has checks which prevent it from following bad pointers beyond the bounds of the stack. So it's not really a bug as long as the caller understands that unwinding another task will not always succeed. In such cases, it's possible that the unwinder may read a KASAN-poisoned region of the stack. Account for that by using READ_ONCE_NOCHECK() when reading the stack of another task. Use READ_ONCE() when reading the stack of the current task, since KASAN warnings can still be useful for finding bugs in that case. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-09x86/microcode/intel: Add a helper which gives the microcode revisionBorislav Petkov
Since on Intel we're required to do CPUID(1) first, before reading the microcode revision MSR, let's add a special helper which does the required steps so that we don't forget to do them next time, when we want to read the microcode revision. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09x86/CPU: Add native CPUID variants returning a single datumBorislav Petkov
... similarly to the cpuid_<reg>() variants. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-05x86/cpu: Fix typo in the comment for AnniedaleAndy Shevchenko
The proper spelling of Anniedale SoC with 'e' in the middle. Fix typo in the comment line in intel-family.h header. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170102092229.87036-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-29mm: optimize PageWaiters bit use for unlock_page()Linus Torvalds
In commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: Nick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-23Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "There's a number of fixes: - a round of fixes for CPUID-less legacy CPUs - a number of microcode loader fixes - i8042 detection robustization fixes - stack dump/unwinder fixes - x86 SoC platform driver fixes - a GCC 7 warning fix - virtualization related fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) Revert "x86/unwind: Detect bad stack return address" x86/paravirt: Mark unused patch_default label x86/microcode/AMD: Reload proper initrd start address x86/platform/intel/quark: Add printf attribute to imr_self_test_result() x86/platform/intel-mid: Switch MPU3050 driver to IIO x86/alternatives: Do not use sync_core() to serialize I$ x86/topology: Document cpu_llc_id x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic x86/asm: Rewrite sync_core() to use IRET-to-self x86/microcode/intel: Replace sync_core() with native_cpuid() Revert "x86/boot: Fail the boot if !M486 and CPUID is missing" x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 x86/tools: Fix gcc-7 warning in relocs.c x86/unwind: Dump stack data on warnings x86/unwind: Adjust last frame check for aligned function stacks x86/init: Fix a couple of comment typos x86/init: Remove i8042_detect() from platform ops Input: i8042 - Trust firmware a bit more when probing on X86 x86/init: Add i8042 state to the platform data ...
2016-12-22Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache allocation interface from Thomas Gleixner: "This provides support for Intel's Cache Allocation Technology, a cache partitioning mechanism. The interface is odd, but the hardware interface of that CAT stuff is odd as well. We tried hard to come up with an abstraction, but that only allows rather simple partitioning, but no way of sharing and dealing with the per package nature of this mechanism. In the end we decided to expose the allocation bitmaps directly so all combinations of the hardware can be utilized. There are two ways of associating a cache partition: - Task A task can be added to a resource group. It uses the cache partition associated to the group. - CPU All tasks which are not member of a resource group use the group to which the CPU they are running on is associated with. That allows for simple CPU based partitioning schemes. The main expected user sare: - Virtualization so a VM can only trash only the associated part of the cash w/o disturbing others - Real-Time systems to seperate RT and general workloads. - Latency sensitive enterprise workloads - In theory this also can be used to protect against cache side channel attacks" [ Intel RDT is "Resource Director Technology". The interface really is rather odd and very specific, which delayed this pull request while I was thinking about it. The pull request itself came in early during the merge window, I just delayed it until things had calmed down and I had more time. But people tell me they'll use this, and the good news is that it is _so_ specific that it's rather independent of anything else, and no user is going to depend on the interface since it's pretty rare. So if push comes to shove, we can just remove the interface and nothing will break ] * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/intel_rdt: Implement show_options() for resctrlfs x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount x86/intel_rdt: Fix setting of closid when adding CPUs to a group x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee x86/intel_rdt: Reset per cpu closids on unmount x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A x86/intel_rdt: Prevent deadlock against hotplug lock x86/intel_rdt: Protect info directory from removal x86/intel_rdt: Add info files to Documentation x86/intel_rdt: Export the minimum number of set mask bits in sysfs x86/intel_rdt: Propagate error in rdt_mount() properly x86/intel_rdt: Add a missing #include MAINTAINERS: Add maintainer for Intel RDT resource allocation x86/intel_rdt: Add scheduler hook x86/intel_rdt: Add schemata file x86/intel_rdt: Add tasks files x86/intel_rdt: Add cpus file x86/intel_rdt: Add mkdir to resctrl file system x86/intel_rdt: Add "info" files to resctrl file system ...
2016-12-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Early fixes for x86. Instead of the (botched) revert, the lockdep/might_sleep splat has a real fix provided by Andrea" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) kvm: take srcu lock around kvm_steal_time_set_preempted() kvm: fix schedule in atomic in kvm_steal_time_set_preempted() KVM: hyperv: fix locking of struct kvm_hv fields KVM: x86: Expose Intel AVX512IFMA/AVX512VBMI/SHA features to guest. kvm: nVMX: Correct a VMX instruction error code for VMPTRLD
2016-12-19x86/asm: Rewrite sync_core() to use IRET-to-selfAndy Lutomirski
Aside from being excessively slow, CPUID is problematic: Linux runs on a handful of CPUs that don't have CPUID. Use IRET-to-self instead. IRET-to-self works everywhere, so it makes testing easy. For reference, On my laptop, IRET-to-self is ~110ns, CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM, and MOV-to-CR2 is ~42ns. While we're at it: sync_core() serves a very specific purpose. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/5c79f0225f68bc8c40335612bf624511abb78941.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernelsAndy Lutomirski
We support various non-Intel CPUs that don't have the CPUID instruction, so the M486 test was wrong. For now, fix it with a big hammer: handle missing CPUID on all 32-bit CPUs. Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/685bd083a7c036f7769510b6846315b17d6ba71f.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19x86/unwind: Dump stack data on warningsJosh Poimboeuf
The unwinder warnings are good at finding unexpected unwinder issues, but they often don't give enough data to be able to fully diagnose them. Print a one-time stack dump when a warning is detected. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/15607370e3ddb1732b6a73d5c65937864df16ac8.1481904011.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19x86/init: Fix a couple of comment typosDmitry Torokhov
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-5-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19x86/init: Remove i8042_detect() from platform opsDmitry Torokhov
Now that i8042 uses flag in legacy platform data, i8042_detect() is no longer used and can be removed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Takashi Iwai <tiwai@suse.de> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-4-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19x86/init: Add i8042 state to the platform dataDmitry Torokhov
Add i8042 state to the platform data to help i8042 driver make decision whether to probe for i8042 or not. We recognize 3 states: platform/subarch ca not possible have i8042 (as is the case with Inrel MID platform), firmware (such as ACPI) reports that i8042 is absent from the device, or i8042 may be present and the driver should probe for it. The intent is to allow i8042 driver abort initialization on x86 if PNP data (absence of both keyboard and mouse PNP devices) agrees with firmware data. It will also allow us to remove i8042_detect later. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Takashi Iwai <tiwai@suse.de> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-2-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-18Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "This is the last functional update from the tip tree for 4.10. It got delayed due to a newly reported and anlyzed variant of BIOS bug and the resulting wreckage: - Seperation of TSC being marked realiable and the fact that the platform provides the TSC frequency via CPUID/MSRs and making use for it for GOLDMONT. - TSC adjust MSR validation and sanitizing: The TSC adjust MSR contains the offset to the hardware counter. The sum of the adjust MSR and the counter is the TSC value which is read via RDTSC. On at least two machines from different vendors the BIOS sets the TSC adjust MSR to negative values. This happens on cold and warm boot. While on cold boot the offset is a few milliseconds, on warm boot it basically compensates the power on time of the system. The BIOSes are not even using the adjust MSR to set all CPUs in the package to the same offset. The offsets are different which renders the TSC unusable, What's worse is that the TSC deadline timer has a HW feature^Wbug. It malfunctions when the TSC adjust value is negative or greater equal 0x80000000 resulting in silent boot failures, hard lockups or non firing timers. This looks like some hardware internal 32/64bit issue with a sign extension problem. Intel has been silent so far on the issue. The update contains sanity checks and keeps the adjust register within working limits and in sync on the package. As it looks like this disease is spreading via BIOS crapware, we need to address this urgently as the boot failures are hard to debug for users" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Limit the adjust value further x86/tsc: Annotate printouts as firmware bug x86/tsc: Force TSC_ADJUST register to value >= zero x86/tsc: Validate TSC_ADJUST after resume x86/tsc: Validate cpumask pointer before accessing it x86/tsc: Fix broken CONFIG_X86_TSC=n build x86/tsc: Try to adjust TSC if sync test fails x86/tsc: Prepare warp test for TSC adjustment x86/tsc: Move sync cleanup to a safe place x86/tsc: Sync test only for the first cpu in a package x86/tsc: Verify TSC_ADJUST from idle x86/tsc: Store and check TSC ADJUST MSR x86/tsc: Detect random warps x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() x86/tsc: Finalize the split of the TSC_RELIABLE flag x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable x86/tsc: Mark TSC frequency determined by CPUID as known x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag
2016-12-18Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes and cleanups from Thomas Gleixner: "This set of updates contains: - Robustification for the logical package managment. Cures the AMD and virtualization issues. - Put the correct start_cpu() return address on the stack of the idle task. - Fixups for the fallout of the nodeid <-> cpuid persistent mapping modifciations - Move the x86/MPX specific mm_struct member to the arch specific mm_context where it belongs - Cleanups for C89 struct initializers and useless function arguments" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/floppy: Use designated initializers x86/mpx: Move bd_addr to mm_context_t x86/mm: Drop unused argument 'removed' from sync_global_pgds() ACPI/NUMA: Do not map pxm to node when NUMA is turned off x86/acpi: Use proper macro for invalid node x86/smpboot: Prevent false positive out of bounds cpumask access warning x86/boot/64: Push correct start_cpu() return address x86/boot/64: Use 'push' instead of 'call' in start_cpu() x86/smpboot: Make logical package management more robust
2016-12-18x86/floppy: Use designated initializersKees Cook
Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161217213705.GA1248@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-17Merge branch 'kbuild' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning about missing CRCs (Nick Piggin) - asm-exports fix for LTO (Nicolas Pitre) - thin archives improvements (Nick Piggin) - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick Piggin) - genksyms support for __builtin_va_list keyword - misc minor fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: x86/kbuild: enable modversions for symbols exported from asm kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case scripts/kallsyms: remove last remnants of --page-offset option make use of make variable CURDIR instead of calling pwd kbuild: cmd_export_list: tighten the sed script kbuild: minor improvement for thin archives build kbuild: modpost warn if export version crc is missing kbuild: keep data tables through dead code elimination kbuild: improve linker compatibility with lib-ksyms.o build genksyms: Regenerate parser kbuild/genksyms: handle va_list type kbuild: thin archives for multi-y targets kbuild: kallsyms allow 3-pass generation if symbols size has changed
2016-12-17x86/mpx: Move bd_addr to mm_context_tMark Rutland
Currently bd_addr lives in mm_struct, which is otherwise architecture independent. Architecture-specific data is supposed to live within mm_context_t (itself contained in mm_struct). Other x86-specific context like the pkey accounting data lives in mm_context_t, and there's no readon the MPX data can't also live there. So as to keep the arch-specific data togather, and to set a good example for others, this patch moves bd_addr into x86's mm_context_t. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1481892055-24596-1-git-send-email-mark.rutland@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-16KVM: hyperv: fix locking of struct kvm_hv fieldsPaolo Bonzini
Introduce a new mutex to avoid an AB-BA deadlock between kvm->lock and vcpu->mutex. Protect accesses in kvm_hv_setup_tsc_page too, as suggested by Roman. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-15Merge tag 'trace-v4.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...
2016-12-15x86/mm: Drop unused argument 'removed' from sync_global_pgds()Kirill A. Shutemov
Since commit af2cf278ef4f ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()") there are no callers of sync_global_pgds() which set the 'removed' argument to 1. Remove the argument and the related conditionals in the function. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20161214234403.137556-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15x86/tsc: Force TSC_ADJUST register to value >= zeroThomas Gleixner
Roland reported that his DELL T5810 sports a value add BIOS which completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with random negative TSC_ADJUST values, different on all CPUs. That renders the TSC useless because the sycnchronization check fails. Roland tested the new TSC_ADJUST mechanism. While it manages to readjust the TSCs he needs to disable the TSC deadline timer, otherwise the machine just stops booting. Deeper investigation unearthed that the TSC deadline timer is sensitive to the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an interrupt storm caused by the TSC deadline timer. This does not make any sense and it's hard to imagine what kind of hardware wreckage is behind that misfeature, but it's reliably reproducible on other systems which have TSC_ADJUST and TSC deadline timer. While it would be understandable that a big enough negative value which moves the resulting TSC readout into the negative space could have the described effect, this happens even with a adjust value of -1, which keeps the TSC readout definitely in the positive space. The compare register for the TSC deadline timer is set to a positive value larger than the TSC, but despite not having reached the deadline the interrupt is raised immediately. If this happens on the boot CPU, then the machine dies silently because this setup happens before the NMI watchdog is armed. Further experiments showed that any other adjustment of TSC_ADJUST works as expected as long as it stays in the positive range. The direction of the adjustment has no influence either. See the lkml link for further analysis. Yet another proof for the theory that timers are designed by janitors and the underlying (obviously undocumented) mechanisms which allow BIOSes to wreckage them are considered a feature. Well done Intel - NOT! To address this wreckage add the following sanity measures: - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0 - If the TSC_ADJUST value on any cpu is negative, set it to 0 - Prevent the cross package synchronization mechanism from setting negative TSC_ADJUST values. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-15x86/tsc: Validate TSC_ADJUST after resumeThomas Gleixner
Some 'feature' BIOSes fiddle with the TSC_ADJUST register during suspend/resume which renders the TSC unusable. Add sanity checks into the resume path and restore the original value if it was adjusted. Reported-and-tested-by: Roland Scheidegger <rscheidegger_lists@hispeed.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Kevin Stanton <kevin.b.stanton@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Allen Hung <allen_hung@dell.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-14Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.10: API: - add skcipher walk interface - add asynchronous compression (acomp) interface - fix algif_aed AIO handling of zero buffer Algorithms: - fix unaligned access in poly1305 - fix DRBG output to large buffers Drivers: - add support for iMX6UL to caam - fix givenc descriptors (used by IPsec) in caam - accelerated SHA256/SHA512 for ARM64 from OpenSSL - add SSE CRCT10DIF and CRC32 to ARM/ARM64 - add AEAD support to Chelsio chcr - add Armada 8K support to omap-rng" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (148 commits) crypto: testmgr - fix overlap in chunked tests again crypto: arm/crc32 - accelerated support based on x86 SSE implementation crypto: arm64/crc32 - accelerated support based on x86 SSE implementation crypto: arm/crct10dif - port x86 SSE implementation to ARM crypto: arm64/crct10dif - port x86 SSE implementation to arm64 crypto: testmgr - add/enhance test cases for CRC-T10DIF crypto: testmgr - avoid overlap in chunked tests crypto: chcr - checking for IS_ERR() instead of NULL crypto: caam - check caam_emi_slow instead of re-lookup platform crypto: algif_aead - fix AIO handling of zero buffer crypto: aes-ce - Make aes_simd_algs static crypto: algif_skcipher - set error code when kcalloc fails crypto: caam - make aamalg_desc a proper module crypto: caam - pass key buffers with typesafe pointers crypto: arm64/aes-ce-ccm - Fix AEAD decryption length MAINTAINERS: add crypto headers to crypto entry crypt: doc - remove misleading mention of async API crypto: doc - fix header file name crypto: api - fix comment typo crypto: skcipher - Add separate walker for AEAD decryption ..
2016-12-13Merge tag 'for-linus-4.10-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Xen features and fixes for 4.10 These are some fixes, a move of some arm related headers to share them between arm and arm64 and a series introducing a helper to make code more readable. The most notable change is David stepping down as maintainer of the Xen hypervisor interface. This results in me sending you the pull requests for Xen related code from now on" * tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits) xen/balloon: Only mark a page as managed when it is released xenbus: fix deadlock on writes to /proc/xen/xenbus xen/scsifront: don't request a slot on the ring until request is ready xen/x86: Increase xen_e820_map to E820_X_MAX possible entries x86: Make E820_X_MAX unconditionally larger than E820MAX xen/pci: Bubble up error and fix description. xen: xenbus: set error code on failure xen: set error code on failures arm/xen: Use alloc_percpu rather than __alloc_percpu arm/arm64: xen: Move shared architecture headers to include/xen/arm xen/events: use xen_vcpu_id mapping for EVTCHNOP_status xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing xen-scsifront: Add a missing call to kfree MAINTAINERS: update XEN HYPERVISOR INTERFACE xenfs: Use proc_create_mount_point() to create /proc/xen xen-platform: use builtin_pci_driver xen-netback: fix error handling output xen: make use of xenbus_read_unsigned() in xenbus xen: make use of xenbus_read_unsigned() in xen-pciback xen: make use of xenbus_read_unsigned() in xen-fbfront ...
2016-12-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...
2016-12-14x86/kbuild: enable modversions for symbols exported from asmAdam Borowski
Commit 4efca4ed ("kbuild: modversions for EXPORT_SYMBOL() for asm") adds modversion support for symbols exported from asm files. Architectures must include C-style declarations for those symbols in asm/asm-prototypes.h in order for them to be versioned. Add these declarations for x86, and an architecture-independent file that can be used for common symbols. With f27c2f6 reverting 8ab2ae6 ("default exported asm symbols to zero") we produce a scary warning on x86, this commit fixes that. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Tested-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Michal Marek <mmarek@suse.com>
2016-12-13Merge tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This is the main pull request for drm for 4.10 kernel. New drivers: - ZTE VOU display driver (zxdrm) - Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson) - MXSFB support (mxsfb) Core: - Format handling has been reworked - Better atomic state debugging - drm_mm leak debugging - Atomic explicit fencing support - fbdev helper ops - Documentation updates - MST fbcon fixes Bridge: - Silicon Image SiI8620 driver Panel: - Add support for new simple panels i915: - GVT Device model - Better HDMI2.0 support on skylake - More watermark fixes - GPU idling rework for suspend/resume - DP Audio workarounds - Scheduler prep-work - Opregion CADL handling - GPU scheduler and priority boosting amdgfx/radeon: - Support for virtual devices - New VM manager for non-contig VRAM buffers - UVD powergating - SI register header cleanup - Cursor fixes - Powermanagement fixes nouveau: - Powermangement reworks for better voltage/clock changes - Atomic modesetting support - Displayport Multistream (MST) support. - GP102/104 hang and cursor fixes - GP106 support hisilicon: - hibmc support (BMC chip for aarch64 servers) armada: - add tracing support for overlay change - refactor plane support - de-midlayer the driver omapdrm: - Timing code cleanups rcar-du: - R8A7792/R8A7796 support - Misc fixes. sunxi: - A31 SoC display engine support imx-drm: - YUV format support - Cleanup plane atomic update mali-dp: - Misc fixes dw-hdmi: - Add support for HDMI i2c master controller tegra: - IOMMU support fixes - Error handling fixes tda998x: - Fix connector registration - Improved robustness - Fix infoframe/audio compliance virtio: - fix busid issues - allocate more vbufs qxl: - misc fixes and cleanups. vc4: - Fragment shader threading - ETC1 support - VEC (tv-out) support msm: - A5XX GPU support - Lots of atomic changes tilcdc: - Misc fixes and cleanups. etnaviv: - Fix dma-buf export path - DRAW_INSTANCED support - fix driver on i.MX6SX exynos: - HDMI refactoring fsl-dcu: - fbdev changes" * tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits) drm/nouveau/kms/nv50: fix atomic regression on original G80 drm/nouveau/bl: Do not register interface if Apple GMUX detected drm/nouveau/bl: Assign different names to interfaces drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2 drm/nouveau/ltc: protect clearing of comptags with mutex drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap drm/nouveau/core: recognise GP106 chipset drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode drm/nouveau/gr/gf100-: properly ack all FECS error interrupts drm/nouveau/fifo/gf100-: recover from host mmu faults drm: Add fake controlD* symlinks for backwards compat drm/vc4: Don't use drm_put_dev drm/vc4: Document VEC DT binding drm/vc4: Add support for the VEC (Video Encoder) IP drm: Add TV connector states to drm_connector_state drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum drm/vc4: Fix ->clock_select setting for the VEC encoder drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well drm/amdgpu: use pin rather than pin_restricted in a few cases ...
2016-12-12Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp hotplug updates from Thomas Gleixner: "This is the final round of converting the notifier mess to the state machine. The removal of the notifiers and the related infrastructure will happen around rc1, as there are conversions outstanding in other trees. The whole exercise removed about 2000 lines of code in total and in course of the conversion several dozen bugs got fixed. The new mechanism allows to test almost every hotplug step standalone, so usage sites can exercise all transitions extensively. There is more room for improvement, like integrating all the pointlessly different architecture mechanisms of synchronizing, setting cpus online etc into the core code" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) tracing/rb: Init the CPU mask on allocation soc/fsl/qbman: Convert to hotplug state machine soc/fsl/qbman: Convert to hotplug state machine zram: Convert to hotplug state machine KVM/PPC/Book3S HV: Convert to hotplug state machine arm64/cpuinfo: Convert to hotplug state machine arm64/cpuinfo: Make hotplug notifier symmetric mm/compaction: Convert to hotplug state machine iommu/vt-d: Convert to hotplug state machine mm/zswap: Convert pool to hotplug state machine mm/zswap: Convert dst-mem to hotplug state machine mm/zsmalloc: Convert to hotplug state machine mm/vmstat: Convert to hotplug state machine mm/vmstat: Avoid on each online CPU loops mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead() tracing/rb: Convert to hotplug state machine oprofile/nmi timer: Convert to hotplug state machine net/iucv: Use explicit clean up labels in iucv_init() x86/pci/amd-bus: Convert to hotplug state machine x86/oprofile/nmi: Convert to hotplug state machine ...
2016-12-12Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode update from Ingo Molnar: "The biggest change (by Borislav Petkov) is a thorough rewrite of the Intel microcode loader and its interactions with the core code. The biggest conceptual change is the decoupling of the microcode loading on boot and application processors (which load the microcode in different scenarios), so that both parse the input patches with as few assumptions as possible - this also fixes various kernel address space randomization bugs. (The AP side then goes on and caches the result to improve boot performance.) Since the AMD side already did this, this change also opened up the path towards more unification/simplification of the core microcode loading infrastructure: 10 files changed, 647 insertions(+), 940 deletions(-) which speaks for itself" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Bump driver version, update copyrights x86/microcode: Rework microcode loading x86/microcode/intel: Remove intel_lib.c x86/microcode/amd: Move private inlines to .c and mark local functions static x86/microcode: Collect CPU info on resume x86/microcode: Issue the debug printk on resume only on success x86/microcode/amd: Hand down the CPU family x86/microcode: Export the microcode cache linked list x86/microcode: Remove one #ifdef clause x86/microcode/intel: Simplify generic_load_microcode() x86/microcode: Move driver authors to CREDITS x86/microcode: Run the AP-loading routine only on the application processors
2016-12-12Merge branch 'x86-idle-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 idle updates from Ingo Molnar: "There were two bigger changes in this development cycle: - remove idle notifiers: 32 files changed, 74 insertions(+), 803 deletions(-) These notifiers were of questionable value and the main usecase, the i7300 driver, was essentially unmaintained and can be removed, plus modern power management concepts don't need the callback - so use this golden opportunity and get rid of this opaque and fragile callback from a latency sensitive code path. (Len Brown, Thomas Gleixner) - improve the AMD Erratum 400 workaround that used high overhead MSR polling in the idle loop (Borisla Petkov, Thomas Gleixner)" * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove empty idle.h header x86/amd: Simplify AMD E400 aware idle routine x86/amd: Check for the C1E bug post ACPI subsystem init x86/bugs: Separate AMD E400 erratum and C1E bug x86/cpufeature: Provide helper to set bugs bits x86/idle: Remove enter_idle(), exit_idle() x86: Remove x86_test_and_clear_bit_percpu() x86/idle: Remove is_idle flag x86/idle: Remove idle_notifier i7300_idle: Remove this driver
2016-12-12Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "The main changes in this cycle were: - do a large round of simplifications after all CPUs do 'eager' FPU context switching in v4.9: remove CR0 twiddling, remove leftover eager/lazy bts, etc (Andy Lutomirski) - more FPU code simplifications: remove struct fpu::counter, clarify nomenclature, remove unnecessary arguments/functions and better structure the code (Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Remove clts() x86/fpu: Remove stts() x86/fpu: Handle #NM without FPU emulation as an error x86/fpu, lguest: Remove CR0.TS support x86/fpu, kvm: Remove host CR0.TS manipulation x86/fpu: Remove irq_ts_save() and irq_ts_restore() x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs() x86/fpu: Get rid of two redundant clts() calls x86/fpu: Finish excising 'eagerfpu' x86/fpu: Split old_fpu & new_fpu handling into separate functions x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state() x86/fpu: Split old & new FPU code paths x86/fpu: Remove __fpregs_(de)activate() x86/fpu: Rename lazy restore functions to "register state valid" x86/fpu, kvm: Remove KVM vcpu->fpu_counter x86/fpu: Remove struct fpu::counter x86/fpu: Remove use_eager_fpu() x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction x86/fpu: Hard-disable lazy FPU mode x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code
2016-12-12Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Two cleanups in the LDT handling code, by Dan Carpenter and Thomas Gleixner" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Make all size computations unsigned x86/ldt: Make a size argument unsigned
2016-12-12Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "The main changes in this cycle were: - Makefile improvements (Paul Bolle) - KConfig cleanups to better separate 32-bit only, 64-bit only and generic feature enablement sections (Ingo Molnar)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove three unneeded genhdr-y entries x86/build: Don't use $(LINUXINCLUDE) twice x86/kconfig: Sort the 'config X86' selects alphabetically x86/kconfig: Clean up 32-bit compat options x86/kconfig: Clean up IA32_EMULATION select x86/kconfig, x86/pkeys: Move pkeys selects to X86_INTEL_MEMORY_PROTECTION_KEYS x86/kconfig: Move 64-bit only arch Kconfig selects to 'config X86_64' x86/kconfig: Move 32-bit only arch Kconfig selects to 'config X86_32'
2016-12-12Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...
2016-12-12Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Misc changes: - optimize (reduce) IRQ handler tracing overhead (Wanpeng Li) - clean up MSR helpers (Borislav Petkov) - fix build warning on some configs (Sebastian Andrzej Siewior)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Cleanup/streamline MSR helpers x86/apic: Prevent tracing on apic_msr_write_eoi() x86/msr: Add wrmsr_notrace() x86/apic: Get rid of "warning: 'acpi_ioapic_lock' defined but not used"
2016-12-12Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "The main changes in this development cycle were: - more AMD northbridge support work, mostly in preparation for Fam17h CPUs (Yazen Ghannam, Borislav Petkov) - cleanups/refactorings and fixes (Borislav Petkov, Tony Luck, Yinghai Lu)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Include the PPIN in MCE records when available x86/mce/AMD: Add system physical address translation for AMD Fam17h x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h x86/amd_nb: Add Fam17h Data Fabric as "Northbridge" x86/amd_nb: Make all exports EXPORT_SYMBOL_GPL x86/amd_nb: Make amd_northbridges internal to amd_nb.c x86/mce/AMD: Reset Threshold Limit after logging error x86/mce/AMD: Fix HWID_MCATYPE calculation by grouping arguments x86/MCE: Correct TSC timestamping of error records x86/RAS: Hide SMCA bank names x86/RAS: Rename smca_bank_names to smca_names x86/RAS: Simplify SMCA HWID descriptor struct x86/RAS: Simplify SMCA bank descriptor struct x86/MCE: Dump MCE to dmesg if no consumers x86/RAS: Add TSC timestamp to the injected MCE x86/MCE: Do not look at panic_on_oops in the severity grading
2016-12-12Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a notion of 'better cores', which the scheduler will prefer to schedule single threaded workloads on. (Tim Chen, Srinivas Pandruvada) - enhance the handling of asymmetric capacity CPUs further (Morten Rasmussen) - improve/fix load handling when moving tasks between task groups (Vincent Guittot) - simplify and clean up the cputime code (Stanislaw Gruszka) - improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent Guittot) - make struct kthread kmalloc()ed and related fixes (Oleg Nesterov) - add uaccess atomicity debugging (when using access_ok() in the wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra) - implement various fixes, cleanups and other enhancements (Daniel Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched/core: Use load_avg for selecting idlest group sched/core: Fix find_idlest_group() for fork kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() kthread: Don't use to_live_kthread() in kthread_[un]park() kthread: Don't use to_live_kthread() in kthread_stop() Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function" kthread: Make struct kthread kmalloc'ed x86/uaccess, sched/preempt: Verify access_ok() context sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> cpufreq/intel_pstate: Use CPPC to get max performance acpi/bus: Set _OSC for diverse core support acpi/bus: Enable HWP CPPC objects x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU x86/sysctl: Add sysctl for ITMT scheduling feature x86: Enable Intel Turbo Boost Max Technology 3.0 x86/topology: Define x86's arch_update_cpu_topology sched: Extend scheduler's asym packing sched/fair: Clean up the tunable parameter definitions ...
2016-12-12Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The tree got pretty big in this development cycle, but the net effect is pretty good: 115 files changed, 673 insertions(+), 1522 deletions(-) The main changes were: - Rework and generalize the mutex code to remove per arch mutex primitives. (Peter Zijlstra) - Add vCPU preemption support: add an interface to query the preemption status of vCPUs and use it in locking primitives - this optimizes paravirt performance. (Pan Xinhui, Juergen Gross, Christian Borntraeger) - Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to clean up and improve the s390 lock yielding machinery and its core kernel impact. (Christian Borntraeger) - Micro-optimize mutexes some more. (Waiman Long) - Reluctantly add the to-be-deprecated mutex_trylock_recursive() interface on a temporary basis, to give the DRM code more time to get rid of its locking hacks. Any other users will be NAK-ed on sight. (We turned off the deprecation warning for the time being to not pollute the build log.) (Peter Zijlstra) - Improve the rtmutex code a bit, in light of recent long lived bugs/races. (Thomas Gleixner) - Misc fixes, cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/paravirt: Fix bool return type for PVOP_CALL() x86/paravirt: Fix native_patch() locking/ww_mutex: Use relaxed atomics locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock() Documentation/virtual/kvm: Support the vCPU preemption check x86/xen: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check x86/kvm: Support the vCPU preemption check kvm: Introduce kvm_write_guest_offset_cached() locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests locking/spinlocks, s390: Implement vcpu_is_preempted(cpu) locking/core, powerpc: Implement vcpu_is_preempted(cpu) sched/core: Introduce the vcpu_is_preempted(cpu) interface sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q locking/core: Provide common cpu_relax_yield() definition locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily ...
2016-12-12Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this development cycle were: - Implement EFI dev path parser and other changes to fully support thunderbolt devices on Apple Macbooks (Lukas Wunner) - Add RNG seeding via the EFI stub, on ARM/arm64 (Ard Biesheuvel) - Expose EFI framebuffer configuration to user-space, to improve tooling (Peter Jones) - Misc fixes and cleanups (Ivan Hu, Wei Yongjun, Yisheng Xie, Dan Carpenter, Roy Franz)" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Make efi_random_alloc() allocate below 4 GB on 32-bit thunderbolt: Compile on x86 only thunderbolt, efi: Fix Kconfig dependencies harder thunderbolt, efi: Fix Kconfig dependencies thunderbolt: Use Device ROM retrieved from EFI x86/efi: Retrieve and assign Apple device properties efi: Allow bitness-agnostic protocol calls efi: Add device path parser efi/arm*/libstub: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table efi/libstub: Add random.c to ARM build efi: Add support for seeding the RNG from a UEFI config table MAINTAINERS: Add ARM and arm64 EFI specific files to EFI subsystem efi/libstub: Fix allocation size calculations efi/efivar_ssdt_load: Don't return success on allocation failure efifb: Show framebuffer layout as device attributes efi/efi_test: Use memdup_user() as a cleanup efi/efi_test: Fix uninitialized variable 'rv' efi/efi_test: Fix uninitialized variable 'datasize' efi/arm*: Fix efi_init() error handling efi: Remove unused include of <linux/version.h>
2016-12-11Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11x86/paravirt: Fix bool return type for PVOP_CALL()Peter Zijlstra
Commit: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") introduced a paravirt op with bool return type [*] It turns out that the PVOP_CALL*() macros miscompile when rettype is bool. Code that looked like: 83 ef 01 sub $0x1,%edi ff 15 32 a0 d8 00 callq *0xd8a032(%rip) # ffffffff81e28120 <pv_lock_ops+0x20> 84 c0 test %al,%al ended up looking like so after PVOP_CALL1() was applied: 83 ef 01 sub $0x1,%edi 48 63 ff movslq %edi,%rdi ff 14 25 20 81 e2 81 callq *0xffffffff81e28120 48 85 c0 test %rax,%rax Note how it tests the whole of %rax, even though a typical bool return function only sets %al, like: 0f 95 c0 setne %al c3 retq This is because ____PVOP_CALL() does: __ret = (rettype)__eax; and while regular integer type casts truncate the result, a cast to bool tests for any !0 value. Fix this by explicitly truncating to sizeof(rettype) before casting. [*] The actual bug should've been exposed in commit: 446f3dc8cc0a ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests") but that didn't properly implement the paravirt call. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3cded4179481 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()") Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-10x86/ldt: Make all size computations unsignedThomas Gleixner
ldt->size can never be negative. The helper functions take 'unsigned int' arguments which are assigned from ldt->size. The related user space user_desc struct member entry_number is unsigned as well. But ldt->size itself and a few local variables which are related to ldt->size are type 'int' which makes no sense whatsoever and results in typecasts which make the eyes bleed. Clean it up and convert everything which is related to ldt->size to unsigned it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Carpenter <dan.carpenter@oracle.com>