summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-02-09x86/fpu: Fix FNSAVE usage in eagerfpu modeAndy Lutomirski
In eager fpu mode, having deactivated FPU without immediately reloading some other context is illegal. Therefore, to recover from FNSAVE, we can't just deactivate the state -- we need to reload it if we're not actively context switching. We had this wrong in fpu__save() and fpu__copy(). Fix both. __kernel_fpu_begin() was fine -- add a comment. This fixes a warning triggerable with nofxsr eagerfpu=on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/60662444e13c76f06e23c15c5dcdba31b4ac3d67.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/fpu: Fix math emulation in eager fpu modeAndy Lutomirski
Systems without an FPU are generally old and therefore use lazy FPU switching. Unsurprisingly, math emulation in eager FPU mode is a bit buggy. Fix it. There were two bugs involving kernel code trying to use the FPU registers in eager mode even if they didn't exist and one BUG_ON() that was incorrect. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Honour passed pgprot in track_pfn_insert() and track_pfn_remap()Matthew Wilcox
track_pfn_insert() overwrites the pgprot that is passed in with a value based on the VMA's page_prot. This is a problem for people trying to do clever things with the new vm_insert_pfn_prot() as it will simply overwrite the passed protection flags. If we use the current value of the pgprot as the base, then it will behave as people are expecting. Also fix track_pfn_remap() in the same way. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1453742717-10326-2-git-send-email-matthew.r.wilcox@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/boot: Use proper array element type in memset() size calculationAlexander Kuleshov
I changed open coded zeroing loops to explicit memset()s in the following commit: 5e9ebbd87a99 ("x86/boot: Micro-optimize reset_early_page_tables()") The base for the size argument of memset was sizeof(pud_p/pmd_p), which are pointers - but the initialized array has pud_t/pmd_t elements. Luckily the two types had the same size, so this did not result in any runtime misbehavior. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1455025494-4063-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09locking/atomics: Update comment about READ_ONCE() and structuresKonrad Rzeszutek Wilk
The comment is out of data. Also point out the performance drawback of the barrier();__builtin_memcpy(); barrier() followed by another copy from stack (__u) to lvalue; Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453757600-11441-1-git-send-email-konrad.wilk@oracle.com [ Made it a bit more readable. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09sched/numa: Spread memory according to CPU and memory useRik van Riel
The pseudo-interleaving in NUMA placement has a fundamental problem: using hard usage thresholds to spread memory equally between nodes can prevent workloads from converging, or keep memory "trapped" on nodes where the workload is barely running any more. In order for workloads to properly converge, the memory migration should not be stopped when nodes reach parity, but instead be distributed according to how heavily memory is used from each node. This way memory migration and task migration reinforce each other, instead of one putting the brakes on the other. Remove the hard thresholds from the pseudo-interleaving code, and instead use a more gradual policy on memory placement. This also seems to improve convergence of workloads that do not run flat out, but sleep in between bursts of activity. We still want to slow down NUMA scanning and migration once a workload has settled on a few actively used nodes, so keep the 3/4 hysteresis in place. Keep track of whether a workload is actively running on multiple nodes, so task_numa_migrate does a full scan of the system for better task placement. In the case of running 3 SPECjbb2005 instances on a 4 node system, this code seems to result in fairer distribution of memory between nodes, with more memory bandwidth for each instance. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: mgorman@suse.de Link: http://lkml.kernel.org/r/20160125170739.2fc9a641@annuminas.surriel.com [ Minor readability tweaks. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()Andy Lutomirski
DMI cacheability is very confused on x86. dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO, which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached. Don't ask me why this makes any sense. dmi_remap() uses ioremap(), which requests an uncached mapping. However, on non-EFI systems, the DMI data generally lives between 0xf0000 and 0x100000, which is in the legacy ISA range, which triggers a special case in the PAT code that overrides the cache mode requested by ioremap() and forces a WB mapping. On a UEFI boot, however, the DMI table can live at any physical address. On my laptop, it's around 0x77dd0000. That's nowhere near the legacy ISA range, so the ioremap() implicit uncached type is honored and we end up with a UC- mapping. UC- is a very, very slow way to read from main memory, so dmi_walk() is likely to take much longer than necessary. Given that, even on UEFI, we do early cached DMI reads, it seems safe to just ask for cached access. Switch to ioremap_cache(). I haven't tried to benchmark this, but I'd guess it saves several milliseconds of boot time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean Delvare <jdelvare@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: If INVPCID is available, use it to flush global mappingsAndy Lutomirski
On my Skylake laptop, INVPCID function 2 (flush absolutely everything) takes about 376ns, whereas saving flags, twiddling CR4.PGE to flush global mappings, and restoring flags takes about 539ns. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Add a 'noinvpcid' boot option to turn off INVPCIDAndy Lutomirski
This adds a chicken bit to turn off INVPCID in case something goes wrong. It's an early_param() because we do TLB flushes before we parse __setup() parameters. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Add INVPCID helpersAndy Lutomirski
This adds helpers for each of the four currently-specified INVPCID modes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/kasan: Write protect kasan zero shadowAndrey Ryabinin
After kasan_init() executed, no one is allowed to write to kasan_zero_page, so write protect it. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/kasan: Clear kasan_zero_page after TLB flushAndrey Ryabinin
Currently we clear kasan_zero_page before __flush_tlb_all(). This works with current implementation of native_flush_tlb[_global]() because it doesn't cause do any writes to kasan shadow memory. But any subtle change made in native_flush_tlb*() could break this. Also current code seems doesn't work for paravirt guests (lguest). Only after the TLB flush we can be sure that kasan_zero_page is not used as early shadow anymore (instrumented code will not write to it). So it should cleared it only after the TLB flush. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09flow_dissector: Fix unaligned access in __skb_flow_dissector when used by ↵Alexander Duyck
eth_get_headlen This patch fixes an issue with unaligned accesses when using eth_get_headlen on a page that was DMA aligned instead of being IP aligned. The fact is when trying to check the length we don't need to be looking at the flow label so we can reorder the checks to first check if we are supposed to gather the flow label and then make the call to actually get it. v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can cause us to check for the flow label. Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09ALSA: timer: Fix race at concurrent readsTakashi Iwai
snd_timer_user_read() has a potential race among parallel reads, as qhead and qused are updated outside the critical section due to copy_to_user() calls. Move them into the critical section, and also sanitize the relevant code a bit. Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-09ALSA: firewire-digi00x: Drop bogus const type qualifier on dot_scrt()Geert Uytterhoeven
sound/firewire/digi00x/amdtp-dot.c:67: warning: type qualifiers ignored on function return type Drop the bogus "const" type qualifier on the return type of dot_scrt() to fix this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-09ALSA: hda - Fix bad dereference of jack objectTakashi Iwai
The hda_jack_tbl entries are managed by snd_array for allowing multiple jacks. It's good per se, but the problem is that struct hda_jack_callback keeps the hda_jack_tbl pointer. Since snd_array doesn't preserve each pointer at resizing the array, we can't keep the original pointer but have to deduce the pointer at each time via snd_array_entry() instead. Actually, this resulted in the deference to the wrong pointer on codecs that have many pins such as CS4208. This patch replaces the pointer to the NID value as the search key. As an unexpected good side effect, this even simplifies the code, as only NID is needed in most cases. Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-09locking/lockdep: Eliminate lockdep_init()Andrey Ryabinin
Lockdep is initialized at compile time now. Get rid of lockdep_init(). Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Krinkin <krinkin.m.u@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: mm-commits@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09locking/lockdep: Convert hash tables to hlistsAndrew Morton
Mike said: : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i.e. : kernel with CONFIG_UBSAN_ALIGNMENT=y fails to load without even any error : message. : : The problem is that ubsan callbacks use spinlocks and might be called : before lockdep is initialized. Particularly this line in the : reserve_ebda_region function causes problem: : : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); : : If i put lockdep_init() before reserve_ebda_region call in : x86_64_start_reservations kernel loads well. Fix this ordering issue permanently: change lockdep so that it uses hlists for the hash tables. Unlike a list_head, an hlist_head is in its initialized state when it is all-zeroes, so lockdep is ready for operation immediately upon boot - lockdep_init() need not have run. The patch will also save some memory. Probably lockdep_init() and lockdep_initialized can be done away with now. Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: mm-commits@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09Merge branch 'locking/urgent' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09ALSA: timer: Fix race between stop and interruptTakashi Iwai
A slave timer element also unlinks at snd_timer_stop() but it takes only slave_active_lock. When a slave is assigned to a master, however, this may become a race against the master's interrupt handling, eventually resulting in a list corruption. The actual bug could be seen with a syzkaller fuzzer test case in BugLink below. As a fix, we need to take timeri->timer->lock when timer isn't NULL, i.e. assigned to a master, while the assignment to a master itself is protected by slave_active_lock. BugLink: http://lkml.kernel.org/r/CACT4Y+Y_Bm+7epAb=8Wi=AaWd+DYS7qawX52qxdCfOfY49vozQ@mail.gmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-09sched/debug: Make schedstats a runtime tunable that is disabled by defaultMel Gorman
schedstats is very useful during debugging and performance tuning but it incurs overhead to calculate the stats. As such, even though it can be disabled at build time, it is often enabled as the information is useful. This patch adds a kernel command-line and sysctl tunable to enable or disable schedstats on demand (when it's built in). It is disabled by default as someone who knows they need it can also learn to enable it when necessary. The benefits are dependent on how scheduler-intensive the workload is. If it is then the patch reduces the number of cycles spent calculating the stats with a small benefit from reducing the cache footprint of the scheduler. These measurements were taken from a 48-core 2-socket machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a single socket machine 8-core machine with Intel i7-3770 processors. netperf-tcp 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%) Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%) Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%) Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%) Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%) Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%) Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%) Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%) Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%) Small gains here, UDP_STREAM showed nothing intresting and neither did the TCP_RR tests. The gains on the 8-core machine were very similar. tbench4 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%) Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%) Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%) Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%) Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%) Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%) Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%) Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%) Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%) Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%) Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%) Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%) Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%) Small gains of 2-4% at low thread counts and otherwise flat. The gains on the 8-core machine were slightly different tbench4 on 8-core i7-3770 single socket machine Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%) Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%) Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%) Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%) Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%) Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%) Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%) Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%) Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%) Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%) In constract, this shows a relatively steady 2-3% gain at higher thread counts. Due to the nature of the patch and the type of workload, it's not a surprise that the result will depend on the CPU used. hackbench-pipes 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%) Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%) Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%) Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%) Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%) Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%) Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%) Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%) Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%) Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%) Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%) Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%) Some small gains and losses and while the variance data is not included, it's close to the noise. The UMA machine did not show anything particularly different pipetest 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v2r2 Min Time 4.13 ( 0.00%) 3.99 ( 3.39%) 1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%) 2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%) 3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%) Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%) Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%) Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%) Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%) Max Time 4.93 ( 0.00%) 4.83 ( 2.03%) Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%) Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%) Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%) Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%) Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%) Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%) Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%) Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%) Small improvement and similar gains were seen on the UMA machine. The gain is small but it stands to reason that doing less work in the scheduler is a good thing. The downside is that the lack of schedstats and tracepoints may be surprising to experts doing performance analysis until they find the existence of the schedstats= parameter or schedstats sysctl. It will be automatically activated for latencytop and sleep profiling to alleviate the problem. For tracepoints, there is a simple warning as it's not safe to activate schedstats in the context when it's known the tracepoint may be wanted but is unavailable. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode: Document builtin microcode loading methodBorislav Petkov
Add some text and an example to Documentation/x86/early-microcode.txt explaining how to build in microcode. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-18-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/AMD: Issue microcode updated message laterBorislav Petkov
Before this, we issued this message from save_microcode_in_initrd() which is called from free_initrd_mem(), i.e., only when we have an initrd enabled. However, we can update from builtin microcode too but then we don't issue the update message. Fix it by issuing that message on the generic driver init path. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-17-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Cleanup get_matching_model_microcode()Borislav Petkov
Reflow arguments, sort local variables in reverse christmas tree, kill "out" label. No functionality change. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-16-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Remove unused arg of get_matching_model_microcode()Borislav Petkov
@cpu is unused, kill it. No functionality change. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-15-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Rename mc_saved_in_initrdBorislav Petkov
Rename it to mc_tmp_ptrs to denote better what it is - a temporary array for saving pointers to microcode blobs. And "initrd" is not accurate anymore since initrd is not the only source for early microcode. Therefore, rename copy_initrd_ptrs() to copy_ptrs() simply and "initrd_start" to "offset". And then do the following convention: the global variable is called "mc_tmp_ptrs" and the local function arguments "mc_ptrs" for differentiation. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-14-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Use *wrmsrl variantsBorislav Petkov
... and drop the 32-bit casting games which we had to do at the time because wrmsr() was unforgiving then, see c3fd0bd5e19a from the full history tree: commit c3fd0bd5e19aaff9cdd104edff136a2023db657e Author: Linus Torvalds <torvalds@home.osdl.org> Date: Tue Feb 17 23:23:41 2004 -0800 Fix up the microcode update on regular 32-bit x86. Our wrmsr() is a bit unforgiving and really doesn't like 64-bit values. ... Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-13-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Cleanup apply_microcode_intel()Borislav Petkov
Get rid of local variable cpu_num as it is equal to @cpu now. Deref cpu_data() only when it is really needed at the end. No functionality change. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-12-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Move the BUG_ON up and turn it into WARN_ONBorislav Petkov
If we're going to BUG_ON() because we're running on the wrong CPU, we better do it as the first thing we do when entering that function. And also, turn it into a WARN_ON() because it is not worth to panic the system if we apply the microcode on the wrong CPU - we're simply going to exit early. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Rename mc_intel variable to mcBorislav Petkov
Well, it is apparent what it points to - microcode. And since it is the intel loader, no need for the "_intel" suffix. Use "!" for the 0/NULL checks, while at it. No functionality change. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-10-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Rename mc_saved_count to num_savedBorislav Petkov
It is shorter and easier on the eyes. Change the "== 0" tests to "!..." while at it. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Rename local variables of type struct mc_saved_dataBorislav Petkov
So it is always a head-twister when trying to stare at code which has a bunch of struct mc_saved_data *mc_saved_data; local function variables *and* a global mc_saved_data of the same name. Rename all locals to "mcs" to differentiate from the global one. No functionality change. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/AMD: Drop redundant printk prefixBorislav Petkov
It is supplied by pr_fmt already. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode: Issue update message only onceBorislav Petkov
This is especially annoying on large boxes: x86: Booting SMP configuration: .... node #0, CPUs: #1 microcode: CPU1 microcode updated early to revision 0x428, date = 2014-05-29 #2 microcode: CPU2 microcode updated early to revision 0x428, date = 2014-05-29 #3 ... so issue the update message only once. $ grep microcode /proc/cpuinfo shows whether every core got updated properly. Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode: Remove an unneeded NULL checkDan Carpenter
"uci" is an element of the ucode_cpu_info[] array, it can't be NULL. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/1454499225-21544-5-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/20140120103046.GC14233@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode: Remove redundant __setup() param parsingBorislav Petkov
We do parse for the disable microcode loader chicken bit very early. After the driver merge, the __setup() param parsing method is not needed anymore so get rid of it. In addition, fix a compiler warning from an old SLES11 gcc (4.3.4) reported by Jan Beulich <jbeulich@suse.com>: arch/x86/kernel/cpu/microcode/core.c: In function ‘load_ucode_bsp’: arch/x86/kernel/cpu/microcode/core.c:96: warning: array subscript is above array bounds Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode/intel: Make early loader look for builtin microcode tooBorislav Petkov
Set the initrd @start depending on the presence of an initrd. Otherwise, builtin microcode loading doesn't work as the start is wrong and we're using it to compute offset to the microcode blobs. Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.4 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/microcode: Untangle from BLK_DEV_INITRDBorislav Petkov
Thomas Voegtle reported that doing oldconfig with a .config which has CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the microcode loading mechanism from being built. So untangle it from the BLK_DEV_INITRD dependency so that oldconfig doesn't turn it off and add an explanatory text to its Kconfig help what the supported methods for supplying microcode are. Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.4 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09of: of_mdio: Add marvell, 88e1145 to whitelist of PHY compatibilities.Aaro Koskinen
Commit ae461131960b ("of: of_mdio: Add a whitelist of PHY compatibilities.") missed one compatible string used in in-tree DTBs: in OCTEON, for selected boards, the kernel DTB pruning code will overwrite the DTB compatible string with "marvell,88e1145", which is missing from the whitelist. Add it. The patch fixes broken networking on EdgeRouter Lite. Fixes: ae461131960b ("of: of_mdio: Add a whitelist of PHY compatibilities.") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09ARM: dts: orion5x: fix the missing mtd flash on linkstation lswtglRoger Shimizu
MTD flash stores u-boot and u-boot environment on linkstation lswtgl. The latter one can be easily read/write by u-boot-tools package in Debian. Fixes: dc57844a736f ("ARM: dts: orion5x: add buffalo linkstation ls-wtgl") Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
2016-02-09ARM: dts: kirkwood: use unique machine name for ds112Heinrich Schuchardt
Downstream packages like Debian flash-kernel use /proc/device-tree/model to determine which dtb file to install. Hence each dts in the Linux kernel should provide a unique model identifier. Commit 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS devices") created the new files kirkwood-ds111.dts and kirkwood-ds112.dts using the same model identifier. This patch provides a unique model identifier for the Synology DiskStation DS112. Fixes: 2d0a7addbd10 ("ARM: Kirkwood: Add support for many Synology NAS devices") Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
2016-02-09kprobes: Optimize hot path by using percpu counter to collect 'nhit' statisticsMartin KaFai Lau
When doing ebpf+kprobe on some hot TCP functions (e.g. tcp_rcv_established), the kprobe_dispatcher() function shows up in 'perf report'. In kprobe_dispatcher(), there is a lot of cache bouncing on 'tk->nhit++'. 'tk->nhit' and 'tk->tp.flags' also share the same cacheline. perf report (cycles:pp): 8.30% ipv4_dst_check 4.74% copy_user_enhanced_fast_string 3.93% dst_release 2.80% tcp_v4_rcv 2.31% queued_spin_lock_slowpath 2.30% _raw_spin_lock 1.88% mlx4_en_process_rx_cq 1.84% eth_get_headlen 1.81% ip_rcv_finish ~~~~ 1.71% kprobe_dispatcher ~~~~ 1.55% mlx4_en_xmit 1.09% __probe_kernel_read perf report after patch: 9.15% ipv4_dst_check 5.00% copy_user_enhanced_fast_string 4.12% dst_release 2.96% tcp_v4_rcv 2.50% _raw_spin_lock 2.39% queued_spin_lock_slowpath 2.11% eth_get_headlen 2.03% mlx4_en_process_rx_cq 1.69% mlx4_en_xmit 1.19% ip_rcv_finish 1.12% __probe_kernel_read 1.02% ehci_hcd_cleanup Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Kernel Team <kernel-team@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454531308-2441898-1-git-send-email-kafai@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09locking/lockdep: Fix stack trace caching logicDmitry Vyukov
check_prev_add() caches saved stack trace in static trace variable to avoid duplicate save_trace() calls in dependencies involving trylocks. But that caching logic contains a bug. We may not save trace on first iteration due to early return from check_prev_add(). Then on the second iteration when we actually need the trace we don't save it because we think that we've already saved it. Let check_prev_add() itself control when stack is saved. There is another bug. Trace variable is protected by graph lock. But we can temporary release graph lock during printing. Fix this by invalidating cached stack trace when we release graph lock. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: glider@google.com Cc: kcc@google.com Cc: peter@hurleysoftware.com Cc: sasha.levin@oracle.com Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09selinux: nlmsgtab: add SOCK_DESTROY to the netlink mapping tablesLorenzo Colitti
Without this, using SOCK_DESTROY in enforcing mode results in: SELinux: unrecognized netlink message type=21 for sclass=32 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09sctp: translate network order to host order when users get a hmacidXin Long
Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") corrected the hmacid byte-order when setting a hmacid. but the same issue also exists on getting a hmacid. We fix it by changing hmacids to host order when users get them with getsockopt. Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09enic: increment devcmd2 result ring in case of timeoutSandeep Pillai
Firmware posts the devcmd result in result ring. In case of timeout, driver does not increment the current result pointer and firmware could post the result after timeout has occurred. During next devcmd, driver would be reading the result of previous devcmd. Fix this by incrementing result even in case of timeout. Fixes: 373fb0873d43 ("enic: add devcmd2") Signed-off-by: Sandeep Pillai <sanpilla@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segsSiva Reddy Kallam
tg3_tso_bug() can hit a condition where the entire tx ring is not big enough to segment the GSO packet. For example, if MSS is very small, gso_segs can exceed the tx ring size. When we hit the condition, it will cause tx timeout. tg3_tso_bug() is called to handle TSO and DMA hardware bugs. For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet. For DMA bugs, we can still fall back to linearize the SKB and let the hardware transmit the TSO packet. This patch adds a function tg3_tso_bug_gso_check() to check if there are enough tx descriptors for GSO before calling tg3_tso_bug(). The caller will then handle the error appropriately - drop or lineraize the SKB. v2: Corrected patch description to avoid confusion. Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Acked-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible fixes: - Handle spaces in file names obtained from /proc/pid/maps (Marcin Ślusarz) New features: - Improved support for Java, using the JVMTI agent library to do jitdumps that then will be inserted in synthesized PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized ELF files stored in ~/.debug and keyed with build-ids, to allow symbol resolution and even annotation with source line info, see the changeset comments to see how to use it (Stephane Eranian) Documentation changes: - Document mmore variables in the 'perf config' man page (Taeung Song) Infrastructure changes: - Improve a bit the 'make -C tools/perf build-test' output (Arnaldo Carvalho de Melo) - Do 'build-test' in parallel, using 'make -j' (Arnaldo Carvalho de Melo) - Fix handling of 'clean' in multi-target make invokations for parallell builds (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/asm/bitops: Force inlining of test_and_set_bit and friendsDenys Vlasenko
Sometimes GCC mysteriously doesn't inline very small functions we expect to be inlined, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Arguably, GCC should do better, but GCC people aren't willing to invest time into it and are asking to use __always_inline instead. With this .config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os here's an example of functions getting deinlined many times: test_and_set_bit (166 copies, ~1260 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f ab 3e lock bts %rdi,(%rsi) 72 04 jb <test_and_set_bit+0xf> 31 c0 xor %eax,%eax eb 05 jmp <test_and_set_bit+0x14> b8 01 00 00 00 mov $0x1,%eax 5d pop %rbp c3 retq test_and_clear_bit (124 copies, ~1000 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f b3 3e lock btr %rdi,(%rsi) 72 04 jb <test_and_clear_bit+0xf> 31 c0 xor %eax,%eax eb 05 jmp <test_and_clear_bit+0x14> b8 01 00 00 00 mov $0x1,%eax 5d pop %rbp c3 retq change_bit (3 copies, 8 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f bb 3e lock btc %rdi,(%rsi) 5d pop %rbp c3 retq clear_bit_unlock (2 copies, 11 calls) 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 48 0f b3 3e lock btr %rdi,(%rsi) 5d pop %rbp c3 retq This patch works it around via s/inline/__always_inline/. Code size decrease by ~13.5k after the patch: text data bss dec filename 92110727 20826144 36417536 149354407 vmlinux.before 92097234 20826176 36417536 149340946 vmlinux.after Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Graf <tgraf@suug.ch> Link: http://lkml.kernel.org/r/1454881887-1367-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09net:Add sysctl_max_skb_fragsHans Westgaard Ry
Devices may have limits on the number of fragments in an skb they support. Current codebase uses a constant as maximum for number of fragments one skb can hold and use. When enabling scatter/gather and running traffic with many small messages the codebase uses the maximum number of fragments and may thereby violate the max for certain devices. The patch introduces a global variable as max number of fragments. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>