summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-30ARC: module: support R_ARC_32_PCREL relocationVineet Gupta
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30arc: perf: Enable generic "cache-references" and "cache-misses" eventsAlexey Brodkin
We used to live with PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_REFERENCES not specified on ARC. Those events are actually aliases to 2 cache events that we do support and so this change sets "cache-reference" and "cache-misses" events in the same way as "L1-dcache-loads" and L1-dcache-load-misses. And while at it adding debug info for cache events as well as doing a subtle fix in HW events debug info - config value is much better represented by hex so we may see not only event index but as well other control bits set (if they exist). Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARC: [plat-eznps] add missing atomic_fetch_xxx operationsNoam Camus
Build brekeage since last changes to generic atomic operations. Added couple of missing macros which are now mandatory Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructionsVineet Gupta
ARCv2 ISA provides 64-bit exclusive load/stores so use them to implement the 64-bit atomics and elide the spinlock based generic 64-bit atomics boot tested with atomic64 self-test (and GOD bless the person who wrote them, I realized my inline assmebly is sloppy as hell) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: Support dynamic peripheral address space in HS38 rel 3.0 coresVineet Gupta
HS release 3.0 provides for even more flexibility in specifying the volatile address space for mapping peripherals. With HS 2.1 @start was made flexible / programmable - with HS 3.0 even @end can be setup (vs. fixed to 0xFFFF_FFFF before). So add code to reflect that and while at it remove an unused struct defintion Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: identify HS38 rel 3.0 coresVineet Gupta
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30ARCv2: Add support for ZeBu Emulation platform for HS coresVineet Gupta
The cool thing is that same kernel image can run on - nsim OSCI simulation platform - SDPlite FPGA setups Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30arc: Add "model" properly in device tree description of all boardsAlexey Brodkin
As it was discussed quite some time ago (see https://lkml.org/lkml/2015/11/5/862) it's a good practice to add "model" property in .dts. Moreover as per ePAPR "model" property is required and should look like "manufacturer,model" so we do here. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Jonas Gorski <jonas.gorski@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robh@kernel.org> Cc: Christian Ruppert <christian.ruppert@alitech.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30MAINTAINERS: Switch to kernel.org email address for Javi MerinoJavi Merino
Change my email address to my kernel.org account instead of the ARM one. Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30x86/entry/64: Fix context tracking state warning when load_gs_index failsWanpeng Li
This warning: WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50 CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 enter_from_user_mode+0x32/0x50 error_entry+0x6d/0xc0 ? general_protection+0x12/0x30 ? native_load_gs_index+0xd/0x20 ? do_set_thread_area+0x19c/0x1f0 SyS_set_thread_area+0x24/0x30 do_int80_syscall_32+0x7c/0x220 entry_INT80_compat+0x38/0x50 ... can be reproduced by running the GS testcase of the ldt_gdt test unit in the x86 selftests. do_int80_syscall_32() will call enter_form_user_mode() to convert context tracking state from user state to kernel state. The load_gs_index() call can fail with user gsbase, gsbase will be fixed up and proceed if this happen. However, enter_from_user_mode() will be called again in the fixed up path though it is context tracking kernel state currently. This patch fixes it by just fixing up gsbase and telling lockdep that IRQs are off once load_gs_index() failed with user gsbase. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUIDAndy Lutomirski
Otherwise arch_task_struct_size == 0 and we die. While we're at it, set X86_FEATURE_ALWAYS, too. Reported-by: David Saggiorato <david@saggiorato.net> Tested-by: David Saggiorato <david@saggiorato.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86") Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86/asm: Get rid of __read_cr4_safe()Andy Lutomirski
We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30Merge branch 'x86/urgent' into x86/asmThomas Gleixner
Get the cr4 fixes so we can apply the final cleanup
2016-09-30x86/vdso: Fix building on big endian hostSegher Boessenkool
We need to call GET_LE to read hdr->e_type. Fixes: 57f90c3dfc75 ("x86/vdso: Error out if the vDSO isn't a valid DSO") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: linux-next@vger.kernel.org Link: http://lkml.kernel.org/r/20160929193442.GA16617@gate.crashing.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30x86/boot: Fix another __read_cr4() case on 486Andy Lutomirski
The condition for reading CR4 was wrong: there are some CPUs with CPUID but not CR4. Rather than trying to make the condition exact, use __read_cr4_safe(). Fixes: 18bc7bd523e0 ("x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly") Reported-by: david@saggiorato.net Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/8c453a61c4f44ab6ff43c29780ba04835234d2e5.1475178369.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-30sched/irqtime: Consolidate irqtime flushing codeFrederic Weisbecker
The code performing irqtime nsecs stats flushing to kcpustat is roughly the same for hardirq and softirq. So lets consolidate that common code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-6-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/irqtime: Consolidate accounting synchronization with u64_stats APIFrederic Weisbecker
The irqtime accounting currently implement its own ad hoc implementation of u64_stats API. Lets rather consolidate it with the appropriate library. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30u64_stats: Introduce IRQs disabled helpersFrederic Weisbecker
Introduce light versions of u64_stats helpers for context where either preempt or IRQs are disabled. This way we can make this library usable by scheduler irqtime accounting which currenty implement its ad-hoc version. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/irqtime: Remove needless IRQs disablement on kcpustat updateFrederic Weisbecker
The callers of the functions performing irqtime kcpustat updates have IRQS disabled, no need to disable them again. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/irqtime: No need for preempt-safe accessorsFrederic Weisbecker
We can safely use the preempt-unsafe accessors for irqtime when we flush its counters to kcpustat as IRQs are disabled at this time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1474849761-12678-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/fair: Fix min_vruntime trackingPeter Zijlstra
While going through enqueue/dequeue to review the movement of set_curr_task() I noticed that the (2nd) update_min_vruntime() call in dequeue_entity() is suspect. It turns out, its actually wrong because it will consider cfs_rq->curr, which could be the entry we just normalized. This mixes different vruntime forms and leads to fail. The purpose of the second update_min_vruntime() is to move min_vruntime forward if the entity we just removed is the one that was holding it back; _except_ for the DEQUEUE_SAVE case, because then we know its a temporary removal and it will come back. However, since we do put_prev_task() _after_ dequeue(), cfs_rq->curr will still be set (and per the above, can be tranformed into a different unit), so update_min_vruntime() should also consider curr->on_rq. This also fixes another corner case where the enqueue (which also does update_curr()->update_min_vruntime()) happens on the rq->lock break in schedule(), between dequeue and put_prev_task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 1e876231785d ("sched: Fix ->min_vruntime calculation in dequeue_entity()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/debug: Add SCHED_WARN_ON()Peter Zijlstra
Provide SCHED_WARN_ON as wrapper for WARN_ON_ONCE() to avoid CONFIG_SCHED_DEBUG wrappery. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Fix set_user_nice()Peter Zijlstra
Almost all scheduler functions update state with the following pattern: if (queued) dequeue_task(rq, p, DEQUEUE_SAVE); if (running) put_prev_task(rq, p); /* update state */ if (queued) enqueue_task(rq, p, ENQUEUE_RESTORE); if (running) set_curr_task(rq, p); set_user_nice() however misses the running part, cure this. This was found by asserting we never enqueue 'current'. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/fair: Introduce set_curr_task() helperPeter Zijlstra
Now that the ia64 only set_curr_task() symbol is gone, provide a helper just like put_prev_task(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core, ia64: Rename set_curr_task()Peter Zijlstra
Rename the ia64 only set_curr_task() function to free up the name. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Fix incorrect utilization accounting when switching to fair classVincent Guittot
When a task switches to fair scheduling class, the period between now and the last update of its utilization is accounted as running time whatever happened during this period. This incorrect accounting applies to the task and also to the task group branch. When changing the property of a running task like its list of allowed CPUs or its scheduling class, we follow the sequence: - dequeue task - put task - change the property - set task as current task - enqueue task The end of the sequence doesn't follow the normal sequence (as per __schedule()) which is: - enqueue a task - then set the task as current task. This incorrectordering is the root cause of incorrect utilization accounting. Update the sequence to follow the right one: - dequeue task - put task - change the property - enqueue task - set task as current task Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: linaro-kernel@lists.linaro.org Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1473666472-13749-8-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Optimize SCHED_SMTPeter Zijlstra
Avoid pointless SCHED_SMT code when running on !SMT hardware. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Rewrite and improve select_idle_siblings()Peter Zijlstra
select_idle_siblings() is a known pain point for a number of workloads; it either does too much or not enough and sometimes just does plain wrong. This rewrite attempts to address a number of issues (but sadly not all). The current code does an unconditional sched_domain iteration; with the intent of finding an idle core (on SMT hardware). The problems which this patch tries to address are: - its pointless to look for idle cores if the machine is real busy; at which point you're just wasting cycles. - it's behaviour is inconsistent between SMT and !SMT hardware in that !SMT hardware ends up doing a scan for any idle CPU in the LLC domain, while SMT hardware does a scan for idle cores and if that fails, falls back to a scan for idle threads on the 'target' core. The new code replaces the sched_domain scan with 3 explicit scans: 1) search for an idle core in the LLC 2) search for an idle CPU in the LLC 3) search for an idle thread in the 'target' core where 1 and 3 are conditional on SMT support and 1 and 2 have runtime heuristics to skip the step. Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT siblings of the CPU going idle. Similarly, we clear sd_llc_shared->has_idle_cores when we fail to find an idle core. Step 2) tracks the average cost of the scan and compares this to the average idle time guestimate for the CPU doing the wakeup. There is a significant fudge factor involved to deal with the variability of the averages. Esp. hackbench was sensitive to this. Step 3) is unconditional; we assume (also per step 1) that scanning all SMT siblings in a core is 'cheap'. With this; SMT systems gain step 2, which cures a few benchmarks -- notably one from Facebook. One 'feature' of the sched_domain iteration, which we preserve in the new code, is that it would start scanning from the 'target' CPU, instead of scanning the cpumask in cpu id order. This avoids multiple CPUs in the LLC scanning for idle to gang up and find the same CPU quite as much. The down side is that tasks can end up hopping across the LLC for no apparent reason. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86/cmpxchg, locking/atomics: Remove superfluous definitionsNikolay Borisov
cmpxchg contained definitions for unused (x)add_* operations, dating back to the original ticket spinlock implementation. Nowadays these are unused so remove them. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1474913478-17757-1-git-send-email-n.borisov.lkml@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86, locking/spinlocks: Remove ticket (spin)lock implementationPeter Zijlstra
We've unconditionally used the queued spinlock for many releases now. Its time to remove the old ticket lock code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Waiman.Long@hpe.com Cc: david.vrabel@citrix.com Cc: dhowells@redhat.com Cc: pbonzini@redhat.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_sharedPeter Zijlstra
Move the nr_busy_cpus thing from its hacky sd->parent->groups->sgc location into the much more natural sched_domain_shared location. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Introduce 'struct sched_domain_shared'Peter Zijlstra
Since struct sched_domain is strictly per cpu; introduce a structure that is shared between all 'identical' sched_domains. Limit to SD_SHARE_PKG_RESOURCES domains for now, as we'll only use it for shared cache state; if another use comes up later we can easily relax this. While the sched_group's are normally shared between CPUs, these are not natural to use when we need some shared state on a domain level -- since that would require the domain to have a parent, which is not a given. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Restructure destroy_sched_domain()Peter Zijlstra
There is no point in doing a call_rcu() for each domain, only do a callback for the root sched domain and clean up the entire set in one go. Also make the entire call chain be called destroy_sched_domain*() to remove confusion with the free_sched_domains() call, which does an entirely different thing. Both cpu_attach_domain() callers of destroy_sched_domain() can live without the call_rcu() because at those points the sched_domain hasn't been published yet. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core: Remove unused @cpu argument from destroy_sched_domain*()Peter Zijlstra
Small cleanup; nothing uses the @cpu argument so make it go away. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/wait: Introduce init_wait_entry()Oleg Nesterov
The partial initialization of wait_queue_t in prepare_to_wait_event() looks ugly. This was done to shrink .text, but we can simply add the new helper which does the full initialization and shrink the compiled code a bit more. And. This way prepare_to_wait_event() can have more users. In particular we are ready to remove the signal_pending_state() checks from wait_bit_action_f helpers and change __wait_on_bit_lock() to use prepare_to_wait_event(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160906140055.GA6167@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()Oleg Nesterov
__wait_on_bit_lock() doesn't need abort_exclusive_wait() too. Right now it can't use prepare_to_wait_event() (see the next change), but it can do the additional finish_wait() if action() fails. abort_exclusive_wait() no longer has callers, remove it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160906140053.GA6164@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/wait: Avoid abort_exclusive_wait() in ___wait_event()Oleg Nesterov
___wait_event() doesn't really need abort_exclusive_wait(), we can simply change prepare_to_wait_event() to remove the waiter from q->task_list if it was interrupted. This simplifies the code/logic, and this way prepare_to_wait_event() can have more users, see the next change. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160908164815.GA18801@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> -- include/linux/wait.h | 7 +------ kernel/sched/wait.c | 35 +++++++++++++++++++++++++---------- 2 files changed, 26 insertions(+), 16 deletions(-)
2016-09-30sched/wait: Fix abort_exclusive_wait(), it should pass TASK_NORMAL to wake_up()Oleg Nesterov
Otherwise this logic only works if mode is "compatible" with another exclusive waiter. If some wq has both TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE waiters, abort_exclusive_wait() won't wait an uninterruptible waiter. The main user is __wait_on_bit_lock() and currently it is fine but only because TASK_KILLABLE includes TASK_UNINTERRUPTIBLE and we do not have lock_page_interruptible() yet. Just use TASK_NORMAL and remove the "mode" arg from abort_exclusive_wait(). Yes, this means that (say) wake_up_interruptible() can wake up the non- interruptible waiter(s), but I think this is fine. And in fact I think that abort_exclusive_wait() must die, see the next change. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160906140047.GA6157@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/fair: Fix fixed point arithmetic width for shares and effective loadDietmar Eggemann
Since commit: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") we now have two different fixed point units for load: - 'shares' in calc_cfs_shares() has 20 bit fixed point unit on 64-bit kernels. Therefore use scale_load() on MIN_SHARES. - 'wl' in effective_load() has 10 bit fixed point unit. Therefore use scale_load_down() on tg->shares which has 20 bit fixed point unit on 64-bit kernels. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471874441-24701-1-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30sched/core, x86/topology: Fix NUMA in package topology bugTim Chen
Current code can call set_cpu_sibling_map() and invoke sched_set_topology() more than once (e.g. on CPU hot plug). When this happens after sched_init_smp() has been called, we lose the NUMA topology extension to sched_domain_topology in sched_init_numa(). This results in incorrect topology when the sched domain is rebuilt. This patch fixes the bug and issues warning if we call sched_set_topology() after sched_init_smp(). Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30softirq: Let ksoftirqd do its jobEric Dumazet
A while back, Paolo and Hannes sent an RFC patch adding threaded-able napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/) The problem seems to be that softirqs are very aggressive and are often handled by the current process, even if we are under stress and that ksoftirqd was scheduled, so that innocent threads would have more chance to make progress. This patch makes sure that if ksoftirq is running, we let it perform the softirq work. Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/ Tested: - NIC receiving traffic handled by CPU 0 - UDP receiver running on CPU 0, using a single UDP socket. - Incoming flood of UDP packets targeting the UDP socket. Before the patch, the UDP receiver could almost never get CPU cycles and could only receive ~2,000 packets per second. After the patch, CPU cycles are split 50/50 between user application and ksoftirqd/0, and we can effectively read ~900,000 packets per second, a huge improvement in DOS situation. (Note that more packets are now dropped by the NIC itself, since the BH handlers get less CPU cycles to drain RX ring buffer) Since the load runs in well identified threads context, an admin can more easily tune process scheduling parameters if needed. Reported-by: Paolo Abeni <pabeni@redhat.com> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1472665349.14381.356.camel@edumazet-glaptop3.roam.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30m68k: Migrate exception table users off module.h and onto extable.hPaul Gortmaker
This file was only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile this. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-09-30sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lockXin Long
When sctp dumps all the ep->assocs, it needs to lock_sock first, but now it locks sock in rcu_read_lock, and lock_sock may sleep, which would break rcu_read_lock. This patch is to get and hold one sock when traversing the list. After that and get out of rcu_read_lock, lock and dump it. Then it will traverse the list again to get the next one until all sctp socks are dumped. For sctp_diag_dump_one, it fixes this issue by holding asoc and moving cb() out of rcu_read_lock in sctp_transport_lookup_process. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30Merge branch 'sctp-fixes'David S. Miller
Xin Long says: ==================== sctp: a bunch of fixes for prsctp polices This patchset is to fix 2 issues for prsctp polices: 1. patch 1 and 2 fix "netperf-Throughput_Mbps -37.2% regression" issue when overloading the CPU. 2. patch 3 fix "prsctp polices should check both sides' prsctp_capable, instead of only local side". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30sctp: change to check peer prsctp_capable when using prsctp policesXin Long
Now before using prsctp polices, sctp uses asoc->prsctp_enable to check if prsctp is enabled. However asoc->prsctp_enable is set only means local host support prsctp, sctp should not abandon packet if peer host doesn't enable prsctp. So this patch is to use asoc->peer.prsctp_capable to check if prsctp is enabled on both side, instead of asoc->prsctp_enable, as asoc's peer.prsctp_capable is set only when local and peer both enable prsctp. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30sctp: remove prsctp_param from sctp_chunkXin Long
Now sctp uses chunk->prsctp_param to save the prsctp param for all the prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk. We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices, and reuse msg->expires_at for TTL policy, as the prsctp polices and old expires policy are mutual exclusive. This patch is to remove prsctp_param from sctp_chunk, and reuse msg's expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF polices. Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy, as it needs a u64 variables to save the expires_at time. This one also fixes the "netperf-Throughput_Mbps -37.2% regression" issue. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30sctp: move sent_count to the memory hole in sctp_chunkXin Long
Now pahole sctp_chunk, it has 2 memory holes: struct sctp_chunk { struct list_head list; atomic_t refcnt; /* XXX 4 bytes hole, try to pack */ ... long unsigned int prsctp_param; int sent_count; /* XXX 4 bytes hole, try to pack */ This patch is to move up sent_count to fill the 1st one and eliminate the 2nd one. It's not just another struct compaction, it also fixes the "netperf- Throughput_Mbps -37.2% regression" issue when overloading the CPU. Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30tg3: Avoid NULL pointer dereference in tg3_io_error_detected()Milton Miller
While the driver is probing the adapter, an error may occur before the netdev structure is allocated and attached to pci_dev. In this case, not only netdev isn't available, but the tg3 private structure is also not available as it is just math from the NULL pointer, so dereferences must be skipped. The following trace is seen when the error is triggered: [1.402247] Unable to handle kernel paging request for data at address 0x00001a99 [1.402410] Faulting instruction address: 0xc0000000007e33f8 [1.402450] Oops: Kernel access of bad area, sig: 11 [#1] [1.402481] SMP NR_CPUS=2048 NUMA PowerNV [1.402513] Modules linked in: [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000 [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0 [1.402787] REGS: c000001fe4e8b790 TRAP: 0300 Not tainted (4.4.0-36-generic) [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28000422 XER: 20000000 [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1 GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000 GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299 GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006 GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768 GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002 GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000 [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340 [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340 This patch avoids the NULL pointer dereference by moving the access after the netdev NULL pointer check on tg3_io_error_detected(). Also, we add a check for netdev being NULL on tg3_io_resume() [suggested by Michael Chan]. Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery") Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error") Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>