summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2015-02-04locking/mutex: Explicitly mark task as running after wakeupDavidlohr Bueso
By the time we wake up and get the lock after being asleep in the slowpath, we better be running. As good practice, be explicit about this and avoid any mischief. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421717961.4903.11.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge tag 'v3.19-rc7' into locking/core, to refresh the branch before ↵Ingo Molnar
applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/Documentation: Remove unneeded wordSharon Dvir
The second 'mutex' shouldn't be there, it can't be about the mutex, as the mutex can't be freed, but unlocked, the memory where the mutex resides however, can be freed. Signed-off-by: Sharon Dvir <sharon.dvir1@mail.huji.ac.il> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422827252-31363-1-git-send-email-sharon.dvir1@mail.huji.ac.il Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched: Pull resched loop to __schedule() callersFrederic Weisbecker
__schedule() disables preemption during its job and re-enables it afterward without doing a preemption check to avoid recursion. But if an event happens after the context switch which requires rescheduling, we need to check again if a task of a higher priority needs the CPU. A preempt irq can raise such a situation. To handle that, __schedule() loops on need_resched(). But preempt_schedule_*() functions, which call __schedule(), also loop on need_resched() to handle missed preempt irqs. Hence we end up with the same loop happening twice. Lets simplify that by attributing the need_resched() loop responsibility to all __schedule() callers. There is a risk that the outer loop now handles reschedules that used to be handled by the inner loop with the added overhead of caller details (inc/dec of PREEMPT_ACTIVE, irq save/restore) but assuming those inner rescheduling loop weren't too frequent, this shouldn't matter. Especially since the whole preemption path is now losing one loop in any case. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1422404652-29067-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Remove cpu_active_mask from cpudl_find()Xunlei Pang
cpu_active_mask is rarely changed (only on hotplug), so remove this operation to gain a little performance. If there is a change in cpu_active_mask, rq_online_dl() and rq_offline_dl() should take care of it normally, so cpudl::free_cpus carries enough information for us. For the rare case when a task is put onto a dying cpu (which rq_offline_dl() can't handle in a timely fashion), it will be handled through _cpu_down()->...->multi_cpu_stop()->migration_call() ->migrate_tasks(), preventing the task from hanging on the dead cpu. Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [peterz: changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421642980-10045-2-git-send-email-pang.xunlei@linaro.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched: Fix hrtick_start() on UPWanpeng Li
The commit 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range") forgot to change the UP version of hrtick_start(), do so now. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range") [ Fixed the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-7-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Avoid pointless __setscheduler()Wanpeng Li
There is no need to dequeue/enqueue and push/pull if there are no scheduling parameters changed for the DL class. Both fair and RT classes already check if parameters changed for them to avoid unnecessary overhead. This patch add the parameters changed test for the DL class in order to reduce overhead. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [ Fixed up the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix stale yield statePeter Zijlstra
When we fail to start the deadline timer in update_curr_dl(), we forget to clear ->dl_yielded, resulting in wrecked time keeping. Since the natural place to clear both ->dl_yielded and ->dl_throttled is in replenish_dl_entity(); both are after all waiting for that event; make it so. Luckily since 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") the task_on_rq_queued() condition in dl_task_timer() must be true, and can therefore call enqueue_task_dl() unconditionally. Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix hrtick for a non-leftmost taskWanpeng Li
After update_curr_dl() the current task might not be the leftmost task anymore. In that case do not start a new hrtick for it. In this case NEED_RESCHED will be set and the next schedule will start the hrtick for the new task if and when appropriate. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Acked-by: Juri Lelli <juri.lelli@arm.com> [ Rewrote the changelog and comment. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge branch 'sched/urgent' into sched/core, to merge fixes before applying ↵Ingo Molnar
new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix deadline parameter modification handlingPeter Zijlstra
Commit 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") removed the hrtimer_try_cancel() function call out from init_dl_task_timer(), which gets called from __setparam_dl(). The result is that we can now re-init the timer while its active -- this is bad and corrupts timer state. Furthermore; changing the parameters of an active deadline task is tricky in that you want to maintain guarantees, while immediately effective change would allow one to circumvent the CBS guarantees -- this too is bad, as one (bad) task should not be able to affect the others. Rework things to avoid both problems. We only need to initialize the timer once, so move that to __sched_fork() for new tasks. Then make sure __setparam_dl() doesn't affect the current running state but only updates the parameters used to calculate the next scheduling period -- this guarantees the CBS functions as expected (albeit slightly pessimistic). This however means we need to make sure __dl_clear_params() needs to reset the active state otherwise new (and tasks flipping between classes) will not properly (re)compute their first instance. Todo: close class flipping CBS hole. Todo: implement delayed BW release. Reported-by: Luca Abeni <luca.abeni@unitn.it> Acked-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Luca Abeni <luca.abeni@unitn.it> Fixes: 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150128140803.GF23038@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-03PM / hibernate: exclude freed pages from allocated pages printoutWonhong Kwon
hibernate_preallocate_memory() prints out that how many pages are allocated, but it doesn't take into consideration the pages freed by free_unnecessary_pages(). Therefore, it always shows the count more than actually allocated. Signed-off-by: Wonhong Kwon <wonhong.kwon@lge.com> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-03tracing: Have mkdir and rmdir be part of tracefsSteven Rostedt (Red Hat)
The tracing "instances" directory can create sub tracing buffers with mkdir, and remove them with rmdir. As a mkdir will also create all the files and directories that control the sub buffer the inode mutexes need to be released before this is done, to avoid deadlocks. It is better to let the tracing system unlock the inode mutexes before calling the functions that create the files within the new directory (or deletes the files from the one being destroyed). Now that tracing has been converted over to tracefs, the tracefs file system can be modified to accommodate this feature. It still releases the locks, but the filesystem itself can take care of the ugly business and let the user just do what it needs. The tracing system now attaches a descriptor to the directory dentry that can have userspace create or remove sub directories. If this descriptor does not exist for a dentry, then that dentry can not be used to create other directories. This descriptor holds a mkdir and rmdir method that only takes a character string as an argument. The tracefs file system will first make a copy of the dentry name before releasing the locks. Then it will pass the copied name to the methods. It is up to the tracing system that supplied the methods to handle races with duplicate names and such as all the inode mutexes would be released when the functions are called. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03tracing: Automatically mount tracefs on debugfs/tracingSteven Rostedt (Red Hat)
As tools currently rely on the tracing directory in debugfs, we can not just created a tracefs infrastructure and expect sysadmins to mount the new tracefs to have their old tools work. Instead, the debugfs tracing directory is still created and the tracefs file system is mounted there when the debugfs filesystem is mounted. No longer does the tracing infrastructure update the debugfs file system, but instead interacts with the tracefs file system. But now, it still appears to the user like nothing changed, except you also have the feature of mounting just the tracing system without needing all of debugfs! Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03tracing: Convert the tracing facility over to use tracefsSteven Rostedt (Red Hat)
debugfs was fine for the tracing facility as a quick way to get an interface. Now that tracing has matured, it should separate itself from debugfs such that it can be mounted separately without needing to mount all of debugfs with it. That is, users resist using tracing because it requires mounting debugfs. Having tracing have its own file system lets users get the features of tracing without needing to bring in the rest of the kernel's debug infrastructure. Another reason for tracefs is that debubfs does not support mkdir. Currently, to create instances, one does a mkdir in the tracing/instance directory. This is implemented via a hack that forces debugfs to do something it is not intended on doing. By converting over to tracefs, this hack can be removed and mkdir can be properly implemented. This patch does not address this yet, but it lays the ground work for that to be done. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03tracing: Create cmdline tracer options on tracing fs initSteven Rostedt (Red Hat)
The options for cmdline tracers are not created if the debugfs system is not ready yet. If tracing has started before debugfs is up, then the option files for the tracer are not created. Create them when creating the tracing directory if the current tracer requires option files. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03tracing: Only create tracer options files if directory existsSteven Rostedt (Red Hat)
Do not bother creating tracer options if no tracing directory exists. If a tracer is enabled via the command line, and is started before the tracing directory is created, then it wont have its tracer specific options created. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03Merge tag 'v3.19-rc7' into x86/asm, to refresh the branch before pulling in ↵Ingo Molnar
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-02Merge branch 'debugfs_automount' of ↵Steven Rostedt (Red Hat)
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into trace/ftrace/tracefs Pull in Al Viro's changes to debugfs that implement the new primitive: debugfs_create_automount(), that creates a directory in debugfs that will safely mount another file system automatically when debugfs is mounted. This will let tracefs automount itself on top of debugfs/tracing directory.
2015-02-02tracing: Separate out initializing top level dir from instancesSteven Rostedt (Red Hat)
The top level trace array is treated a little different than the instances, as it has to deal with more of the general tracing. The tr->dir is the tracing directory, which is an immutable dentry, where as the tr->dir of instances are the dentry that was created, and can be destroyed later. These should have different functions accessing them. As only tracing_init_dentry() deals with the top level array, fold the code for it into that function, and remove the trace_init_dentry_tr() that was also used by the instances to get their directory dentry. Add a tracing_get_dentry() to just get the tracing dir entry for instances as well as the top level array. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-02tracing: Make tracing_init_dentry_tr() staticSteven Rostedt (Red Hat)
tracing_init_dentry_tr() is not used outside of trace.c, it should be static. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-02perf: provide sysfs_show for struct perf_pmu_events_attrCody P Schafer
(struct perf_pmu_events_attr) is defined in include/linux/perf_event.h, but the only "show" for it is in x86 and contains x86 specific stuff. Make a generic one for those of us who are just using the event_str. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-01sched: don't cause task state changes in nested sleep debuggingLinus Torvalds
Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an event groups fix, two PMU driver fixes and a CPU model variant addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Tighten (and fix) the grouping condition perf/x86/intel: Add model number for Airmont perf/rapl: Fix crash in rapl_scale() perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization perf probe: Fix probing kretprobes perf symbols: Introduce 'for' method to iterate over the symbols with a given name perf probe: Do not rely on map__load() filter to find symbols perf symbols: Introduce method to iterate symbols ordered by name perf symbols: Return the first entry with a given name in find_by_name method perf annotate: Fix memory leaks in LOCK handling perf annotate: Handle ins parsing failures perf scripting perl: Force to use stdbool perf evlist: Remove extraneous 'was' on error message
2015-01-30sched/deadline: Modify cpudl::free_cpus to reflect rd->onlineXunlei Pang
Currently, cpudl::free_cpus contains all CPUs during init, see cpudl_init(). When calling cpudl_find(), we have to add rd->span to avoid selecting the cpu outside the current root domain, because cpus_allowed cannot be depended on when performing clustered scheduling using the cpuset, see find_later_rq(). This patch adds cpudl_set_freecpu() and cpudl_clear_freecpu() for changing cpudl::free_cpus when doing rq_online_dl()/rq_offline_dl(), so we can avoid the rd->span operation when calling cpudl_find() in find_later_rq(). Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421642980-10045-1-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched/idle: Add missing checks to the exit condition of cpu_idle_poll()Preeti U Murthy
cpu_idle_poll() is entered into when either the cpu_idle_force_poll is set or tick_check_broadcast_expired() returns true. The exit condition from cpu_idle_poll() is tif_need_resched(). However this does not take into account scenarios where cpu_idle_force_poll changes or tick_check_broadcast_expired() returns false, without setting the resched flag. So a cpu will be caught in cpu_idle_poll() needlessly, thereby wasting power. Add an explicit check on cpu_idle_force_poll and tick_check_broadcast_expired() to the exit condition of cpu_idle_poll() to avoid this. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150121105655.15279.59626.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched: Fix missing preemption opportunityFrederic Weisbecker
If an interrupt fires in cond_resched(), between the call to __schedule() and the PREEMPT_ACTIVE count decrementation, and that interrupt sets TIF_NEED_RESCHED, the call to preempt_schedule_irq() will be ignored due to the PREEMPT_ACTIVE count. This kind of scenario, with irq preemption being delayed because it's interrupting a preempt-disabled area, is usually fixed up after preemption is re-enabled back with an explicit call to preempt_schedule(). This is what preempt_enable() does but a raw preempt count decrement as performed by __preempt_count_sub(PREEMPT_ACTIVE) doesn't handle delayed preemption check. Therefore when such a race happens, the rescheduling is going to be delayed until the next scheduler or preemption entrypoint. This can be a problem for scheduler latency sensitive workloads. Lets fix that by consolidating cond_resched() with preempt_schedule() internals. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ingo Molnar <mingo@kernel.org> Original-patch-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421946484-9298-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched/rt: Reduce rq lock contention by eliminating locking of non-feasible ↵Tim Chen
target This patch adds checks that prevens futile attempts to move rt tasks to a CPU with active tasks of equal or higher priority. This reduces run queue lock contention and improves the performance of a well known OLTP benchmark by 0.7%. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Cc: Suruchi Kadu <suruchi.a.kadu@intel.com> Cc: Doug Nelson<doug.nelson@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30Merge branch 'sched/urgent' into sched/coreIngo Molnar
Merge all pending fixes and refresh the tree, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30PM / sleep: export suspend_resume trace eventTodd E Brandt
Export the suspend_resume tracepoint so it can be used in loadable modules. Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-29ftrace: allow architectures to specify ftrace compile optionsHeiko Carstens
If the kernel is compiled with function tracer support the -pg compile option is passed to gcc to generate extra code into the prologue of each function. This patch replaces the "open-coded" -pg compile flag with a CC_FLAGS_FTRACE makefile variable which architectures can override if a different option should be used for code generation. Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-28trace: Use 64-bit timekeepingTina Ruchandani
The ring_buffer_producer uses 'struct timeval' to measure its start and end times. 'struct timeval' on 32-bit systems will have its tv_sec value overflow in year 2038 and beyond. This patch replaces struct timeval with 'ktime_t' which uses 64-bit representation for nanoseconds. Link: http://lkml.kernel.org/r/20150128141611.GA2701@tinar Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-28tracing: Add array printing helperDave Martin
If a trace event contains an array, there is currently no standard way to format this for text output. Drivers are currently hacking around this by a) local hacks that use the trace_seq functionailty directly, or b) just not printing that information. For fixed size arrays, formatting of the elements can be open-coded, but this gets cumbersome for arrays of non-trivial size. These approaches result in non-standard content of the event format description delivered to userspace, so userland tools needs to be taught to understand and parse each array printing method individually. This patch implements a __print_array() helper that tracepoint implementations can use instead of reinventing it. A simple C-style syntax is used to delimit the array and its elements {like,this}. So that the helpers can be used with large static arrays as well as dynamic arrays, they take a pointer and element count: they can be used with __get_dynamic_array() for use with dynamic arrays. Link: http://lkml.kernel.org/r/1422449335-8289-2-git-send-email-javi.merino@arm.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-28Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28Merge tag 'pr-20150114-x86-entry' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull x86/entry enhancements from Andy Lutomirski: " This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com " This tree semantically depends on and is based on the following RCU commit: 734d16801349 ("rcu: Make rcu_nmi_enter() handle nesting") ... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumaskMike Galbraith
While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink() an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it: CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19 Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000 RIP: 0010:[<ffffffff81073846>] [<ffffffff81073846>] cpuset_cpumask_can_shrink+0x56/0xb0 [...] Call Trace: [<ffffffff810cb82a>] validate_change+0x18a/0x200 [<ffffffff810cc877>] cpuset_write_resmask+0x3b7/0x720 [<ffffffff810c4d58>] cgroup_file_write+0x38/0x100 [<ffffffff811d953a>] kernfs_fop_write+0x12a/0x180 [<ffffffff8116e1a3>] vfs_write+0xb3/0x1d0 [<ffffffff8116ed06>] SyS_write+0x46/0xb0 [<ffffffff8159ced6>] system_call_fastpath+0x16/0x1b Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by: Zefan Li <lizefan@huawei.com> Fixes: f82f80426f7a ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28perf: Tighten (and fix) the grouping conditionPeter Zijlstra
The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28sched/fair: Avoid using uninitialized variable in preferred_group_nid()Jan Beulich
At least some gcc versions - validly afaict - warn about potentially using max_group uninitialized: There's no way the compiler can prove that the body of the conditional where it and max_faults get set/ updated gets executed; in fact, without knowing all the details of other scheduler code, I can't prove this either. Generally the necessary change would appear to be to clear max_group prior to entering the inner loop, and break out of the outer loop when it ends up being all clear after the inner one. This, however, seems inefficient, and afaict the same effect can be achieved by exiting the outer loop when max_faults is still zero after the inner loop. [ mingo: changed the solution to zero initialization: uninitialized_var() needs to die, as it's an actively dangerous construct: if in the future a known-proven-good piece of code is changed to have a true, buggy uninitialized variable, the compiler warning is then supressed... The better long term solution is to clean up the code flow, so that even simple minded compilers (and humans!) are able to read it without getting a headache. ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/54C2139202000078000588F7@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27tracing: Remove newline from trace_printk warning bannerBorislav Petkov
Remove the output-confusing newline below: [ 0.191328] ********************************************************** [ 0.191493] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** [ 0.191586] ** ** ... Link: http://lkml.kernel.org/r/1422375440-31970-1-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> [ added an extra '\n' by itself, to keep what it was suppose to do ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't OOPS on socket AIO, from Christoph Hellwig. 2) Scheduled scans should be aborted upon RFKILL, from Emmanuel Grumbach. 3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish. 4) Fix RCU locking across copy_to_user() in bpf code, from Alexei Starovoitov. 5) Lots of crash, memory leak, short TX packet et al bug fixes in sh_eth from Ben Hutchings. 6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel Borkmann. 7) Fix return value logic for poll handlers in netxen, enic, and bnx2x. From Eric Dumazet and Govindarajulu Varadarajan. 8) Header length calculation fix in mac80211 from Fred Chou. 9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths. From Ezequiel Garcia. 10) udp_diag has bogus logic in it's hash chain skipping, copy same fix tcp diag used. From Herbert Xu. 11) amd-xgbe programs wrong rx flow control register, from Thomas Lendacky. 12) Fix race leading to use after free in ping receive path, from Subash Abhinov Kasiviswanathan. 13) Cache redirect routes otherwise we can get a heavy backlog of rcu jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) net: don't OOPS on socket aio stmmac: prevent probe drivers to crash kernel bnx2x: fix napi poll return value for repoll ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too sh_eth: Fix DMA-API usage for RX buffers sh_eth: Check for DMA mapping errors on transmit sh_eth: Ensure DMA engines are stopped before freeing buffers sh_eth: Remove RX overflow log messages ping: Fix race in free in receive path udp_diag: Fix socket skipping within chain can: kvaser_usb: Fix state handling upon BUS_ERROR events can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT can: kvaser_usb: Send correct context to URB completion can: kvaser_usb: Do not sleep in atomic context ipv4: try to cache dst_entries which would cause a redirect samples: bpf: relax test_maps check bpf: rcu lock must not be held when calling copy_to_user() net: sctp: fix slab corruption from use after free on INIT collisions net: mv643xx_eth: Fix highmem support in non-TSO egress path sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers ...
2015-01-26bpf: rcu lock must not be held when calling copy_to_user()Alexei Starovoitov
BUG: sleeping function called from invalid context at mm/memory.c:3732 in_atomic(): 0, irqs_disabled(): 0, pid: 671, name: test_maps 1 lock held by test_maps/671: #0: (rcu_read_lock){......}, at: [<0000000000264190>] map_lookup_elem+0xe8/0x260 Call Trace: ([<0000000000115b7e>] show_trace+0x12e/0x150) [<0000000000115c40>] show_stack+0xa0/0x100 [<00000000009b163c>] dump_stack+0x74/0xc8 [<000000000017424a>] ___might_sleep+0x23a/0x248 [<00000000002b58e8>] might_fault+0x70/0xe8 [<0000000000264230>] map_lookup_elem+0x188/0x260 [<0000000000264716>] SyS_bpf+0x20e/0x840 Fix it by allocating temporary buffer to store map element value. Fixes: db20fd2b0108 ("bpf: add lookup/update/delete/iterate methods to BPF maps") Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26Merge branch 'for-3.19-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "The lifetime rules of cgroup hierarchies always have been somewhat counter-intuitive and cgroup core tried to enforce that hierarchies w/o userland-visible usages must die in finite amount of time so that the controllers can be reused for other hierarchies; unfortunately, this can't be implemented reasonably for the memory controller - the kmemcg part doesn't have any way to forcefully drain the existing usages, leading to an interruptible hang if a following mount attempts to use the controller in any way. So, it seems like we're stuck with "hierarchies live on till they die whenever that may be" at least for now. This pretty much confines attaching controllers to hierarchies to before the hierarchies are actively used by making dynamic configurations post active usages unreliable. This has never been reliable and should be fine in practice given how cgroups are used. After the patch, hierarchies aren't killed if it isn't already drained. A following mount attempt of the same mount options will reuse the existing hierarchy. Mount attempts with differing options will fail w/ -EBUSY" * 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: prevent mount hang due to memory controller lifetime
2015-01-26kexec, Kconfig: spell "architecture" properlyBorislav Petkov
Grepping for "archicture" showed it actually twice! Most unusual spelling error, very interesting. :) Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-01-26Merge branch 'linus' into irq/coreThomas Gleixner
Reason: Pull in upstream fixes on which new patches depend on.
2015-01-25new fs_pin killing logicsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25get rid of the second argument of acct_kill()Al Viro
Replace the old ns->bacct only with NULL and only if it still points to acct. And assign the new value to it *before* calling acct_kill() in acct_on(). That way we don't need to pass the new acct to acct_kill(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25take count and rcu_head out of fs_pinAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25pull bumping refcount into ->kill()Al Viro
there will be one more change of ->kill() calling conventions; this isn't final. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25kill pin_put()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>