summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2013-08-08cpuset: drop "const" qualifiers from struct cpuset instancesTejun Heo
cpuset uses "const" qualifiers on struct cpuset in some functions; however, it doesn't work well when a value derived from returned const pointer has to be passed to an accessor. It's C after all. Drop the "const" qualifiers except for the trivially leaf ones. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/Tejun Heo
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-08userns: limit the maximum depth of user_namespace->parent chainOleg Nesterov
Ensure that user_namespace->parent chain can't grow too much. Currently we use the hardroded 32 as limit. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-07perf: Do not get values from disabled counters in group format readJiri Olsa
It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by reading only enabled counters for PERF_SAMPLE_READ sample type processing. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wwkjb0bbcuslnz0klrmqi26r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-07perf: Add PERF_EVENT_IOC_ID ioctl to return event IDJiri Olsa
The only way to get the event ID is by reading the event fd, followed by parsing the ID value out of the returned data. While this is ok for current read format used by perf tool, it is not ok when we use PERF_FORMAT_GROUP format. With this format the data are returned for the whole group and there's no way to find out what ID belongs to our fd (if we are not group leader event). Adding a simple ioctl that returns event primary ID for given fd. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-v1bn5cto707jn0bon34afqr1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-07Merge tag 'trace-fixes-3.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Oleg Nesterov has been working hard in closing all the holes that can lead to race conditions between deleting an event and accessing an event debugfs file. This included a fix to the debugfs system (acked by Greg Kroah-Hartman). We think that all the holes have been patched and hopefully we don't find more. I haven't marked all of them for stable because I need to examine them more to figure out how far back some of the changes need to go. Along the way, some other fixes have been made. Alexander Z Lam fixed some logic where the wrong buffer was being modifed. Andrew Vagin found a possible corruption for machines that actually allocate cpumask, as a reference to one was being zeroed out by mistake. Dhaval Giani found a bad prototype when tracing is not configured. And I not only had some changes to help Oleg, but also finally fixed a long standing bug that Dave Jones and others have been hitting, where a module unload and reload can cause the function tracing accounting to get screwed up" * tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix reset of time stamps during trace_clock changes tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set tracing: Fix fields of struct trace_iterator that are zeroed by mistake tracing/uprobes: Fail to unregister if probe event files are in use tracing/kprobes: Fail to unregister if probe event files are in use tracing: Add comment to describe special break case in probe_remove_event_call() tracing: trace_remove_event_call() should fail if call/file is in use debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs) ftrace: Check module functions being traced on reload ftrace: Consolidate some duplicate code for updating ftrace ops tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private tracing: Introduce remove_event_file_dir() tracing: Change f_start() to take event_mutex and verify i_private != NULL tracing: Change event_filter_read/write to verify i_private != NULL tracing: Change event_enable/disable_read() to verify i_private != NULL tracing: Turn event/id->i_private into call->event.type
2013-08-06x86, asmlinkage, power: Make various symbols used by the suspend asm code ↵Andi Kleen
visible Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-16-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06Merge branch 'for-3.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Fix for a minor memory leak bug in the cgroup init failure path" * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix a leak when percpu_ref_init() fails
2013-08-06Merge branch 'for-3.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull two workqueue fixes from Tejun Heo: "A lockdep notation update so that nested work_on_cpu() invocations don't lead to spurious lockdep warnings and fix for an unbound attr bug which made what's shown in sysfs deviate from the actual ones. Both patches have pretty limited scope" * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: copy workqueue_attrs with all fields workqueue: allow work_on_cpu() to be called recursively
2013-08-06printk: Fix return of braille_register_console()Steven Rostedt
Some of my configs I test with have CONFIG_A11Y_BRAILLE_CONSOLE set. When I started testing against v3.11-rc4 my console went bonkers. Using ktest to bisect the issue, it came down to: commit bbeddf52a "printk: move braille console support into separate braille.[ch] files" Looking into the patch I found the problem. It's with the return of braille_register_console(). As anything other than NULL is considered a failure. But for those of us that have CONFIG_A11Y_BRAILLE_CONSOLE set but do not define a "brl" or "brl=" on the command line, we still may want a console that those with sight can still use. Return NULL (success) if "brl" or "brl=" is not on the console line. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-06Revert "ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child)"Oleg Nesterov
This reverts commit fab840fc2d542fabcab903db8e03589a6702ba5f. This commit even has the test-case to prove that the tracee can be killed by SIGTRAP if the debugger does not remove the breakpoints before PTRACE_DETACH. However, this is exactly what wineserver deliberately does, set_thread_context() calls PTRACE_ATTACH + PTRACE_DETACH just for PTRACE_POKEUSER(DR*) in between. So we should revert this fix and document that PTRACE_DETACH should keep the breakpoints. Reported-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-06userns: unshare_userns(&cred) should not populate cred on failureOleg Nesterov
unshare_userns(new_cred) does *new_cred = prepare_creds() before create_user_ns() which can fail. However, the caller expects that it doesn't need to take care of new_cred if unshare_userns() fails. We could change the single caller, sys_unshare(), but I think it would be more clean to avoid the side effects on failure, so with this patch unshare_userns() does put_cred() itself and initializes *new_cred only if create_user_ns() succeeeds. Cc: stable@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-05register_console: prevent adding the same console twiceAndreas Bießmann
This patch guards the console_drivers list to be corrupted. The for_each_console() macro insist on a strictly forward list ended by NULL: con0->next->con1->next->NULL Without this patch it may happen easily to destroy this list for example by adding 'earlyprintk' twice, especially on embedded devices where the early console is often a single static instance. This will result in the following list: con0->next->con0 This in turn will result in an endless loop in console_unlock() later on by printing the first __log_buf line endlessly. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Cc: Kay Sievers <kay@vrfy.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-02tracing: Fix reset of time stamps during trace_clock changesAlexander Z Lam
Fixed two issues with changing the timestamp clock with trace_clock: - The global buffer was reset on instance clock changes. Change this to pass the correct per-instance buffer - ftrace_now() is used to set buf->time_start in tracing_reset_online_cpus(). This was incorrect because ftrace_now() used the global buffer's clock to return the current time. Change this to use buffer_ftrace_now() which returns the current time for the correct per-instance buffer. Also removed tracing_reset_current() because it is not used anywhere Link: http://lkml.kernel.org/r/1375493777-17261-2-git-send-email-azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-02tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct bufferAlexander Z Lam
Releasing the free_buffer file in an instance causes the global buffer to be stopped when TRACE_ITER_STOP_ON_FREE is enabled. Operate on the correct buffer. Link: http://lkml.kernel.org/r/1375493777-17261-1-git-send-email-azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-02tracing: Fix fields of struct trace_iterator that are zeroed by mistakeAndrew Vagin
tracing_read_pipe zeros all fields bellow "seq". The declaration contains a comment about that, but it doesn't help. The first field is "snapshot", it's true when current open file is snapshot. Looks obvious, that it should not be zeroed. The second field is "started". It was converted from cpumask_t to cpumask_var_t (v2.6.28-4983-g4462344), in other words it was converted from cpumask to pointer on cpumask. Currently the reference on "started" memory is lost after the first read from tracing_read_pipe and a proper object will never be freed. The "started" is never dereferenced for trace_pipe, because trace_pipe can't have the TRACE_FILE_ANNOTATE options. Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org Cc: stable@vger.kernel.org # 2.6.30 Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-02cgroup: Merge branch 'for-3.11-fixes' into for-3.12Tejun Heo
for-3.12 branch is about to receive invasive updates which are dependent on da0a12caff ("cgroup: fix a leak when percpu_ref_init() fails"). Given the amount of scheduled changes, I think it'd less painful to pull in for-3.11-fixes as preparation. Pull in for-3.11-fixes into for-3.12. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-02Merge tag 'pm+acpi-3.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - Revert two cpuidle commits added during the 3.8 development cycle that turn out to have introduced a significant performance regression as requested by Jeremy Eder. - The recent patches that made the freezer less heavy-weight introduced a regression causing user-space-driven hibernation using the ioctl() interface to block indefinitely when the hibernate process executes try_to_freeze(). Fix from Colin Cross addresses this by adding a process flag to mark the hibernate/suspend process to inform the freezer that that process should be ignored. - One of the recent cpufreq reverts uncovered a problem in the core causing the cpufreq driver module refcount to become negative after a system suspend-resume cycle. Fix from Rafael J Wysocki. - The evaluation of the ACPI battery _BIX method has never worked correctly, because the commit that added support for it forgot to take the "Revision" field in the return package into account. As a result, the reading of battery info doesn't work at all on some systems, which is addressed by a fix from Lan Tianyu. * tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes ACPI / battery: Fix parsing _BIX return value cpufreq: Fix cpufreq driver module refcount balance after suspend/resume Revert "cpuidle: Quickly notice prediction failure for repeat mode" Revert "cpuidle: Quickly notice prediction failure in general case"
2013-08-02hung_task debugging: Print more info when reporting the problemOleg Nesterov
printk(KERN_ERR) from check_hung_task() likely means we have a bug, but unlike BUG_ON()/WARN_ON ()it doesn't show the kernel version, this complicates the bug-reports investigation. Add the additional pr_err() to print tainted/release/version like dump_stack_print_info() does, the output becomes: INFO: task perl:504 blocked for more than 2 seconds. Not tainted 3.11.0-rc1-10367-g136bb46-dirty #1763 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ... While at it, turn the old printk's into pr_err(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: ahecox@redhat.com Cc: Christopher Williams <cww@redhat.com> Cc: dwysocha@redhat.com Cc: gavin@redhat.com Cc: Mandeep Singh Baines <msb@chromium.org> Cc: nshi@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130801165941.GA17544@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-01tracing/uprobes: Fail to unregister if probe event files are in useSteven Rostedt (Red Hat)
Uprobes suffer the same problem that kprobes have. There's a race between writing to the "enable" file and removing the probe. The probe checks for it being in use and if it is not, goes about deleting the probe and the event that represents it. But the problem with that is, after it checks if it is in use it can be enabled, and the deletion of the event (access to the probe) will fail, as it is in use. But the uprobe will still be deleted. This is a problem as the event can reference the uprobe that was deleted. The fix is to remove the event first, and check to make sure the event removal succeeds. Then it is safe to remove the probe. When the event exists, either ftrace or perf can enable the probe and prevent the event from being removed. Link: http://lkml.kernel.org/r/20130704034038.991525256@goodmis.org Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-01cgroup: rename cgroup_pidlist->mutexLi Zefan
It's a rw_semaphore not a mutex. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-01cgroup: restructure the failure path in cgroup_write_event_control()Li Zefan
It uses a single label and checks the validity of each pointer. This is err-prone, and actually we had a bug because one of the check was insufficient. Use multi lables as we do in other places. v2: - drop initializations of local variables. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-01workqueue: copy workqueue_attrs with all fieldsShaohua Li
$echo '0' > /sys/bus/workqueue/devices/xxx/numa $cat /sys/bus/workqueue/devices/xxx/numa I got 1. It should be 0, the reason is copy_workqueue_attrs() called in apply_workqueue_attrs() doesn't copy no_numa field. Fix it by making copy_workqueue_attrs() copy ->no_numa too. This would also make get_unbound_pool() set a pool's ->no_numa attribute according to the workqueue attributes used when the pool was created. While harmelss, as ->no_numa isn't a pool attribute, this is a bit confusing. Clear it explicitly. tj: Updated description and comments a bit. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2013-07-31tracing/kprobes: Fail to unregister if probe event files are in useSteven Rostedt (Red Hat)
When a probe is being removed, it cleans up the event files that correspond to the probe. But there is a race between writing to one of these files and deleting the probe. This is especially true for the "enable" file. CPU 0 CPU 1 ----- ----- fd = open("enable",O_WRONLY); probes_open() release_all_trace_probes() unregister_trace_probe() if (trace_probe_is_enabled(tp)) return -EBUSY write(fd, "1", 1) __ftrace_set_clr_event() call->class->reg() (kprobe_register) enable_trace_probe(tp) __unregister_trace_probe(tp); list_del(&tp->list) unregister_probe_event(tp) <-- fails! free_trace_probe(tp) write(fd, "0", 1) __ftrace_set_clr_event() call->class->unreg (kprobe_register) disable_trace_probe(tp) <-- BOOM! A test program was written that used two threads to simulate the above scenario adding a nanosleep() interval to change the timings and after several thousand runs, it was able to trigger this bug and crash: BUG: unable to handle kernel paging request at 00000005000000f9 IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7 PGD 7808a067 PUD 0 Oops: 0000 [#1] PREEMPT SMP Dumping ftrace buffer: --------------------------------- Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000 RIP: 0010:[<ffffffff810dee70>] [<ffffffff810dee70>] probes_open+0x3b/0xa7 RSP: 0018:ffff880076e53c38 EFLAGS: 00010203 RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003 RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000 RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000 R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35 R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450 FS: 00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0 Stack: ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35 ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0 ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440 Call Trace: [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30 [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b [<ffffffff81130f78>] do_dentry_open+0x162/0x226 [<ffffffff81131186>] finish_open+0x46/0x54 [<ffffffff8113f30b>] do_last+0x7f6/0x996 [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44 [<ffffffff8113f6dd>] path_openat+0x232/0x496 [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a [<ffffffff81131f4e>] do_sys_open+0x70/0x102 [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197 [<ffffffff81131ffe>] SyS_open+0x1e/0x20 [<ffffffff81522742>] system_call_fastpath+0x16/0x1b Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7 RIP [<ffffffff810dee70>] probes_open+0x3b/0xa7 RSP <ffff880076e53c38> CR2: 00000005000000f9 ---[ end trace 35f17d68fc569897 ]--- The unregister_trace_probe() must be done first, and if it fails it must fail the removal of the kprobe. Several changes have already been made by Oleg Nesterov and Masami Hiramatsu to allow moving the unregister_probe_event() before the removal of the probe and exit the function if it fails. This prevents the tp structure from being used after it is freed. Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge more patches from Andrew Morton: "A bunch of fixes. Plus Joe's printk move and rework. It's not a -rc3 thing but now would be a nice time to offload it, while things are quiet. I've been sitting on it all for a couple of weeks, no issues" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: vmpressure: make sure there are no events queued after memcg is offlined vmpressure: do not check for pending work to prevent from new work vmpressure: change vmpressure::sr_lock to spinlock printk: rename struct log to struct printk_log printk: use pointer for console_cmdline indexing printk: move braille console support into separate braille.[ch] files printk: add console_cmdline.h printk: move to separate directory for easier modification drivers/rtc/rtc-twl.c: fix: rtcX/wakealarm attribute isn't created mm: zbud: fix condition check on allocation size thp, mm: avoid PageUnevictable on active/inactive lru lists mm/swap.c: clear PageActive before adding pages onto unevictable list arch/x86/platform/ce4100/ce4100.c: include reboot.h mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG rapidio: fix use after free in rio_unregister_scan() .gitignore: ignore *.lz4 files MAINTAINERS: dynamic debug: Jason's not there... dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine() ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page() mm: mempolicy: fix mbind_range() && vma_adjust() interaction
2013-07-31printk: rename struct log to struct printk_logJoe Perches
Rename the struct to enable moving portions of printk.c to separate files. The rename changes output of /proc/vmcoreinfo. Signed-off-by: Joe Perches <joe@perches.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31printk: use pointer for console_cmdline indexingJoe Perches
Make the code a bit more compact by always using a pointer for the active console_cmdline. Move overly indented code to correct indent level. Signed-off-by: Joe Perches <joe@perches.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31printk: move braille console support into separate braille.[ch] filesJoe Perches
Create files with prototypes and static inlines for braille support. Make braille_console functions return 1 on success. Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup return value to NULL. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31printk: add console_cmdline.hJoe Perches
Add an include file for the console_cmdline struct so that the braille console driver can be separated. Signed-off-by: Joe Perches <joe@perches.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31printk: move to separate directory for easier modificationJoe Perches
Make it easier to break up printk into bite-sized chunks. Remove printk path/filename from comment. Signed-off-by: Joe Perches <joe@perches.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31mm: sched: numa: fix NUMA balancing when !SCHED_DEBUGDave Kleikamp
Commit 3105b86a9fee ("mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG") defined numabalancing_enabled to control the enabling and disabling of automatic NUMA balancing, but it is never used. I believe the intention was to use this in place of sched_feat_numa(NUMA). Currently, if SCHED_DEBUG is not defined, sched_feat_numa(NUMA) will never be changed from the initial "false". Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix association failures not triggering a connect-failure event in cfg80211, from Johannes Berg. 2) Eliminate a potential NULL deref with older iptables tools when configuring xt_socket rules, from Eric Dumazet. 3) Missing RTNL locking in wireless regulatory code, from Johannes Berg. 4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey Khoroshilov. 5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey Khoroshilov. 6) VXLAN namespace teardown fails to unregister devices, from Stephen Hemminger. 7) Fix multicast settings getting dropped by firmware in qlcnic driver, from Sucheta Chakraborty. 8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar. 9) Fix a nasty bug in bridging where an active timer would get reinitialized with a setup_timer() call. From Eric Dumazet. 10) Fix use after free in new mlx5 driver, from Dan Carpenter. 11) Fix freed pointer reference in ipv6 multicast routing on namespace cleanup, from Hannes Frederic Sowa. 12) Some usbnet drivers report TSO and SG in their feature set, but the usbnet layer doesn't really support them. From Eric Dumazet. 13) Fix crash on EEH errors in tg3 driver, from Gavin Shan. 14) Drop cb_lock when requesting modules in genetlink, from Stanislaw Gruszka. 15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from Dan Carpenter. 16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to endless loops, from Uwe Kleine-König. 17) Fix hangs from loading mvneta driver, from Arnaud Patard. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits) mlx5: fix error return code in mlx5_alloc_uuars() mvneta: Try to fix mvneta when compiled as module mvneta: Fix hang when loading the mvneta driver atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE af_key: more info leaks in pfkey messages net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link net_sched: Fix stack info leak in cbq_dump_wrr(). igb: fix vlan filtering in promisc mode when not in VT mode ixgbe: Fix Tx Hang issue with lldpad on 82598EB genetlink: release cb_lock before requesting additional module net: fec: workaround stop tx during errata ERR006358 qlcnic: Fix diagnostic interrupt test for 83xx adapters. qlcnic: Fix setting Guest VLAN qlcnic: Fix operation type and command type. qlcnic: Fix initialization of work function. Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring" atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring net/tg3: Fix warning from pci_disable_device() net/tg3: Fix kernel crash ...
2013-07-31tracing: Add comment to describe special break case in probe_remove_event_call()Steven Rostedt (Red Hat)
The "break" used in the do_for_each_event_file() is used as an optimization as the loop is really a double loop. The loop searches all event files for each trace_array. There's only one matching event file per trace_array and after we find the event file for the trace_array, the break is used to jump to the next trace_array and start the search there. As this is not a standard way of using "break" in C code, it requires a comment right before the break to let people know what is going on. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31tracing: trace_remove_event_call() should fail if call/file is in useOleg Nesterov
Change trace_remove_event_call(call) to return the error if this call is active. This is what the callers assume but can't verify outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c need the additional changes, unregister_trace_probe() should abort if trace_remove_event_call() fails. The caller is going to free this call/file so we must ensure that nobody can use them after trace_remove_event_call() succeeds. debugfs should be fine after the previous changes and event_remove() does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need the additional checks: - There could be a perf_event(s) attached to this tp_event, so the patch checks ->perf_refcount. - TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE, so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex. Link: http://lkml.kernel.org/r/20130729175033.GB26284@redhat.com Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31cgroup: convert cgroup_ida to cgroup_idrLi Zefan
This enables us to lookup a cgroup by its id. v4: - add a comment for idr_remove() in cgroup_offline_fn(). v3: - on success, idr_alloc() returns the id but not 0, so fix the BUG_ON() in cgroup_init(). - pass the right value to idr_alloc() so that the id for dummy cgroup is 0. Signed-off-by: Li Zefan <lizefan@huawei.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-07-31cgroup: more naming cleanupsLi Zefan
Constantly use @cset for css_set variables and use @cgrp as cgroup variables. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-07-31cgroup: remove struct cgroup_seqfile_stateLi Zefan
We can use struct cfent instead. v2: - remove cgroup_seqfile_release(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-07-31cgroup: remove sparse tags from offline_css()Li Zefan
This should have been removed in commit d7eeac1913ff ("cgroup: hold cgroup_mutex before calling css_offline"). While at it, update the comments. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-07-31cgroup: fix a leak when percpu_ref_init() failsLi Zefan
ss->css_free() is not called when perfcpu_ref_init() fails. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-07-30ftrace: Check module functions being traced on reloadSteven Rostedt (Red Hat)
There's been a nasty bug that would show up and not give much info. The bug displayed the following warning: WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230() Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk Call Trace: [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230 [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0 [<ffffffff811401cc>] ? kfree+0x2c/0x110 [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150 [<ffffffff81149f1e>] __fput+0xae/0x220 [<ffffffff8114a09e>] ____fput+0xe/0x10 [<ffffffff8105fa22>] task_work_run+0x72/0x90 [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0 [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff815c0f88>] int_signal+0x12/0x17 ---[ end trace 793179526ee09b2c ]--- It was finally narrowed down to unloading a module that was being traced. It was actually more than that. When functions are being traced, there's a table of all functions that have a ref count of the number of active tracers attached to that function. When a function trace callback is registered to a function, the function's record ref count is incremented. When it is unregistered, the function's record ref count is decremented. If an inconsistency is detected (ref count goes below zero) the above warning is shown and the function tracing is permanently disabled until reboot. The ftrace callback ops holds a hash of functions that it filters on (and/or filters off). If the hash is empty, the default means to filter all functions (for the filter_hash) or to disable no functions (for the notrace_hash). When a module is unloaded, it frees the function records that represent the module functions. These records exist on their own pages, that is function records for one module will not exist on the same page as function records for other modules or even the core kernel. Now when a module unloads, the records that represents its functions are freed. When the module is loaded again, the records are recreated with a default ref count of zero (unless there's a callback that traces all functions, then they will also be traced, and the ref count will be incremented). The problem is that if an ftrace callback hash includes functions of the module being unloaded, those hash entries will not be removed. If the module is reloaded in the same location, the hash entries still point to the functions of the module but the module's ref counts do not reflect that. With the help of Steve and Joern, we found a reproducer: Using uinput module and uinput_release function. cd /sys/kernel/debug/tracing modprobe uinput echo uinput_release > set_ftrace_filter echo function > current_tracer rmmod uinput modprobe uinput # check /proc/modules to see if loaded in same addr, otherwise try again echo nop > current_tracer [BOOM] The above loads the uinput module, which creates a table of functions that can be traced within the module. We add uinput_release to the filter_hash to trace just that function. Enable function tracincg, which increments the ref count of the record associated to uinput_release. Remove uinput, which frees the records including the one that represents uinput_release. Load the uinput module again (and make sure it's at the same address). This recreates the function records all with a ref count of zero, including uinput_release. Disable function tracing, which will decrement the ref count for uinput_release which is now zero because of the module removal and reload, and we have a mismatch (below zero ref count). The solution is to check all currently tracing ftrace callbacks to see if any are tracing any of the module's functions when a module is loaded (it already does that with callbacks that trace all functions). If a callback happens to have a module function being traced, it increments that records ref count and starts tracing that function. There may be a strange side effect with this, where tracing module functions on unload and then reloading a new module may have that new module's functions being traced. This may be something that confuses the user, but it's not a big deal. Another approach is to disable all callback hashes on module unload, but this leaves some ftrace callbacks that may not be registered, but can still have hashes tracing the module's function where ftrace doesn't know about it. That situation can cause the same bug. This solution solves that case too. Another benefit of this solution, is it is possible to trace a module's function on unload and load. Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com Reported-by: Jörn Engel <joern@logfs.org> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Steve Hodgson <steve@purestorage.com> Tested-by: Steve Hodgson <steve@purestorage.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-30watchdog: Make it work under full dynticksFrederic Weisbecker
A perf event can be used without forcing the tick to stay alive if it doesn't use a frequency but a sample period and if it doesn't throttle (raise storm of events). Since the lockup detector neither use a perf event frequency nor should ever throttle due to its high period, it can now run concurrently with the full dynticks feature. So remove the hack that disabled the watchdog. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-9-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Implement finer grained full dynticks kickFrederic Weisbecker
Currently the full dynticks subsystem keep the tick alive as long as there are perf events running. This prevents the tick from being stopped as long as features such that the lockup detectors are running. As a temporary fix, the lockup detector is disabled by default when full dynticks is built but this is not a long term viable solution. To fix this, only keep the tick alive when an event configured with a frequency rather than a period is running on the CPU, or when an event throttles on the CPU. These are the only purposes of the perf tick, especially now that the rotation of flexible events is handled from a seperate hrtimer. The tick can be shutdown the rest of the time. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Account freq events per cpuFrederic Weisbecker
This is going to be used by the full dynticks subsystem as a finer-grained information to know when to keep and when to stop the tick. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-7-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Migrate per cpu event accountingFrederic Weisbecker
When an event is migrated, move the event per-cpu accounting accordingly so that branch stack and cgroup events work correctly on the new CPU. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-6-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Split the per-cpu accounting part of the event accounting codeFrederic Weisbecker
This way we can use the per-cpu handling seperately. This is going to be used by to fix the event migration code accounting. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Factor out event accounting code to account_event()/__free_event()Frederic Weisbecker
Gather all the event accounting code to a single place, once all the prerequisites are completed. This simplifies the refcounting. Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Sanitize get_callchain_buffer()Frederic Weisbecker
In case of allocation failure, get_callchain_buffer() keeps the refcount incremented for the current event. As a result, when get_callchain_buffers() returns an error, we must cleanup what it did by cancelling its last refcount with a call to put_callchain_buffers(). This is a hack in order to be able to call free_event() after that failure. The original purpose of that was to simplify the failure path. But this error handling is actually counter intuitive, ugly and not very easy to follow because one expect to see the resources used to perform a service to be cleaned by the callee if case of failure, not by the caller. So lets clean this up by cancelling the refcount from get_callchain_buffer() in case of failure. And correctly free the event accordingly in perf_event_alloc(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30perf: Fix branch stack refcount leak on callchain init failureFrederic Weisbecker
On callchain buffers allocation failure, free_event() is called and all the accounting performed in perf_event_alloc() for that event is cancelled. But if the event has branch stack sampling, it is unaccounted as well from the branch stack sampling events refcounts. This is a bug because this accounting is performed after the callchain buffer allocation. As a result, the branch stack sampling events refcount can become negative. To fix this, move the branch stack event accounting before the callchain buffer allocation. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30generic-ipi: Kill unnecessary variable - csd_flagsXie XiuQi
After commit 8969a5ede0f9e17da4b943712429aef2c9bcd82b ("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed, and all callsites of generic_exec_single() do an unconditional csd_lock() now. So csd_flags is unnecessary now. Remove it. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Link: http://lkml.kernel.org/r/51F72DA1.7010401@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-30mutex: Fix w/w mutex deadlock injectionMaarten Lankhorst
The check needs to be for > 1, because ctx->acquired is already incremented. This will prevent ww_mutex_lock_slow from returning -EDEADLK and not locking the mutex. It caused a lot of false gpu lockups on radeon with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y because a function that shouldn't be able to return -EDEADLK did. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/51F775B5.201@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>