summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-11-18mpx: Extend siginfo structure to include bound violation informationQiaowei Ren
This patch adds new fields about bound violation into siginfo structure. si_lower and si_upper are respectively lower bound and upper bound when bound violation is caused. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-mm@kvack.org Cc: linux-mips@linux-mips.org Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20141114151819.1908C900@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-17audit: convert status version to a feature bitmapRichard Guy Briggs
The version field defined in the audit status structure was found to have limitations in terms of its expressibility of features supported. This is distict from the get/set features call to be able to command those features that are present. Converting this field from a version number to a feature bitmap will allow distributions to selectively backport and support certain features and will allow upstream to be able to deprecate features in the future. It will allow userspace clients to first query the kernel for which features are actually present and supported. Currently, EINVAL is returned rather than EOPNOTSUP, which isn't helpful in determining if there was an error in the command, or if it simply isn't supported yet. Past features are not represented by this bitmap, but their use may be converted to EOPNOTSUP if needed in the future. Since "version" is too generic to convert with a #define, use a union in the struct status, introducing the member "feature_bitmap" unionized with "version". Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP* counterparts, leaving the former for backwards compatibility. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: minor whitespace tweaks] Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-11-16perf: Improve the perf_sample_data struct layoutPeter Zijlstra
This patch reorders fields in the perf_sample_data struct in order to minimize the number of cachelines touched in perf_sample_data_init(). It also removes some intializations which are redundant with the code in kernel/events/core.c Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411559322-16548-7-git-send-email-eranian@google.com Cc: cebbert.lkml@gmail.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: jolsa@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16perf: Add ability to sample machine state on interruptStephane Eranian
Enable capture of interrupted machine state for each sample. Registers to sample are passed per event in the sample_regs_intr bitmask. To sample interrupt machine state, the PERF_SAMPLE_INTR_REGS must be passed in sample_type. The list of available registers is arch dependent and provided by asm/perf_regs.h Registers are laid out as u64 in the order of the bit order of sample_intr_regs. This patch also adds a new ABI version PERF_ATTR_SIZE_VER4 because we extend the perf_event_attr struct with a new u64 field. Reviewed-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: cebbert.lkml@gmail.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1411559322-16548-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICKWanpeng Li
Introduce start_hrtick_dl for !CONFIG_SCHED_HRTICK to align with the fair class. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415670747-58726-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/deadline: Remove unnecessary definitions in cpudeadline.hpang.xunlei
Actually, cpudl_set() and cpudl_init() can never be used without CONFIG_SMP. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415260327-30465-4-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/cpupri: Remove unnecessary definitions in cpupri.hpang.xunlei
Actually, cpupri_set() and cpupri_init() can never be used without CONFIG_SMP. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: "pang.xunlei" <pang.xunlei@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415260327-30465-1-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()Wanpeng Li
Do not call dequeue_pushable_dl_task() when failing to push an eligible task, as it remains pushable, merely not at this particular moment. Actually the patch is the same behavior as commit 311e800e16f6 ("sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task()" in -rt side. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415258564-8573-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/fair: Fix stale overloaded status in the busiest group finding logicWanpeng Li
Commit caeb178c60f4 ("sched/fair: Make update_sd_pick_busiest() return 'true' on a busier sd") changes groups to be ranked in the order of overloaded > imbalance > other, and busiest group is picked according to this order. sgs->group_capacity_factor is used to check if the group is overloaded. When the child domain prefers tasks to go to siblings first, the sgs->group_capacity_factor will be set lower than one in order to move all the excess tasks away. However, group overloaded status is not updated when sgs->group_capacity_factor is set to lower than one, which leads to us missing to find the busiest group. This patch fixes it by updating group overloaded status when sg capacity factor is set to one, in order to find the busiest group accurately. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415144690-25196-1-git-send-email-wanpeng.li@linux.intel.com [ Fixed the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched: Move p->nr_cpus_allowed check to select_task_rq()Wanpeng Li
Move the p->nr_cpus_allowed check into kernel/sched/core.c: select_task_rq(). This change will make fair.c, rt.c, and deadline.c all start with the same logic. Suggested-and-Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "pang.xunlei" <pang.xunlei@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415150077-59053-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/completion: Document when to use wait_for_completion_io_*()Wolfram Sang
As discussed in [1], accounting IO is meant for blkio only. Document that so driver authors won't use them for device io. [1] http://thread.gmane.org/gmane.linux.drivers.i2c/20470 Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415098901-2768-1-git-send-email-wsa@the-dreams.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/fair: Kill task_struct::numa_entry and numa_group::task_listKirill Tkhai
Nobody iterates over numa_group::task_list, this just confuses the readers. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415358456.28592.17.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar
applying more changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistencyStanislaw Gruszka
Commit d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc test case in cost of breaking another one. After that commit, calling clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&Y) can result of Y time being smaller than X time. Reproducer/tester can be found further below, it can be compiled and ran by: gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread while ./tst-cpuclock2 ; do : ; done This reproducer, when running on a buggy kernel, will complain about "clock_gettime difference too small". Issue happens because on start in thread_group_cputimer() we initialize sum_exec_runtime of cputimer with threads runtime not yet accounted and then add the threads runtime to running cputimer again on scheduler tick, making it's sum_exec_runtime bigger than actual threads runtime. KOSAKI Motohiro posted a fix for this problem, but that patch was never applied: https://lkml.org/lkml/2013/5/26/191 . This patch takes different approach to cure the problem. It calls update_curr() when cputimer starts, that assure we will have updated stats of running threads and on the next schedule tick we will account only the runtime that elapsed from cputimer start. That also assure we have consistent state between cpu times of individual threads and cpu time of the process consisted by those threads. Full reproducer (tst-cpuclock2.c): #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <time.h> #include <pthread.h> #include <stdint.h> #include <inttypes.h> /* Parameters for the Linux kernel ABI for CPU clocks. */ #define CPUCLOCK_SCHED 2 #define MAKE_PROCESS_CPUCLOCK(pid, clock) \ ((~(clockid_t) (pid) << 3) | (clockid_t) (clock)) static pthread_barrier_t barrier; /* Help advance the clock. */ static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) ; return NULL; } /* Don't use the glibc wrapper. */ static int do_nanosleep(int flags, const struct timespec *req) { clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED); return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL); } static int64_t tsdiff(const struct timespec *before, const struct timespec *after) { int64_t before_i = before->tv_sec * 1000000000ULL + before->tv_nsec; int64_t after_i = after->tv_sec * 1000000000ULL + after->tv_nsec; return after_i - before_i; } int main(void) { int result = 0; pthread_t th; pthread_barrier_init(&barrier, NULL, 2); if (pthread_create(&th, NULL, chew_cpu, NULL) != 0) { perror("pthread_create"); return 1; } pthread_barrier_wait(&barrier); /* The test. */ struct timespec before, after, sleeptimeabs; int64_t sleepdiff, diffabs; const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 }; /* The relative nanosleep. Not sure why this is needed, but its presence seems to make it easier to reproduce the problem. */ if (do_nanosleep(0, &sleeptime) != 0) { perror("clock_nanosleep"); return 1; } /* Get the current time. */ if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &before) < 0) { perror("clock_gettime[2]"); return 1; } /* Compute the absolute sleep time based on the current time. */ uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec; sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000; sleeptimeabs.tv_nsec = nsec % 1000000000; /* Sleep for the computed time. */ if (do_nanosleep(TIMER_ABSTIME, &sleeptimeabs) != 0) { perror("absolute clock_nanosleep"); return 1; } /* Get the time after the sleep. */ if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &after) < 0) { perror("clock_gettime[3]"); return 1; } /* The time after sleep should always be equal to or after the absolute sleep time passed to clock_nanosleep. */ sleepdiff = tsdiff(&sleeptimeabs, &after); if (sleepdiff < 0) { printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff); result = 1; printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec); printf("After %llu.%09llu\n", after.tv_sec, after.tv_nsec); printf("Sleep %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec); } /* The difference between the timestamps taken before and after the clock_nanosleep call should be equal to or more than the duration of the sleep. */ diffabs = tsdiff(&before, &after); if (diffabs < sleeptime.tv_nsec) { printf("clock_gettime difference too small: %" PRId64 "\n", diffabs); result = 1; } pthread_cancel(th); return result; } Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/cputime: Fix cpu_timer_sample_group() double accountingPeter Zijlstra
While looking over the cpu-timer code I found that we appear to add the delta for the calling task twice, through: cpu_timer_sample_group() thread_group_cputimer() thread_group_cputime() times->sum_exec_runtime += task_sched_runtime(); *sample = cputime.sum_exec_runtime + task_delta_exec(); Which would make the sample run ahead, making the sleep short. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16sched/numa: Avoid selecting oneself as swap targetPeter Zijlstra
Because the whole numa task selection stuff runs with preemption enabled (its long and expensive) we can end up migrating and selecting oneself as a swap target. This doesn't really work out well -- we end up trying to acquire the same lock twice for the swap migrate -- so avoid this. Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141110100328.GF29390@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16perf: Fix corruption of sibling list with hotplugMark Rutland
When a CPU hotplugged out, we call perf_remove_from_context() (via perf_event_exit_cpu()) to rip each CPU-bound event out of its PMU's cpu context, but leave siblings grouped together. Freeing of these events is left to the mercy of the usual refcounting. When a CPU-bound event's refcount drops to zero we cross-call to __perf_remove_from_context() to clean it up, detaching grouped siblings. This works when the relevant CPU is online, but will fail if the CPU is currently offline, and we won't detach the event from its siblings before freeing the event, leaving the sibling list corrupt. If the sibling list is later walked (e.g. because the CPU cam online again before a remaining sibling's refcount drops to zero), we will walk the now corrupted siblings list, potentially dereferencing garbage values. Given that the events should never be scheduled again (as we removed them from their context), we can simply detatch siblings when the CPU goes down in the first place. If the CPU comes back online, the redundant call to __perf_remove_from_context() is safe. Reported-by: Drew Richardson <drew.richardson@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: vincent.weaver@maine.edu Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1415203904-25308-2-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-15PM / Runtime: Kconfig: move ia64 dependency to arch/ia64/KconfigKevin Hilman
The IA64_HP_SIM dependency on PM_RUNTIME should be done in the arch Kconfig instead of in the PM core. Move it accordingly. NOTE: arch/ia64/Kconfig currently does a 'select PM', which since commit 1eb208aea317 (PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)) is effectively a noop unless PM_SLEEP or PM_RUNTIME are set elsewhere. Signed-off-by: Kevin Hilman <khilman@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-14Merge tag 'pm+acpi-3.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are three regression fixes, two recent (generic power domains, suspend-to-idle) and one older (cpufreq), an ACPI blacklist entry for one more machine having problems with Windows 8 compatibility, a minor cpufreq driver fix (cpufreq-dt) and a fixup for new callback definitions (generic power domains). Specifics: - Fix a crash in the suspend-to-idle code path introduced by a recent commit that forgot to check a pointer against NULL before dereferencing it (Dmitry Eremin-Solenikov). - Fix a boot crash on Exynos5 introduced by a recent commit making that platform use generic Device Tree bindings for power domains which exposed a weakness in the generic power domains framework leading to that crash (Ulf Hansson). - Fix a crash during system resume on systems where cpufreq depends on Operation Performance Points (OPP) for functionality, but CONFIG_OPP is not set. This leads the cpufreq driver registration to fail, but the resume code attempts to restore the pre-suspend cpufreq configuration (which does not exist) nevertheless and crashes. From Geert Uytterhoeven. - Add a new ACPI blacklist entry for Dell Vostro 3546 that has problems if it is reported as Windows 8 compatible to the BIOS (Adam Lee). - Fix swapped arguments in an error message in the cpufreq-dt driver (Abhilash Kesavan). - Fix up the prototypes of new callbacks in struct generic_pm_domain to make them more useful. Users of those callbacks will be added in 3.19 and it's better for them to be based on the correct struct definition in mainline from the start. From Ulf Hansson and Kevin Hilman" * tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: Fix initial default state of the need_restore flag PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set PM / Domains: Change prototype for the attach and detach callbacks cpufreq: Avoid crash in resume on SMP without OPP cpufreq: cpufreq-dt: Fix arguments in clock failure error message ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
2014-11-14function_graph: Fix micro seconds notationsByungchul Park
Usually, "msecs" notation means milli-seconds, and "usecs" notation means micro-seconds. Since the unit used in the code is micro-seconds, the notation should be replaced from msecs to usecs. Link: http://lkml.kernel.org/r/1415171926-9782-2-git-send-email-byungchul.park@lge.com Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-14ftrace-graph: show latency-format on print_graph_irq()Daniel Bristot de Oliveira
On the function_graph tracer, the print_graph_irq() function prints a trace line with the flag ==========> on an irq handler entry, and the flag <========== on an irq handler return. But when the latency-format is enable, it is not printing the latency-format flags, causing the following error in the trace output: 0) ==========> | 0) d... | smp_apic_timer_interrupt() { This patch fixes this issue by printing the latency-format flags when it is enable. Link: http://lkml.kernel.org/r/7c2e226dac20c940b6242178fab7f0e3c9b5ce58.1415233316.git.bristot@redhat.com Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-14trace: Replace single-character seq_puts with seq_putcRasmus Villemoes
Printing a single character to a seqfile might as well be done with seq_putc instead of seq_puts; this avoids a strlen() call and a memory access. It also shaves another few bytes off the generated code. Link: http://lkml.kernel.org/r/1415479332-25944-4-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/chelsio/cxgb4vf/sge.c drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c sge.c was overlapping two changes, one to use the new __dev_alloc_page() in net-next, and one to use s->fl_pg_order in net. ixgbe_phy.c was a set of overlapping whitespace changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13tracing: Merge consecutive seq_puts callsRasmus Villemoes
Consecutive seq_puts calls with literal strings can be merged to a single call. This reduces the size of the generated code, and can also lead to slight .rodata reduction (because of fewer nul and padding bytes). It should also shave a off a few clock cycles. Link: http://lkml.kernel.org/r/1415479332-25944-3-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13tracing: Replace seq_printf by simpler equivalentsRasmus Villemoes
Using seq_printf to print a simple string or a single character is a lot more expensive than it needs to be, since seq_puts and seq_putc exist. These patches do seq_printf(m, s) -> seq_puts(m, s) seq_printf(m, "%s", s) -> seq_puts(m, s) seq_printf(m, "%c", c) -> seq_putc(m, c) Subsequent patches will simplify further. Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13tracing: kdb: Fix kernel livelock with empty buffersDaniel Thompson
Currently kdb's ftdump command will livelock by constantly printk'ing the empty string at KERN_EMERG level if it run when the ftrace system is not in use. This occurs because trace_empty() never returns false when the ring buffers are left at the start of a non-consuming read [launched by ring_buffer_read_start()]. This patch changes the loop exit condition to use the result of trace_find_next_entry_inc(). Effectively this switches the non-consuming kdb dumper to follow the approach of the non-consuming userspace interface [s_next()] rather than the consuming ftrace_dump(). Link: http://lkml.kernel.org/r/1415277716-19419-3-git-send-email-daniel.thompson@linaro.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13tracing: kdb: Fix kernel panic during ftdumpDaniel Thompson
Currently kdb's ftdump command unconditionally crashes due to a null pointer de-reference whenever the command is run. This in turn causes the kernel to panic. The abridged stacktrace (gathered with ARCH=arm) is: --- cut here --- [<c09535ac>] (panic) from [<c02132dc>] (die+0x264/0x440) [<c02132dc>] (die) from [<c0952eb8>] (__do_kernel_fault.part.11+0x74/0x84) [<c0952eb8>] (__do_kernel_fault.part.11) from [<c021f954>] (do_page_fault+0x1d0/0x3c4) [<c021f954>] (do_page_fault) from [<c020846c>] (do_DataAbort+0x48/0xac) [<c020846c>] (do_DataAbort) from [<c0213c58>] (__dabt_svc+0x38/0x60) Exception stack(0xc0deba88 to 0xc0debad0) ba80: e8c29180 00000001 e9854304 e9854300 c0f567d8 c0df2580 baa0: 00000000 00000000 00000000 c0f117b8 c0e3a3c0 c0debb0c 00000000 c0debad0 bac0: 0000672e c02f4d60 60000193 ffffffff [<c0213c58>] (__dabt_svc) from [<c02f4d60>] (kdb_ftdump+0x1e4/0x3d8) [<c02f4d60>] (kdb_ftdump) from [<c02ce328>] (kdb_parse+0x2b8/0x698) [<c02ce328>] (kdb_parse) from [<c02ceef0>] (kdb_main_loop+0x52c/0x784) [<c02ceef0>] (kdb_main_loop) from [<c02d1b0c>] (kdb_stub+0x238/0x490) --- cut here --- The NULL deref occurs due to the initialized use of struct trace_iter's buffer_iter member. This is a regression, albeit a fairly elderly one. It was introduced by commit 6d158a813efc ("tracing: Remove NR_CPUS array from trace_iterator"). This patch solves this by providing a collection of ring_buffer_iter(s) and using this to initialize buffer_iter. Note that static allocation is used solely because the trace_iter itself is also static allocated. Static allocation also means that we have to NULL-ify the pointer during cleanup to avoid use-after-free problems. Link: http://lkml.kernel.org/r/1415277716-19419-2-git-send-email-daniel.thompson@linaro.org Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13tracing: Fix traceoff_on_warning handling on boot command lineLuis Claudio R. Goncalves
According to the documentation, adding "traceoff_on_warning" to the boot command line should be enough to enable the feature. But right now it is necessary to specify "traceoff_on_warning=". Along with fixing that, also verify if the value passed, if any, is either "0" or "off". Link: http://lkml.kernel.org/r/20141112231400.GL12281@uudg.org Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13ftrace: Have the control_ops get a trampolineSteven Rostedt (Red Hat)
With the new logic, if only a single user of ftrace function hooks is used, it will get its own trampoline assigned to it. The problem is that the control_ops is an indirect ops that perf ops uses. What that means is that when perf registers its ops with register_ftrace_function(), it has the CONTROL flag set and gets added to the control list instead of the global ftrace list. The control_ops gets added to that instead and the mcount trampoline calls the control_ops function. The control_ops function will iterate the control list and call the ops functions that are attached to it. But currently the trampoline is added to the perf ops and not the control ops, and when ftrace tries to find a trampoline hook for it, it fails to find one and gives the following splat: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 10133 at kernel/trace/ftrace.c:2033 ftrace_get_addr_new+0x6f/0xc0() Modules linked in: [...] CPU: 0 PID: 10133 Comm: perf Tainted: P 3.18.0-rc1-test+ #388 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 00000000000007f1 ffff8800c2643bc8 ffffffff814fca6e ffff88011ea0ed01 0000000000000000 ffff8800c2643c08 ffffffff81041ffd 0000000000000000 ffffffff810c388c ffffffff81a5a350 ffff880119b00000 ffffffff810001c8 Call Trace: [<ffffffff814fca6e>] dump_stack+0x46/0x58 [<ffffffff81041ffd>] warn_slowpath_common+0x81/0x9b [<ffffffff810c388c>] ? ftrace_get_addr_new+0x6f/0xc0 [<ffffffff810001c8>] ? 0xffffffff810001c8 [<ffffffff81042031>] warn_slowpath_null+0x1a/0x1c [<ffffffff810c388c>] ftrace_get_addr_new+0x6f/0xc0 [<ffffffff8102e938>] ftrace_replace_code+0xd6/0x334 [<ffffffff810c4116>] ftrace_modify_all_code+0x41/0xc5 [<ffffffff8102eba6>] arch_ftrace_update_code+0x10/0x19 [<ffffffff810c293c>] ftrace_run_update_code+0x21/0x42 [<ffffffff810c298f>] ftrace_startup_enable+0x32/0x34 [<ffffffff810c3049>] ftrace_startup+0x14e/0x15a [<ffffffff810c307c>] register_ftrace_function+0x27/0x40 [<ffffffff810dc118>] perf_ftrace_event_register+0x3e/0xee [<ffffffff810dbfbe>] perf_trace_init+0x29d/0x2a9 [<ffffffff810eb422>] perf_tp_event_init+0x27/0x3a [<ffffffff810f18bc>] perf_init_event+0x9e/0xed [<ffffffff810f1ba4>] perf_event_alloc+0x299/0x330 [<ffffffff810f236b>] SYSC_perf_event_open+0x3ee/0x816 [<ffffffff8115a066>] ? mntput+0x2d/0x2f [<ffffffff81142b00>] ? __fput+0xa7/0x1b2 [<ffffffff81091300>] ? do_gettimeofday+0x22/0x3a [<ffffffff810f279c>] SyS_perf_event_open+0x9/0xb [<ffffffff81502a92>] system_call_fastpath+0x12/0x17 ---[ end trace 81a53565150e4982 ]--- Bad trampoline accounting at: ffffffff810001c8 (run_init_process+0x0/0x2d) (10000001) Update the control_ops trampoline instead of the perf ops one. Reported-by: lkp@01.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13kernel/panic.c: update comments for print_taintedXie XiuQi
Commit 69361eef9056 ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag, but failed to update the comments for print_tainted(). So, update the comments. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', ↵Paul E. McKenney
'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD cpu.2014.11.03a: Changes for per-CPU variables. doc.2014.11.13a: Documentation updates. fixes.2014.11.13a: Miscellaneous fixes. signal.2014.10.29a: Signal changes. rt.2014.10.29a: Real-time changes. torture.2014.11.03a: torture-test changes.
2014-11-13rcu: Fix FIXME in rcu_tasks_kthread()Paul E. McKenney
This commit affines rcu_tasks_kthread() to the housekeeping CPUs in CONFIG_NO_HZ_FULL builds. This is just a default, so systems administrators are free to put this kthread somewhere else if they wish. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit fixes from Paul Moore: "After he sent the initial audit pull request for 3.18, Eric asked me to take over the management of the audit tree, hence this pull request to fix a couple of problems with audit. As you can see below, the changes are minimal: adding some whitespace to a string so userspace parses it correctly, and fixing a problem with audit's usage of fsnotify that was causing audit watch rules to be lost. Neither of these patches were very controversial on the mailing lists and they fix real problems, getting them into 3.18 would be a good thing" * 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit: audit: keep inode pinned audit: AUDIT_FEATURE_CHANGE message format missing delimiting space
2014-11-11audit: keep inode pinnedMiklos Szeredi
Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-11-11tracing: Add entry->next_cpu to trace_ctxwake_bin()Jiang Liu
Function trace_ctxwake_bin() misses ctx_switch_entry->next_cpu field, so user will get stale value for "next_cpu". Link: http://lkml.kernel.org/p/1377176379-27908-1-git-send-email-liuj97@gmail.com Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11tracing: Move tracing_sched_{switch,wakeup}() into wakeup tracerSteven Rostedt (Red Hat)
The only code that references tracing_sched_switch_trace() and tracing_sched_wakeup_trace() is the wakeup latency tracer. Those two functions use to belong to the sched_switch tracer which has long been removed. These functions were left behind because the wakeup latency tracer used them. But since the wakeup latency tracer is the only one to use them, they should be static functions inside that code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11tracing: Kill the dead code in probe_sched_switch() and probe_sched_wakeup()Oleg Nesterov
After the previous patch it is clear that "tracer_enabled" can never be true, we can remove the "if (tracer_enabled)" code in probe_sched_switch() and probe_sched_wakeup(). Plus we can obviously remove tracer_enabled, ctx_trace, and sched_stopped as well. Link: http://lkml.kernel.org/p/20140723193503.GA30217@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11tracing: Kill tracing_{start,stop}_sched_switch_record() and ↵Oleg Nesterov
tracing_sched_switch_assign_trace() tracing_{start,stop}_sched_switch_record() have no callers since 87d80de2800d "tracing: Remove obsolete sched_switch tracer". The last caller of tracing_sched_switch_assign_trace() was removed by 30dbb20e68e6 "tracing: Remove boot tracer". Link: http://lkml.kernel.org/p/20140723193501.GA30214@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11ftrace: Add more information to ftrace_bug() outputSteven Rostedt (Red Hat)
With the introduction of the dynamic trampolines, it is useful that if things go wrong that ftrace_bug() produces more information about what the current state is. This can help debug issues that may arise. Ftrace has lots of checks to make sure that the state of the system it touchs is exactly what it expects it to be. When it detects an abnormality it calls ftrace_bug() and disables itself to prevent any further damage. It is crucial that ftrace_bug() produces sufficient information that can be used to debug the situation. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolinesSteven Rostedt (Red Hat)
When the static ftrace_ops (like function tracer) enables tracing, and it is the only callback that is referencing a function, a trampoline is dynamically allocated to the function that calls the callback directly instead of calling a loop function that iterates over all the registered ftrace ops (if more than one ops is registered). But when it comes to dynamically allocated ftrace_ops, where they may be freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe to free the trampoline. If a task was preempted while executing on the trampoline, there's currently no way to know when it will be off that trampoline. But this is not true when it comes to !CONFIG_PREEMPT. The current method of calling schedule_on_each_cpu() will force tasks off the trampoline, becaues they can not schedule while on it (kernel preemption is not configured). That means it is safe to free a dynamically allocated ftrace ops trampoline when CONFIG_PREEMPT is not configured. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11kernel/debug/debug_core.c: Logging clean-upFabian Frederick
-Convert printk( to pr_foo() -Add pr_fmt -Coalesce formats Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kgdb: timeout if secondary CPUs ignore the roundupDaniel Thompson
Currently if an active CPU fails to respond to a roundup request the CPU that requested the roundup will become stuck. This needlessly reduces the robustness of the debugger. This patch introduces a timeout allowing the system state to be examined even when the system contains unresponsive processors. It also modifies kdb's cpu command to make it censor attempts to switch to unresponsive processors and to report their state as (D)ead. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Allow access to sensitive commands to be restricted by defaultDaniel Thompson
Currently kiosk mode must be explicitly requested by the bootloader or userspace. It is convenient to be able to change the default value in a similar manner to CONFIG_MAGIC_SYSRQ_DEFAULT_MASK. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Add enable mask for groups of commandsAnton Vorontsov
Currently all kdb commands are enabled whenever kdb is deployed. This makes it difficult to deploy kdb to help debug certain types of systems. Android phones provide one example; the FIQ debugger found on some Android devices has a deliberately weak set of commands to allow the debugger to enabled very late in the production cycle. Certain kiosk environments offer another interesting case where an engineer might wish to probe the system state using passive inspection commands without providing sufficient power for a passer by to root it. Without any restrictions, obtaining the root rights via KDB is a matter of a few commands, and works everywhere. For example, log in as a normal user: cbou:~$ id uid=1001(cbou) gid=1001(cbou) groups=1001(cbou) Now enter KDB (for example via sysrq): Entering kdb (current=0xffff8800065bc740, pid 920) due to Keyboard Entry kdb> ps 23 sleeping system daemon (state M) processes suppressed, use 'ps A' to see all. Task Addr Pid Parent [*] cpu State Thread Command 0xffff8800065bc740 920 919 1 0 R 0xffff8800065bca20 *bash 0xffff880007078000 1 0 0 0 S 0xffff8800070782e0 init [...snip...] 0xffff8800065be3c0 918 1 0 0 S 0xffff8800065be6a0 getty 0xffff8800065b9c80 919 1 0 0 S 0xffff8800065b9f60 login 0xffff8800065bc740 920 919 1 0 R 0xffff8800065bca20 *bash All we need is the offset of cred pointers. We can look up the offset in the distro's kernel source, but it is unnecessary. We can just start dumping init's task_struct, until we see the process name: kdb> md 0xffff880007078000 0xffff880007078000 0000000000000001 ffff88000703c000 ................ 0xffff880007078010 0040210000000002 0000000000000000 .....!@......... [...snip...] 0xffff8800070782b0 ffff8800073e0580 ffff8800073e0580 ..>.......>..... 0xffff8800070782c0 0000000074696e69 0000000000000000 init............ ^ Here, 'init'. Creds are just above it, so the offset is 0x02b0. Now we set up init's creds for our non-privileged shell: kdb> mm 0xffff8800065bc740+0x02b0 0xffff8800073e0580 0xffff8800065bc9f0 = 0xffff8800073e0580 kdb> mm 0xffff8800065bc740+0x02b8 0xffff8800073e0580 0xffff8800065bc9f8 = 0xffff8800073e0580 And thus gaining the root: kdb> go cbou:~$ id uid=0(root) gid=0(root) groups=0(root) cbou:~$ bash root:~# p.s. No distro enables kdb by default (although, with a nice KDB-over-KMS feature availability, I would expect at least some would enable it), so it's not actually some kind of a major issue. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Categorize kdb commands (similar to SysRq categorization)Daniel Thompson
This patch introduces several new flags to collect kdb commands into groups (later allowing them to be optionally disabled). This follows similar prior art to enable/disable magic sysrq commands. The commands have been categorized as follows: Always on: go (w/o args), env, set, help, ?, cpu (w/o args), sr, dmesg, disable_nmi, defcmd, summary, grephelp Mem read: md, mdr, mdp, mds, ef, bt (with args), per_cpu Mem write: mm Reg read: rd Reg write: go (with args), rm Inspect: bt (w/o args), btp, bta, btc, btt, ps, pid, lsmod Flow ctrl: bp, bl, bph, bc, be, bd, ss Signal: kill Reboot: reboot All: cpu, kgdb, (and all of the above), nmi_console Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Remove KDB_REPEAT_NONE flagAnton Vorontsov
Since we now treat KDB_REPEAT_* as flags, there is no need to pass KDB_REPEAT_NONE. It's just the default behaviour when no flags are specified. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Use KDB_REPEAT_* values as flagsAnton Vorontsov
The actual values of KDB_REPEAT_* enum values and overall logic stayed the same, but we now treat the values as flags. This makes it possible to add other flags and combine them, plus makes the code a lot simpler and shorter. But functionality-wise, there should be no changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Rename kdb_register_repeat() to kdb_register_flags()Anton Vorontsov
We're about to add more options for commands behaviour, so let's give a more generic name to the low-level kdb command registration function. There are just various renames, no functional changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flagsAnton Vorontsov
We're about to add more options for command behaviour, so let's expand the meaning of kdb_repeat_t. So far we just do various renames, there should be no functional changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Remove currently unused kdbtab_t->cmd_flagsAnton Vorontsov
The struct member is never used in the code, so we can remove it. We will introduce real flags soon by renaming cmd_repeat to cmd_flags. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>