summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2015-06-11audit: Fix check of return value of strnlen_user()Jan Kara
strnlen_user() returns 0 when it hits fault, not -1. Fix the test in audit_log_single_execve_arg(). Luckily this shouldn't ever happen unless there's a kernel bug so it's mostly a cosmetic fix. CC: Paul Moore <pmoore@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-06-11ring-buffer-benchmark: Fix the wrong sched_priority of producerWang Long
The producer should be used producer_fifo as its sched_priority, so correct it. Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.com Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Wang Long <long.wanglong@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-10sched, numa: do not hint for NUMA balancing on VM_MIXEDMAP mappingsMel Gorman
Jovi Zhangwei reported the following problem Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages with GFP_COMP flag. [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping: (null) index:0x0 [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head) [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page)) [Mon May 25 05:29:33 2015] ------------[ cut here ]------------ [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661! [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP In this case it was triggered by running tcpdump but it's not necessary reproducible on all systems. sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap Compound pages cannot be migrated and it was not expected that such pages be marked for NUMA balancing. This did not take into account that drivers such as net/packet/af_packet.c may insert compound pages into userspace with vm_insert_page. This patch tells the NUMA balancing protection scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that compound pages are marked for migration. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Jovi Zhangwei <jovi@cloudflare.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-10ring-buffer-benchmark: Fix the wrong typeWang Long
The macro 'module_param' shows that the type of the variable disable_reader and write_iteration is unsigned integer. so, we change their type form int to unsigned int. Link: http://lkml.kernel.org/r/1433923927-67782-1-git-send-email-long.wanglong@huawei.com Signed-off-by: Wang Long <long.wanglong@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-10ring-buffer-benchmark: Fix the wrong param in module_paramWang Long
The {producer|consumer}_{nice|fifo} parameters are integer type, we should use 'int' as the second param in module_param. For example(consumer_fifo): the default value of consumer_fifo is -1. Without this patch: # cat /sys/module/ring_buffer_benchmark/parameters/consumer_fifo 4294967295 With this patch: # cat /sys/module/ring_buffer_benchmark/parameters/consumer_fifo -1 Link: http://lkml.kernel.org/r/1433923873-67712-1-git-send-email-long.wanglong@huawei.com Signed-off-by: Wang Long <long.wanglong@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-10clocksource: Use current logging styleJoe Perches
clocksource messages aren't prefixed in dmesg so it's a bit unclear what subsystem emits the messages. Use pr_fmt and pr_<level> to auto-prefix the messages appropriately. Miscellanea: o Remove "Warning" from KERN_WARNING level messages o Align "timekeeping watchdog: " messages o Coalesce formats o Align multiline arguments Signed-off-by: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10time: Refactor usecs_to_jiffiesNicholas Mc Guire
Refactor the usecs_to_jiffies conditional code part in time.c and jiffies.h putting it into conditional functions rather than #ifdefs to improve readability. This is analogous to the msecs_to_jiffies() cleanup in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Cc: Masahiro Yamada <yamada.m@jp.panasonic.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Andrew Hunter <ahh@google.com> Cc: Paul Turner <pjt@google.com> Cc: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10cgroup: fix uninitialised iterator in for_each_subsys_whichAleksa Sarai
Fix the fact that @ssid is uninitialised in the case where CGROUP_SUBSYS_COUNT = 0 by setting ssid to 0. Fixes: cb4a31675270 ("cgroup: use bitmask to filter for_each_subsys") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2015-06-09x86/mpx: Clean up the code by not passing a task pointer around when unnecessaryDave Hansen
The MPX code can only work on the current task. You can not, for instance, enable MPX management in another process or thread. You can also not handle a fault for another process or thread. Despite this, we pass a task_struct around prolifically. This patch removes all of the task struct passing for code paths where the code can not deal with another task (which turns out to be all of them). This has no functional changes. It's just a cleanup. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-06-08Merge 4.1-rc7 into driver-core-nextGreg Kroah-Hartman
We want the fixes in this branch as well for testing and merge resolution. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08cgroup: replace explicit ss_mask checking with for_each_subsys_whichAleksa Sarai
Replace the explicit checking against ss_masks inside a for_each_subsys block with for_each_subsys_which(..., ss_mask), to take advantage of the more readable (and more efficient) macro. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2015-06-08cgroup: use bitmask to filter for_each_subsysAleksa Sarai
Add a new macro for_each_subsys_which that allows all enabled cgroup subsystems to be filtered by a bitmask, such that mask & (1 << ssid) determines if the subsystem is to be processed in the loop body (where ssid is the unique id of the subsystem). Also replace the need_forkexit_callback with two separate bitmasks for each callback to make (ss->{fork,exit}) checks unnecessary. tj: add a short comment for "if (!CGROUP_SUBSYS_COUNT)". Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2015-06-07perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLESKan Liang
After enlarging the PEBS interrupt threshold, there may be some mixed up PEBS samples which are discarded by the kernel. This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with the number of possible discarded records when it is impossible to demux the samples. It makes sure the user is not left in the dark about such discards. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285195-14269-8-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07perf/x86/intel: Handle multiple records in the PEBS bufferYan, Zheng
When the PEBS interrupt threshold is larger than one record and the machine supports multiple PEBS events, the records of these events are mixed up and we need to demultiplex them. Demuxing the records is hard because the hardware is deficient. The hardware has two issues that, when combined, create impossible scenarios to demux. The first issue is that the 'status' field of the PEBS record is a copy of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a problem let us first describe the regular PEBS cycle: A) the CTRn value reaches 0: - the corresponding bit in GLOBAL_STATUS gets set - we start arming the hardware assist < some unspecified amount of time later -- this could cover multiple events of interest > B) the hardware assist is armed, any next event will trigger it C) a matching event happens: - the hardware assist triggers and generates a PEBS record this includes a copy of GLOBAL_STATUS at this moment - if we auto-reload we (re)set CTRn - we clear the relevant bit in GLOBAL_STATUS Now consider the following chain of events: A0, B0, A1, C0 The event generated for counter 0 will include a status with counter 1 set, even though its not at all related to the record. A similar thing can happen with a !PEBS event if it just happens to overflow at the right moment. The second issue is that the hardware will only emit one record for two or more counters if the event that triggers the assist is 'close'. The 'close' can be several cycles. In some cases even the complete assist, if the event is something that doesn't need retirement. For instance, consider this chain of events: A0, B0, A1, B1, C01 Where C01 is an event that triggers both hardware assists, we will generate but a single record, but again with both counters listed in the status field. This time the record pertains to both events. Note that these two cases are different but undistinguishable with the data as generated. Therefore demuxing records with multiple PEBS bits (we can safely ignore status bits for !PEBS counters) is impossible. Furthermore we cannot emit the record to both events because that might cause a data leak -- the events might not have the same privileges -- so what this patch does is discard such events. The assumption/hope is that such discards will be rare. Here lists some possible ways you may get high discard rate. - when you count the same thing multiple times. But it is not a useful configuration. - you can be unfortunate if you measure with a userspace only PEBS event along with either a kernel or unrestricted PEBS event. Imagine the event triggering and setting the overflow flag right before entering the kernel. Then all kernel side events will end up with multiple bits set. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> [ Changelog improvements. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07sched/numa: Only consider less busy nodes as numa balancing destinationsRik van Riel
Changeset a43455a1d572 ("sched/numa: Ensure task_numa_migrate() checks the preferred node") fixes an issue where workloads would never converge on a fully loaded (or overloaded) system. However, it introduces a regression on less than fully loaded systems, where workloads converge on a few NUMA nodes, instead of properly staying spread out across the whole system. This leads to a reduction in available memory bandwidth, and usable CPU cache, with predictable performance problems. The root cause appears to be an interaction between the load balancer and NUMA balancing, where the short term load represented by the load balancer differs from the long term load the NUMA balancing code would like to base its decisions on. Simply reverting a43455a1d572 would re-introduce the non-convergence of workloads on fully loaded systems, so that is not a good option. As an aside, the check done before a43455a1d572 only applied to a task's preferred node, not to other candidate nodes in the system, so the converge-on-too-few-nodes problem still happens, just to a lesser degree. Instead, try to compensate for the impedance mismatch between the load balancer and NUMA balancing by only ever considering a lesser loaded node as a destination for NUMA balancing, regardless of whether the task is trying to move to the preferred node, or to another node. This patch also addresses the issue that a system with a single runnable thread would never migrate that thread to near its memory, introduced by 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced"). A test where the main thread creates a large memory area, and spawns a worker thread to iterate over the memory (placed on another node by select_task_rq_fair), after which the main thread goes to sleep and waits for the worker thread to loop over all the memory now sees the worker thread migrated to where the memory is, instead of having all the memory migrated over like before. Jirka has run a number of performance tests on several systems: single instance SpecJBB 2005 performance is 7-15% higher on a 4 node system, with higher gains on systems with more cores per socket. Multi-instance SpecJBB 2005 (one per node), linpack, and stream see little or no changes with the revert of 095bebf61a46 and this patch. Reported-by: Artem Bityutski <dedekind1@gmail.com> Reported-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150528095249.3083ade0@annuminas.surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07Revert 095bebf61a46 ("sched/numa: Do not move past the balance point if ↵Rik van Riel
unbalanced") Commit 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced") broke convergence of workloads with just one runnable thread, by making it impossible for the one runnable thread on the system to move from one NUMA node to another. Instead, the thread would remain where it was, and pull all the memory across to its location, which is much slower than just migrating the thread to where the memory is. The next patch has a better fix for the issue that 095bebf61a46 tried to address. Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: mgorman@suse.de Link: http://lkml.kernel.org/r/1432753468-7785-2-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07sched/fair: Prevent throttling in early pick_next_task_fair()Ben Segall
The optimized task selection logic optimistically selects a new task to run without first doing a full put_prev_task(). This is so that we can avoid a put/set on the common ancestors of the old and new task. Similarly, we should only call check_cfs_rq_runtime() to throttle eligible groups if they're part of the common ancestry, otherwise it is possible to end up with no eligible task in the simple task selection. Imagine: /root /prev /next /A /B If our optimistic selection ends up throttling /next, we goto simple and our put_prev_task() ends up throttling /prev, after which we're going to bug out in set_next_entity() because there aren't any tasks left. Avoid this scenario by only throttling common ancestors. Reported-by: Mohammed Naser <mnaser@vexxhost.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Ben Segall <bsegall@google.com> [ munged Changelog ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: pjt@google.com Fixes: 678d5718d8d0 ("sched/fair: Optimize cgroup pick_next_task_fair()") Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07preempt: Use preempt_schedule_context() as the official tracing preemption pointFrederic Weisbecker
preempt_schedule_context() is a tracing safe preemption point but it's only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing recursion issues since commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") introduced function based preemp_count_*() ops. Lets make it available on all configs and give it a more appropriate name for its new position. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07sched: Make preempt_schedule_context() function-tracing safeFrederic Weisbecker
Since function tracing disables preemption, it needs a safe preemption point to use when preemption is re-enabled without worrying about tracing recursion. Ie: to avoid tracing recursion, that preemption point can't be traced (use of notrace qualifier) and it can't call any traceable function before that preemption point disables preemption itself, which disarms the recursion. preempt_schedule() was fine until commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") because PREEMPT_ACTIVE (which has the property to disable preemption and this disarm tracing preemption recursion) was set before calling any further function. But that commit introduced the use of preempt_count_add/sub() functions to set PREEMPT_ACTIVE and because these functions are called before preemption gets a chance to be disabled, we have a tracing recursion. preempt_schedule_context() is one of the possible preemption functions used by tracing. Its special purpose is to avoid tracing recursion against context tracking. Lets enhance this function to become more generally tracing safe by disabling preemption with raw accessors, such that no function is called before preemption gets disabled and disarm the tracing recursion. This function is going to become the specific tracing-safe preemption point in further commit. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07lockdep: Fix a race between /proc/lock_stat and module unloadPeter Zijlstra
The lock_class iteration of /proc/lock_stat is not serialized against the lockdep_free_key_range() call from module unload. Therefore it can happen that we find a class of which ->name/->key are no longer valid. There is a further bug in zap_class() that left ->name dangling. Cure this. Use RCU_INIT_POINTER() because NULL. Since lockdep_free_key_range() is rcu_sched serialized, we can read both ->name and ->key under rcu_read_lock_sched() (preempt-disable) and be assured that if we observe a !NULL value it stays safe to use for as long as we hold that lock. If we observe both NULL, skip the entry. Reported-by: Jerome Marchand <jmarchan@redhat.com> Tested-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150602105013.GS3644@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07bpf: allow programs to write to certain skb fieldsAlexei Starovoitov
allow programs read/write skb->mark, tc_index fields and ((struct qdisc_skb_cb *)cb)->data. mark and tc_index are generically useful in TC. cb[0]-cb[4] are primarily used to pass arguments from one program to another called via bpf_tail_call() which can be seen in sockex3_kern.c example. All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs. mark, tc_index are writeable from tc_cls_act only. cb[0]-cb[4] are writeable by both sockets and tc_cls_act. Add verifier tests and improve sample code. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-05Merge branch 'linus' into irq/coreThomas Gleixner
Get the urgent fixes from upstream to avoid conflicts.
2015-06-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The biggest chunk of the changes are two regression fixes: a HT workaround fix and an event-group scheduling fix. It's been verified with 5 days of fuzzer testing. Other fixes: - eBPF fix - a BIOS breakage detection fix - PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix a refactoring bug perf/x86: Tweak broken BIOS rules during check_hw_exists() perf/x86/intel/pt: Untangle pt_buffer_reset_markers() perf: Disallow sparse AUX allocations for non-SG PMUs in overwrite mode perf/x86: Improve HT workaround GP counter constraint perf/x86: Fix event/group validation perf: Fix race in BPF program unregister
2015-06-04compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap()Helge Deller
In the functions compat_get_bitmap() and compat_put_bitmap() the variable nr_compat_longs stores how many compat_ulong_t words should be copied in a loop. The copy loop itself is this: if (nr_compat_longs-- > 0) { if (__get_user(um, umask)) return -EFAULT; } else { um = 0; } Since nr_compat_longs gets unconditionally decremented in each loop and since it's type is unsigned this could theoretically lead to out of bounds accesses to userspace if nr_compat_longs wraps around to (unsigned)(-1). Although the callers currently do not trigger out-of-bounds accesses, we should better implement the loop in a safe way to completely avoid such warp-arounds. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2015-06-04signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()Oleg Nesterov
selinux_bprm_committed_creds()->__flush_signals() is not right, we shouldn't clear TIF_SIGPENDING unconditionally. There can be other reasons for signal_pending(): freezing(), JOBCTL_PENDING_MASK, and potentially more. Also change this code to check fatal_signal_pending() rather than SIGNAL_GROUP_EXIT, it looks a bit better. Now we can kill __flush_signals() before it finds another buggy user. Note: this code looks racy, we can flush a signal which was sent after the task SID has been updated. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-06-03Merge branch 'locking/core' into x86/core, to prepare for dependent patchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03lockdep: Do not break user-visible stringBorislav Petkov
Remove the line-break in the user-visible string and add the missing space in this error message: WARNING: lockdep init error! lock-(console_sem).lock was acquiredbefore lockdep_init Also: - don't yell, it's just a debug warning - denote references to function calls with '()' - standardize the lock name quoting - and finish the sentence. The result: WARNING: lockdep init error: lock '(console_sem).lock' was acquired before lockdep_init(). Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150602133827.GD19887@pd.tnic [ Added a few more stylistic tweaks to the error message. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03Merge branches 'x86/mm', 'x86/build', 'x86/apic' and 'x86/platform' into ↵Ingo Molnar
x86/core, to apply dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02livepatch: add module locking around kallsyms callsMiroslav Benes
The list of loaded modules is walked through in module_kallsyms_on_each_symbol (called by kallsyms_on_each_symbol). The module_mutex lock should be acquired to prevent potential corruptions in the list. This was uncovered with new lockdep asserts in module code introduced by the commit 0be964be0d45 ("module: Sanitize RCU usage and locking") in recent next- trees. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-06-02clockevents: Rename state to state_use_accessorsThomas Gleixner
The only sensible way to make abuse of core internal fields obvious and easy to grep for. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>
2015-06-02clockevents: Use set/get state helper functionsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>
2015-06-02clockevents: Provide functions to set and get the stateThomas Gleixner
We want to rename dev->state, so provide proper get and set functions. Rename clockevents_set_state() to clockevents_switch_state() to avoid confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>
2015-06-02clockevents: Use helpers to check the state of a clockevent deviceViresh Kumar
Use accessor functions to check the state of clockevent devices in core code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/fa2b9869fd17f210eaa156ec2b594efd0230b6c7.1432192527.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-02Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Initialization/Kconfig updates: hide most Kconfig options from unsuspecting users. There's now a single high level configuration option: * * RCU Subsystem * Make expert-level adjustments to RCU configuration (RCU_EXPERT) [N/y/?] (NEW) Which if answered in the negative, leaves us with a single interactive configuration option: Offload RCU callback processing from boot-selected CPUs (RCU_NOCB_CPU) [N/y/?] (NEW) All the rest of the RCU options are configured automatically. - Remove all uses of RCU-protected array indexes: replace the rcu_[access|dereference]_index_check() APIs with READ_ONCE() and rcu_lockdep_assert(). - RCU CPU-hotplug cleanups. - Updates to Tiny RCU: a race fix and further code shrinkage. - RCU torture-testing updates: fixes, speedups, cleanups and documentation updates. - Miscellaneous fixes. - Documentation updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02Merge branch 'linus' into sched/core, to resolve conflictIngo Molnar
Conflicts: arch/sparc/include/asm/topology_64.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31ebpf: misc core cleanupDaniel Borkmann
Besides others, move bpf_tail_call_proto to the remaining definitions of other protos, improve comments a bit (i.e. remove some obvious ones, where the code is already self-documenting, add objectives for others), simplify bpf_prog_array_compatible() a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31ebpf: allow bpf_ktime_get_ns_proto also for networkingDaniel Borkmann
As this is already exported from tracing side via commit d9847d310ab4 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as well want to move it to the core, so also networking users can make use of it, e.g. to measure diffs for certain flows from ingress/egress. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31bpf: add missing rcu protection when releasing programs from prog_arrayAlexei Starovoitov
Normally the program attachment place (like sockets, qdiscs) takes care of rcu protection and calls bpf_prog_put() after a grace period. The programs stored inside prog_array may not be attached anywhere, so prog_array needs to take care of preserving rcu protection. Otherwise bpf_tail_call() will race with bpf_prog_put(). To solve that introduce bpf_prog_put_rcu() helper function and use it in 3 places where unattached program can decrement refcnt: closing program fd, deleting/replacing program in prog_array. Fixes: 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-29audit: obsolete audit_context check is removed in audit_filter_rules()Mikhail Klementyev
Signed-off-by: Mikhail Klementyev <jollheef@riseup.net> [PM: patch applied by hand due to HTML mangling, rewrote subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-05-29audit: fix for typo in comment to function audit_log_link_denied()Shailendra Verma
Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-05-29Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull fixes for cpumask and modules from Rusty Russell: "** NOW WITH TESTING! ** Two fixes which got lost in my recent distraction. One is a weird cpumask function which needed to be rewritten, the other is a module bug which is cc:stable" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: cpumask_set_cpu_local_first => cpumask_local_spread, lament module: Call module notifier on failure after complete_formation()
2015-05-29ring-buffer: Add enum names for the context levelsSteven Rostedt (Red Hat)
Instead of having hard coded numbers for the context levels, use enums to describe them more. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-29workqueue: fix typos in commentsShailendra Verma
tj: dropped iff -> if, iff is if and only if not a typo. Spotted by Randy Dunlap. Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org>
2015-05-28ring-buffer: Remove useless unused tracing_off_permanent()Steven Rostedt (Red Hat)
The tracing_off_permanent() call is a way to disable all ring_buffers. Nothing uses it and nothing should use it, as tracing_off() and friends are better, as they disable the ring buffers related to tracing. The tracing_off_permanent() even disabled non tracing ring buffers. This is a bit drastic, and was added to handle NMIs doing outputs that could corrupt the ring buffer when only tracing used them. It is now obsolete and adds a little overhead, it should be removed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-28ring-buffer: Give NMIs a chance to lock the reader_lockSteven Rostedt (Red Hat)
Currently, if an NMI does a dump of a ring buffer, it disables all ring buffers from ever doing any writes again. This is because it wont take the locks for the cpu_buffer and this can cause corruption if it preempted a read, or a read happens on another CPU for the current cpu buffer. This is a bit overkill. First, it should at least try to take the lock, and if it fails then disable it. Also, there's no need to disable all ring buffers, even those that are unrelated to what is being read. Only disable the per cpu ring buffer that is being read if it can not get the lock for it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-28kernel/module.c: avoid ifdefs for sig_enforce declarationLuis R. Rodriguez
There's no need to require an ifdef over the declaration of sig_enforce as IS_ENABLED() can be used. While at it, there's no harm in exposing this kernel parameter outside of CONFIG_MODULE_SIG as it'd be a no-op on non module sig kernels. Now, technically we should in theory be able to remove the #ifdef'ery over the declaration of the module parameter as we are also trusting the bool_enable_only code for CONFIG_MODULE_SIG kernels but for now remain paranoid and keep it. With time if no one can put a bullet through bool_enable_only and if there are no technical requirements over not exposing CONFIG_MODULE_SIG_FORCE with the measures in place by bool_enable_only we could remove this last ifdef. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28kernel/workqueue.c: remove ifdefs over wq_power_efficientLuis R. Rodriguez
We can avoid an ifdef over wq_power_efficient's declaration by just using IS_ENABLED(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28kernel/params.c: export param_ops_bool_enable_onlyLuis R. Rodriguez
This will grant access to this helper to code built as modules. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Howells <dhowells@redhat.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Kyle McMartin <kyle@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>