summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-04-25bpf: mark registers in all frames after pkt/null checksPaul Chaignon
In case of a null check on a pointer inside a subprog, we should mark all registers with this pointer as either safe or unknown, in both the current and previous frames. Currently, only spilled registers and registers in the current frame are marked. Packet bound checks in subprogs have the same issue. This patch fixes it to mark registers in previous frames as well. A good reproducer for null checks looks as follow: 1: ptr = bpf_map_lookup_elem(map, &key); 2: ret = subprog(ptr) { 3: return ptr != NULL; 4: } 5: if (ret) 6: value = *ptr; With the above, the verifier will complain on line 6 because it sees ptr as map_value_or_null despite the null check in subprog 1. Note that this patch fixes another resulting bug when using bpf_sk_release(): 1: sk = bpf_sk_lookup_tcp(...); 2: subprog(sk) { 3: if (sk) 4: bpf_sk_release(sk); 5: } 6: if (!sk) 7: return 0; 8: return 1; In the above, mark_ptr_or_null_regs will warn on line 6 because it will try to free the reference state, even though it was already freed on line 3. Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)") Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-25seccomp: Make NEW_LISTENER and TSYNC flags exclusiveTycho Andersen
As the comment notes, the return codes for TSYNC and NEW_LISTENER conflict, because they both return positive values, one in the case of success and one in the case of error. So, let's disallow both of these flags together. While this is technically a userspace break, all the users I know of are still waiting on me to land this feature in libseccomp, so I think it'll be safe. Also, at present my use case doesn't require TSYNC at all, so this isn't a big deal to disallow. If someone wanted to support this, a path forward would be to add a new flag like TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN, but the use cases are so different I don't see it really happening. Finally, it's worth noting that this does actually fix a UAF issue: at the end of seccomp_set_mode_filter(), we have: if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { if (ret < 0) { listener_f->private_data = NULL; fput(listener_f); put_unused_fd(listener); } else { fd_install(listener, listener_f); ret = listener; } } out_free: seccomp_filter_free(prepared); But if ret > 0 because TSYNC raced, we'll install the listener fd and then free the filter out from underneath it, causing a UAF when the task closes it or dies. This patch also switches the condition to be simply if (ret), so that if someone does add the flag mentioned above, they won't have to remember to fix this too. Reported-by: syzbot+b562969adb2e04af3442@syzkaller.appspotmail.com Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") CC: stable@vger.kernel.org # v5.0+ Signed-off-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: James Morris <jamorris@linux.microsoft.com>
2019-04-25bpf: support BPF_PROG_QUERY for BPF_FLOW_DISSECTOR attach_typeStanislav Fomichev
target_fd is target namespace. If there is a flow dissector BPF program attached to that namespace, its (single) id is returned. v5: * drop net ref right after rcu unlock (Daniel Borkmann) v4: * add missing put_net (Jann Horn) v3: * add missing inline to skb_flow_dissector_prog_query static def (kbuild test robot <lkp@intel.com>) v2: * don't sleep in rcu critical section (Jakub Kicinski) * check input prog_cnt (exit early) Cc: Jann Horn <jannh@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-25livepatch: Replace klp_ktype_patch's default_attrs with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace klp_ktype_patch's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create klp_patch_groups. This patch was tested by loading the livepatch-sample module and verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25cpufreq: schedutil: Replace default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace sugov_tunables_ktype's default_attrs field with default groups. Change "sugov_attributes" to "sugov_attrs" and use the ATTRIBUTE_GROUPS macro to create sugov_groups. This patch was tested by setting the scaling governor to schedutil and verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25padata: Replace padata_attr_type default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace padata_attr_type's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create padata_default_groups. This patch was tested by loading the pcrypt module and verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25irqdesc: Replace irq_kobj_type's default_attrs field with groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace irq_kobj_type's default_attrs field with default_groups and use the ATTRIBUTE_GROUPS macro to create irq_groups. This patch was tested by verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25sched/numa: Fix a possible divide-by-zeroXie XiuQi
sched_clock_cpu() may not be consistent between CPUs. If a task migrates to another CPU, then se.exec_start is set to that CPU's rq_clock_task() by update_stats_curr_start(). Specifically, the new value might be before the old value due to clock skew. So then if in numa_get_avg_runtime() the expression: 'now - p->last_task_numa_placement' ends up as -1, then the divider '*period + 1' in task_numa_placement() is 0 and things go bang. Similar to update_curr(), check if time goes backwards to avoid this. [ peterz: Wrote new changelog. ] [ mingo: Tweaked the code comment. ] Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cj.chengjian@huawei.com Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-25crypto: shash - remove shash_desc::flagsEric Biggers
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-24dma-mapping: remove an unnecessary NULL checkDan Carpenter
We already dereferenced "dev" when we called get_dma_ops() so this NULL check is too late. We're not supposed to pass NULL "dev" pointers to dma_alloc_attrs(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-23seccomp: fix up grammar in commentTycho Andersen
This sentence is kind of a train wreck anyway, but at least dropping the extra pronoun helps somewhat. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2019-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-04-22 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) allow stack/queue helpers from more bpf program types, from Alban. 2) allow parallel verification of root bpf programs, from Alexei. 3) introduce bpf sysctl hook for trusted root cases, from Andrey. 4) recognize var/datasec in btf deduplication, from Andrii. 5) cpumap performance optimizations, from Jesper. 6) verifier prep for alu32 optimization, from Jiong. 7) libbpf xsk cleanup, from Magnus. 8) other various fixes and cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23bpf: drop bpf_verifier_lockAlexei Starovoitov
Drop bpf_verifier_lock for root to avoid being DoS-ed by unprivileged. The BPF verifier is now fully parallel. All unpriv users are still serialized by bpf_verifier_lock to avoid exhausting kernel memory by running N parallel verifications. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23bpf: remove global variablesAlexei Starovoitov
Move three global variables protected by bpf_verifier_lock into 'struct bpf_verifier_env' to allow parallel verification. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-22audit: fix a memory leak bugWenwen Wang
In audit_rule_change(), audit_data_to_entry() is firstly invoked to translate the payload data to the kernel's rule representation. In audit_data_to_entry(), depending on the audit field type, an audit tree may be created in audit_make_tree(), which eventually invokes kmalloc() to allocate the tree. Since this tree is a temporary tree, it will be then freed in the following execution, e.g., audit_add_rule() if the message type is AUDIT_ADD_RULE or audit_del_rule() if the message type is AUDIT_DEL_RULE. However, if the message type is neither AUDIT_ADD_RULE nor AUDIT_DEL_RULE, i.e., the default case of the switch statement, this temporary tree is not freed. To fix this issue, only allocate the tree when the type is AUDIT_ADD_RULE or AUDIT_DEL_RULE. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-04-21function_graph: Have selftest also emulate tr->reset() as it did with tr->init()Steven Rostedt (VMware)
The function_graph boot up self test emulates the tr->init() function in order to add a wrapper around the function graph tracer entry code to test for lock ups and such. But it does not emulate the tr->reset(), and just calls the function_graph tracer tr->reset() function which will use its own fgraph_ops to unregister function tracing with. As the fgraph_ops is becoming more meaningful with the register_ftrace_graph() and unregister_ftrace_graph() functions, the two need to be the same. The emulated tr->init() uses its own fgraph_ops descriptor, which means the unregister_ftrace_graph() must use the same ftrace_ops, which the selftest currently does not do. By emulating the tr->reset() as the selftest does with the tr->init() it will be able to pass the same fgraph_ops descriptor to the unregister_ftrace_graph() as it did with the register_ftrace_graph(). Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-20Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Misc clocksource driver fixes, and a sched-clock wrapping fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() clocksource/drivers/timer-ti-dm: Remove omap_dm_timer_set_load_start clocksource/drivers/oxnas: Fix OX820 compatible clocksource/drivers/arm_arch_timer: Remove unneeded pr_fmt macro clocksource/drivers/npcm: select TIMER_OF
2019-04-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes
2019-04-20Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A deadline scheduler warning/race fix, and a cfs_period_us quota calculation workaround where the real fix looks too involved to merge immediately" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Correctly handle active 0-lag timers sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
2019-04-20Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "A lockdep warning fix and a script execution fix when atomics are generated" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomics: Don't assume that scripts are executable locking/lockdep: Make lockdep_unregister_key() honor 'debug_locks' again
2019-04-19sched/debug: Fix spelling mistake "logaritmic" -> "logarithmic"Colin Ian King
Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20181128152350.13622-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19cgroup: add tracing points for cgroup v2 freezerRoman Gushchin
Add cgroup:cgroup_freeze and cgroup:cgroup_unfreeze events, which are using the existing cgroup tracing infrastructure. Add the cgroup_event event class, which is similar to the cgroup class, but contains an additional integer field to store a new value (the level field is dropped). Also add two tracing events: cgroup_notify_populated and cgroup_notify_frozen, which are raised in a generic way using the TRACE_CGROUP_PATH() macro. This allows to trace cgroup state transitions and is generally helpful for debugging the cgroup freezer code. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-04-19cgroup: make TRACE_CGROUP_PATH irq-safeRoman Gushchin
To use the TRACE_CGROUP_PATH() macro with css_set_lock locked, let's make the macro irq-safe. It's necessary in order to trace cgroup freezer state transitions (frozen/not frozen), which are happening with css_set_lock locked. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-04-19cgroup: cgroup v2 freezerRoman Gushchin
Cgroup v1 implements the freezer controller, which provides an ability to stop the workload in a cgroup and temporarily free up some resources (cpu, io, network bandwidth and, potentially, memory) for some other tasks. Cgroup v2 lacks this functionality. This patch implements freezer for cgroup v2. Cgroup v2 freezer tries to put tasks into a state similar to jobctl stop. This means that tasks can be killed, ptraced (using PTRACE_SEIZE*), and interrupted. It is possible to attach to a frozen task, get some information (e.g. read registers) and detach. It's also possible to migrate a frozen tasks to another cgroup. This differs cgroup v2 freezer from cgroup v1 freezer, which mostly tried to imitate the system-wide freezer. However uninterruptible sleep is fine when all tasks are going to be frozen (hibernation case), it's not the acceptable state for some subset of the system. Cgroup v2 freezer is not supporting freezing kthreads. If a non-root cgroup contains kthread, the cgroup still can be frozen, but the kthread will remain running, the cgroup will be shown as non-frozen, and the notification will not be delivered. * PTRACE_ATTACH is not working because non-fatal signal delivery is blocked in frozen state. There are some interface differences between cgroup v1 and cgroup v2 freezer too, which are required to conform the cgroup v2 interface design principles: 1) There is no separate controller, which has to be turned on: the functionality is always available and is represented by cgroup.freeze and cgroup.events cgroup control files. 2) The desired state is defined by the cgroup.freeze control file. Any hierarchical configuration is allowed. 3) The interface is asynchronous. The actual state is available using cgroup.events control file ("frozen" field). There are no dedicated transitional states. 4) It's allowed to make any changes with the cgroup hierarchy (create new cgroups, remove old cgroups, move tasks between cgroups) no matter if some cgroups are frozen. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> No-objection-from-me-by: Oleg Nesterov <oleg@redhat.com> Cc: kernel-team@fb.com
2019-04-19cgroup: protect cgroup->nr_(dying_)descendants by css_set_lockRoman Gushchin
The number of descendant cgroups and the number of dying descendant cgroups are currently synchronized using the cgroup_mutex. The number of descendant cgroups will be required by the cgroup v2 freezer, which will use it to determine if a cgroup is frozen (depending on total number of descendants and number of frozen descendants). It's not always acceptable to grab the cgroup_mutex, especially from quite hot paths (e.g. exit()). To avoid this, let's additionally synchronize these counters using the css_set_lock. So, it's safe to read these counters with either cgroup_mutex or css_set_lock locked, and for changing both locks should be acquired. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com
2019-04-19cgroup: implement __cgroup_task_count() helperRoman Gushchin
The helper is identical to the existing cgroup_task_count() except it doesn't take the css_set_lock by itself, assuming that the caller does. Also, move cgroup_task_count() implementation into kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com
2019-04-19cgroup: rename freezer.c into legacy_freezer.cRoman Gushchin
Freezer.c will contain an implementation of cgroup v2 freezer, so let's rename the v1 freezer to avoid naming conflicts. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel-team@fb.com
2019-04-19sched/topology: Update init_sched_domains() commentJuri Lelli
Holding hotplug lock is not a requirement anymore for callers of sched_ init_domains after commit: 6acce3ef8452 ("sched: Remove get_online_cpus() usage") Update the relative comment preceding init_sched_domains(). Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cgroups@vger.kernel.org Cc: lizefan@huawei.com Link: http://lkml.kernel.org/r/20181219133445.31982-2-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19cgroup/cpuset: Update stale generate_sched_domains() commentsJuri Lelli
Commit: fc560a26acce ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()") removed the local list (q) that was used to perform a top-down scan of all cpusets; however, comments mentioning it were not updated. Update comments to reflect current implementation. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cgroups@vger.kernel.org Cc: lizefan@huawei.com Link: http://lkml.kernel.org/r/20181219133445.31982-1-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19tick: Fix typos in commentsLaurent Gauthier
Signed-off-by: Laurent Gauthier <laurent.gauthier@soccasys.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19kernel/watchdog_hld.c: hard lockup message should end with a newlineSergey Senozhatsky
Separate print_modules() and hard lockup error message. Before the patch: NMI watchdog: Watchdog detected hard LOCKUP on cpu 1Modules linked in: nls_cp437 Link: http://lkml.kernel.org/r/20190412062557.2700-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19kprobes: Mark ftrace mcount handler functions nokprobeMasami Hiramatsu
Mark ftrace mcount handler functions nokprobe since probing on these functions with kretprobe pushes return address incorrectly on kretprobe shadow stack. Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094062044.6137.6419622920568680640.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19sched/core: Check quota and period overflow at usec to nsec conversionKonstantin Khlebnikov
Large values could overflow u64 and pass following sanity checks. # echo 18446744073750000 > cpu.cfs_period_us # cat cpu.cfs_period_us 40448 # echo 18446744073750000 > cpu.cfs_quota_us # cat cpu.cfs_quota_us 40448 After this patch they will fail with -EINVAL. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/155125502079.293431.3947497929372138600.stgit@buzz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19sched/core: Handle overflow in cpu_shares_write_u64Konstantin Khlebnikov
Bit shift in scale_load() could overflow shares. This patch saturates it to MAX_SHARES like following sched_group_set_shares(). Example: # echo 9223372036854776832 > cpu.shares # cat cpu.shares Before patch: 1024 After pattch: 262144 Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/155125501891.293431.3345233332801109696.stgit@buzz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19sched/rt: Check integer overflow at usec to nsec conversionKonstantin Khlebnikov
Example of unhandled overflows: # echo 18446744073709651 > cpu.rt_runtime_us # cat cpu.rt_runtime_us 99 # echo 18446744073709900 > cpu.rt_period_us # cat cpu.rt_period_us 348 After this patch they will fail with -EINVAL. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/155125501739.293431.5252197504404771496.stgit@buzz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19mm/resource: Use resource_overlaps() to simplify region_intersects()Wei Yang
The three checks in region_intersects() are basically an open-coded version of resource_overlaps() - so use the real thing. Also fix typos in comments while at it. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Yuan Yao <yuan.yao@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: bhelgaas@google.com Cc: bp@suse.de Cc: dan.j.williams@intel.com Cc: jack@suse.cz Cc: rdunlap@infradead.org Cc: tiwai@suse.de Link: http://lkml.kernel.org/r/20190305083432.23675-1-richardw.yang@linux.intel.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19rseq: Remove superfluous rseq_len from task_structMathieu Desnoyers
The rseq system call, when invoked with flags of "0" or "RSEQ_FLAG_UNREGISTER" values, expects the rseq_len parameter to be equal to sizeof(struct rseq), which is fixed-size and fixed-layout, specified in uapi linux/rseq.h. Expecting a fixed size for rseq_len is a design choice that ensures multiple libraries and application defining __rseq_abi in the same process agree on its exact size. Considering that this size is and will always be the same value, there is no point in saving this value within task_struct rseq_len. Remove this field from task_struct. No change in functionality intended. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20190305194755.2602-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19rseq: Clean up comments by reflecting removal of event counterMathieu Desnoyers
The "event counter" was removed from rseq before it was merged upstream. However, a few comments in the source code still refer to it. Adapt the comments to match reality. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20190305194755.2602-2-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19sched/core: Fix typo in commentJoel Savitz
Signed-off-by: Joel Savitz <jsavitz@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1551921213-813-1-git-send-email-jsavitz@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18ipv6: Add rate limit mask for ICMPv6 messagesStephen Suryaputra
To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP message types use larger numeric values, a simple bitmask doesn't fit. I use large bitmap. The input and output are the in form of list of ranges. Set the default to rate limit all error messages but Packet Too Big. For Packet Too Big, use ratemask instead of hard-coded. There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow() aren't called. This patch only adds them to icmpv6_echo_reply(). Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says that it is also acceptable to rate limit informational messages. Thus, I removed the current hard-coded behavior of icmpv6_mask_allow() that doesn't rate limit informational messages. v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL isn't defined, expand the description in ip-sysctl.txt and remove unnecessary conditional before kfree(). v3: Inline the bitmap instead of dynamically allocated. Still is a pointer to it is needed because of the way proc_do_large_bitmap work. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18sched/core: Make some functions staticYueHaibing
Fix these sparse warnings: kernel/sched/core.c:6577:11: warning: symbol 'min_cfs_quota_period' was not declared. Should it be static? kernel/sched/core.c:6657:5: warning: symbol 'tg_set_cfs_quota' was not declared. Should it be static? kernel/sched/core.c:6670:6: warning: symbol 'tg_get_cfs_quota' was not declared. Should it be static? kernel/sched/core.c:6683:5: warning: symbol 'tg_set_cfs_period' was not declared. Should it be static? kernel/sched/core.c:6693:6: warning: symbol 'tg_get_cfs_period' was not declared. Should it be static? kernel/sched/fair.c:2596:6: warning: symbol 'task_tick_numa' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190418144713.34332-1-yuehaibing@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18signal: use fdget() since we don't allow O_PATHChristian Brauner
As stated in the original commit for pidfd_send_signal() we don't allow to signal processes through O_PATH file descriptors since it is semantically equivalent to a write on the pidfd. We already correctly error out right now and return EBADF if an O_PATH fd is passed. This is because we use file->f_op to detect whether a pidfd is passed and O_PATH fds have their file->f_op set to empty_fops in do_dentry_open() and thus fail the test. Thus, there is no regression. It's just semantically correct to use fdget() and return an error right from there instead of taking a reference and returning an error later. Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jann Horn <jann@thejh.net> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-18Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM commits from Paul E. McKenney: - An LKMM commit adding support for synchronize_srcu_expedited() - A couple of straggling RCU flavor consolidation updates - Documentation updates. - Miscellaneous fixes - SRCU updates - RCU CPU stall-warning updates - Torture-test updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()Chang-An Chen
tick_freeze() introduced by suspend-to-idle in commit 124cf9117c5f ("PM / sleep: Make it possible to quiesce timers during suspend-to-idle") uses timekeeping_suspend() instead of syscore_suspend() during suspend-to-idle. As a consequence generic sched_clock will keep going because sched_clock_suspend() and sched_clock_resume() are not invoked during suspend-to-idle which can result in a generic sched_clock wrap. On a ARM system with suspend-to-idle enabled, sched_clock is registered as "56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns", which means the real wrapping duration is 8796093022202ns. [ 134.551779] suspend-to-idle suspend (timekeeping_suspend()) [ 1204.912239] suspend-to-idle resume (timekeeping_resume()) ...... [ 1206.912239] suspend-to-idle suspend (timekeeping_suspend()) [ 5880.502807] suspend-to-idle resume (timekeeping_resume()) ...... [ 6000.403724] suspend-to-idle suspend (timekeeping_suspend()) [ 8035.753167] suspend-to-idle resume (timekeeping_resume()) ...... [ 8795.786684] (2)[321:charger_thread]...... [ 8795.788387] (2)[321:charger_thread]...... [ 0.057226] (0)[0:swapper/0]...... [ 0.061447] (2)[0:swapper/2]...... sched_clock was not stopped during suspend-to-idle, and sched_clock_poll hrtimer was not expired because timekeeping_suspend() was invoked during suspend-to-idle. It makes sched_clock wrap at kernel time 8796s. To prevent this, invoke sched_clock_suspend() and sched_clock_resume() in tick_freeze() together with timekeeping_suspend() and timekeeping_resume(). Fixes: 124cf9117c5f (PM / sleep: Make it possible to quiesce timers during suspend-to-idle) Signed-off-by: Chang-An Chen <chang-an.chen@mediatek.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Corey Minyard <cminyard@mvista.com> Cc: <linux-mediatek@lists.infradead.org> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: <kuohong.wang@mediatek.com> Cc: <freddy.hsin@mediatek.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1553828349-8914-1-git-send-email-chang-an.chen@mediatek.com
2019-04-18irq_work: Do not raise an IPI when queueing work on the local CPUNicholas Piggin
The QEMU PowerPC/PSeries machine model was not expecting a self-IPI, and it may be a bit surprising thing to do, so have irq_work_queue_on do local queueing when target is the current CPU. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190409093403.20994-1-npiggin@gmail.com [ Simplified the preprocessor comments. Fixed unbalanced curly brackets pointed out by Thomas. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18watchdog: Fix typo in commentArash Fotouhi
Signed-off-by: Arash Fotouhi <arash@arashfotouhi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: loberman@redhat.com Cc: vincent.whitchurch@axis.com Link: http://lkml.kernel.org/r/1553308112-3513-1-git-send-email-arash@arashfotouhi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18locking/lockdep: Generate LOCKF_ bit compositesPeter Zijlstra
Instead of open-coding the bitmasks, generate them using the lockdep_states.h header. This prepares for additional states, which would make the manual masks tedious and error prone. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18locking/lockdep: Use expanded masks on find_usage_*() functionsFrederic Weisbecker
In order to optimize check_irq_usage() and factorize all the IRQ usage validations we'll need to be able to check multiple lock usage bits at once. Prepare the low level usage mask check functions for that purpose. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190402160244.32434-4-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18locking/lockdep: Map remaining magic numbers to lock usage mask namesFrederic Weisbecker
Clarify the code with mapping some more constant numbers that haven't been named after their corresponding LOCK_USAGE_* symbol. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190402160244.32434-3-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && ↵Frederic Weisbecker
CONFIG_PROVE_LOCKING valid_state() and print_usage_bug*() functions are not used beyond irq locking correctness checks under CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING. Sadly the "unused function" warning wouldn't fire because valid_state() is inline so the unused case has remained unseen until now. So move them inside the appropriate CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING section. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190402160244.32434-2-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>