summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-12-08tracing/uprobes: Use dyn_event framework for uprobe eventsMasami Hiramatsu
Use dyn_event framework for uprobe events. This shows uprobe events on "dynamic_events" file. User can also define new uprobe events via dynamic_events. Link: http://lkml.kernel.org/r/154140858481.17322.9091293846515154065.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing/kprobes: Use dyn_event framework for kprobe eventsMasami Hiramatsu
Use dyn_event framework for kprobe events. This shows kprobe events on "tracing/dynamic_events" file. User can also define new events via tracing/dynamic_events. Link: http://lkml.kernel.org/r/154140855646.17322.6619219995865980392.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Add unified dynamic event frameworkMasami Hiramatsu
Add unified dynamic event framework for ftrace kprobes, uprobes and synthetic events. Those dynamic events can be co-exist on same file because those syntax doesn't overlap. This introduces a framework part which provides a unified tracefs interface and operations. Link: http://lkml.kernel.org/r/154140852824.17322.12250362185969352095.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Integrate similar probe argument parsersMasami Hiramatsu
Integrate similar argument parsers for kprobes and uprobes events into traceprobe_parse_probe_arg(). Link: http://lkml.kernel.org/r/154140850016.17322.9836787731210512176.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Simplify creation and deletion of synthetic eventsMasami Hiramatsu
Since the event_mutex and synth_event_mutex ordering issue is gone, we can skip existing event check when adding or deleting events, and some redundant code in error path. This changes release_all_synth_events() to abort the process when it hits any error and returns the error code. It succeeds only if it has no error. Link: http://lkml.kernel.org/r/154140847194.17322.17960275728005067803.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Lock event_mutex before synth_event_mutexMasami Hiramatsu
synthetic event is using synth_event_mutex for protecting synth_event_list, and event_trigger_write() path acquires locks as below order. event_trigger_write(event_mutex) ->trigger_process_regex(trigger_cmd_mutex) ->event_hist_trigger_func(synth_event_mutex) On the other hand, synthetic event creation and deletion paths call trace_add_event_call() and trace_remove_event_call() which acquires event_mutex. In that case, if we keep the synth_event_mutex locked while registering/unregistering synthetic events, its dependency will be inversed. To avoid this issue, current synthetic event is using a 2 phase process to create/delete events. For example, it searches existing events under synth_event_mutex to check for event-name conflicts, and unlocks synth_event_mutex, then registers a new event under event_mutex locked. Finally, it locks synth_event_mutex and tries to add the new event to the list. But it can introduce complexity and a chance for name conflicts. To solve this simpler, this introduces trace_add_event_call_nolock() and trace_remove_event_call_nolock() which don't acquire event_mutex inside. synthetic event can lock event_mutex before synth_event_mutex to solve the lock dependency issue simpler. Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing/uprobes: Add busy check when cleanup all uprobesMasami Hiramatsu
Add a busy check loop in cleanup_all_probes() before trying to remove all events in uprobe_events, the same way that kprobe_events does. Without this change, writing null to uprobe_events will try to remove events but if one of them is enabled, it will stop there leaving some events cleared and others not clceared. With this change, writing null to uprobe_events makes sure all events are not enabled before removing events. So, it clears all events, or returns an error (-EBUSY) with keeping all events. Link: http://lkml.kernel.org/r/154140841557.17322.12653952888762532401.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Change default buffer_percent to 50Steven Rostedt (VMware)
After running several tests, it appears that having the reader wait till half the buffer is full before starting to read (and causing its own events to fill up the ring buffer constantly), works well. It keeps trace-cmd (the main user of this interface) from dominating the traces it records. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Add tracefs file buffer_percentageSteven Rostedt (VMware)
Add a "buffer_percentage" file, that allows users to specify how much of the buffer (percentage of pages) need to be filled before waking up a task blocked on a per cpu trace_pipe_raw file. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08ring-buffer: Add percentage of ring buffer full to wake up readerSteven Rostedt (VMware)
Instead of just waiting for a page to be full before waking up a pending reader, allow the reader to pass in a "percentage" of pages that have content before waking up a reader. This should help keep the process of reading the events not cause wake ups that constantly cause reading of the buffer. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Have trace_stack nr_entries compare not be so subtleDan Carpenter
Dan Carpenter reviewed the trace_stack.c code and figured he found an off by one bug. "From reviewing the code, it seems possible for stack_trace_max.nr_entries to be set to .max_entries and in that case we would be reading one element beyond the end of the stack_dump_trace[] array. If it's not set to .max_entries then the bug doesn't affect runtime." Although it looks to be the case, it is not. Because we have: static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] = { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX }; struct stack_trace stack_trace_max = { .max_entries = STACK_TRACE_ENTRIES - 1, .entries = &stack_dump_trace[0], }; And: stack_trace_max.nr_entries = x; for (; x < i; x++) stack_dump_trace[x] = ULONG_MAX; Even if nr_entries equals max_entries, indexing with it into the stack_dump_trace[] array will not overflow the array. But if it is the case, the second part of the conditional that tests stack_dump_trace[nr_entries] to ULONG_MAX will always be true. By applying Dan's patch, it removes the subtle aspect of it and makes the if conditional slightly more efficient. Link: http://lkml.kernel.org/r/20180620110758.crunhd5bfep7zuiz@kili.mountain Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()Steven Rostedt (VMware)
The ret_stack processing is going to change, and that is going to break anything that is accessing the ret_stack directly. One user is the function graph profiler. By using the ftrace_graph_get_ret_stack() helper function, the profiler can access the ret_stack entry without relying on the implementation details of the stack itself. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08function_graph: Move ftrace_graph_ret_addr() to fgraph.cSteven Rostedt (VMware)
Move the function function_graph_ret_addr() to fgraph.c, as the management of the curr_ret_stack is going to change, and all the accesses to ret_stack needs to be done in fgraph.c. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08fgraph: Add new fgraph_ops structure to enable function graph hooksSteven Rostedt (VMware)
Currently the registering of function graph is to pass in a entry and return function. We need to have a way to associate those functions together where the entry can determine to run the return hook. Having a structure that contains both functions will facilitate the process of converting the code to be able to do such. This is similar to the way function hooks are enabled (it passes in ftrace_ops). Instead of passing in the functions to use, a single structure is passed in to the registering function. The unregister function is now passed in the fgraph_ops handle. When we allow more than one callback to the function graph hooks, this will let the system know which one to remove. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Rearrange functions in trace_sched_wakeup.cSteven Rostedt (VMware)
Rearrange the functions in trace_sched_wakeup.c so that there are fewer #ifdef CONFIG_FUNCTION_TRACER and #ifdef CONFIG_FUNCTION_GRAPH_TRACER, instead of having the #ifdefs spread all over. No functional change is made. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08fgraph: Move function graph specific code into fgraph.cSteven Rostedt (VMware)
To make the function graph infrastructure more managable, the code needs to be in its own file (fgraph.c). Move the code that is specific for managing the function graph infrastructure out of ftrace.c and into fgraph.c Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08function_graph: Do not expose the graph_time option when profiler is not ↵Steven Rostedt (VMware)
configured When the function profiler is not configured, the "graph_time" option is meaningless, as the function profiler is the only thing that makes use of it. Do not expose it if the profiler is not configured. Link: http://lkml.kernel.org/r/20181123061133.GA195223@google.com Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08ftrace: Create new ftrace_internal.h headerSteven Rostedt (VMware)
In order to move function graph infrastructure into its own file (fgraph.h) it needs to access various functions and variables in ftrace.c that are currently static. Create a new file called ftrace-internal.h that holds the function prototypes and the extern declarations of the variables needed by fgraph.c as well, and make them global in ftrace.c such that they can be used outside that file. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08function_graph: Remove the use of FTRACE_NOTRACE_DEPTHSteven Rostedt (VMware)
The curr_ret_stack is no longer set to a negative value when a function is not to be traced by the function graph tracer. Remove the usage of FTRACE_NOTRACE_DEPTH, as it is no longer needed. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-07blkcg: remove additional reference to the cssDennis Zhou
The previous patch in this series removed carrying around a pointer to the css in blkg. However, the blkg association logic still relied on taking a reference on the css to ensure we wouldn't fail in getting a reference for the blkg. Here the implicit dependency on the css is removed. The association continues to rely on the tryget logic walking up the blkg tree. This streamlines the three ways that association can happen: normal, swap, and writeback. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: remove bio->bi_css and instead use bio->bi_blkgDennis Zhou
Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07bpf: verifier remove the rejection on BPF_ALU | BPF_ARSHJiong Wang
This patch remove the rejection on BPF_ALU | BPF_ARSH as we have supported them on interpreter and all JIT back-ends Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07bpf: interpreter support BPF_ALU | BPF_ARSHJiong Wang
This patch implements interpreting BPF_ALU | BPF_ARSH. Do arithmetic right shift on low 32-bit sub-register, and zero the high 32 bits. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07y2038: futex: Add support for __kernel_timespecArnd Bergmann
This prepares sys_futex for y2038 safe calling: the native syscall is changed to receive a __kernel_timespec argument, which will be switched to 64-bit time_t in the future. All the internal time handling gets changed to timespec64, and the compat_sys_futex entry point is moved under the CONFIG_COMPAT_32BIT_TIME check to provide compatibility for existing 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-07y2038: futex: Move compat implementation into futex.cArnd Bergmann
We are going to share the compat_sys_futex() handler between 64-bit architectures and 32-bit architectures that need to deal with both 32-bit and 64-bit time_t, and this is easier if both entry points are in the same file. In fact, most other system call handlers do the same thing these days, so let's follow the trend here and merge all of futex_compat.c into futex.c. In the process, a few minor changes have to be done to make sure everything still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become local symbol, and the compat version of the fetch_robust_entry() function gets renamed to compat_fetch_robust_entry() to avoid a symbol clash. This is intended as a purely cosmetic patch, no behavior should change. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-07Merge tag 'gcc-plugins-v4.20-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc stackleak plugin fixes from Kees Cook: - Remove tracing for inserted stack depth marking function (Anders Roxell) - Move gcc-plugin pass location to avoid objtool warnings (Alexander Popov) * tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass stackleak: Mark stackleak_track_stack() as notrace
2018-12-06Merge tag 'trace-v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "This is a single commit that fixes a bug in uprobes SDT code due to a missing mutex protection" * tag 'trace-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: Uprobes: Fix kernel oops with delayed_uprobe_remove()
2018-12-06signal: Add restore_user_sigmask()Deepa Dinamani
Refactor the logic to restore the sigmask before the syscall returns into an api. This is useful for versions of syscalls that pass in the sigmask and expect the current->sigmask to be changed during the execution and restored after the execution of the syscall. With the advent of new y2038 syscalls in the subsequent patches, we add two more new versions of the syscalls (for pselect, ppoll and io_pgetevents) in addition to the existing native and compat versions. Adding such an api reduces the logic that would need to be replicated otherwise. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06signal: Add set_user_sigmask()Deepa Dinamani
Refactor reading sigset from userspace and updating sigmask into an api. This is useful for versions of syscalls that pass in the sigmask and expect the current->sigmask to be changed during, and restored after, the execution of the syscall. With the advent of new y2038 syscalls in the subsequent patches, we add two more new versions of the syscalls (for pselect, ppoll, and io_pgetevents) in addition to the existing native and compat versions. Adding such an api reduces the logic that would need to be replicated otherwise. Note that the calls to sigprocmask() ignored the return value from the api as the function only returns an error on an invalid first argument that is hardcoded at these call sites. The updated logic uses set_current_blocked() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-06kobject: return error code if writing /sys/.../uevent failsPeter Rajnoha
Propagate error code back to userspace if writing the /sys/.../uevent file fails. Before, the write operation always returned with success, even if we failed to recognize the input string or if we failed to generate the uevent itself. With the error codes properly propagated back to userspace, we are able to react in userspace accordingly by not assuming and awaiting a uevent that is not delivered. Signed-off-by: Peter Rajnoha <prajnoha@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-06dma-direct: remove the mapping_error dma_map_ops methodChristoph Hellwig
The dma-direct code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-06kexec_file: kexec_walk_memblock() only walks a dedicated region at kdumpAKASHI Takahiro
In kdump case, there exists only one dedicated memblock region as usable memory (crashk_res). With this patch, kexec_walk_memblock() runs a given callback function on this region. Cosmetic change: 0 to MEMBLOCK_NONE at for_each_free_mem_range*() Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()AKASHI Takahiro
Memblock list is another source for usable system memory layout. So move powerpc's arch_kexec_walk_mem() to common code so that other memblock-based architectures, particularly arm64, can also utilise it. A moved function is now renamed to kexec_walk_memblock() and integrated into kexec_locate_mem_hole(), which will now be usable for all architectures with no need for overriding arch_kexec_walk_mem(). With this change, arch_kexec_walk_mem() need no longer be a weak function, and was now renamed to kexec_walk_resources(). Since powerpc doesn't support kdump in its kexec_file_load(), the current kexec_walk_memblock() won't work for kdump either in this form, this will be fixed in the next patch. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06s390, kexec_file: drop arch_kexec_mem_walk()AKASHI Takahiro
Since s390 already knows where to locate buffers, calling arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem indicates this while all other architectures sets it to 0 initially. This change is a preparatory work for the next patch, where all the variant memory walks, either on system resource or memblock, will be put in one common place so that it will satisfy all the architectures' need. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06kexec_file: make kexec_image_post_load_cleanup_default() globalAKASHI Takahiro
Change this function from static to global so that arm64 can implement its own arch_kimage_file_post_load_cleanup() later using kexec_image_post_load_cleanup_default(). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-05Uprobes: Fix kernel oops with delayed_uprobe_remove()Ravi Bangoria
There could be a race between task exit and probe unregister: exit_mm() mmput() __mmput() uprobe_unregister() uprobe_clear_state() put_uprobe() delayed_uprobe_remove() delayed_uprobe_remove() put_uprobe() is calling delayed_uprobe_remove() without taking delayed_uprobe_lock and thus the race sometimes results in a kernel crash. Fix this by taking delayed_uprobe_lock before calling delayed_uprobe_remove() from put_uprobe(). Detailed crash log can be found at: Link: http://lkml.kernel.org/r/000000000000140c370577db5ece@google.com Link: http://lkml.kernel.org/r/20181205033423.26242-1-ravi.bangoria@linux.ibm.com Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reported-by: syzbot+cb1fb754b771caca0a88@syzkaller.appspotmail.com Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference count (semaphore)") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-05stackleak: Mark stackleak_track_stack() as notraceAnders Roxell
Function graph tracing recurses into itself when stackleak is enabled, causing the ftrace graph selftest to run for up to 90 seconds and trigger the softlockup watchdog. Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200 200 mcount_get_lr_addr x0 // pointer to function's saved lr (gdb) bt \#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200 \#1 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153 \#2 0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106 \#3 0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507 \#4 0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286 \#5 ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321 \#6 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153 \#7 0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876 \#8 0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650 \#9 0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205 Rework so we mark stackleak_track_stack as notrace Co-developed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-05bpf: Change insn_offset to insn_off in bpf_func_infoMartin KaFai Lau
The later patch will introduce "struct bpf_line_info" which has member "line_off" and "file_off" referring back to the string section in btf. The line_"off" and file_"off" are more consistent to the naming convention in btf.h that means "offset" (e.g. name_off in "struct btf_type"). The to-be-added "struct bpf_line_info" also has another member, "insn_off" which is the same as the "insn_offset" in "struct bpf_func_info". Hence, this patch renames "insn_offset" to "insn_off" for "struct bpf_func_info". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-05bpf: Improve the info.func_info and info.func_info_rec_size behaviorMartin KaFai Lau
1) When bpf_dump_raw_ok() == false and the kernel can provide >=1 func_info to the userspace, the current behavior is setting the info.func_info_cnt to 0 instead of setting info.func_info to 0. It is different from the behavior in jited_func_lens/nr_jited_func_lens, jited_ksyms/nr_jited_ksyms...etc. This patch fixes it. (i.e. set func_info to 0 instead of func_info_cnt to 0 when bpf_dump_raw_ok() == false). 2) When the userspace passed in info.func_info_cnt == 0, the kernel will set the expected func_info size back to the info.func_info_rec_size. It is a way for the userspace to learn the kernel expected func_info_rec_size introduced in commit 838e96904ff3 ("bpf: Introduce bpf_func_info"). An exception is the kernel expected size is not set when func_info is not available for a bpf_prog. This makes the returned info.func_info_rec_size has different values depending on the returned value of info.func_info_cnt. This patch sets the kernel expected size to info.func_info_rec_size independent of the info.func_info_cnt. 3) The current logic only rejects invalid func_info_rec_size if func_info_cnt is non zero. This patch also rejects invalid nonzero info.func_info_rec_size and not equal to the kernel expected size. 4) Set info.btf_id as long as prog->aux->btf != NULL. That will setup the later copy_to_user() codes look the same as others which then easier to understand and maintain. prog->aux->btf is not NULL only if prog->aux->func_info_cnt > 0. Breaking up info.btf_id from prog->aux->func_info_cnt is needed for the later line info patch anyway. A similar change is made to bpf_get_prog_name(). Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-05 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix bpf uapi pointers for 32-bit architectures, from Daniel. 2) improve verifer ability to handle progs with a lot of branches, from Alexei. 3) strict btf checks, from Yonghong. 4) bpf_sk_lookup api cleanup, from Joe. 5) other misc fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05bpf: add __weak hook for allocating executable memoryArd Biesheuvel
By default, BPF uses module_alloc() to allocate executable memory, but this is not necessary on all arches and potentially undesirable on some of them. So break out the module_alloc() and module_memfree() calls into __weak functions to allow them to be overridden in arch code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-05dma-mapping: fix lack of DMA address assignment in generic remap allocatorMarek Szyprowski
Commit bfd56cd60521 ("dma-mapping: support highmem in the generic remap allocator") replaced dma_direct_alloc_pages() with __dma_direct_alloc_pages(), which doesn't set dma_handle and zero allocated memory. Fix it by doing this directly in the caller function. Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap allocator") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-05timekeeping: Use proper seqcount initializerBart Van Assche
tk_core.seq is initialized open coded, but that misses to initialize the lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq consequently lack a name and are hard to read. Use the proper initializer which takes care of the lockdep map initialization. [ tglx: Massaged changelog ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: tj@kernel.org Cc: johannes.berg@intel.com Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org
2018-12-04Merge tag 'v4.20-rc5' into for-4.21/blockJens Axboe
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and also getting the merge fix that went into mainline that users are hitting testing for-4.21/block and/or for-next. * tag 'v4.20-rc5': (664 commits) Linux 4.20-rc5 PCI: Fix incorrect value returned from pcie_get_speed_cap() MAINTAINERS: Update linux-mips mailing list address ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set ...
2018-12-04bpf: add per-insn complexity limitAlexei Starovoitov
malicious bpf program may try to force the verifier to remember a lot of distinct verifier states. Put a limit to number of per-insn 'struct bpf_verifier_state'. Note that hitting the limit doesn't reject the program. It potentially makes the verifier do more steps to analyze the program. It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner instead of spending cpu time walking long link list. The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs with slight increase in number of "steps" it takes to successfully verify the programs: before after bpf_lb-DLB_L3.o 1940 1940 bpf_lb-DLB_L4.o 3089 3089 bpf_lb-DUNKNOWN.o 1065 1065 bpf_lxc-DDROP_ALL.o 28052 | 28162 bpf_lxc-DUNKNOWN.o 35487 | 35541 bpf_netdev.o 10864 10864 bpf_overlay.o 6643 6643 bpf_lcx_jit.o 38437 38437 But it also makes malicious program to be rejected in 0.4 seconds vs 6.5 Hence apply this limit to unprivileged programs only. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: improve verifier branch analysisAlexei Starovoitov
pathological bpf programs may try to force verifier to explode in the number of branch states: 20: (d5) if r1 s<= 0x24000028 goto pc+0 21: (b5) if r0 <= 0xe1fa20 goto pc+2 22: (d5) if r1 s<= 0x7e goto pc+0 23: (b5) if r0 <= 0xe880e000 goto pc+0 24: (c5) if r0 s< 0x2100ecf4 goto pc+0 25: (d5) if r1 s<= 0xe880e000 goto pc+1 26: (c5) if r0 s< 0xf4041810 goto pc+0 27: (d5) if r1 s<= 0x1e007e goto pc+0 28: (b5) if r0 <= 0xe86be000 goto pc+0 29: (07) r0 += 16614 30: (c5) if r0 s< 0x6d0020da goto pc+0 31: (35) if r0 >= 0x2100ecf4 goto pc+0 Teach verifier to recognize always taken and always not taken branches. This analysis is already done for == and != comparison. Expand it to all other branches. It also helps real bpf programs to be verified faster: before after bpf_lb-DLB_L3.o 2003 1940 bpf_lb-DLB_L4.o 3173 3089 bpf_lb-DUNKNOWN.o 1080 1065 bpf_lxc-DDROP_ALL.o 29584 28052 bpf_lxc-DUNKNOWN.o 36916 35487 bpf_netdev.o 11188 10864 bpf_overlay.o 6679 6643 bpf_lcx_jit.o 39555 38437 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04bpf: check pending signals while verifying programsAlexei Starovoitov
Malicious user space may try to force the verifier to use as much cpu time and memory as possible. Hence check for pending signals while verifying the program. Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN, since the kernel has to release the resources used for program verification. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03audit: shorten PATH cap values when zeroRichard Guy Briggs
Since the vast majority of files (99.993% on a typical system) have no fcaps, display "0" instead of the full zero-padded 16 hex digits in the two PATH record cap_f* fields to save netlink bandwidth and disk space. Simply changing the format to %x won't work since the value is two (or possibly more in the future) 32-bit hexadecimal values concatenated and bits in higher order values will be misrepresented. Passes audit-testsuite and userspace tools already work fine. Please see the github issue tracker for more details https://github.com/linux-audit/audit-kernel/issues/101 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-12-03cpuset: Remove set but not used variable 'cs'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: kernel/cgroup/cpuset.c: In function 'cpuset_cancel_attach': kernel/cgroup/cpuset.c:2167:17: warning: variable 'cs' set but not used [-Wunused-but-set-variable] It never used since introduction in commit 1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from subtree_control enabling") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>