summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2017-10-20bpf: require CAP_NET_ADMIN when using sockmap mapsJohn Fastabend
Restrict sockmap to CAP_NET_ADMIN. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb regionJohn Fastabend
SK_SKB BPF programs are run from the socket/tcp context but early in the stack before much of the TCP metadata is needed in tcp_skb_cb. So we can use some unused fields to place BPF metadata needed for SK_SKB programs when implementing the redirect function. This allows us to drop the preempt disable logic. It does however require an API change so sk_redirect_map() has been updated to additionally provide ctx_ptr to skb. Note, we do however continue to disable/enable preemption around actual BPF program running to account for map updates. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20bpf: enforce TCP only support for sockmapJohn Fastabend
Only TCP sockets have been tested and at the moment the state change callback only handles TCP sockets. This adds a check to ensure that sockets actually being added are TCP sockets. For net-next we can consider UDP support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20kprobes: Disable the jprobes test codeMasami Hiramatsu
Disable jprobes test code because jprobes are deprecated. This code will be completely removed when the jprobe code is removed. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Ian McDonald <ian.mcdonald@jandi.co.nz> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlad Yasevich <vyasevich@gmail.com> Link: http://lkml.kernel.org/r/150724531730.5014.6377596890962355763.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20kprobes: Disable the jprobes APIsMasami Hiramatsu
Disable the jprobes APIs and comment out the jprobes API function code. This is in preparation of removing all jprobes related code (including kprobe's break_handler). Nowadays ftrace and other tracing features are mature enough to replace jprobes use-cases. Users can safely use ftrace and perf probe etc. for their use cases. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Ian McDonald <ian.mcdonald@jandi.co.nz> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlad Yasevich <vyasevich@gmail.com> Link: http://lkml.kernel.org/r/150724527741.5014.15465541485637899227.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=yMasami Hiramatsu
We want to wait for all potentially preempted kprobes trampoline execution to have completed. This guarantees that any freed trampoline memory is not in use by any task in the system anymore. synchronize_rcu_tasks() gives such a guarantee, so use it. Also, this guarantees to wait for all potentially preempted tasks on the instructions which will be replaced with a jump. Since this becomes a problem only when CONFIG_PREEMPT=y, enable CONFIG_TASKS_RCU=y for synchronize_rcu_tasks() in that case. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150845661962.5443.17724352636247312231.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-19doc: Fix various RCU docbook comment-header problemsPaul E. McKenney
Because many of RCU's files have not been included into docbook, a number of errors have accumulated. This commit fixes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19membarrier: Provide register expedited private commandMathieu Desnoyers
This introduces a "register private expedited" membarrier command which allows eventual removal of important memory barrier constraints on the scheduler fast-paths. It changes how the "private expedited" membarrier command (new to 4.14) is used from user-space. This new command allows processes to register their intent to use the private expedited command. This affects how the expedited private command introduced in 4.14-rc is meant to be used, and should be merged before 4.14 final. Processes are now required to register before using MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. This fixes a problem that arose when designing requested extensions to sys_membarrier() to allow JITs to efficiently flush old code from instruction caches. Several potential algorithms are much less painful if the user register intent to use this functionality early on, for example, before the process spawns the second thread. Registering at this time removes the need to interrupt each and every thread in that process at the first expedited sys_membarrier() system call. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19rcu/segcblist: Include rcupdate.hSebastian Andrzej Siewior
The RT build on ARM complains about non-existing ULONG_CMP_LT. This commit therefore includes rcupdate.h into rcu_segcblist.c. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19rcu: Add extended-quiescent-state testing advicePaul E. McKenney
If you add or remove calls to rcu_idle_enter(), rcu_user_enter(), rcu_irq_exit(), rcu_irq_exit_irqson(), rcu_idle_exit(), rcu_user_exit(), rcu_irq_enter(), rcu_irq_enter_irqson(), rcu_nmi_enter(), or rcu_nmi_exit(), you should run a full set of tests on a kernel built with CONFIG_RCU_EQS_DEBUG=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19rcu: Suppress lockdep false-positive ->boost_mtx complaintsPaul E. McKenney
RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an rt_mutex structure in locked state held by some other task. When that other task releases it, lockdep complains (quite accurately, but a bit uselessly) that the other task never acquired it. This complaint can suppress other, more helpful, lockdep complaints, and in any case it is a false positive. This commit therefore switches from rt_mutex_unlock() to rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations. Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(), addtional adjustments will be required. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19rcu: Do not include rtmutex_common.h unconditionallySebastian Andrzej Siewior
This commit adjusts include files and provides definitions in preparation for suppressing lockdep false-positive ->boost_mtx complaints. Without this preparation, architectures not supporting rt_mutex will get build failures. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19clockevents: Retry programming min delta up to 10 timesJames Hogan
When CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n, the call path hrtimer_reprogram -> clockevents_program_event -> clockevents_program_min_delta will not retry if the clock event driver returns -ETIME. If the driver could not satisfy the program_min_delta for any reason, the lack of a retry means the CPU may not receive a tick interrupt, potentially until the counter does a full period. This leads to rcu_sched timeout messages as the stalled CPU is detected by other CPUs, and other issues if the CPU is holding locks or other resources at the point at which it stalls. There have been a couple of observed mechanisms through which a clock event driver could not satisfy the requested min_delta and return -ETIME. With the MIPS GIC driver, the shared execution resource within MT cores means inconventient latency due to execution of instructions from other hardware threads in the core, within gic_next_event, can result in an event being set in the past. Additionally under virtualisation it is possible to get unexpected latency during a clockevent device's set_next_event() callback which can make it return -ETIME even for a delta based on min_delta_ns. It isn't appropriate to use MIN_ADJUST in the virtualisation case as occasional hypervisor induced high latency will cause min_delta_ns to quickly increase to the maximum. Instead, borrow the retry pattern from the MIN_ADJUST case, but without making adjustments. Retry up to 10 times, each time increasing the attempted delta by min_delta, before giving up. [ Matt: Reworked the loop and made retry increase the delta. ] Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: "Martin Schwidefsky" <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@mips.com> Link: https://lkml.kernel.org/r/1508422643-6075-1-git-send-email-matt.redfearn@mips.com
2017-10-19bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocationsDaniel Borkmann
PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu allocator. Given we support __GFP_NOWARN now, lets just let the allocation request fail naturally instead. The two call sites from BPF mistakenly assumed __GFP_NOWARN would work, so no changes needed to their actual __alloc_percpu_gfp() calls which use the flag already. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19bpf: fix splat for illegal devmap percpu allocationDaniel Borkmann
It was reported that syzkaller was able to trigger a splat on devmap percpu allocation due to illegal/unsupported allocation request size passed to __alloc_percpu(): [ 70.094249] illegal size (32776) or align (8) for percpu allocation [ 70.094256] ------------[ cut here ]------------ [ 70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630 [...] [ 70.094325] Call Trace: [ 70.094328] __alloc_percpu_gfp+0x12/0x20 [ 70.094330] dev_map_alloc+0x134/0x1e0 [ 70.094331] SyS_bpf+0x9bc/0x1610 [ 70.094333] ? selinux_task_setrlimit+0x5a/0x60 [ 70.094334] ? security_task_setrlimit+0x43/0x60 [ 70.094336] entry_SYSCALL_64_fastpath+0x1a/0xa5 This was due to too large max_entries for the map such that we surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to fail naturally here, so switch to __alloc_percpu_gfp() and pass __GFP_NOWARN instead. Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map") Reported-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Shankara Pailoor <sp3485@columbia.edu> Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19kernel/module: Delete an error message for a failed memory allocation in ↵Markus Elfring
add_module_usage() Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2017-10-19irqdomain: Add __rcu annotations to radix tree slotMasahiro Yamada
Fix different address spaces warning of sparse. kernel/irq/irqdomain.c:1463:14: warning: incorrect type in assignment (different address spaces) kernel/irq/irqdomain.c:1463:14: expected void **slot kernel/irq/irqdomain.c:1463:14: got void [noderef] <asn:4>** kernel/irq/irqdomain.c:1465:66: warning: incorrect type in argument 2 (different address spaces) kernel/irq/irqdomain.c:1465:66: expected void [noderef] <asn:4>**slot kernel/irq/irqdomain.c:1465:66: got void **slot Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-10-19irqdomain: Move revmap_trees_mutex to struct irq_domainMasahiro Yamada
The revmap_trees_mutex protects domain->revmap_tree. There is no need to make it global because it is allowed to modify revmap_tree of two different domains concurrently. Having said that, this would not be a actual bottleneck because the interrupt map/unmap does not occur quite often. Rather, the motivation is to tidy up the code from a data structure point of view. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-10-19livepatch: add transition noticesJoe Lawrence
Log a few kernel debug messages at the beginning of the following livepatch transition functions: klp_complete_transition() klp_cancel_transition() klp_init_transition() klp_reverse_transition() Also update the log notice message in klp_start_transition() for similar verbiage as the above messages. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-10-19livepatch: move transition "complete" notice into klp_complete_transition()Joe Lawrence
klp_complete_transition() performs a bit of housework before a transition to KLP_PATCHED or KLP_UNPATCHED is actually completed (including post-(un)patch callbacks). To be consistent, move the transition "complete" kernel log notice out of klp_try_complete_transition() and into klp_complete_transition(). Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-10-19livepatch: add (un)patch callbacksJoe Lawrence
Provide livepatch modules a klp_object (un)patching notification mechanism. Pre and post-(un)patch callbacks allow livepatch modules to setup or synchronize changes that would be difficult to support in only patched-or-unpatched code contexts. Callbacks can be registered for target module or vmlinux klp_objects, but each implementation is klp_object specific. - Pre-(un)patch callbacks run before any (un)patching transition starts. - Post-(un)patch callbacks run once an object has been (un)patched and the klp_patch fully transitioned to its target state. Example use cases include modification of global data and registration of newly available services/handlers. See Documentation/livepatch/callbacks.txt for details and samples/livepatch/ for examples. Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-10-19locking/static_keys: Improve uninitialized key warningBorislav Petkov
Right now it says: static_key_disable_cpuslocked used before call to jump_label_init ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:161 static_key_disable_cpuslocked+0x68/0x70 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-rc5+ #1 Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016 task: ffffffff81c0e480 task.stack: ffffffff81c00000 RIP: 0010:static_key_disable_cpuslocked+0x68/0x70 RSP: 0000:ffffffff81c03ef0 EFLAGS: 00010096 ORIG_RAX: 0000000000000000 RAX: 0000000000000041 RBX: ffffffff81c32680 RCX: ffffffff81c5cbf8 RDX: 0000000000000001 RSI: 0000000000000092 RDI: 0000000000000002 RBP: ffff88807fffd240 R08: 726f666562206465 R09: 0000000000000136 R10: 0000000000000000 R11: 696e695f6c656261 R12: ffffffff82158900 R13: ffffffff8215f760 R14: 0000000000000001 R15: 0000000000000008 FS: 0000000000000000(0000) GS:ffff883f7f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88807ffff000 CR3: 0000000001c09000 CR4: 00000000000606b0 Call Trace: static_key_disable+0x16/0x20 start_kernel+0x15a/0x45d ? load_ucode_intel_bsp+0x11/0x2d secondary_startup_64+0xa5/0xb0 Code: 48 c7 c7 a0 15 cf 81 e9 47 53 4b 00 48 89 df e8 5f fc ff ff eb e8 48 c7 c6 \ c0 97 83 81 48 c7 c7 d0 ff a2 81 31 c0 e8 c5 9d f5 ff <0f> ff eb a7 0f ff eb \ b0 e8 eb a2 4b 00 53 48 89 fb e8 42 0e f0 but it doesn't tell me which key it is. So dump the key's name too: static_key_disable_cpuslocked(): static key 'virt_spin_lock_key' used before call to jump_label_init() And that makes pinpointing which key is causing that a lot easier. include/linux/jump_label.h | 14 +++++++------- include/linux/jump_label_ratelimit.h | 6 +++--- kernel/jump_label.c | 14 +++++++------- 3 files changed, 17 insertions(+), 17 deletions(-) Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171018152428.ffjgak4o25f7ept6@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18timer: Convert stub timer to timer_setup()Thomas Gleixner
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org>
2017-10-18workqueue: Convert timers to use timer_setup() (part 2)Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. (The prior workqueue patch missed a few timers.) Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Link: https://lkml.kernel.org/r/20171016225825.GA99101@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-18genirq: Add config option for reservation modeThomas Gleixner
The interrupt reservation mode requires reactivation of PCI/MSI interrupts. Create a config option, so the PCI code can set the corresponding flag when required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poulson <jopoulso@microsoft.com> Cc: Mihai Costache <v-micos@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-pci@vger.kernel.org Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Simon Xiao <sixiao@microsoft.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: devel@linuxdriverproject.org Cc: KY Srinivasan <kys@microsoft.com> Link: https://lkml.kernel.org/r/20171017075600.369375409@linutronix.de
2017-10-18timers: Avoid an unnecessary iteration in __run_timers()Zhenzhong Duan
If the base clock is behind jiffies in the soft irq expiry code then the next timer is retrieved by get_next_timer_interrupt() to avoid incrementing base clock one by one. If the next timer interrupt is past current jiffies then the base clock is set to jiffies - 1. At the call site this is incremented and another iteration through the expiry loop is executed which checks empty hash buckets. That's a pointless excercise because it's already known that the next timer is past jiffies. Set the base clock in that case to jiffies directly so it gets incremented to jiffies + 1 at the call site resulting in immediate termination of the expiry loop. [ tglx: Massaged changelog and added comment to the code ] Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Joe Jin <joe.jin@oracle.com> Cc: sboyd@codeaurora.org Cc: Srinivas Reddy Eeda <srinivas.eeda@oracle.com> Cc: john.stultz@linaro.org Link: https://lkml.kernel.org/r/7086a857-f90c-4616-bbe8-f7696f21626c@default
2017-10-18Revert "kprobes: Warn if optprobe handler tries to change execution path"Naveen N. Rao
This reverts commit: e863d539614641 ("kprobes: Warn if optprobe handler tries to change execution path") On PowerPC, we place a probe at kretprobe_trampoline to catch function returns and with CONFIG_OPTPROBES=y, this probe gets optimized. This works for us due to the way we handle the optprobe as described in commit: 762df10bad6954 ("powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()") With the above commit, we end up with a warning. As such, revert this change. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171017081834.3629-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18bpf: move knowledge about post-translation offsets out of verifierJakub Kicinski
Use the fact that verifier ops are now separate from program ops to define a separate set of callbacks for verification of already translated programs. Since we expect the analyzer ops to be defined only for a small subset of all program types initialize their array by hand (don't use linux/bpf_types.h). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: remove the verifier ops from program structureJakub Kicinski
Since the verifier ops don't have to be associated with the program for its entire lifetime we can move it to verifier's struct bpf_verifier_env. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: split verifier and program opsJakub Kicinski
struct bpf_verifier_ops contains both verifier ops and operations used later during program's lifetime (test_run). Split the runtime ops into a different structure. BPF_PROG_TYPE() will now append ## _prog_ops or ## _verifier_ops to the names. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: disallow arithmetic operations on context pointerJakub Kicinski
Commit f1174f77b50c ("bpf/verifier: rework value tracking") removed the crafty selection of which pointer types are allowed to be modified. This is OK for most pointer types since adjust_ptr_min_max_vals() will catch operations on immutable pointers. One exception is PTR_TO_CTX which is now allowed to be offseted freely. The intent of aforementioned commit was to allow context access via modified registers. The offset passed to ->is_valid_access() verifier callback has been adjusted by the value of the variable offset. What is missing, however, is taking the variable offset into account when the context register is used. Or in terms of the code adding the offset to the value passed to the ->convert_ctx_access() callback. This leads to the following eBPF user code: r1 += 68 r0 = *(u32 *)(r1 + 8) exit being translated to this in kernel space: 0: (07) r1 += 68 1: (61) r0 = *(u32 *)(r1 +180) 2: (95) exit Offset 8 is corresponding to 180 in the kernel, but offset 76 is valid too. Verifier will "accept" access to offset 68+8=76 but then "convert" access to offset 8 as 180. Effective access to offset 248 is beyond the kernel context. (This is a __sk_buff example on a debug-heavy kernel - packet mark is 8 -> 180, 76 would be data.) Dereferencing the modified context pointer is not as easy as dereferencing other types, because we have to translate the access to reading a field in kernel structures which is usually at a different offset and often of a different size. To allow modifying the pointer we would have to make sure that given eBPF instruction will always access the same field or the fields accessed are "compatible" in terms of offset and size... Disallow dereferencing modified context pointers and add to selftests the test case described here. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18perf/core: Export AUX buffer helpers to modulesWill Deacon
Perf PMU drivers using AUX buffers cannot be built as modules unless the AUX helpers are exported. This patch exports perf_aux_output_{begin,end,skip} and perf_get_aux to modules. Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-18genirq: export irq_get_percpu_devid_partition to modulesWill Deacon
Any modular driver using cluster-affine PPIs needs to be able to call irq_get_percpu_devid_partition so that it can enable the IRQ on the correct subset of CPUs. This patch exports the symbol so that it can be called from within a module. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-10-18bpf: cpumap add tracepointsJesper Dangaard Brouer
This adds two tracepoint to the cpumap. One for the enqueue side trace_xdp_cpumap_enqueue() and one for the kthread dequeue side trace_xdp_cpumap_kthread(). To mitigate the tracepoint overhead, these are invoked during the enqueue/dequeue bulking phases, thus amortizing the cost. The obvious use-cases are for debugging and monitoring. The non-intuitive use-case is using these as a feedback loop to know the system load. One can imagine auto-scaling by reducing, adding or activating more worker CPUs on demand. V4: tracepoint remove time_limit info, instead add sched info V8: intro struct bpf_cpu_map_entry members cpu+map_id in this patch Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: cpumap xdp_buff to skb conversion and allocationJesper Dangaard Brouer
This patch makes cpumap functional, by adding SKB allocation and invoking the network stack on the dequeuing CPU. For constructing the SKB on the remote CPU, the xdp_buff in converted into a struct xdp_pkt, and it mapped into the top headroom of the packet, to avoid allocating separate mem. For now, struct xdp_pkt is just a cpumap internal data structure, with info carried between enqueue to dequeue. If a driver doesn't have enough headroom it is simply dropped, with return code -EOVERFLOW. This will be picked up the xdp tracepoint infrastructure, to allow users to catch this. V2: take into account xdp->data_meta V4: - Drop busypoll tricks, keeping it more simple. - Skip RPS and Generic-XDP-recursive-reinjection, suggested by Alexei V5: correct RCU read protection around __netif_receive_skb_core. V6: Setting TASK_RUNNING vs TASK_INTERRUPTIBLE based on talk with Rik van Riel Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: XDP_REDIRECT enable use of cpumapJesper Dangaard Brouer
This patch connects cpumap to the xdp_do_redirect_map infrastructure. Still no SKB allocation are done yet. The XDP frames are transferred to the other CPU, but they are simply refcnt decremented on the remote CPU. This served as a good benchmark for measuring the overhead of remote refcnt decrement. If driver page recycle cache is not efficient then this, exposes a bottleneck in the page allocator. A shout-out to MST's ptr_ring, which is the secret behind is being so efficient to transfer memory pointers between CPUs, without constantly bouncing cache-lines between CPUs. V3: Handle !CONFIG_BPF_SYSCALL pointed out by kbuild test robot. V4: Make Generic-XDP aware of cpumap type, but don't allow redirect yet, as implementation require a separate upstream discussion. V5: - Fix a maybe-uninitialized pointed out by kbuild test robot. - Restrict bpf-prog side access to cpumap, open when use-cases appear - Implement cpu_map_enqueue() as a more simple void pointer enqueue V6: - Allow cpumap type for usage in helper bpf_redirect_map, general bpf-prog side restriction moved to earlier patch. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAPJesper Dangaard Brouer
The 'cpumap' is primarily used as a backend map for XDP BPF helper call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'. This patch implement the main part of the map. It is not connected to the XDP redirect system yet, and no SKB allocation are done yet. The main concern in this patch is to ensure the datapath can run without any locking. This adds complexity to the setup and tear-down procedure, which assumptions are extra carefully documented in the code comments. V2: - make sure array isn't larger than NR_CPUS - make sure CPUs added is a valid possible CPU V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl> V5: - Restrict map allocation to root / CAP_SYS_ADMIN - WARN_ON_ONCE if queue is not empty on tear-down - Return -EPERM on memlock limit instead of -ENOMEM - Error code in __cpu_map_entry_alloc() also handle ptr_ring_cleanup() - Moved cpu_map_enqueue() to next patch V6: all notice by Daniel Borkmann - Fix err return code in cpu_map_alloc() introduced in V5 - Move cpu_possible() check after max_entries boundary check - Forbid usage initially in check_map_func_compatibility() V7: - Fix alloc error path spotted by Daniel Borkmann - Did stress test adding+removing CPUs from the map concurrently - Fixed refcnt issue on cpu_map_entry, kthread started too soon - Make sure packets are flushed during tear-down, involved use of rcu_barrier() and kthread_run only exit after queue is empty - Fix alloc error path in __cpu_map_entry_alloc() for ptr_ring V8: - Nitpicking comments and gramma by Edward Cree - Fix missing semi-colon introduced in V7 due to rebasing - Move struct bpf_cpu_map_entry members cpu+map_id to tracepoint patch Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-17time: Use do_settimeofday64() internallyArnd Bergmann
do_settimeofday() is a wrapper around do_settimeofday64(), so that function can be called directly. The wrapper can be removed once the last user is gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: y2038@lists.linaro.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Link: https://lkml.kernel.org/r/20171013183452.3635956-1-arnd@arndb.de
2017-10-17posix-stubs: Use get_timespec64() and put_timespec64()Arnd Bergmann
This is a follow-up to commit 5c4994102fb5 ("posix-timers: Use get_timespec64() and put_timespec64()"), which left two system call using copy_from_user()/copy_to_user(). Change them as well for consistency. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: y2038@lists.linaro.org Cc: John Stultz <john.stultz@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Link: https://lkml.kernel.org/r/20171013183009.3442318-1-arnd@arndb.de
2017-10-16ftrace: Kill FTRACE_OPS_FL_PER_CPUPeter Zijlstra
The one and only user of FTRACE_OPS_FL_PER_CPU is gone, remove the lot. Link: http://lkml.kernel.org/r/20171011080224.372422809@infradead.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-16perf/ftrace: Small cleanupPeter Zijlstra
ops->flags _should_ be 0 at this point, so setting the flag using bitwise or is a bit daft. Link: http://lkml.kernel.org/r/20171011080224.315585202@infradead.org Requested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-16perf/ftrace: Fix function trace eventsPeter Zijlstra
The function-trace <-> perf interface is a tad messed up. Where all the other trace <-> perf interfaces use a single trace hook registration and use per-cpu RCU based hlist to iterate the events, function-trace actually needs multiple hook registrations in order to minimize function entry patching when filters are present. The end result is that we iterate events both on the trace hook and on the hlist, which results in reporting events multiple times. Since function-trace cannot use the regular scheme, fix it the other way around, use singleton hlists. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-16perf/ftrace: Revert ("perf/ftrace: Fix double traces of perf on ↵Peter Zijlstra
ftrace:function") Revert commit: 75e8387685f6 ("perf/ftrace: Fix double traces of perf on ftrace:function") The reason I instantly stumbled on that patch is that it only addresses the ftrace situation and doesn't mention the other _5_ places that use this interface. It doesn't explain why those don't have the problem and if not, why their solution doesn't work for ftrace. It doesn't, but this is just putting more duct tape on. Link: http://lkml.kernel.org/r/20171011080224.200565770@infradead.org Cc: Zhou Chengming <zhouchengming1@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-16tracing: bpf: Hide bpf trace events when they are not usedSteven Rostedt (VMware)
All the trace events defined in include/trace/events/bpf.h are only used when CONFIG_BPF_SYSCALL is defined. But this file gets included by include/linux/bpf_trace.h which is included by the networking code with CREATE_TRACE_POINTS defined. If a trace event is created but not used it still has data structures and functions created for its use, even though nothing is using them. To not waste space, do not define the BPF trace events in bpf.h unless CONFIG_BPF_SYSCALL is defined. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16Merge tag 'irqchip-4.14-3' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip updates for 4.14-rc5 from Marc Zyngier: - Fix unfortunate mistake in the GICv3 ITS binding example - Two fixes for the recently merged GICv4 support - GICv3 ITS 52bit PA fixes - Generic irqchip mask-ack fix, and its application to the tango irqchip
2017-10-14Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three fixes that address an SMP balancing performance regression" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Ensure load_balance() respects the active_mask sched/core: Address more wake_affine() regressions sched/core: Fix wake_affine() performance regression
2017-10-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Some tooling fixes plus three kernel fixes: a memory leak fix, a statistics fix and a crash fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix memory leaks on allocation failures perf/core: Fix cgroup time when scheduling descendants perf/core: Avoid freeing static PMU contexts when PMU is unregistered tools include uapi bpf.h: Sync kernel ABI header with tooling header perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMU perf script: Add missing separator for "-F ip,brstack" (and brstackoff) perf callchain: Compare dsos (as well) for CCKEY_FUNCTION
2017-10-14Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two lockdep fixes for bugs introduced by the cross-release dependency tracking feature - plus a commit that disables it because performance regressed in an absymal fashion on some systems" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Disable cross-release features for now locking/selftest: Avoid false BUG report locking/lockdep: Fix stacktrace mess
2017-10-14Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "A CPU hotplug related fix, plus two related sanity checks" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs genirq/cpuhotplug: Add sanity check for effective affinity mask genirq: Warn when effective affinity is not updated