summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-07-15selftests/bpf: Add a test case for unix sockmapCong Wang
Add a test case to ensure redirection between two AF_UNIX datagram sockets work. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-11-xiyou.wangcong@gmail.com
2021-07-15selftests/bpf: Factor out add_to_sockmap()Cong Wang
Factor out a common helper add_to_sockmap() which adds two sockets into a sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-10-xiyou.wangcong@gmail.com
2021-07-15selftests/bpf: Factor out udp_socketpair()Cong Wang
Factor out a common helper udp_socketpair() which creates a pair of connected UDP sockets. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-9-xiyou.wangcong@gmail.com
2021-07-15af_unix: Implement unix_dgram_bpf_recvmsg()Cong Wang
We have to implement unix_dgram_bpf_recvmsg() to replace the original ->recvmsg() to retrieve skmsg from ingress_msg. AF_UNIX is again special here because the lack of sk_prot->recvmsg(). I simply add a special case inside unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-8-xiyou.wangcong@gmail.com
2021-07-15af_unix: Implement ->psock_update_sk_prot()Cong Wang
Now we can implement unix_bpf_update_proto() to update sk_prot, especially prot->close(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-7-xiyou.wangcong@gmail.com
2021-07-15af_unix: Add a dummy ->close() for sockmapCong Wang
Unlike af_inet, unix_proto is very different, it does not even have a ->close(). We have to add a dummy implementation to satisfy sockmap. Normally it is just a nop, it is introduced only for sockmap to replace it. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-6-xiyou.wangcong@gmail.com
2021-07-15af_unix: Set TCP_ESTABLISHED for datagram sockets tooCong Wang
Currently only unix stream socket sets TCP_ESTABLISHED, datagram socket can set this too when they connect to its peer socket. At least __ip4_datagram_connect() does the same. This will be used to determine whether an AF_UNIX datagram socket can be redirected to in sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-5-xiyou.wangcong@gmail.com
2021-07-15af_unix: Implement ->read_sock() for sockmapCong Wang
Implement ->read_sock() for AF_UNIX datagram socket, it is pretty much similar to udp_read_sock(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-4-xiyou.wangcong@gmail.com
2021-07-15sock_map: Lift socket state restriction for datagram socketsCong Wang
TCP and other connection oriented sockets have accept() for each incoming connection on the server side, hence they can just insert those fd's from accept() to sockmap, which are of course established. Now with datagram sockets begin to support sockmap and redirection, the restriction is no longer applicable to them, as they have no accept(). So we have to lift this restriction for them. This is fine, because inside bpf_sk_redirect_map() we still have another socket status check, sock_map_redirect_allowed(), as a guard. This also means they do not have to be removed from sockmap when disconnecting. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com
2021-07-15sock_map: Relax config dependency to CONFIG_NETCong Wang
Currently sock_map still has Kconfig dependency on CONFIG_INET, but there is no actual functional dependency on it after we introduce ->psock_update_sk_prot(). We have to extend it to CONFIG_NET now as we are going to support AF_UNIX. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-2-xiyou.wangcong@gmail.com
2021-07-15Revert "Makefile: Enable -Wimplicit-fallthrough for Clang"Linus Torvalds
This reverts commit b7eb335e26a9c7f258c96b3962c283c379d3ede0. It turns out that the problem with the clang -Wimplicit-fallthrough warning is not about the kernel source code, but about clang itself, and that the warning is unusable until clang fixes its broken ways. In particular, when you enable this warning for clang, you not only get warnings about implicit fallthroughs. You also get this: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough] which is completely broken becasue it (a) doesn't even tell you where the problem is (seriously: no line numbers, no filename, no nothing). (b) is fundamentally broken anyway, because there are perfectly valid reasons to have a fallthrough statement even if it turns out that it can perhaps not be reached. In the kernel, an example of that second case is code in the scheduler: switch (state) { case cpuset: if (IS_ENABLED(CONFIG_CPUSETS)) { cpuset_cpus_allowed_fallback(p); state = possible; break; } fallthrough; case possible: where if CONFIG_CPUSETS is enabled you actually never hit the fallthrough case at all. But that in no way makes the fallthrough wrong. So the warning is completely broken, and enabling it for clang is a very bad idea. In the meantime, we can keep the gcc option enabled, and make the gcc build use -Wimplicit-fallthrough=5 which means that we will at least continue to require a proper fallthrough statement, and that gcc won't silently accept the magic comment versions. Because gcc does this all correctly, and while the odd "=5" part is kind of obscure, it's documented in [1]: "-Wimplicit-fallthrough=5 doesn’t recognize any comments as fallthrough comments, only attributes disable the warning" so if clang ever fixes its bad behavior we can try enabling it there again. Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html [1] Cc: Kees Cook <keescook@chromium.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-15Merge branch 'Add bpf_get_func_ip helper'Alexei Starovoitov
Jiri Olsa says: ==================== Add bpf_get_func_ip helper that returns IP address of the caller function for trampoline and krobe programs. There're 2 specific implementation of the bpf_get_func_ip helper, one for trampoline progs and one for kprobe/kretprobe progs. The trampoline helper call is replaced/inlined by the verifier with simple move instruction. The kprobe/kretprobe is actual helper call that returns prepared caller address. Also available at: https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git bpf/get_func_ip v4 changes: - dropped jit/x86 check for get_func_ip tracing check [Alexei] - added code to bpf_get_func_ip_tracing [Alexei] and tested that it works without inlining [Alexei] - changed has_get_func_ip to check_get_func_ip [Andrii] - replaced test assert loop with explicit asserts [Andrii] - adde bpf_program__attach_kprobe_opts function and use it for offset setup [Andrii] - used bpf_program__set_autoload(false) for test6 [Andrii] - added Masami's ack v3 changes: - resend with Masami in cc and v3 in each patch subject v2 changes: - use kprobe_running to get kprobe instead of cpu var [Masami] - added support to add kprobe on function+offset and test for that [Alan] ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-07-15selftests/bpf: Add test for bpf_get_func_ip in kprobe+offset probeJiri Olsa
Adding test for bpf_get_func_ip in kprobe+ofset probe. Because of the offset value it's arch specific, enabling the new test only for x86_64 architecture. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-9-jolsa@kernel.org
2021-07-15libbpf: Allow specification of "kprobe/function+offset"Alan Maguire
kprobes can be placed on most instructions in a function, not just entry, and ftrace and bpftrace support the function+offset notification for probe placement. Adding parsing of func_name into func+offset to bpf_program__attach_kprobe() allows the user to specify SEC("kprobe/bpf_fentry_test5+0x6") ...for example, and the offset can be passed to perf_event_open_probe() to support kprobe attachment. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-8-jolsa@kernel.org
2021-07-15libbpf: Add bpf_program__attach_kprobe_opts functionJiri Olsa
Adding bpf_program__attach_kprobe_opts that does the same as bpf_program__attach_kprobe, but takes opts argument. Currently opts struct holds just retprobe bool, but we will add new field in following patch. The function is not exported, so there's no need to add size to the struct bpf_program_attach_kprobe_opts for now. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-7-jolsa@kernel.org
2021-07-15selftests/bpf: Add test for bpf_get_func_ip helperJiri Olsa
Adding test for bpf_get_func_ip helper for fentry, fexit, kprobe, kretprobe and fmod_ret programs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-6-jolsa@kernel.org
2021-07-15bpf: Add bpf_get_func_ip helper for kprobe programsJiri Olsa
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs, so it's now possible to call bpf_get_func_ip from both kprobe and kretprobe programs. Taking the caller's address from 'struct kprobe::addr', which is defined for both kprobe and kretprobe. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org
2021-07-15bpf: Add bpf_get_func_ip helper for tracing programsJiri Olsa
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs, specifically for all trampoline attach types. The trampoline's caller IP address is stored in (ctx - 8) address. so there's no reason to actually call the helper, but rather fixup the call instruction and return [ctx - 8] value directly. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org
2021-07-16Merge tag 'drm-intel-fixes-2021-07-15' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Two regression fixes targeting stable: - Fix -EDEADLK handling regression (Ville) - Drop the page table optimisation (Matt) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YPA8y1DSCp2EbtpC@intel.com
2021-07-15Merge tag 'configfs-5.13-1' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs fix from Christoph Hellwig: - fix the read and write iterators (Bart Van Assche) * tag 'configfs-5.13-1' of git://git.infradead.org/users/hch/configfs: configfs: fix the read and write iterators
2021-07-16Merge tag 'drm-misc-fixes-2021-07-15' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull (less than what git shortlog provides): * fbdev: Avoid use-after-free by not deleting current video mode * ttm: Avoid NULL-ptr deref in ttm_range_man_fini() * vmwgfx: Fix a merge commit Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YO/yoFO+iSEqnIH0@linux-uq9g
2021-07-15Merge tag 'pwm/for-5.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm fixes from Thierry Reding: "A couple of fixes from Uwe that I missed for v5.14-rc1" * tag 'pwm/for-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: ep93xx: Ensure configuring period and duty_cycle isn't wrongly skipped pwm: berlin: Ensure configuring period and duty_cycle isn't wrongly skipped pwm: tiecap: Ensure configuring period and duty_cycle isn't wrongly skipped pwm: spear: Ensure configuring period and duty_cycle isn't wrongly skipped pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
2021-07-15bpf: Enable BPF_TRAMP_F_IP_ARG for trampolines with call_get_func_ipJiri Olsa
Enabling BPF_TRAMP_F_IP_ARG for trampolines that actually need it. The BPF_TRAMP_F_IP_ARG adds extra 3 instructions to trampoline code and is used only by programs with bpf_get_func_ip helper, which is added in following patch and sets call_get_func_ip bit. This patch ensures that BPF_TRAMP_F_IP_ARG flag is used only for trampolines that have programs with call_get_func_ip set. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-3-jolsa@kernel.org
2021-07-15bpf, x86: Store caller's ip in trampoline stackJiri Olsa
Storing caller's ip in trampoline's stack. Trampoline programs can reach the IP in (ctx - 8) address, so there's no change in program's arguments interface. The IP address is takes from [fp + 8], which is return address from the initial 'call fentry' call to trampoline. This IP address will be returned via bpf_get_func_ip helper helper, which is added in following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-2-jolsa@kernel.org
2021-07-15SMB3.1.1: fix mount failure to some servers when compression enabledSteve French
When sending the compression context to some servers, they rejected the SMB3.1.1 negotiate protocol because they expect the compression context to have a data length of a multiple of 8. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-15cifs: added WARN_ON for all the count decrementsShyam Prasad N
We have a few ref counters srv_count, ses_count and tc_count which we use for ref counting. Added a WARN_ON during the decrement of each of these counters to make sure that they don't go below their minimum values. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-15cifs: fix missing null session check in mountSteve French
Although it is unlikely to be have ended up with a null session pointer calling cifs_try_adding_channels in cifs_mount. Coverity correctly notes that we are already checking for it earlier (when we return from do_dfs_failover), so at a minimum to clarify the code we should make sure we also check for it when we exit the loop so we don't end up calling cifs_try_adding_channels or mount_setup_tlink with a null ses pointer. Addresses-Coverity: 1505608 ("Derefernce after null check") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-15cifs: handle reconnect of tcon when there is no cached dfs referralPaulo Alcantara
When there is no cached DFS referral of tcon->dfs_path, then reconnect to same share. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-16Merge tag 'amd-drm-fixes-5.14-2021-07-14' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.14-2021-07-14: amdgpu: - SR-IOV fixes - RAS fixes - eDP fixes - SMU13 code unification to facilitate fixes in the future - Add new renoir DID - Yellow Carp fixes - Beige Goby fixes - Revert a bunch of TLB fixes that caused regressions - Revert an LTTPR display regression amdkfd - Fix VRAM access regression - SVM fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210714220858.5553-1-alexander.deucher@amd.com
2021-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Andrii Nakryiko says: ==================== pull-request: bpf 2021-07-15 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 5 day(s) which contain a total of 9 files changed, 37 insertions(+), 15 deletions(-). The main changes are: 1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and BPF_XDP_CPUMAP programs, from Xuan Zhuo. 2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo. 3) Follow-up fix to subprog poke descriptor use-after-free problem, from Daniel Borkmann and John Fastabend. 4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King. 5) Fix memory leak in BPF sockmap, from John Fastabend. 6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend and Jakub Sitnicki. 7) Fix NULL pointer dereference in bpftool, from Tobias Klauser. 8) AF_XDP documentation fixes, from Baruch Siach. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-15tracing: Do not reference char * as a string in histogramsSteven Rostedt (VMware)
The histogram logic was allowing events with char * pointers to be used as normal strings. But it was easy to crash the kernel with: # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger And open some files, and boom! BUG: unable to handle page fault for address: 00007f2ced0c3280 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strlen+0x0/0x20 Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280 RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98 R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074 R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280 FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0 Call Trace: event_hist_trigger+0x463/0x5f0 ? find_held_lock+0x32/0x90 ? sched_clock_cpu+0xe/0xd0 ? lock_release+0x155/0x440 ? kernel_init_free_pages+0x6d/0x90 ? preempt_count_sub+0x9b/0xd0 ? kernel_init_free_pages+0x6d/0x90 ? get_page_from_freelist+0x12c4/0x1680 ? __rb_reserve_next+0xe5/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 event_triggers_call+0x52/0xe0 ftrace_syscall_enter+0x264/0x2c0 syscall_trace_enter.constprop.0+0x1ee/0x210 do_syscall_64+0x1c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Where it triggered a fault on strlen(key) where key was the filename. The reason is that filename is a char * to user space, and the histogram code just blindly dereferenced it, with obvious bad results. I originally tried to use strncpy_from_user/kernel_nofault() but found that there's other places that its dereferenced and not worth the effort. Just do not allow "char *" to act like strings. Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Tom Zanussi <zanussi@kernel.org> Fixes: 79e577cbce4c4 ("tracing: Support string type key properly") Fixes: 5967bd5c4239 ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-15Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "This fixes many fall-through warnings when building with Clang and -Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for Clang, globally. It's also important to notice that since we have adopted the use of the pseudo-keyword macro fallthrough, we also want to avoid having more /* fall through */ comments being introduced. Contrary to GCC, Clang doesn't recognize any comments as implicit fall-through markings when the -Wimplicit-fallthrough option is enabled. So, in order to avoid having more comments being introduced, we use the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang, will cause a warning in case a code comment is intended to be used as a fall-through marking. The patch for Makefile also enforces this. We had almost 4,000 of these issues for Clang in the beginning, and there might be a couple more out there when building some architectures with certain configurations. However, with the recent fixes I think we are in good shape and it is now possible to enable the warning for Clang" * tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) Makefile: Enable -Wimplicit-fallthrough for Clang powerpc/smp: Fix fall-through warning for Clang dmaengine: mpc512x: Fix fall-through warning for Clang usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang powerpc/powernv: Fix fall-through warning for Clang MIPS: Fix unreachable code issue MIPS: Fix fall-through warnings for Clang ASoC: Mediatek: MT8183: Fix fall-through warning for Clang power: supply: Fix fall-through warnings for Clang dmaengine: ti: k3-udma: Fix fall-through warning for Clang s390: Fix fall-through warnings for Clang dmaengine: ipu: Fix fall-through warning for Clang iommu/arm-smmu-v3: Fix fall-through warning for Clang mmc: jz4740: Fix fall-through warning for Clang PCI: Fix fall-through warning for Clang scsi: libsas: Fix fall-through warning for Clang video: fbdev: Fix fall-through warning for Clang math-emu: Fix fall-through warning cpufreq: Fix fall-through warning for Clang drm/msm: Fix fall-through warning in msm_gem_new_impl() ...
2021-07-15Merge branch 'bpf-timers'Daniel Borkmann
Alexei Starovoitov says: ==================== The first request to support timers in bpf was made in 2013 before sys_bpf syscall was added. That use case was periodic sampling. It was address with attaching bpf programs to perf_events. Then during XDP development the timers were requested to do garbage collection and health checks. They were worked around by implementing timers in user space and triggering progs with BPF_PROG_RUN command. The user space timers and perf_event+bpf timers are not armed by the bpf program. They're done asynchronously vs program execution. The XDP program cannot send a packet and arm the timer at the same time. The tracing prog cannot record an event and arm the timer right away. This large class of use cases remained unaddressed. The jiffy based and hrtimer based timers are essential part of the kernel development and with this patch set the hrtimer based timers will be available to bpf programs. TLDR: bpf timers is a wrapper of hrtimers with all the extra safety added to make sure bpf progs cannot crash the kernel. v6->v7: - address Andrii's comments and add his Acks. v5->v6: - address code review feedback from Martin and add his Acks. - add usercnt > 0 check to bpf_timer_init and remove timers_cancel_and_free second loop in map_free callbacks. - add cond_resched_rcu. v4->v5: - Martin noticed the following issues: . prog could be reallocated bpf_patch_insn_data(). Fixed by passing 'aux' into bpf_timer_set_callback, since 'aux' is stable during insn patching. . Added missing rcu_read_lock. . Removed redundant record_map. - Discovered few bugs with stress testing: . One cpu does htab_free_prealloced_timers->bpf_timer_cancel_and_free->hrtimer_cancel while another is trying to do something with the timer like bpf_timer_start/set_callback. Those ops try to acquire bpf_spin_lock that is already taken by bpf_timer_cancel_and_free, so both cpus spin forever. The same problem existed in bpf_timer_cancel(). One bpf prog on one cpu might call bpf_timer_cancel and wait, while another cpu is in the timer callback that tries to do bpf_timer_*() helper on the same timer. The fix is to do drop_prog_refcnt() and unlock. And only then hrtimer_cancel. Because of this had to add callback_fn != NULL check to bpf_timer_cb(). Also removed redundant bpf_prog_inc/put from bpf_timer_cb() and replaced with rcu_dereference_check similar to recent rcu_read_lock-removal from drivers. bpf_timer_cb is in softirq. . Managed to hit refcnt==0 while doing bpf_prog_put from bpf_timer_cancel_and_free(). That exposed the issue that bpf_prog_put wasn't ready to be called from irq context. Fixed similar to bpf_map_put which is irq ready. - Refactored BPF_CALL_1(bpf_spin_lock) into __bpf_spin_lock_irqsave() to make the main logic more clear, since Martin and Yonghong brought up this concern. v3->v4: 1. Split callback_fn from bpf_timer_start into bpf_timer_set_callback as suggested by Martin. That makes bpf timer api match one to one to kernel hrtimer api and provides greater flexibility. 2. Martin also discovered the following issue with uref approach: bpftool prog load xdp_timer.o /sys/fs/bpf/xdp_timer type xdp bpftool net attach xdpgeneric pinned /sys/fs/bpf/xdp_timer dev lo rm /sys/fs/bpf/xdp_timer nc -6 ::1 8888 bpftool net detach xdpgeneric dev lo The timer callback stays active in the kernel though the prog was detached and map usercnt == 0. It happened because 'bpftool prog load' pinned the prog only. The map usercnt went to zero. Subsequent attach and runs didn't affect map usercnt. The timer was able to start and bpf_prog_inc itself. When the prog was detached the prog stayed active. To address this issue added if (!atomic64_read(&(t->map->usercnt))) return -EPERM; to the first patch. Which means that timers are allowed only in the maps that are held by user space with open file descriptor or maps pinned in bpffs. 3. Discovered that timers in inner maps were broken. The inner map pointers are dynamic. Therefore changed bpf_timer_init() to accept explicit map pointer supplied by the program instead of hidden map pointer supplied by the verifier. To make sure that pointer to a timer actually belongs to that map added the verifier check in patch 3. 4. Addressed Yonghong's feedback. Improved comments and added dynamic in_nmi() check. Added Acks. v2->v3: The v2 approach attempted to bump bpf_prog refcnt when bpf_timer_start is called to make sure callback code doesn't disappear when timer is active and drop refcnt when timer cb is done. That led to a ton of race conditions between callback running and concurrent bpf_timer_init/start/cancel on another cpu, and concurrent bpf_map_update/delete_elem, and map destroy. Then v2.5 approach skipped prog refcnt altogether. Instead it remembered all timers that bpf prog armed in a link list and canceled them when prog refcnt went to zero. The race conditions disappeared, but timers in map-in-map could not be supported cleanly, since timers in inner maps have inner map's life time and don't match prog's life time. This v3 approach makes timers to be owned by maps. It allows timers in inner maps to be supported from the start. This apporach relies on "user refcnt" scheme used in prog_array that stores bpf programs for bpf_tail_call. The bpf_timer_start() increments prog refcnt, but unlike 1st approach the timer callback does decrement the refcnt. The ops->map_release_uref is responsible for cancelling the timers and dropping prog refcnt when user space reference to a map is dropped. That addressed all the races and simplified locking. Andrii presented a use case where specifying callback_fn in bpf_timer_init() is inconvenient vs specifying in bpf_timer_start(). The bpf_timer_init() typically is called outside for timer callback, while bpf_timer_start() most likely will be called from the callback. timer_cb() { ... bpf_timer_start(timer_cb); ...} looks like recursion and as infinite loop to the verifier. The verifier had to be made smarter to recognize such async callbacks. Patches 7,8,9 addressed that. Patch 1 and 2 refactoring. Patch 3 implements bpf timer helpers and locking. Patch 4 implements map side of bpf timer support. Patch 5 prevent pointer mismatch in bpf_timer_init. Patch 6 adds support for BTF in inner maps. Patch 7 teaches check_cfg() pass to understand async callbacks. Patch 8 teaches do_check() pass to understand async callbacks. Patch 9 teaches check_max_stack_depth() pass to understand async callbacks. Patches 10 and 11 are the tests. v1->v2: - Addressed great feedback from Andrii and Toke. - Fixed race between parallel bpf_timer_*() ops. - Fixed deadlock between timer callback and LRU eviction or bpf_map_delete/update. - Disallowed mmap and global timers. - Allow spin_lock and bpf_timer in an element. - Fixed memory leaks due to map destruction and LRU eviction. - A ton more tests. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-07-15perf trace: Free strings in trace__parse_events_option()Riccardo Mancini
ASan reports several memory leaks running: # perf test "88: Check open filename arg using perf trace + vfs_getname" The fourth of these leaks is related to some strings never being freed in trace__parse_events_option. This patch adds the missing frees. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/34d08535b11124106b859790549991abff5a7de8.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-15perf trace: Free syscall tp fields in evsel->privRiccardo Mancini
ASan reports several memory leaks running: # perf test "88: Check open filename arg using perf trace + vfs_getname" The third of these leaks is related to evsel->priv fields of sycalls never being deallocated. This patch adds the function evlist__free_syscall_tp_fields which iterates over all evsels in evlist, matching syscalls, and calling the missing frees. This new function is called at the end of trace__run, right before calling evlist__delete. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/46526611904ec5ff2768b59014e3afce8e0197d1.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-15perf trace: Free syscall->arg_fmtRiccardo Mancini
ASan reports several memory leaks running: # perf test "88: Check open filename arg using perf trace + vfs_getname" The second of these leaks is caused by the arg_fmt field of syscall not being deallocated. This patch adds a new function syscall__exit which is called on all syscalls.table entries in trace__exit, which will free the arg_fmt field. Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/d68f25c043d30464ac9fa79c3399e18f429bca82.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-15perf trace: Free malloc'd trace fields on exitRiccardo Mancini
ASan reports several memory leaks running: # perf test "88: Check open filename arg using perf trace + vfs_getname" The first of these leaks is related to struct trace fields never being deallocated. This patch adds the function trace__exit, which is called at the end of cmd_trace, replacing the existing deallocation, which is now moved inside the new function. This function deallocates: - ev_qualifier - ev_qualifier_ids.entries - syscalls.table - sctbl - perfconfig_events Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/de5945ed5c0cb882cbfa3268567d0bff460ff016.1626343282.git.rickyman7@gmail.com [ Removed needless initialization to zero, missing named initializers are zeroed by the compiler ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-15selftests/bpf: Add a test with bpf_timer in inner map.Alexei Starovoitov
Check that map-in-map supports bpf timers. Check that indirect "recursion" of timer callbacks works: timer_cb1() { bpf_timer_set_callback(timer_cb2); } timer_cb2() { bpf_timer_set_callback(timer_cb1); } Check that bpf_map_release htab_free_prealloced_timers bpf_timer_cancel_and_free hrtimer_cancel works while timer cb is running. "while true; do ./test_progs -t timer_mim; done" is a great stress test. It caught missing timer cancel in htab->extra_elems. timer_mim_reject.c is a negative test that checks that timer<->map mismatch is prevented. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-12-alexei.starovoitov@gmail.com
2021-07-15selftests/bpf: Add bpf_timer test.Alexei Starovoitov
Add bpf_timer test that creates timers in preallocated and non-preallocated hash, in array and in lru maps. Let array timer expire once and then re-arm it for 35 seconds. Arm lru timer into the same callback. Then arm and re-arm hash timers 10 times each. At the last invocation of prealloc hash timer cancel the array timer. Force timer free via LRU eviction and direct bpf_map_delete_elem. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-11-alexei.starovoitov@gmail.com
2021-07-15bpf: Teach stack depth check about async callbacks.Alexei Starovoitov
Teach max stack depth checking algorithm about async callbacks that don't increase bpf program stack size. Also add sanity check that bpf_tail_call didn't sneak into async cb. It's impossible, since PTR_TO_CTX is not available in async cb, hence the program cannot contain bpf_tail_call(ctx,...); Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-10-alexei.starovoitov@gmail.com
2021-07-15bpf: Implement verifier support for validation of async callbacks.Alexei Starovoitov
bpf_for_each_map_elem() and bpf_timer_set_callback() helpers are relying on PTR_TO_FUNC infra in the verifier to validate addresses to subprograms and pass them into the helpers as function callbacks. In case of bpf_for_each_map_elem() the callback is invoked synchronously and the verifier treats it as a normal subprogram call by adding another bpf_func_state and new frame in __check_func_call(). bpf_timer_set_callback() doesn't invoke the callback directly. The subprogram will be called asynchronously from bpf_timer_cb(). Teach the verifier to validate such async callbacks as special kind of jump by pushing verifier state into stack and let pop_stack() process it. Special care needs to be taken during state pruning. The call insn doing bpf_timer_set_callback has to be a prune_point. Otherwise short timer callbacks might not have prune points in front of bpf_timer_set_callback() which means is_state_visited() will be called after this call insn is processed in __check_func_call(). Which means that another async_cb state will be pushed to be walked later and the verifier will eventually hit BPF_COMPLEXITY_LIMIT_JMP_SEQ limit. Since push_async_cb() looks like another push_stack() branch the infinite loop detection will trigger false positive. To recognize this case mark such states as in_async_callback_fn. To distinguish infinite loop in async callback vs the same callback called with different arguments for different map and timer add async_entry_cnt to bpf_func_state. Enforce return zero from async callbacks. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-9-alexei.starovoitov@gmail.com
2021-07-15bpf: Relax verifier recursion check.Alexei Starovoitov
In the following bpf subprogram: static int timer_cb(void *map, void *key, void *value) { bpf_timer_set_callback(.., timer_cb); } the 'timer_cb' is a pointer to a function. ld_imm64 insn is used to carry this pointer. bpf_pseudo_func() returns true for such ld_imm64 insn. Unlike bpf_for_each_map_elem() the bpf_timer_set_callback() is asynchronous. Relax control flow check to allow such "recursion" that is seen as an infinite loop by check_cfg(). The distinction between bpf_for_each_map_elem() the bpf_timer_set_callback() is done in the follow up patch. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-8-alexei.starovoitov@gmail.com
2021-07-15bpf: Remember BTF of inner maps.Alexei Starovoitov
BTF is required for 'struct bpf_timer' to be recognized inside map value. The bpf timers are supported inside inner maps. Remember 'struct btf *' in inner_map_meta to make it available to the verifier in the sequence: struct bpf_map *inner_map = bpf_map_lookup_elem(&outer_map, ...); if (inner_map) timer = bpf_map_lookup_elem(&inner_map, ...); Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-7-alexei.starovoitov@gmail.com
2021-07-15bpf: Prevent pointer mismatch in bpf_timer_init.Alexei Starovoitov
bpf_timer_init() arguments are: 1. pointer to a timer (which is embedded in map element). 2. pointer to a map. Make sure that pointer to a timer actually belongs to that map. Use map_uid (which is unique id of inner map) to reject: inner_map1 = bpf_map_lookup_elem(outer_map, key1) inner_map2 = bpf_map_lookup_elem(outer_map, key2) if (inner_map1 && inner_map2) { timer = bpf_map_lookup_elem(inner_map1); if (timer) // mismatch would have been allowed bpf_timer_init(timer, inner_map2); } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-6-alexei.starovoitov@gmail.com
2021-07-15bpf: Add map side support for bpf timers.Alexei Starovoitov
Restrict bpf timers to array, hash (both preallocated and kmalloced), and lru map types. The per-cpu maps with timers don't make sense, since 'struct bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that the number of timers depends on number of possible cpus and timers would not be accessible from all cpus. lpm map support can be added in the future. The timers in inner maps are supported. The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free bpf_timer in a given map element. Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate that map element indeed contains 'struct bpf_timer'. Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when map element data is reused in preallocated htab and lru maps. Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single map element. There could be one of each, but not more than one. Due to 'one bpf_timer in one element' restriction do not support timers in global data, since global data is a map of single element, but from bpf program side it's seen as many global variables and restriction of single global timer would be odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with timers, since user space could have corrupted mmap element and crashed the kernel. The maps with timers cannot be readonly. Due to these restrictions search for bpf_timer in datasec BTF in case it was placed in the global data to report clear error. The previous patch allowed 'struct bpf_timer' as a first field in a map element only. Relax this restriction. Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free the timer when lru map deletes an element as a part of it eviction algorithm. Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store. The timer operation are done through helpers only. This is similar to 'struct bpf_spin_lock'. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-5-alexei.starovoitov@gmail.com
2021-07-15bpf: Introduce bpf timers.Alexei Starovoitov
Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded in hash/array/lru maps as a regular field and helpers to operate on it: // Initialize the timer. // First 4 bits of 'flags' specify clockid. // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed. long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, int flags); // Configure the timer to call 'callback_fn' static function. long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn); // Arm the timer to expire 'nsec' nanoseconds from the current time. long bpf_timer_start(struct bpf_timer *timer, u64 nsec, u64 flags); // Cancel the timer and wait for callback_fn to finish if it was running. long bpf_timer_cancel(struct bpf_timer *timer); Here is how BPF program might look like: struct map_elem { int counter; struct bpf_timer timer; }; struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1000); __type(key, int); __type(value, struct map_elem); } hmap SEC(".maps"); static int timer_cb(void *map, int *key, struct map_elem *val); /* val points to particular map element that contains bpf_timer. */ SEC("fentry/bpf_fentry_test1") int BPF_PROG(test1, int a) { struct map_elem *val; int key = 0; val = bpf_map_lookup_elem(&hmap, &key); if (val) { bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME); bpf_timer_set_callback(&val->timer, timer_cb); bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0); } } This patch adds helper implementations that rely on hrtimers to call bpf functions as timers expire. The following patches add necessary safety checks. Only programs with CAP_BPF are allowed to use bpf_timer. The amount of timers used by the program is constrained by the memcg recorded at map creation time. The bpf_timer_init() helper needs explicit 'map' argument because inner maps are dynamic and not known at load time. While the bpf_timer_set_callback() is receiving hidden 'aux->prog' argument supplied by the verifier. The prog pointer is needed to do refcnting of bpf program to make sure that program doesn't get freed while the timer is armed. This approach relies on "user refcnt" scheme used in prog_array that stores bpf programs for bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is paired with bpf_timer_cancel() that will drop the prog refcnt. The ops->map_release_uref is responsible for cancelling the timers and dropping prog refcnt when user space reference to a map reaches zero. This uref approach is done to make sure that Ctrl-C of user space process will not leave timers running forever unless the user space explicitly pinned a map that contained timers in bpffs. bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't have user references (is not held by open file descriptor from user space and not pinned in bpffs). The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel and free the timer if given map element had it allocated. "bpftool map update" command can be used to cancel timers. The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because '__u64 :64' has 1 byte alignment of 8 byte padding. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-4-alexei.starovoitov@gmail.com
2021-07-15bpf: Factor out bpf_spin_lock into helpers.Alexei Starovoitov
Move ____bpf_spin_lock/unlock into helpers to make it more clear that quadruple underscore bpf_spin_lock/unlock are irqsave/restore variants. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-3-alexei.starovoitov@gmail.com
2021-07-15bpf: Prepare bpf_prog_put() to be called from irq context.Alexei Starovoitov
Currently bpf_prog_put() is called from the task context only. With addition of bpf timers the timer related helpers will start calling bpf_prog_put() from irq-saved region and in rare cases might drop the refcnt to zero. To address this case, first, convert bpf_prog_free_id() to be irq-save (this is similar to bpf_map_free_id), and, second, defer non irq appropriate calls into work queue. For example: bpf_audit_prog() is calling kmalloc and wake_up_interruptible, bpf_prog_kallsyms_del_all()->bpf_ksym_del()->spin_unlock_bh(). They are not safe with irqs disabled. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-2-alexei.starovoitov@gmail.com
2021-07-15perf lzma: Close lzma stream on exitRiccardo Mancini
ASan reports memory leaks when running: # perf test "88: Check open filename arg using perf trace + vfs_getname" One of these is caused by the lzma stream never being closed inside lzma_decompress_to_file(). This patch adds the missing lzma_end(). Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Fixes: 80a32e5b498a7547 ("perf tools: Add lzma decompression support for kernel module") Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/aaf50bdce7afe996cfc06e1bbb36e4a2a9b9db93.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-15perf script: Fix memory 'threads' and 'cpus' leaks on exitRiccardo Mancini
ASan reports several memory leaks while running: # perf test "82: Use vfs_getname probe to get syscall args filenames" Two of these are caused by some refcounts not being decreased on perf-script exit, namely script.threads and script.cpus. This patch adds the missing __put calls in a new perf_script__exit function, which is called at the end of cmd_script. This patch concludes the fixes of all remaining memory leaks in perf test "82: Use vfs_getname probe to get syscall args filenames". Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Fixes: cfc8874a48599249 ("perf script: Process cpu/threads maps") Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/5ee73b19791c6fa9d24c4d57f4ac1a23609400d7.1626343282.git.rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>