summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-05-10net: usb: qmi_wwan: remove redundant assignment to variable statusColin Ian King
The variable status is being initializeed with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: huawei_cdc_ncm: remove redundant assignment to variable retColin Ian King
The variable ret is being initializeed with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: usb: ax88179_178a: remove redundant assignment to variable retColin Ian King
The variable ret is being initializeed with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10dsa: sja1105: fix semicolon.cocci warningskbuild test robot
drivers/net/dsa/sja1105/sja1105_ethtool.c:481:11-12: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: ae1804de93f6 ("dsa: sja1105: dynamically allocate stats structure") CC: Arnd Bergmann <arnd@arndb.de> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10ALSA: hda/realtek: Add quirk for Samsung NotebookMike Pozulp
Some models of the Samsung Notebook 9 have very quiet and distorted headphone output. This quirk changes the VREF value of the ALC298 codec NID 0x1a from default HIZ to new 100. [ adjusted to 5.7-base and rearranged in SSID order -- tiwai ] Signed-off-by: Mike Pozulp <pozulp.kernel@gmail.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207423 Link: https://lore.kernel.org/r/20200510032838.1989130-1-pozulp.kernel@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-05-10iio: sca3000: Remove an erroneous 'get_device()'Christophe JAILLET
This looks really unusual to have a 'get_device()' hidden in a 'dev_err()' call. Remove it. While at it add a missing \n at the end of the message. Fixes: 574fb258d636 ("Staging: IIO: VTI sca3000 series accelerometer driver (spi)") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-05-09Input: applespi - replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200507185347.GA14499@embeddedor Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-05-09octeontx2-pf: Use the napi_alloc_frag() to alloc the pool buffersKevin Hao
In the current codes, the octeontx2 uses its own method to allocate the pool buffers, but there are some issues in this implementation. 1. We have to run the otx2_get_page() for each allocation cycle and this is pretty error prone. As I can see there is no invocation of the otx2_get_page() in otx2_pool_refill_task(), this will leave the allocated pages have the wrong refcount and may be freed wrongly. 2. It wastes memory. For example, if we only receive one packet in a NAPI RX cycle, and then allocate a 2K buffer with otx2_alloc_rbuf() to refill the pool buffers and leave the remain area of the allocated page wasted. On a kernel with 64K page, 62K area is wasted. IMHO it is really unnecessary to implement our own method for the buffers allocate, we can reuse the napi_alloc_frag() to simplify our code. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09netprio_cgroup: Fix unlimited memory leak of v2 cgroupsZefan Li
If systemd is configured to use hybrid mode which enables the use of both cgroup v1 and v2, systemd will create new cgroup on both the default root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach task to the two cgroups. If the task does some network thing then the v2 cgroup can never be freed after the session exited. One of our machines ran into OOM due to this memory leak. In the scenario described above when sk_alloc() is called cgroup_sk_alloc() thought it's in v2 mode, so it stores the cgroup pointer in sk->sk_cgrp_data and increments the cgroup refcnt, but then sock_update_netprioidx() thought it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the cgroup refcnt will never be freed. Currently we do the mode switch when someone writes to the ifpriomap cgroup control file. The easiest fix is to also do the switch when a task is attached to a new cgroup. Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Yang Yingliang <yangyingliang@huawei.com> Tested-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09IB/mlx4: Replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09bpf, runqslower: include proper uapi/bpf.hSong Liu
runqslower doesn't specify include path for uapi/bpf.h. This causes the following warning: In file included from runqslower.c:10: .../tools/testing/selftests/bpf/tools/include/bpf/bpf.h:234:38: warning: 'enum bpf_stats_type' declared inside parameter list will not be visible outside of this definition or declaration 234 | LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); Fix this by adding -I tools/includ/uapi to the Makefile. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-09gcc-10: mark more functions __init to avoid section mismatch warningsLinus Torvalds
It seems that for whatever reason, gcc-10 ends up not inlining a couple of functions that used to be inlined before. Even if they only have one single callsite - it looks like gcc may have decided that the code was unlikely, and not worth inlining. The code generation difference is harmless, but caused a few new section mismatch errors, since the (now no longer inlined) function wasn't in the __init section, but called other init functions: Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap() So add the appropriate __init annotation to make modpost not complain. In both cases there were trivially just a single callsite from another __init function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Merge branch 'bpf_iter'Alexei Starovoitov
Yonghong Song says: ==================== Motivation: The current way to dump kernel data structures mostly: 1. /proc system 2. various specific tools like "ss" which requires kernel support. 3. drgn The dropback for the first two is that whenever you want to dump more, you need change the kernel. For example, Martin wants to dump socket local storage with "ss". Kernel change is needed for it to work ([1]). This is also the direct motivation for this work. drgn ([2]) solves this proble nicely and no kernel change is not needed. But since drgn is not able to verify the validity of a particular pointer value, it might present the wrong results in rare cases. In this patch set, we introduce bpf iterator. Initial kernel changes are still needed for interested kernel data, but a later data structure change will not require kernel changes any more. bpf program itself can adapt to new data structure changes. This will give certain flexibility with guaranteed correctness. In this patch set, kernel seq_ops is used to facilitate iterating through kernel data, similar to current /proc and many other lossless kernel dumping facilities. In the future, different iterators can be implemented to trade off losslessness for other criteria e.g. no repeated object visits, etc. User Interface: 1. Similar to prog/map/link, the iterator can be pinned into a path within a bpffs mount point. 2. The bpftool command can pin an iterator to a file bpftool iter pin <bpf_prog.o> <path> 3. Use `cat <path>` to dump the contents. Use `rm -f <path>` to remove the pinned iterator. 4. The anonymous iterator can be created as well. Please see patch #19 andd #20 for bpf programs and bpf iterator output examples. Note that certain iterators are namespace aware. For example, task and task_file targets only iterate through current pid namespace. ipv6_route and netlink will iterate through current net namespace. Please see individual patches for implementation details. Performance: The bpf iterator provides in-kernel aggregation abilities for kernel data. This can greatly improve performance compared to e.g., iterating all process directories under /proc. For example, I did an experiment on my VM with an application forking different number of tasks and each forked process opening various number of files. The following is the result with the latency with unit of microseconds: # of forked tasks # of open files # of bpf_prog calls # latency (us) 100 100 11503 7586 1000 1000 1013203 709513 10000 100 1130203 764519 The number of bpf_prog calls may be more than forked tasks multipled by open files since there are other tasks running on the system. The bpf program is a do-nothing program. One millions of bpf calls takes less than one second. Although the initial motivation is from Martin's sk_local_storage, this patch didn't implement tcp6 sockets and sk_local_storage. The /proc/net/tcp6 involves three types of sockets, timewait, request and tcp6 sockets. Some kind of type casting or other mechanism is needed to handle all these socket types in one bpf program. This will be addressed in future work. Currently, we do not support kernel data generated under module. This requires some BTF work. More work for more iterators, e.g., tcp, udp, bpf_map elements, etc. Changelog: v3 -> v4: - in bpf_seq_read(), if start() failed with an error, return that error to user space (Andrii) - in bpf_seq_printf(), if reading kernel memory failed for %s and %p{i,I}{4,6}, set buffer to empty string or address 0. Documented this behavior in uapi header (Andrii) - fix a few error handling issues for bpftool (Andrii) - A few other minor fixes and cosmetic changes. v2 -> v3: - add bpf_iter_unreg_target() to unregister a target, used in the error path of the __init functions. - handle err != 0 before handling overflow (Andrii) - reference count "task" for task_file target (Andrii) - remove some redundancy for bpf_map/task/task_file targets - add bpf_iter_unreg_target() in ip6_route_cleanup() - Handling "%%" format in bpf_seq_printf() (Andrii) - implement auto-attach for bpf_iter in libbpf (Andrii) - add macros offsetof and container_of in bpf_helpers.h (Andrii) - add tests for auto-attach and program-return-1 cases - some other minor fixes v1 -> v2: - removed target_feature, using callback functions instead - checking target to ensure program specified btf_id supported (Martin) - link_create change with new changes from Andrii - better handling of btf_iter vs. seq_file private data (Martin, Andrii) - implemented bpf_seq_read() (Andrii, Alexei) - percpu buffer for bpf_seq_printf() (Andrii) - better syntax for BPF_SEQ_PRINTF macro (Andrii) - bpftool fixes (Quentin) - a lot of other fixes RFC v2 -> v1: - rename bpfdump to bpf_iter - use bpffs instead of a new file system - use bpf_link to streamline and simplify iterator creation. References: [1]: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com [2]: https://github.com/osandov/drgn ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-09tools/bpf: selftests: Add bpf_iter selftestsYonghong Song
The added test includes the following subtests: - test verifier change for btf_id_or_null - test load/create_iter/read for ipv6_route/netlink/bpf_map/task/task_file - test anon bpf iterator - test anon bpf iterator reading one char at a time - test file bpf iterator - test overflow (single bpf program output not overflow) - test overflow (single bpf program output overflows) - test bpf prog returning 1 The ipv6_route tests the following verifier change - access fields in the variable length array of the structure. The netlink load tests the following verifier change - put a btf_id ptr value in a stack and accessible to tracing/iter programs. The anon bpf iterator also tests link auto attach through skeleton. $ test_progs -n 2 #2/1 btf_id_or_null:OK #2/2 ipv6_route:OK #2/3 netlink:OK #2/4 bpf_map:OK #2/5 task:OK #2/6 task_file:OK #2/7 anon:OK #2/8 anon-read-one-char:OK #2/9 file:OK #2/10 overflow:OK #2/11 overflow-e2big:OK #2/12 prog-ret-1:OK #2 bpf_iter:OK Summary: 1/12 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175923.2477637-1-yhs@fb.com
2020-05-09tools/bpf: selftests: Add iter progs for bpf_map/task/task_fileYonghong Song
The implementation is arbitrary, just to show how the bpf programs can be written for bpf_map/task/task_file. They can be costomized for specific needs. For example, for bpf_map, the iterator prints out: $ cat /sys/fs/bpf/my_bpf_map id refcnt usercnt locked_vm 3 2 0 20 6 2 0 20 9 2 0 20 12 2 0 20 13 2 0 20 16 2 0 20 19 2 0 20 %%% END %%% For task, the iterator prints out: $ cat /sys/fs/bpf/my_task tgid gid 1 1 2 2 .... 1944 1944 1948 1948 1949 1949 1953 1953 === END === For task/file, the iterator prints out: $ cat /sys/fs/bpf/my_task_file tgid gid fd file 1 1 0 ffffffff95c97600 1 1 1 ffffffff95c97600 1 1 2 ffffffff95c97600 .... 1895 1895 255 ffffffff95c8fe00 1932 1932 0 ffffffff95c8fe00 1932 1932 1 ffffffff95c8fe00 1932 1932 2 ffffffff95c8fe00 1932 1932 3 ffffffff95c185c0 This is able to print out all open files (fd and file->f_op), so user can compare f_op against a particular kernel file operations to find what it is. For example, from /proc/kallsyms, we can find ffffffff95c185c0 r eventfd_fops so we will know tgid 1932 fd 3 is an eventfd file descriptor. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175922.2477576-1-yhs@fb.com
2020-05-09tools/bpf: selftests: Add iterator programs for ipv6_route and netlinkYonghong Song
Two bpf programs are added in this patch for netlink and ipv6_route target. On my VM, I am able to achieve identical results compared to /proc/net/netlink and /proc/net/ipv6_route. $ cat /proc/net/netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode 000000002c42d58b 0 0 00000000 0 0 0 2 0 7 00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719 00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422 000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424 .... 00000000159a170d 15 1862 00000002 0 0 0 2 0 1886 000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076 00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683 000000008398fb08 16 0 00000000 0 0 0 2 0 27 $ cat /sys/fs/bpf/my_netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode 000000002c42d58b 0 0 00000000 0 0 0 2 0 7 00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719 00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422 000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424 .... 00000000159a170d 15 1862 00000002 0 0 0 2 0 1886 000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076 00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683 000000008398fb08 16 0 00000000 0 0 0 2 0 27 $ cat /proc/net/ipv6_route fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo $ cat /sys/fs/bpf/my_ipv6_route fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com
2020-05-09tools/bpftool: Add bpf_iter support for bptoolYonghong Song
Currently, only one command is supported bpftool iter pin <bpf_prog.o> <path> It will pin the trace/iter bpf program in the object file <bpf_prog.o> to the <path> where <path> should be on a bpffs mount. For example, $ bpftool iter pin ./bpf_iter_ipv6_route.o \ /sys/fs/bpf/my_route User can then do a `cat` to print out the results: $ cat /sys/fs/bpf/my_route fe800000000000000000000000000000 40 00000000000000000000000000000000 ... 00000000000000000000000000000000 00 00000000000000000000000000000000 ... 00000000000000000000000000000001 80 00000000000000000000000000000000 ... fe800000000000008c0162fffebdfd57 80 00000000000000000000000000000000 ... ff000000000000000000000000000000 08 00000000000000000000000000000000 ... 00000000000000000000000000000000 00 00000000000000000000000000000000 ... The implementation for ipv6_route iterator is in one of subsequent patches. This patch also added BPF_LINK_TYPE_ITER to link query. In the future, we may add additional parameters to pin command by parameterizing the bpf iterator. For example, a map_id or pid may be added to let bpf program only traverses a single map or task, similar to kernel seq_file single_open(). We may also add introspection command for targets/iterators by leveraging the bpf_iter itself. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200509175920.2477247-1-yhs@fb.com
2020-05-09tools/libpf: Add offsetof/container_of macro in bpf_helpers.hYonghong Song
These two helpers will be used later in bpf_iter bpf program bpf_iter_netlink.c. Put them in bpf_helpers.h since they could be useful in other cases. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175919.2477104-1-yhs@fb.com
2020-05-09tools/libbpf: Add bpf_iter supportYonghong Song
Two new libbpf APIs are added to support bpf_iter: - bpf_program__attach_iter Given a bpf program and additional parameters, which is none now, returns a bpf_link. - bpf_iter_create syscall level API to create a bpf iterator. The macro BPF_SEQ_PRINTF are also introduced. The format looks like: BPF_SEQ_PRINTF(seq, "task id %d\n", pid); This macro can help bpf program writers with nicer bpf_seq_printf syntax similar to the kernel one. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175917.2476936-1-yhs@fb.com
2020-05-09bpf: Support variable length array in tracing programsYonghong Song
In /proc/net/ipv6_route, we have struct fib6_info { struct fib6_table *fib6_table; ... struct fib6_nh fib6_nh[0]; } struct fib6_nh { struct fib_nh_common nh_common; struct rt6_info **rt6i_pcpu; struct rt6_exception_bucket *rt6i_exception_bucket; }; struct fib_nh_common { ... u8 nhc_gw_family; ... } The access: struct fib6_nh *fib6_nh = &rt->fib6_nh; ... fib6_nh->nh_common.nhc_gw_family ... This patch ensures such an access is handled properly. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175916.2476853-1-yhs@fb.com
2020-05-09bpf: Handle spilled PTR_TO_BTF_ID properly when checking stack_boundaryYonghong Song
This specifically to handle the case like below: // ptr below is a socket ptr identified by PTR_TO_BTF_ID u64 param[2] = { ptr, val }; bpf_seq_printf(seq, fmt, sizeof(fmt), param, sizeof(param)); In this case, the 16 bytes stack for "param" contains: 8 bytes for ptr with spilled PTR_TO_BTF_ID 8 bytes for val as STACK_MISC The current verifier will complain the ptr should not be visible to the helper. ... 16: (7b) *(u64 *)(r10 -64) = r2 18: (7b) *(u64 *)(r10 -56) = r1 19: (bf) r4 = r10 ; 20: (07) r4 += -64 ; BPF_SEQ_PRINTF(seq, fmt1, (long)s, s->sk_protocol); 21: (bf) r1 = r6 22: (18) r2 = 0xffffa8d00018605a 24: (b4) w3 = 10 25: (b4) w5 = 16 26: (85) call bpf_seq_printf#125 R0=inv(id=0) R1_w=ptr_seq_file(id=0,off=0,imm=0) R2_w=map_value(id=0,off=90,ks=4,vs=144,imm=0) R3_w=inv10 R4_w=fp-64 R5_w=inv16 R6=ptr_seq_file(id=0,off=0,imm=0) R7=ptr_netlink_sock(id=0,off=0,imm=0) R10=fp0 fp-56_w=mmmmmmmm fp-64_w=ptr_ last_idx 26 first_idx 13 regs=8 stack=0 before 25: (b4) w5 = 16 regs=8 stack=0 before 24: (b4) w3 = 10 invalid indirect read from stack off -64+0 size 16 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175915.2476783-1-yhs@fb.com
2020-05-09bpf: Add bpf_seq_printf and bpf_seq_write helpersYonghong Song
Two helpers bpf_seq_printf and bpf_seq_write, are added for writing data to the seq_file buffer. bpf_seq_printf supports common format string flag/width/type fields so at least I can get identical results for netlink and ipv6_route targets. For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW specifically indicates a write failure due to overflow, which means the object will be repeated in the next bpf invocation if object collection stays the same. Note that if the object collection is changed, depending how collection traversal is done, even if the object still in the collection, it may not be visited. For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to read kernel memory. Reading kernel memory may fail in the following two cases: - invalid kernel address, or - valid kernel address but requiring a major fault If reading kernel memory failed, the %s string will be an empty string and %p{i,I}{4,6} will be all 0. Not returning error to bpf program is consistent with what bpf_trace_printk() does for now. bpf_seq_printf may return -EBUSY meaning that internal percpu buffer for memory copy of strings or other pointees is not available. Bpf program can return 1 to indicate it wants the same object to be repeated. Right now, this should not happen on no-RT kernels since migrate_disable(), which guards bpf prog call, calls preempt_disable(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
2020-05-09bpf: Add PTR_TO_BTF_ID_OR_NULL supportYonghong Song
Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support. For tracing/iter program, the bpf program context definition, e.g., for previous bpf_map target, looks like struct bpf_iter__bpf_map { struct bpf_iter_meta *meta; struct bpf_map *map; }; The kernel guarantees that meta is not NULL, but map pointer maybe NULL. The NULL map indicates that all objects have been traversed, so bpf program can take proper action, e.g., do final aggregation and/or send final report to user space. Add btf_id_or_null_non0_off to prog->aux structure, to indicate that if the context access offset is not 0, set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID. This bit is set for tracing/iter program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com
2020-05-09bpf: Add task and task/file iterator targetsYonghong Song
Only the tasks belonging to "current" pid namespace are enumerated. For task/file target, the bpf program will have access to struct task_struct *task u32 fd struct file *file where fd/file is an open file for the task. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175911.2476407-1-yhs@fb.com
2020-05-09net: bpf: Add netlink and ipv6_route bpf_iter targetsYonghong Song
This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
2020-05-09bpf: Add bpf_map iteratorYonghong Song
Implement seq_file operations to traverse all bpf_maps. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com
2020-05-09bpf: Implement common macros/helpers for target iteratorsYonghong Song
Macro DEFINE_BPF_ITER_FUNC is implemented so target can define an init function to capture the BTF type which represents the target. The bpf_iter_meta is a structure holding meta data, common to all targets in the bpf program. Additional marker functions are called before or after bpf_seq_read() show()/next()/stop() callback functions to help calculate precise seq_num and whether call bpf_prog inside stop(). Two functions, bpf_iter_get_info() and bpf_iter_run_prog(), are implemented so target can get needed information from bpf_iter infrastructure and can run the program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175907.2475956-1-yhs@fb.com
2020-05-09bpf: Create file bpf iteratorYonghong Song
To produce a file bpf iterator, the fd must be corresponding to a link_fd assocciated with a trace/iter program. When the pinned file is opened, a seq_file will be generated. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175906.2475893-1-yhs@fb.com
2020-05-09bpf: Create anonymous bpf iteratorYonghong Song
A new bpf command BPF_ITER_CREATE is added. The anonymous bpf iterator is seq_file based. The seq_file private data are referenced by targets. The bpf_iter infrastructure allocated additional space at seq_file->private before the space used by targets to store some meta data, e.g., prog: prog to run session_id: an unique id for each opened seq_file seq_num: how many times bpf programs are queried in this session done_stop: an internal state to decide whether bpf program should be called in seq_ops->stop() or not The seq_num will start from 0 for valid objects. The bpf program may see the same seq_num more than once if - seq_file buffer overflow happens and the same object is retried by bpf_seq_read(), or - the bpf program explicitly requests a retry of the same object Since module is not supported for bpf_iter, all target registeration happens at __init time, so there is no need to change bpf_iter_unreg_target() as it is used mostly in error path of the init function at which time no bpf iterators have been created yet. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
2020-05-09bpf: Implement bpf_seq_read() for bpf iteratorYonghong Song
bpf iterator uses seq_file to provide a lossless way to transfer data to user space. But we want to call bpf program after all objects have been traversed, and bpf program may write additional data to the seq_file buffer. The current seq_read() does not work for this use case. Besides allowing stop() function to write to the buffer, the bpf_seq_read() also fixed the buffer size to one page. If any single call of show() or stop() will emit data more than one page to cause overflow, -E2BIG error code will be returned to user space. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175904.2475468-1-yhs@fb.com
2020-05-09bpf: Support bpf tracing/iter programs for BPF_LINK_UPDATEYonghong Song
Added BPF_LINK_UPDATE support for tracing/iter programs. This way, a file based bpf iterator, which holds a reference to the link, can have its bpf program updated without creating new files. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175902.2475262-1-yhs@fb.com
2020-05-09bpf: Support bpf tracing/iter programs for BPF_LINK_CREATEYonghong Song
Given a bpf program, the step to create an anonymous bpf iterator is: - create a bpf_iter_link, which combines bpf program and the target. In the future, there could be more information recorded in the link. A link_fd will be returned to the user space. - create an anonymous bpf iterator with the given link_fd. The bpf_iter_link can be pinned to bpffs mount file system to create a file based bpf iterator as well. The benefit to use of bpf_iter_link: - using bpf link simplifies design and implementation as bpf link is used for other tracing bpf programs. - for file based bpf iterator, bpf_iter_link provides a standard way to replace underlying bpf programs. - for both anonymous and free based iterators, bpf link query capability can be leveraged. The patch added support of tracing/iter programs for BPF_LINK_CREATE. A new link type BPF_LINK_TYPE_ITER is added to facilitate link querying. Currently, only prog_id is needed, so there is no additional in-kernel show_fdinfo() and fill_link_info() hook is needed for BPF_LINK_TYPE_ITER link. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
2020-05-09bpf: Allow loading of a bpf_iter programYonghong Song
A bpf_iter program is a tracing program with attach type BPF_TRACE_ITER. The load attribute attach_btf_id is used by the verifier against a particular kernel function, which represents a target, e.g., __bpf_iter__bpf_map for target bpf_map which is implemented later. The program return value must be 0 or 1 for now. 0 : successful, except potential seq_file buffer overflow which is handled by seq_file reader. 1 : request to restart the same object In the future, other return values may be used for filtering or teminating the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
2020-05-09bpf: Implement an interface to register bpf_iter targetsYonghong Song
The target can call bpf_iter_reg_target() to register itself. The needed information: target: target name seq_ops: the seq_file operations for the target init_seq_private target callback to initialize seq_priv during file open fini_seq_private target callback to clean up seq_priv during file release seq_priv_size: the private_data size needed by the seq_file operations The target name represents a target which provides a seq_ops for iterating objects. The target can provide two callback functions, init_seq_private and fini_seq_private, called during file open/release time. For example, /proc/net/{tcp6, ipv6_route, netlink, ...}, net name space needs to be setup properly during file open and released properly during file release. Function bpf_iter_unreg_target() is also implemented to unregister a particular target. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175859.2474669-1-yhs@fb.com
2020-05-09Merge tag 'riscv-for-linus-5.7-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "A smattering of fixes and cleanups: - Dead code removal. - Exporting riscv_cpuid_to_hartid_mask for modules. - Per-CPU tracking of ISA features. - Setting max_pfn correctly when probing memory. - Adding a note to the VDSO so glibc can check the kernel's version without a uname(). - A fix to force the bootloader to initialize the boot spin tables, which still get used as a fallback when SBI-0.1 is enabled" * tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Remove unused code from STRICT_KERNEL_RWX riscv: force __cpu_up_ variables to put in data section riscv: add Linux note to vdso riscv: set max_pfn to the PFN of the last page RISC-V: Remove N-extension related defines RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Export riscv_cpuid_to_hartid_mask() API
2020-05-09Merge branch ↵Jakub Kicinski
'mlxsw-spectrum-Enforce-some-HW-limitations-for-matchall-TC-offload' Ido Schimmel says: ==================== mlxsw: spectrum: Enforce some HW limitations for matchall TC offload Jiri says: There are some limitations for TC matchall classifier offload that are given by the mlxsw HW dataplane. It is not possible to do sampling on egress and also the mirror/sample vs. ACL (flower) ordering is fixed. So check this and forbid to offload incorrect setup. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09selftests: mlxsw: tc_restrictions: add couple of test for the correct ↵Jiri Pirko
matchall-flower ordering Make sure that the drive restricts incorrect order of inserted matchall vs. flower rules. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09selftests: mlxsw: tc_restrictions: add test to check sample action restrictionsJiri Pirko
Check that matchall rules with sample actions are not possible to be inserted to egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09selftests: mlxsw: rename tc_flower_restrictions.sh to tc_restrictions.shJiri Pirko
The file is about to contain matchall restrictions too, so change the name to make it more generic. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09mlxsw: spectrum_flower: Forbid to insert flower rules in collision with ↵Jiri Pirko
matchall rules On ingress, the matchall rules doing mirroring and sampling are offloaded into hardware blocks that are processed before any flower rules. On egress, the matchall mirroring rules are offloaded into hardware block that is processed after all flower rules. Therefore check the priorities of inserted flower rules against existing matchall rules and ensure the correct ordering. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09mlxsw: spectrum_matchall: Forbid to insert matchall rules in collision with ↵Jiri Pirko
flower rules On ingress, the matchall rules doing mirroring and sampling are offloaded into hardware blocks that are processed before any flower rules. On egress, the matchall mirroring rules are offloaded into hardware block that is processed after all flower rules. Therefore check the priorities of inserted matchall rules against existing flower rules and ensure the correct ordering. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09mlxsw: spectrum_matchall: Expose a function to get min and max rule priorityJiri Pirko
Introduce an infrastructure that allows to get minimum and maximum rule priority for specified chain. This is going to be used by a subsequent patch to enforce ordering between flower and matchall filters. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09mlxsw: spectrum_matchall: Put matchall list into substruct of flow structJiri Pirko
As there are going to be other matchall specific fields in flow structure, put the existing list field into matchall substruct. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09mlxsw: spectrum_flower: Expose a function to get min and max rule priorityJiri Pirko
Introduce an infrastructure that allows to get minimum and maximum rule priority for specified chain. This is going to be used by a subsequent patch to enforce ordering between flower and matchall filters. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09mlxsw: spectrum_matchall: Restrict sample action to be allowed only on ingressJiri Pirko
HW supports packet sampling on ingress only. Check and fail if user is adding sample on egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09gcc-10: avoid shadowing standard library 'free()' in cryptoLinus Torvalds
gcc-10 has started warning about conflicting types for a few new built-in functions, particularly 'free()'. This results in warnings like: crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch] because the crypto layer had its local freeing functions called 'free()'. Gcc-10 is in the wrong here, since that function is marked 'static', and thus there is no chance of confusion with any standard library function namespace. But the simplest thing to do is to just use a different name here, and avoid this gcc mis-feature. [ Side note: gcc knowing about 'free()' is in itself not the mis-feature: the semantics of 'free()' are special enough that a compiler can validly do special things when seeing it. So the mis-feature here is that gcc thinks that 'free()' is some restricted name, and you can't shadow it as a local static function. Making the special 'free()' semantics be a function attribute rather than tied to the name would be the much better model ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09net: freescale: select CONFIG_FIXED_PHY where neededArnd Bergmann
I ran into a randconfig build failure with CONFIG_FIXED_PHY=m and CONFIG_GIANFAR=y: x86_64-linux-ld: drivers/net/ethernet/freescale/gianfar.o:(.rodata+0x418): undefined reference to `fixed_phy_change_carrier' It seems the same thing can happen with dpaa and ucc_geth, so change all three to do an explicit 'select FIXED_PHY'. The fixed-phy driver actually has an alternative stub function that theoretically allows building network drivers when fixed-phy is disabled, but I don't see how that would help here, as the drivers presumably would not work then. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09gcc-10: disable 'restrict' warning for nowLinus Torvalds
gcc-10 now warns about passing aliasing pointers to functions that take restricted pointers. That's actually a great warning, and if we ever start using 'restrict' in the kernel, it might be quite useful. But right now we don't, and it turns out that the only thing this warns about is an idiom where we have declared a few functions to be "printf-like" (which seems to make gcc pick up the restricted pointer thing), and then we print to the same buffer that we also use as an input. And people do that as an odd concatenation pattern, with code like this: #define sysfs_show_gen_prop(buffer, fmt, ...) \ snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__) where we have 'buffer' as both the destination of the final result, and as the initial argument. Yes, it's a bit questionable. And outside of the kernel, people do have standard declarations like int snprintf( char *restrict buffer, size_t bufsz, const char *restrict format, ... ); where that output buffer is marked as a restrict pointer that cannot alias with any other arguments. But in the context of the kernel, that 'use snprintf() to concatenate to the end result' does work, and the pattern shows up in multiple places. And we have not marked our own version of snprintf() as taking restrict pointers, so the warning is incorrect for now, and gcc picks it up on its own. If we do start using 'restrict' in the kernel (and it might be a good idea if people find places where it matters), we'll need to figure out how to avoid this issue for snprintf and friends. But in the meantime, this warning is not useful. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'stringop-overflow' warning for nowLinus Torvalds
This is the final array bounds warning removal for gcc-10 for now. Again, the warning is good, and we should re-enable all these warnings when we have converted all the legacy array declaration cases to flexible arrays. But in the meantime, it's just noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09hinic: add three net_device_ops of vfLuo bin
adds ndo_set_vf_rate/ndo_set_vf_spoofchk/ndo_set_vf_link_state to configure netdev of virtual function Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>