summaryrefslogtreecommitdiff
path: root/samples
AgeCommit message (Collapse)Author
2021-08-26samples: bpf: Fix uninitialized variable in xdp_redirect_cpuKumar Kartikeya Dwivedi
While at it, also improve help output when CPU number is greater than possible. Fixes: e531a220cc59 ("samples: bpf: Convert xdp_redirect_cpu to XDP samples helper") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210826120910.454081-1-memxor@gmail.com
2021-08-25samples: pktgen: add trap SIGINT for printing execution resultJuhee Kang
All pktgen samples can send indefinitely num messages per thread by setting the count option to 0(-n 0). If running sample with setting count 0 and press Ctrl-C to stop this program, the program prints the result of the execution so far. Currently, the samples besides sample{3...5} don't work properly. Because Ctrl-C stops the script, not just pktgen. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 0 Running... ctrl^C to stop ^CDevice: eth0@0 Result: OK: 569657(c569538+d118) usec, 84650 (60byte,0frags) 148597pps 71Mb/sec (71326560bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -n 0 Running... ctrl^C to stop ^C In order to solve this, this commit adds trap SIGINT. Also, this commit changes control_c function to print_result to maintain consistency with other samples. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25samples: pktgen: fix to print when terminated normallyJuhee Kang
Currently, most pktgen samples print the execution result when the program is terminated normally. However, sample03 doesn't work appropriately. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 1 Running... ctrl^C to stop Device: eth0@0 Result: OK: 19(c5+d13) usec, 1 (60byte,0frags) 51762pps 24Mb/sec (24845760bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample03_burst_single_flow.sh -n 1 Running... ctrl^C to stop The reason why it doesn't print the execution result when the program is terminated usually is that sample03 doesn't call the function which prints the result, unlike other samples. So, this commit solves this issue by calling the function before termination. Also, this commit changes control_c function to print_result to maintain consistency with other samples. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24samples: bpf: Convert xdp_redirect_map_multi to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Also adapt to change of type of mac address map, so that no resizing is required. Add a new flag for sample mask that skips priting the from_device->to_device heading for each line, as xdp_redirect_map_multi may have two devices but the flow of data may be bidirectional, so the output would be confusing. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_map_multi_kern.o to XDP samples helperKumar Kartikeya Dwivedi
One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array map to store mac addresses of devices, as the resizing behavior was based on max_ifindex, which unecessarily maximized the capacity of map beyond what was needed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_map to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Since get_mac_addr is already provided by XDP samples helper, we drop it. Also convert to XDP samples helper similar to prior samples to minimize duplication of code. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_map_kern.o to XDP samples helperKumar Kartikeya Dwivedi
Also update it to use consistent SEC("xdp") and SEC("xdp_devmap") naming, and use global variable instead of BPF map for copying the mac address. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_cpu to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a few minor omissions (e.g. redirect errno reporting). All of these have been moved to XDP samples helper, hence drop the unneeded code and convert to usage of helpers provided by it. One of the important changes here is dropping of mprog-disable option, as we make that the default. Also, we support built-in programs for some common actions on the packet when it reaches kthread (pass, drop, redirect to device). If the user still needs to install a custom program, they can still supply a BPF object, however the program should be suitably tagged with SEC("xdp_cpumap") annotation so that the expected attach type is correct when updating our cpumap map element. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_cpu_kern.o to XDP samples helperKumar Kartikeya Dwivedi
Similar to xdp_monitor_kern, a lot of these BPF programs have been reimplemented properly consolidating missing features from other XDP samples. Hence, drop the unneeded code and rename to .bpf.c suffix. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. One important note: The XDP samples helper handles ownership of installed XDP programs on devices, including responding to SIGINT and SIGTERM, so drop the code here and use the helpers we provide going forward for all xdp_redirect* conversions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_kern.o to XDP samples helperKumar Kartikeya Dwivedi
We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other potential users, so drop it while moving code to the new file. Also, consistently use SEC("xdp") naming instead. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_monitor to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to the xdp_sample_user.o helper, so we remove the duplicate functions here that are no longer needed. Thanks to BPF skeleton, we no longer depend on order of tracepoints to uninstall them on startup. Instead, the sample mask is used to install the needed tracepoints. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_monitor_kern.o to XDP samples helperKumar Kartikeya Dwivedi
We already moved all the functionality it provided in XDP samples helper userspace and kernel BPF object, so just delete the unneeded code. We also add generation of BPF skeleton and compilation using clang -target bpf for files ending with .bpf.c suffix (to denote that they use vmlinux.h). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com
2021-08-24samples: bpf: Add vmlinux.h generation supportKumar Kartikeya Dwivedi
Also, take this opportunity to depend on in-tree bpftool, so that we can use static linking support in subsequent commits for XDP samples BPF helper object. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-13-memxor@gmail.com
2021-08-24samples: bpf: Add devmap_xmit tracepoint statistics supportKumar Kartikeya Dwivedi
This adds support for retrieval and printing for devmap_xmit total and mutli mode tracepoint. For multi mode, we keep a hash map entry for each redirection stream, such that we can dynamically add and remove entries on output. The from_match and to_match will be set by individual samples when setting up the XDP program on these devices. The multi mode tracepoint is also handy for xdp_redirect_map_multi, where up to 32 devices can be specified. Also add samples_init_pre_load macro to finally set up the resized maps and mmap them in place for low overhead stats retrieval. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for devmap_xmit tracepointKumar Kartikeya Dwivedi
This adds support for the devmap_xmit tracepoint, and its multi device variant that can be used to obtain streams for each individual net_device to net_device redirection. This is useful for decomposing total xmit stats in xdp_monitor. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-11-memxor@gmail.com
2021-08-24samples: bpf: Add cpumap tracepoint statistics supportKumar Kartikeya Dwivedi
This consolidates retrieval and printing into the XDP sample helper. For the kthread stats, it expands xdp_stats separately with its own per-CPU stats. For cpumap enqueue, we display FROM->TO stats also with its per-CPU stats. The help out explains in detail the various aspects of the output. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-10-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for cpumap tracepointsKumar Kartikeya Dwivedi
These are invoked in two places, when the XDP frame or SKB (for generic XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes the frame after invoking the CPUMAP program for it (returning stats for the batch). We use cpumap_map_id to filter on the map_id as a way to avoid printing incorrect stats for parallel sessions of xdp_redirect_cpu. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-9-memxor@gmail.com
2021-08-24samples: bpf: Add xdp_exception tracepoint statistics supportKumar Kartikeya Dwivedi
This implements the retrieval and printing, as well the help output. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-8-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for xdp_exception tracepointKumar Kartikeya Dwivedi
This would allow us to store stats for each XDP action, including their per-CPU counts. Consolidating this here allows all redirect samples to detect xdp_exception events. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-7-memxor@gmail.com
2021-08-24samples: bpf: Add redirect tracepoint statistics supportKumar Kartikeya Dwivedi
This implements per-errno reporting (for the ones we explicitly recognize), adds some help output, and implements the stats retrieval and printing functions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-6-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for redirect tracepointKumar Kartikeya Dwivedi
This adds the shared BPF file that will be used going forward for sharing tracepoint programs among XDP redirect samples. Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other helpers that will be required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-5-memxor@gmail.com
2021-08-24samples: bpf: Add basic infrastructure for XDP samplesKumar Kartikeya Dwivedi
This file implements some common helpers to consolidate differences in features and functionality between the various XDP samples and give them a consistent look, feel, and reporting capabilities. This commit only adds support for receive statistics, which does not rely on any tracepoint, but on the XDP program installed on the device by each XDP redirect sample. Some of the key features are: * A concise output format accompanied by helpful text explaining its fields. * An elaborate output format building upon the concise one, and folding out details in case of errors and staying out of view otherwise. * Printing driver names for devices redirecting packets. * Getting mac address for interface. * Printing summarized total statistics for the entire session. * Ability to dynamically switch between concise and verbose mode, using SIGQUIT (Ctrl + \). In later patches, the support will be extended for each tracepoint with its own custom output in concise and verbose mode. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com
2021-08-24samples: bpf: Fix a couple of warningsKumar Kartikeya Dwivedi
cookie_uid_helper_example.c: In function ‘main’: cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive writing 10 bytes into a region of size between 8 and 58 [-Wformat-overflow=] 178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT", | ^~~~~~~~~~ /home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note: ‘sprintf’ output between 53 and 103 bytes into a destination of size 100 178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 | file); | ~~~~~ Fix by using snprintf and a sufficiently sized buffer. tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of size 11 [-Wstringop-overread] 35 | key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use size as 11. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com
2021-08-17tracing: Add trace_event helper macros __string_len() and __assign_str_len()Steven Rostedt (VMware)
There's a few cases that a string that is to be recorded in a trace event, does not have a terminating 'nul' character, and instead, the tracepoint passes in the length of the string to record. Add two helper macros to the trace event code that lets this work easier, than tricks with "%.*s" logic. __string_len() which is similar to __string() for declaration, but takes a length argument. __assign_str_len() which is similar to __assign_str() for assiging the string, but it too takes a length argument. Note, the TRACE_EVENT() macro will allocate the location on the ring buffer to 'len + 1', that will be used to store the string into. It is a requirement that the 'len' used for this is a most the length of the string being recorded. This string can still use __get_str() just like strings created with __string() can use to retrieve the string. Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/ Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-16samples: pktgen: add missing IPv6 option to pktgen scriptsJuhee Kang
Currently, "sample04" and "sample05" are not working properly when running with an IPv6 option("-6"). The commit 0f06a6787e05 ("samples: Add an IPv6 "-6" option to the pktgen scripts") has omitted the addition of this option at "sample04" and "sample05". In order to support IPv6 option, this commit adds logic related to IPv6 option. Fixes: 0f06a6787e05 ("samples: Add an IPv6 "-6" option to the pktgen scripts") Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16samples: pktgen: pass the environment variable of normal user to sudoJuhee Kang
All pktgen samples can use the environment variable instead of option parameters(eg. $DEV is able to use instead of '-i' option). This is results of running sample as root and user: // running as root # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -v -n 1 Running... ctrl^C to stop // running as normal user $ DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -v -n 1 [...] ERROR: Please specify output device This results show the sample doesn't work properly when the sample runs as normal user. Because the sample is restarted by the function (root_check_run_with_sudo) to run with sudo. In this process, the environment variable of normal user doesn't propagate to sudo. It can be solved by using "-E"(--preserve-env) option of "sudo", which preserve normal user's existing environment variables. So this commit adds "-E" option in the function (root_check_run_with_sudo). Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-15samples/bpf: Define MAX_ENTRIES instead of a magic number in offwaketimeMuhammad Falak R Wani
Define MAX_ENTRIES instead of using 10000 as a magic number in various places. Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210815065013.15411-1-falakreyaz@gmail.com
2021-08-11vfio/mbochs: Fix close when multiple device FDs are openJason Gunthorpe
mbochs_close() iterates over global device state and frees it. Currently this is done every time a device FD is closed, but if multiple device FDs are open this could corrupt other still active FDs. Change this to use close_device() so it only runs on the last close. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/samples: Delete useless open/closeJason Gunthorpe
The core code no longer requires these ops to be defined, so delete these empty functions and leave the op as NULL. mtty's functions only log a pointless message, delete that entirely. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio: Introduce a vfio_uninit_group_dev() API callMax Gurtovoy
This pairs with vfio_init_group_dev() and allows undoing any state that is stored in the vfio_device unrelated to registration. Add appropriately placed calls to all the drivers. The following patch will use this to add pre-registration state for the device set. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/mbochs: Fix missing error unwind of mbochs_used_mbytesJason Gunthorpe
Convert mbochs to use an atomic scheme for this like mtty was changed into. The atomic fixes various race conditions with probing. Add the missing error unwind. Also add the missing kfree of mdev_state->pages. Fixes: 681c1615f891 ("vfio/mbochs: Convert to use vfio_register_group_dev()") Reported-by: Cornelia Huck <cohuck@redhat.com> Co-developed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/2-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/samples: Remove module get/putJason Gunthorpe
The patch to move the get/put to core and the patch to convert the samples to use vfio_device crossed in a way that this was missed. When both patches are together the samples do not need their own get/put. Fixes: 437e41368c01 ("vfio/mdpy: Convert to use vfio_register_group_dev()") Fixes: 681c1615f891 ("vfio/mbochs: Convert to use vfio_register_group_dev()") Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/1-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-10samples, bpf: Add an explict comment to handle nested vlan tagging.Muhammad Falak R Wani
A codeblock for handling nested vlan trips newbies into thinking it as duplicate code. Explicitly add a comment to clarify. Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809070046.32142-1-falakreyaz@gmail.com
2021-08-06samples/bpf: xdpsock: Remove forward declaration of ip_fast_csum()Niklas Söderlund
There is a forward declaration of ip_fast_csum() just before its implementation, remove the unneeded forward declaration. While at it mark the implementation as static inline. Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/bpf/20210806122855.26115-3-simon.horman@corigine.com
2021-08-06samples/bpf: xdpsock: Make the sample more useful outside the treeNiklas Söderlund
The xdpsock sample application is a useful base for experiment's around AF_XDP sockets. Compiling the sample outside of the kernel tree is made harder then it has to be as the sample includes two headers and that are not installed by 'make install_header' nor are usually part of distributions kernel headers. The first header asm/barrier.h is not used and can just be dropped. The second linux/compiler.h are only needed for the decorator __force and are only used in ip_fast_csum(), csum_fold() and csum_tcpudp_nofold(). These functions are copied verbatim from include/asm-generic/checksum.h and lib/checksum.c. While it's fine to copy and use these functions in the sample application the decorator brings no value and can be dropped together with the include. With this change it's trivial to compile the xdpsock sample outside the kernel tree from xdpsock_user.c and xdpsock.h. $ gcc -o xdpsock xdpsock_user.c -lbpf -lpthread Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/bpf/20210806122855.26115-2-simon.horman@corigine.com
2021-08-05bpf, samples: Add missing mprog-disable to xdp_redirect_cpu's optstringMatthew Cover
Commit ce4dade7f12a ("samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap") added the following option, but missed adding it to optstring: - mprog-disable: disable loading XDP program on cpumap entries Fix it and add the missing option character. Fixes: ce4dade7f12a ("samples/bpf: xdp_redirect_cpu: Load a eBPF program on cpumap") Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210731005632.13228-1-matthew.cover@stackpath.com
2021-08-01samples: mei: don't wait on read completion upon write.Alexander Usyskin
The original mei driver communication was strictly write command and receive response flow, the completion of write was determined when response was ready using select(). This paradigm is a long time not true. There can be write without a response and an unsolicited read. The driver is capable of handling those. Adjust also the sample code and remove select() on read() from the write flow. Add select to the read flow to showcase how to do the read with a timeout. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210801072532.8668-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27samples: bpf: Add the omitted xdp samples to .gitignoreJuhee Kang
There are recently added xdp samples (xdp_redirect_map_multi and xdpsock_ctrl_proc) which are not managed by .gitignore. This commit adds these files to .gitignore. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210727041056.23455-2-claudiajkang@gmail.com
2021-07-27samples: bpf: Fix tracex7 error raised on the missing argumentJuhee Kang
The current behavior of 'tracex7' doesn't consist with other bpf samples tracex{1..6}. Other samples do not require any argument to run with, but tracex7 should be run with btrfs device argument. (it should be executed with test_override_return.sh) Currently, tracex7 doesn't have any description about how to run this program and raises an unexpected error. And this result might be confusing since users might not have a hunch about how to run this program. // Current behavior # ./tracex7 sh: 1: Syntax error: word unexpected (expecting ")") // Fixed behavior # ./tracex7 ERROR: Run with the btrfs device argument! In order to fix this error, this commit adds logic to report a message and exit when running this program with a missing argument. Additionally in test_override_return.sh, there is a problem with multiple directory(tmpmnt) creation. So in this commit adds a line with removing the directory with every execution. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210727041056.23455-1-claudiajkang@gmail.com
2021-07-27kdb: Rename members of struct kdbtab_tSumit Garg
Remove redundant prefix "cmd_" from name of members in struct kdbtab_t for better readibility. Suggested-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210712134620.276667-5-sumit.garg@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2021-07-27kdb: Get rid of redundant kdb_register_flags()Sumit Garg
Commit e4f291b3f7bb ("kdb: Simplify kdb commands registration") allowed registration of pre-allocated kdb commands with pointer to struct kdbtab_t. Lets switch other users as well to register pre- allocated kdb commands via: - Changing prototype for kdb_register() to pass a pointer to struct kdbtab_t instead. - Embed kdbtab_t structure in kdb_macro_t rather than individual params. With these changes kdb_register_flags() becomes redundant and hence removed. Also, since we have switched all users to register pre-allocated commands, "is_dynamic" flag in struct kdbtab_t becomes redundant and hence removed as well. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20210712134620.276667-3-sumit.garg@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2021-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-07-15 The following pull-request contains BPF updates for your *net-next* tree. We've added 45 non-merge commits during the last 15 day(s) which contain a total of 52 files changed, 3122 insertions(+), 384 deletions(-). The main changes are: 1) Introduce bpf timers, from Alexei. 2) Add sockmap support for unix datagram socket, from Cong. 3) Fix potential memleak and UAF in the verifier, from He. 4) Add bpf_get_func_ip helper, from Jiri. 5) Improvements to generic XDP mode, from Kumar. 6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-14Merge tag 'net-5.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ...
2021-07-07samples/bpf: xdp_redirect_cpu_user: Cpumap qsize set larger defaultJesper Dangaard Brouer
Experience from production shows queue size of 192 is too small, as this caused packet drops during cpumap-enqueue on RX-CPU. This can be diagnosed with xdp_monitor sample program. This bpftrace program was used to diagnose the problem in more detail: bpftrace -e ' tracepoint:xdp:xdp_cpumap_kthread { @deq_bulk = lhist(args->processed,0,10,1); @drop_net = lhist(args->drops,0,10,1) } tracepoint:xdp:xdp_cpumap_enqueue { @enq_bulk = lhist(args->processed,0,10,1); @enq_drops = lhist(args->drops,0,10,1); }' Watch out for the @enq_drops counter. The @drop_net counter can happen when netstack gets invalid packets, so don't despair it can be natural, and that counter will likely disappear in newer kernels as it was a source of confusion (look at netstat info for reason of the netstack @drop_net counters). The production system was configured with CPU power-saving C6 state. Learn more in this blogpost[1]. And wakeup latency in usec for the states are: # grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10 /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:133 Deepest state take 133 usec to wakeup from (133/10^6). The link speed is 25Gbit/s ((25*10^9/8) in bytes/sec). How many bytes can arrive with in 133 usec at this speed: (25*10^9/8)*(133/10^6) = 415625 bytes. With MTU size packets this is 275 packets, and with minimum Ethernet (incl intergap overhead) 84 bytes it is 4948 packets. Clearly default queue size is too small. Setting default cpumap queue to 2048 as worst-case (small packet) at 10Gbit/s is 1979 packets with 133 usec wakeup time, +64 packet before kthread wakeup call (due to xdp_do_flush) worst-case 2043 packets. Thus, if a packet burst on RX-CPU will enqueue packets to a remote cpumap CPU that is in deep-sleep state it can overrun the cpumap queue. The production system was also configured to avoid deep-sleep via: tuned-adm profile network-latency [1] https://jeremyeder.com/2013/08/30/oh-did-you-expect-the-cpu/ Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/162523477604.786243.13372630844944530891.stgit@firesoul
2021-07-05bpf, samples: Fix xdpsock with '-M' parameter missing unload processWang Hai
Execute the following command and exit, then execute it again, the following error will be reported: $ sudo ./samples/bpf/xdpsock -i ens4f2 -M ^C $ sudo ./samples/bpf/xdpsock -i ens4f2 -M libbpf: elf: skipping unrecognized data section(16) .eh_frame libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame libbpf: Kernel error message: XDP program already attached ERROR: link set xdp fd failed Commit c9d27c9e8dc7 ("samples: bpf: Do not unload prog within xdpsock") removed the unloading prog code because of the presence of bpf_link. This is fine if XDP_SHARED_UMEM is disabled, but if it is enabled, unloading the prog is still needed. Fixes: c9d27c9e8dc7 ("samples: bpf: Do not unload prog within xdpsock") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210628091815.2373487-1-wanghai38@huawei.com
2021-07-05bpf, samples: Add -fno-asynchronous-unwind-tables to BPF Clang invocationToke Høiland-Jørgensen
The samples/bpf Makefile currently compiles BPF files in a way that will produce an .eh_frame section, which will in turn confuse libbpf and produce errors when loading BPF programs, like: libbpf: elf: skipping unrecognized data section(32) .eh_frame libbpf: elf: skipping relo section(33) .rel.eh_frame for section(32) .eh_frame Fix this by instruction Clang not to produce this section, as it's useless for BPF anyway. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210705103841.180260-1-toke@redhat.com
2021-07-03Merge tag 'vfio-v5.14-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Module reference fixes, structure renaming (Max Gurtovoy) - Export and use common pci_dev_trylock() (Luis Chamberlain) - Enable direct mdev device creation and probing by parent (Christoph Hellwig & Jason Gunthorpe) - Fix mdpy error path leak (Colin Ian King) - Fix mtty list entry leak (Jason Gunthorpe) - Enforce mtty device limit (Alex Williamson) - Resolve concurrent vfio-pci mmap faults (Alex Williamson) * tag 'vfio-v5.14-rc1' of git://github.com/awilliam/linux-vfio: vfio/pci: Handle concurrent vma faults vfio/mtty: Enforce available_instances vfio/mtty: Delete mdev_devices_list vfio: use the new pci_dev_trylock() helper to simplify try lock PCI: Export pci_dev_trylock() and pci_dev_unlock() vfio/mdpy: Fix memory leak of object mdev_state->vconfig vfio/iommu_type1: rename vfio_group struck to vfio_iommu_group vfio/mbochs: Convert to use vfio_register_group_dev() vfio/mdpy: Convert to use vfio_register_group_dev() vfio/mtty: Convert to use vfio_register_group_dev() vfio/mdev: Allow the mdev_parent_ops to specify the device driver to bind vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE driver core: Export device_driver_attach() driver core: Don't return EPROBE_DEFER to userspace during sysfs bind driver core: Flow the return code from ->probe() through to sysfs bind driver core: Better distinguish probe errors in really_probe driver core: Pull required checks into driver_probe_device() vfio/platform: remove unneeded parent_module attribute vfio: centralize module refcount in subsystem layer
2021-06-30Merge tag 'net-next-5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
2021-06-28Merge tag 'docs-5.14' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "This was a reasonably active cycle for documentation; this includes: - Some kernel-doc cleanups. That script is still regex onslaught from hell, but it has gotten a little better. - Improvements to the checkpatch docs, which are also used by the tool itself. - A major update to the pathname lookup documentation. - Elimination of :doc: markup, since our automarkup magic can create references from filenames without all the extra noise. - The flurry of Chinese translation activity continues. Plus, of course, the usual collection of updates, typo fixes, and warning fixes" * tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits) docs: path-lookup: use bare function() rather than literals docs: path-lookup: update symlink description docs: path-lookup: update get_link() ->follow_link description docs: path-lookup: update WALK_GET, WALK_PUT desc docs: path-lookup: no get_link() docs: path-lookup: update i_op->put_link and cookie description docs: path-lookup: i_op->follow_link replaced with i_op->get_link docs: path-lookup: Add macro name to symlink limit description docs: path-lookup: remove filename_mountpoint docs: path-lookup: update do_last() part docs: path-lookup: update path_mountpoint() part docs: path-lookup: update path_to_nameidata() part docs: path-lookup: update follow_managed() part docs: Makefile: Use CONFIG_SHELL not SHELL docs: Take a little noise out of the build process docs: x86: avoid using ReST :doc:`foo` markup docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup ...