summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-25lan78xx: Fix white space and style issuesJohn Efstathiades
Fix white space and code style issues identified by checkpatch. Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25Merge branch 'xen-harden-netfront'David S. Miller
Juergen Gross says: ==================== xen: harden netfront against malicious backends Xen backends of para-virtualized devices can live in dom0 kernel, dom0 user land, or in a driver domain. This means that a backend might reside in a less trusted environment than the Xen core components, so a backend should not be able to do harm to a Xen guest (it can still mess up I/O data, but it shouldn't be able to e.g. crash a guest by other means or cause a privilege escalation in the guest). Unfortunately netfront in the Linux kernel is fully trusting its backend. This series is fixing netfront in this regard. It was discussed to handle this as a security problem, but the topic was discussed in public before, so it isn't a real secret. It should be mentioned that a similar series has been posted some years ago by Marek Marczykowski-Górecki, but this series has not been applied due to a Xen header not having been available in the Xen git repo at that time. Additionally my series is fixing some more DoS cases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25xen/netfront: don't trust the backend response data blindlyJuergen Gross
Today netfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request. Note that only the tx queue needs special id handling, as for the rx queue the id is equal to the index in the ring page. Introduce a new indicator for the device whether it is broken and let the device stop working when it is set. Set this indicator in case the backend sets any weird data. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25xen/netfront: disentangle tx_skb_freelistJuergen Gross
The tx_skb_freelist elements are in a single linked list with the request id used as link reference. The per element link field is in a union with the skb pointer of an in use request. Move the link reference out of the union in order to enable a later reuse of it for requests which need a populated skb pointer. Rename add_id_to_freelist() and get_id_from_freelist() to add_id_to_list() and get_id_from_list() in order to prepare using those for other lists as well. Define ~0 as value to indicate the end of a list and place that value into the link for a request not being on the list. When freeing a skb zero the skb pointer in the request. Use a NULL value of the skb pointer instead of skb_entry_is_link() for deciding whether a request has a skb linked to it. Remove skb_entry_set_link() and open code it instead as it is really trivial now. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25xen/netfront: don't read data from request on the ring pageJuergen Gross
In order to avoid a malicious backend being able to influence the local processing of a request build the request locally first and then copy it to the ring page. Any reading from the request influencing the processing in the frontend needs to be done on the local instance. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25xen/netfront: read response from backend only onceJuergen Gross
In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net: macb: Add a NULL check on desc_ptpHarini Katakam
macb_ptp_desc will not return NULL under most circumstances with correct Kconfig and IP design config register. But for the sake of the extreme corner case, check for NULL when using the helper. In case of rx_tstamp, no action is necessary except to return (similar to timestamp disabled) and warn. In case of TX, return -EINVAL to let the skb be free. Perform this check before marking skb in progress. Fixes coverity warning: (4) Event dereference: Dereferencing a null pointer "desc_ptp" Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25qed: Enable automatic recovery on error condition.Alok Prasad
This patch enables automatic recovery by default in case of various error condition like fw assert , hardware error etc. This also ensure driver can handle multiple iteration of assertion conditions. Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warningsMichael Riesch
This reverts commit 2c896fb02e7f65299646f295a007bda043e0f382 "net: stmmac: dwmac-rk: add pd_gmac support for rk3399" and fixes unbalanced pm_runtime_enable warnings. In the commit to be reverted, support for power management was introduced to the Rockchip glue code. Later, power management support was introduced to the stmmac core code, resulting in multiple invocations of pm_runtime_{enable,disable,get_sync,put_sync}. The multiple invocations happen in rk_gmac_powerup and stmmac_{dvr_probe, resume} as well as in rk_gmac_powerdown and stmmac_{dvr_remove, suspend}, respectively, which are always called in conjunction. Fixes: 5ec55823438e850c91c6b92aec93fb04ebde29e2 ("net: stmmac: add clocks management for gmac driver") Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25net-next: When a bond have a massive amount of VLANs with IPv6 addresses, ↵Gilad Naaman
performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically. The source of most of the slow down is the `dev_addr_lists.c` module, which mainatins a linked list of HW addresses. When using IPv6, this list grows for each IPv6 address added on a VLAN, since each IPv6 address has a multicast HW address associated with it. When performing any modification to the involved links, this list is traversed many times, often for nothing, all while holding the RTNL lock. Instead, this patch adds an auxilliary rbtree which cuts down traversal time significantly. Performance can be seen with the following script: #!/bin/bash ip netns del test || true 2>/dev/null ip netns add test echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null set -e ip -n test link add foo type veth peer name bar ip -n test link add b1 type bond ip -n test link add florp type vrf table 10 ip -n test link set bar master b1 ip -n test link set foo up ip -n test link set bar up ip -n test link set b1 up ip -n test link set florp up VLAN_COUNT=1500 BASE_DEV=b1 echo Creating vlans ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done" echo Bringing them up ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i up; done" echo Assiging IPv6 Addresses ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test address add dev foo.\$i 2000::\$i/64; done" echo Attaching to VRF ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i master florp; done" On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance before the patch is (truncated): Creating vlans real 108.35 Bringing them up real 4.96 Assiging IPv6 Addresses real 19.22 Attaching to VRF real 458.84 After the patch: Creating vlans real 5.59 Bringing them up real 5.07 Assiging IPv6 Addresses real 5.64 Attaching to VRF real 25.37 Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Lu Wei <luwei32@huawei.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25can: mscan: mpc5xxx_can: mpc5xxx_can_probe(): remove useless BUG_ON()Tang Bin
In the function mpc5xxx_can_probe(), the variable 'data' has already been determined in the above code, so the BUG_ON() in this place is useless, remove it. Link: https://lore.kernel.org/r/20210823141033.17876-1-tangbin@cmss.chinamobile.com Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-25can: mscan: mpc5xxx_can: mpc5xxx_can_probe(): use of_device_get_match_data ↵Tang Bin
to simplify code Retrieve OF match data, it's better and cleaner to use 'of_device_get_match_data' over 'of_match_device'. Link: https://lore.kernel.org/r/20210823113338.3568-4-tangbin@cmss.chinamobile.com Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-25can: rcar_canfd: rcar_canfd_handle_channel_tx(): fix redundant assignmentLad Prabhakar
Fix redundant assignment of 'priv' to itself in rcar_canfd_handle_channel_tx(). Fixes: 76e9353a80e9 ("can: rcar_canfd: Add support for RZ/G2L family") Link: https://lore.kernel.org/r/20210820161449.18169-1-prabhakar.mahadev-lad.rj@bp.renesas.com Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-25can: rcar: Kconfig: Add helper dependency on COMPILE_TESTCai Huoqing
it's helpful for complie test in other platform(e.g.X86) Link: https://lore.kernel.org/r/20210825062341.2332-1-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-24net: phy: mediatek: add the missing suspend/resume callbacksDENG Qingfang
Without suspend/resume callbacks, the PHY cannot be powered down/up administratively. Fixes: e40d2cca0189 ("net: phy: add MediaTek Gigabit Ethernet PHY driver") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210823044422.164184-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24net: bridge: change return type of br_handle_ingress_vlan_tunnelKangmin Park
br_handle_ingress_vlan_tunnel() is only referenced in br_handle_frame(). If br_handle_ingress_vlan_tunnel() is called and return non-zero value, goto drop in br_handle_frame(). But, br_handle_ingress_vlan_tunnel() always return 0. So, the routines that check the return value and goto drop has no meaning. Therefore, change return type of br_handle_ingress_vlan_tunnel() to void and remove if statement of br_handle_frame(). Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210823102118.17966-1-l4stpr0gr4m@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24selftests/net: Use kselftest skip code for skipped testsPo-Hsu Lin
There are several test cases in the net directory are still using exit 0 or exit 1 when they need to be skipped. Use kselftest framework skip code instead so it can help us to distinguish the return status. Criterion to filter out what should be fixed in net directory: grep -r "exit [01]" -B1 | grep -i skip This change might cause some false-positives if people are running these test scripts directly and only checking their return codes, which will change from 0 to 4. However I think the impact should be small as most of our scripts here are already using this skip code. And there will be no such issue if running them with the kselftest framework. Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20210823085854.40216-1-po-hsu.lin@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24Merge branch 'Improve XDP samples usability and output'Alexei Starovoitov
Kumar Kartikeya says: ==================== This set revamps XDP samples related to redirection to show better output and implement missing features consolidating all their differences and giving them a consistent look and feel, by implementing common features and command line options. Some of the TODO items like reporting redirect error numbers (ENETDOWN, EINVAL, ENOSPC, etc.) have also been implemented. Some of the features are: * Received packet statistics * xdp_redirect/xdp_redirect_map tracepoint statistics * xdp_redirect_err/xdp_redirect_map_err tracepoint statistics (with support for showing exact errno) * xdp_cpumap_enqueue/xdp_cpumap_kthread tracepoint statistics * xdp_devmap_xmit tracepoint statistics * xdp_exception tracepoint statistics * Per ifindex pair devmap_xmit stats shown dynamically (for xdp_monitor) to decompose the total. * Use of BPF skeleton and BPF static linking to share BPF programs. * Use of vmlinux.h and tp_btf for raw_tracepoint support. * Removal of redundant -N/--native-mode option (enforced by default now) * ... and massive cleanups all over the place. All tracepoints also use raw_tp now, and tracepoints like xdp_redirect are only enabled when requested explicitly to capture successful redirection statistics. The set of programs converted as part of this series are: * xdp_redirect_cpu * xdp_redirect_map_multi * xdp_redirect_map * xdp_redirect * xdp_monitor Explanation of the output: There is now a concise output mode by default that shows primarily four fields: rx/s Number of packets received per second redir/s Number of packets successfully redirected per second err,drop/s Aggregated count of errors per second (including dropped packets) xmit/s Number of packets transmitted on the output device per second Some examples: ; sudo ./xdp_redirect_map veth0 veth1 -s Redirecting from veth0 (ifindex 15; driver veth) to veth1 (ifindex 14; driver veth) veth0->veth1 0 rx/s 0 redir/s 0 err,drop/s 0 xmit/s veth0->veth1 9,998,660 rx/s 9,998,658 redir/s 0 err,drop/s 9,998,654 xmit/s ... There is also a verbose mode, that can also be enabled by default using -v (--verbose). The output mode can be switched dynamically at runtime using Ctrl + \ (SIGQUIT). To make the concise output more useful, the errors that occur are expanded inline (as if verbose mode was enabled) to let the user pin down the source of the problem without having to clutter output (or possibly miss it) or always use verbose mode. For instance, let's consider a case where the output device link state is set to down while redirection is happening: [...] veth0->veth1 24,503,376 rx/s 0 err,drop/s 24,503,372 xmit/s veth0->veth1 25,044,775 rx/s 0 err,drop/s 25,044,783 xmit/s veth0->veth1 25,263,046 rx/s 4 err,drop/s 25,263,028 xmit/s redirect_err 4 error/s ENETDOWN 4 error/s [...] The same holds for xdp_exception actions. An example of how a complete xdp_redirect_map session would look: ; sudo ./xdp_redirect_map veth0 veth1 Redirecting from veth0 (ifindex 5; driver veth) to veth1 (ifindex 4; driver veth) veth0->veth1 7,411,506 rx/s 0 err,drop/s 7,411,470 xmit/s veth0->veth1 8,931,770 rx/s 0 err,drop/s 8,931,771 xmit/s ^\ veth0->veth1 8,787,295 rx/s 0 err,drop/s 8,787,325 xmit/s receive total 8,787,295 pkt/s 0 drop/s 0 error/s cpu:7 8,787,295 pkt/s 0 drop/s 0 error/s redirect_err 0 error/s xdp_exception 0 hit/s xmit veth0->veth1 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg cpu:7 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg veth0->veth1 8,842,610 rx/s 0 err,drop/s 8,842,606 xmit/s receive total 8,842,610 pkt/s 0 drop/s 0 error/s cpu:7 8,842,610 pkt/s 0 drop/s 0 error/s redirect_err 0 error/s xdp_exception 0 hit/s xmit veth0->veth1 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg cpu:7 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg ^C Packets received : 33,973,181 Average packets/s : 4,246,648 Packets transmitted : 33,973,172 Average transmit/s : 4,246,647 The xdp_redirect tracepoint (for success stats) needs to be enabled explicitly using --stats/-s. Documentation for entire output and options is provided when user specifies --help/-h with a sample. Changelog: ---------- v3 -> v4: v3: https://lore.kernel.org/bpf/20210728165552.435050-1-memxor@gmail.com * Address all feedback from Daniel * Use READ_ONCE/WRITE_ONCE from linux/compiler.h (cannot directly include due to conflicts with vmlinux.h) * Fix MAX_CPUS hardcoding by switching to mmapable array maps, that are resized based on the value of libbpf_num_possible_cpus * s/ELEMENTS_OF/ARRAY_SIZE/g * Use tools/include/linux/hashtable.h * Coding style fixes * Remove hyperlinks for tracepoints * Split into smaller reviewable changes * Restore support for specifying custom xdp_redirect_cpu cpumap prog with some enhancements, including built-in programs for common actions (pass, drop, redirect). By default, cpumap prog is now disabled. * Misc bug fixes all over the place The printing stuff is a lot more basic without hyperlink support, hence it has not been exported into a more general facility. v2 -> v3 v2: https://lore.kernel.org/bpf/20210721212833.701342-1-memxor@gmail.com * Address all feedback from Andrii * Replace usage of libbpf hashmap (internal API) with custom one * Rename ATOMIC_* macros to NO_TEAR_* to better reflect their use * Use size_t as a portable word sized data type * Set libbpf_set_strict_mode * Invert conditions in BPF programs to exit early and reduce nesting * Use canonical SEC("xdp") naming for all XDP BPF progams * Add missing help description for cpumap enqueue and kthread tracepoints * Move private struct declarations from xdp_sample_user.h to .c file * Improve help output for cpumap enqueue and cpumap kthread tracepoints * Fix a bug where keys array for BPF_MAP_LOOKUP_BATCH is overallocated * Fix some conditions for printing stats (earlier only checked pps, now pps, drop, err and print if any is greater than zero) * Fix alloc_stats_record to properly return and cleanup allocated memory on allocation failure instead of calling exit(3) * Bump bpf_map_lookup_batch count to 32 to reduce lookup time with multiple devices in map * Fix a bug where devmap_xmit_multi stats are not printed when previous record is missing (i.e. when the first time stats are printed), by simply using a dummy record that is zeroed out * Also print per-CPU counts for devmap_xmit_multi which we collect already * Change mac_map to be BPF_MAP_TYPE_HASH instead of array to prevent resizing to a large size when max_ifindex is high, in xdp_redirect_map_multi * Fix instance of strerror(errno) in sample_install_xdp to use saved errno * Provide a usage function from samples helper * Provide a fix where incorrect stats are shown for parallel sessions of xdp_redirect_* samples by introducing matching support for input device(s), output device(s) and cpumap map id for enqueue and kthread stats. Only xdp_monitor doesn't filter stats, all others do. RFC (v1) -> v2 RFC (v1): https://lore.kernel.org/bpf/20210528235250.2635167-1-memxor@gmail.com * Address all feedback from Andrii * Use BPF static linking * Use vmlinux.h * Use BPF_PROG macro * Use global variables instead of maps * Use of tp_btf for raw_tracepoint progs * Switch to timerfd for polling * Use libbpf hashmap for maintaing device sets for per ifindex pair devmap_xmit stats * Fix Makefile to specify object dependencies properly * Use in-tree bpftool * ... misc fixes and cleanups all over the place ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-24samples: bpf: Convert xdp_redirect_map_multi to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Also adapt to change of type of mac address map, so that no resizing is required. Add a new flag for sample mask that skips priting the from_device->to_device heading for each line, as xdp_redirect_map_multi may have two devices but the flow of data may be bidirectional, so the output would be confusing. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_map_multi_kern.o to XDP samples helperKumar Kartikeya Dwivedi
One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array map to store mac addresses of devices, as the resizing behavior was based on max_ifindex, which unecessarily maximized the capacity of map beyond what was needed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_map to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Since get_mac_addr is already provided by XDP samples helper, we drop it. Also convert to XDP samples helper similar to prior samples to minimize duplication of code. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_map_kern.o to XDP samples helperKumar Kartikeya Dwivedi
Also update it to use consistent SEC("xdp") and SEC("xdp_devmap") naming, and use global variable instead of BPF map for copying the mac address. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_cpu to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a few minor omissions (e.g. redirect errno reporting). All of these have been moved to XDP samples helper, hence drop the unneeded code and convert to usage of helpers provided by it. One of the important changes here is dropping of mprog-disable option, as we make that the default. Also, we support built-in programs for some common actions on the packet when it reaches kthread (pass, drop, redirect to device). If the user still needs to install a custom program, they can still supply a BPF object, however the program should be suitably tagged with SEC("xdp_cpumap") annotation so that the expected attach type is correct when updating our cpumap map element. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_cpu_kern.o to XDP samples helperKumar Kartikeya Dwivedi
Similar to xdp_monitor_kern, a lot of these BPF programs have been reimplemented properly consolidating missing features from other XDP samples. Hence, drop the unneeded code and rename to .bpf.c suffix. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. One important note: The XDP samples helper handles ownership of installed XDP programs on devices, including responding to SIGINT and SIGTERM, so drop the code here and use the helpers we provide going forward for all xdp_redirect* conversions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_redirect_kern.o to XDP samples helperKumar Kartikeya Dwivedi
We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other potential users, so drop it while moving code to the new file. Also, consistently use SEC("xdp") naming instead. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_monitor to XDP samples helperKumar Kartikeya Dwivedi
Use the libbpf skeleton facility and other utilities provided by XDP samples helper. A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to the xdp_sample_user.o helper, so we remove the duplicate functions here that are no longer needed. Thanks to BPF skeleton, we no longer depend on order of tracepoints to uninstall them on startup. Instead, the sample mask is used to install the needed tracepoints. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
2021-08-24samples: bpf: Convert xdp_monitor_kern.o to XDP samples helperKumar Kartikeya Dwivedi
We already moved all the functionality it provided in XDP samples helper userspace and kernel BPF object, so just delete the unneeded code. We also add generation of BPF skeleton and compilation using clang -target bpf for files ending with .bpf.c suffix (to denote that they use vmlinux.h). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com
2021-08-24samples: bpf: Add vmlinux.h generation supportKumar Kartikeya Dwivedi
Also, take this opportunity to depend on in-tree bpftool, so that we can use static linking support in subsequent commits for XDP samples BPF helper object. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-13-memxor@gmail.com
2021-08-24samples: bpf: Add devmap_xmit tracepoint statistics supportKumar Kartikeya Dwivedi
This adds support for retrieval and printing for devmap_xmit total and mutli mode tracepoint. For multi mode, we keep a hash map entry for each redirection stream, such that we can dynamically add and remove entries on output. The from_match and to_match will be set by individual samples when setting up the XDP program on these devices. The multi mode tracepoint is also handy for xdp_redirect_map_multi, where up to 32 devices can be specified. Also add samples_init_pre_load macro to finally set up the resized maps and mmap them in place for low overhead stats retrieval. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for devmap_xmit tracepointKumar Kartikeya Dwivedi
This adds support for the devmap_xmit tracepoint, and its multi device variant that can be used to obtain streams for each individual net_device to net_device redirection. This is useful for decomposing total xmit stats in xdp_monitor. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-11-memxor@gmail.com
2021-08-24samples: bpf: Add cpumap tracepoint statistics supportKumar Kartikeya Dwivedi
This consolidates retrieval and printing into the XDP sample helper. For the kthread stats, it expands xdp_stats separately with its own per-CPU stats. For cpumap enqueue, we display FROM->TO stats also with its per-CPU stats. The help out explains in detail the various aspects of the output. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-10-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for cpumap tracepointsKumar Kartikeya Dwivedi
These are invoked in two places, when the XDP frame or SKB (for generic XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes the frame after invoking the CPUMAP program for it (returning stats for the batch). We use cpumap_map_id to filter on the map_id as a way to avoid printing incorrect stats for parallel sessions of xdp_redirect_cpu. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-9-memxor@gmail.com
2021-08-24samples: bpf: Add xdp_exception tracepoint statistics supportKumar Kartikeya Dwivedi
This implements the retrieval and printing, as well the help output. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-8-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for xdp_exception tracepointKumar Kartikeya Dwivedi
This would allow us to store stats for each XDP action, including their per-CPU counts. Consolidating this here allows all redirect samples to detect xdp_exception events. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-7-memxor@gmail.com
2021-08-24samples: bpf: Add redirect tracepoint statistics supportKumar Kartikeya Dwivedi
This implements per-errno reporting (for the ones we explicitly recognize), adds some help output, and implements the stats retrieval and printing functions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-6-memxor@gmail.com
2021-08-24samples: bpf: Add BPF support for redirect tracepointKumar Kartikeya Dwivedi
This adds the shared BPF file that will be used going forward for sharing tracepoint programs among XDP redirect samples. Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other helpers that will be required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-5-memxor@gmail.com
2021-08-24samples: bpf: Add basic infrastructure for XDP samplesKumar Kartikeya Dwivedi
This file implements some common helpers to consolidate differences in features and functionality between the various XDP samples and give them a consistent look, feel, and reporting capabilities. This commit only adds support for receive statistics, which does not rely on any tracepoint, but on the XDP program installed on the device by each XDP redirect sample. Some of the key features are: * A concise output format accompanied by helpful text explaining its fields. * An elaborate output format building upon the concise one, and folding out details in case of errors and staying out of view otherwise. * Printing driver names for devices redirecting packets. * Getting mac address for interface. * Printing summarized total statistics for the entire session. * Ability to dynamically switch between concise and verbose mode, using SIGQUIT (Ctrl + \). In later patches, the support will be extended for each tracepoint with its own custom output in concise and verbose mode. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com
2021-08-24tools: include: Add ethtool_drvinfo definition to UAPI headerKumar Kartikeya Dwivedi
Instead of copying the whole header in, just add the struct definitions we need for now. In the future it can be synced as a copy of in-tree header if required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-3-memxor@gmail.com
2021-08-24samples: bpf: Fix a couple of warningsKumar Kartikeya Dwivedi
cookie_uid_helper_example.c: In function ‘main’: cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive writing 10 bytes into a region of size between 8 and 58 [-Wformat-overflow=] 178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT", | ^~~~~~~~~~ /home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note: ‘sprintf’ output between 53 and 103 bytes into a destination of size 100 178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 | file); | ~~~~~ Fix by using snprintf and a sufficiently sized buffer. tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of size 11 [-Wstringop-overread] 35 | key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use size as 11. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com
2021-08-24bpf: Fix possible out of bound write in narrow load handlingAndrey Ignatov
Fix a verifier bug found by smatch static checker in [0]. This problem has never been seen in prod to my best knowledge. Fixing it still seems to be a good idea since it's hard to say for sure whether it's possible or not to have a scenario where a combination of convert_ctx_access() and a narrow load would lead to an out of bound write. When narrow load is handled, one or two new instructions are added to insn_buf array, but before it was only checked that cnt >= ARRAY_SIZE(insn_buf) And it's safe to add a new instruction to insn_buf[cnt++] only once. The second try will lead to out of bound write. And this is what can happen if `shift` is set. Fix it by making sure that if the BPF_RSH instruction has to be added in addition to BPF_AND then there is enough space for two more instructions in insn_buf. The full report [0] is below: kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c 12282 12283 insn->off = off & ~(size_default - 1); 12284 insn->code = BPF_LDX | BPF_MEM | size_code; 12285 } 12286 12287 target_size = 0; 12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog, 12289 &target_size); 12290 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) || ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bounds check. 12291 (ctx_field_size && !target_size)) { 12292 verbose(env, "bpf verifier is misconfigured\n"); 12293 return -EINVAL; 12294 } 12295 12296 if (is_narrower_load && size < target_size) { 12297 u8 shift = bpf_ctx_narrow_access_offset( 12298 off, size, size_default) * 8; 12299 if (ctx_field_size <= 4) { 12300 if (shift) 12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, ^^^^^ increment beyond end of array 12302 insn->dst_reg, 12303 shift); --> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg, ^^^^^ out of bounds write 12305 (1 << size * 8) - 1); 12306 } else { 12307 if (shift) 12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH, 12309 insn->dst_reg, 12310 shift); 12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg, ^^^^^^^^^^^^^^^ Same. 12312 (1ULL << size * 8) - 1); 12313 } 12314 } 12315 12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 12317 if (!new_prog) 12318 return -ENOMEM; 12319 12320 delta += cnt - 1; 12321 12322 /* keep walking new program and skip insns we just inserted */ 12323 env->prog = new_prog; 12324 insn = new_prog->insnsi + i + delta; 12325 } 12326 12327 return 0; 12328 } [0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/ v1->v2: - clarify that problem was only seen by static checker but not in prod; Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com
2021-08-24Merge branch 'bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG'Alexei Starovoitov
Xu Liu says: ==================== We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-24selftests/bpf: Test for get_netns_cookieXu Liu
Add test to use get_netns_cookie() from BPF_PROG_TYPE_SK_MSG. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-3-liuxu623@gmail.com
2021-08-24bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSGXu Liu
We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-2-liuxu623@gmail.com
2021-08-24Merge branch 'selftests/bpf: minor fixups'Alexei Starovoitov
Li Zhijian says: ==================== Fix a few issues reported by 0Day/LKP during runing selftests/bpf. Changelog: V2: - folded previous similar standalone patch to [1/5], and add acked tag from Song Liu - add acked tag to [2/5], [3/5] from Song Liu - [4/5]: move test_bpftool.py to TEST_PROGS_EXTENDED, files in TEST_GEN_PROGS_EXTENDED are generated by make. Otherwise, it will break out-of-tree install: 'make O=/kselftest-build SKIP_TARGETS= V=1 -C tools/testing/selftests install INSTALL_PATH=/kselftest-install' - [5/5]: new patch ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-24selftests/bpf: Exit with KSFT_SKIP if no Makefile foundLi Zhijian
This would happend when we run the tests after install kselftests root@lkp-skl-d01 ~# /kselftests/run_kselftest.sh -t bpf:test_doc_build.sh TAP version 13 1..1 # selftests: bpf: test_doc_build.sh perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_ADDRESS = "en_US.UTF-8", LC_NAME = "en_US.UTF-8", LC_MONETARY = "en_US.UTF-8", LC_PAPER = "en_US.UTF-8", LC_IDENTIFICATION = "en_US.UTF-8", LC_TELEPHONE = "en_US.UTF-8", LC_MEASUREMENT = "en_US.UTF-8", LC_TIME = "en_US.UTF-8", LC_NUMERIC = "en_US.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). # skip: bpftool files not found! # ok 1 selftests: bpf: test_doc_build.sh # SKIP Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820025549.28325-1-lizhijian@cn.fujitsu.com
2021-08-24selftests/bpf: Add missing files required by test_bpftool.sh for installingLi Zhijian
test_bpftool.sh relies on bpftool and test_bpftool.py. 'make install' will install bpftool to INSTALL_PATH/bpf/bpftool, and export it to PATH so that it can be used after installing. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820015556.23276-5-lizhijian@cn.fujitsu.com
2021-08-24selftests/bpf: Add default bpftool built by selftests to PATHLi Zhijian
For 'make run_tests': selftests will build bpftool into tools/testing/selftests/bpf/tools/sbin/bpftool by default. ================== root@lkp-skl-d01 /opt/rootfs/v5.14-rc4# make -C tools/testing/selftests/bpf run_tests make: Entering directory '/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf' MKDIR include MKDIR libbpf MKDIR bpftool [...] GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/profiler.skel.h CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/prog.o GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pid_iter.skel.h CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pids.o LINK /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/bpftool INSTALL bpftool GEN vmlinux.h [...] # test_feature_dev_json (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel_full (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel_full_vs_not_full (test_bpftool.TestBpftool) ... ERROR # test_feature_macros (test_bpftool.TestBpftool) ... Error: bug: failed to retrieve CAP_BPF status: Invalid argument # ERROR # # ====================================================================== # ERROR: test_feature_dev_json (test_bpftool.TestBpftool) # ---------------------------------------------------------------------- # Traceback (most recent call last): # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 57, in wrapper # return f(*args, iface, **kwargs) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 82, in test_feature_dev_json # res = bpftool_json(["feature", "probe", "dev", iface]) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 42, in bpftool_json # res = _bpftool(args) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 34, in _bpftool # return subprocess.check_output(_args) # File "/usr/lib/python3.7/subprocess.py", line 395, in check_output # **kwargs).stdout # File "/usr/lib/python3.7/subprocess.py", line 487, in run # output=stdout, stderr=stderr) # subprocess.CalledProcessError: Command '['bpftool', '-j', 'feature', 'probe', 'dev', 'dummy0']' returned non-zero exit status 255. # ================== Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-4-lizhijian@cn.fujitsu.com
2021-08-24selftests/bpf: Make test_doc_build.sh work from script directoryLi Zhijian
Previously, it fails as below: ------------- root@lkp-skl-d01 /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf# ./test_doc_build.sh ++ realpath --relative-to=/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf ./test_doc_build.sh + SCRIPT_REL_PATH=test_doc_build.sh ++ dirname test_doc_build.sh + SCRIPT_REL_DIR=. ++ realpath /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/./../../../../ + KDIR_ROOT_DIR=/opt/rootfs/v5.14-rc4 + cd /opt/rootfs/v5.14-rc4 + for tgt in docs docs-clean + make -s -C /opt/rootfs/v5.14-rc4/. docs make: *** No rule to make target 'docs'. Stop. + for tgt in docs docs-clean + make -s -C /opt/rootfs/v5.14-rc4/. docs-clean make: *** No rule to make target 'docs-clean'. Stop. ----------- Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-3-lizhijian@cn.fujitsu.com
2021-08-24selftests/bpf: Enlarge select() timeout for test_mapsLi Zhijian
0Day robot observed that it's easily timeout on a heavy load host. ------------------- # selftests: bpf: test_maps # Fork 1024 tasks to 'test_update_delete' # Fork 1024 tasks to 'test_update_delete' # Fork 100 tasks to 'test_hashmap' # Fork 100 tasks to 'test_hashmap_percpu' # Fork 100 tasks to 'test_hashmap_sizes' # Fork 100 tasks to 'test_hashmap_walk' # Fork 100 tasks to 'test_arraymap' # Fork 100 tasks to 'test_arraymap_percpu' # Failed sockmap unexpected timeout not ok 3 selftests: bpf: test_maps # exit=1 # selftests: bpf: test_lru_map # nr_cpus:8 ------------------- Since this test will be scheduled by 0Day to a random host that could have only a few cpus(2-8), enlarge the timeout to avoid a false NG report. In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps', and knew 10S is likely enough, but i still perfer to a larger value 30. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-2-lizhijian@cn.fujitsu.com