Age | Commit message (Collapse) | Author |
|
Fix white space and code style issues identified by checkpatch.
Signed-off-by: John Efstathiades <john.efstathiades@pebblebay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Juergen Gross says:
====================
xen: harden netfront against malicious backends
Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).
Unfortunately netfront in the Linux kernel is fully trusting its
backend. This series is fixing netfront in this regard.
It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.
It should be mentioned that a similar series has been posted some years
ago by Marek Marczykowski-Górecki, but this series has not been applied
due to a Xen header not having been available in the Xen git repo at
that time. Additionally my series is fixing some more DoS cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Today netfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.
Note that only the tx queue needs special id handling, as for the rx
queue the id is equal to the index in the ring page.
Introduce a new indicator for the device whether it is broken and let
the device stop working when it is set. Set this indicator in case the
backend sets any weird data.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tx_skb_freelist elements are in a single linked list with the
request id used as link reference. The per element link field is in a
union with the skb pointer of an in use request.
Move the link reference out of the union in order to enable a later
reuse of it for requests which need a populated skb pointer.
Rename add_id_to_freelist() and get_id_from_freelist() to
add_id_to_list() and get_id_from_list() in order to prepare using
those for other lists as well. Define ~0 as value to indicate the end
of a list and place that value into the link for a request not being
on the list.
When freeing a skb zero the skb pointer in the request. Use a NULL
value of the skb pointer instead of skb_entry_is_link() for deciding
whether a request has a skb linked to it.
Remove skb_entry_set_link() and open code it instead as it is really
trivial now.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to avoid a malicious backend being able to influence the local
processing of a request build the request locally first and then copy
it to the ring page. Any reading from the request influencing the
processing in the frontend needs to be done on the local instance.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
macb_ptp_desc will not return NULL under most circumstances with correct
Kconfig and IP design config register. But for the sake of the extreme
corner case, check for NULL when using the helper. In case of rx_tstamp,
no action is necessary except to return (similar to timestamp disabled)
and warn. In case of TX, return -EINVAL to let the skb be free. Perform
this check before marking skb in progress.
Fixes coverity warning:
(4) Event dereference:
Dereferencing a null pointer "desc_ptp"
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch enables automatic recovery by default in case of various
error condition like fw assert , hardware error etc.
This also ensure driver can handle multiple iteration of assertion
conditions.
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 2c896fb02e7f65299646f295a007bda043e0f382
"net: stmmac: dwmac-rk: add pd_gmac support for rk3399" and fixes
unbalanced pm_runtime_enable warnings.
In the commit to be reverted, support for power management was
introduced to the Rockchip glue code. Later, power management support
was introduced to the stmmac core code, resulting in multiple
invocations of pm_runtime_{enable,disable,get_sync,put_sync}.
The multiple invocations happen in rk_gmac_powerup and
stmmac_{dvr_probe, resume} as well as in rk_gmac_powerdown and
stmmac_{dvr_remove, suspend}, respectively, which are always called
in conjunction.
Fixes: 5ec55823438e850c91c6b92aec93fb04ebde29e2 ("net: stmmac: add clocks management for gmac driver")
Signed-off-by: Michael Riesch <michael.riesch@wolfvision.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.
The source of most of the slow down is the `dev_addr_lists.c` module,
which mainatins a linked list of HW addresses.
When using IPv6, this list grows for each IPv6 address added on a
VLAN, since each IPv6 address has a multicast HW address associated with
it.
When performing any modification to the involved links, this list is
traversed many times, often for nothing, all while holding the RTNL
lock.
Instead, this patch adds an auxilliary rbtree which cuts down
traversal time significantly.
Performance can be seen with the following script:
#!/bin/bash
ip netns del test || true 2>/dev/null
ip netns add test
echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null
set -e
ip -n test link add foo type veth peer name bar
ip -n test link add b1 type bond
ip -n test link add florp type vrf table 10
ip -n test link set bar master b1
ip -n test link set foo up
ip -n test link set bar up
ip -n test link set b1 up
ip -n test link set florp up
VLAN_COUNT=1500
BASE_DEV=b1
echo Creating vlans
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done"
echo Bringing them up
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test link set foo.\$i up; done"
echo Assiging IPv6 Addresses
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test address add dev foo.\$i 2000::\$i/64; done"
echo Attaching to VRF
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test link set foo.\$i master florp; done"
On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance
before the patch is (truncated):
Creating vlans
real 108.35
Bringing them up
real 4.96
Assiging IPv6 Addresses
real 19.22
Attaching to VRF
real 458.84
After the patch:
Creating vlans
real 5.59
Bringing them up
real 5.07
Assiging IPv6 Addresses
real 5.64
Attaching to VRF
real 25.37
Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Lu Wei <luwei32@huawei.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the function mpc5xxx_can_probe(), the variable 'data' has already
been determined in the above code, so the BUG_ON() in this place is
useless, remove it.
Link: https://lore.kernel.org/r/20210823141033.17876-1-tangbin@cmss.chinamobile.com
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
to simplify code
Retrieve OF match data, it's better and cleaner to use
'of_device_get_match_data' over 'of_match_device'.
Link: https://lore.kernel.org/r/20210823113338.3568-4-tangbin@cmss.chinamobile.com
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Fix redundant assignment of 'priv' to itself in
rcar_canfd_handle_channel_tx().
Fixes: 76e9353a80e9 ("can: rcar_canfd: Add support for RZ/G2L family")
Link: https://lore.kernel.org/r/20210820161449.18169-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
it's helpful for complie test in other platform(e.g.X86)
Link: https://lore.kernel.org/r/20210825062341.2332-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Without suspend/resume callbacks, the PHY cannot be powered down/up
administratively.
Fixes: e40d2cca0189 ("net: phy: add MediaTek Gigabit Ethernet PHY driver")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210823044422.164184-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
br_handle_ingress_vlan_tunnel() is only referenced in
br_handle_frame(). If br_handle_ingress_vlan_tunnel() is called and
return non-zero value, goto drop in br_handle_frame().
But, br_handle_ingress_vlan_tunnel() always return 0. So, the
routines that check the return value and goto drop has no meaning.
Therefore, change return type of br_handle_ingress_vlan_tunnel() to
void and remove if statement of br_handle_frame().
Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210823102118.17966-1-l4stpr0gr4m@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are several test cases in the net directory are still using
exit 0 or exit 1 when they need to be skipped. Use kselftest
framework skip code instead so it can help us to distinguish the
return status.
Criterion to filter out what should be fixed in net directory:
grep -r "exit [01]" -B1 | grep -i skip
This change might cause some false-positives if people are running
these test scripts directly and only checking their return codes,
which will change from 0 to 4. However I think the impact should be
small as most of our scripts here are already using this skip code.
And there will be no such issue if running them with the kselftest
framework.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20210823085854.40216-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kumar Kartikeya says:
====================
This set revamps XDP samples related to redirection to show better output and
implement missing features consolidating all their differences and giving them a
consistent look and feel, by implementing common features and command line
options. Some of the TODO items like reporting redirect error numbers
(ENETDOWN, EINVAL, ENOSPC, etc.) have also been implemented.
Some of the features are:
* Received packet statistics
* xdp_redirect/xdp_redirect_map tracepoint statistics
* xdp_redirect_err/xdp_redirect_map_err tracepoint statistics (with support for
showing exact errno)
* xdp_cpumap_enqueue/xdp_cpumap_kthread tracepoint statistics
* xdp_devmap_xmit tracepoint statistics
* xdp_exception tracepoint statistics
* Per ifindex pair devmap_xmit stats shown dynamically (for xdp_monitor) to
decompose the total.
* Use of BPF skeleton and BPF static linking to share BPF programs.
* Use of vmlinux.h and tp_btf for raw_tracepoint support.
* Removal of redundant -N/--native-mode option (enforced by default now)
* ... and massive cleanups all over the place.
All tracepoints also use raw_tp now, and tracepoints like xdp_redirect
are only enabled when requested explicitly to capture successful redirection
statistics.
The set of programs converted as part of this series are:
* xdp_redirect_cpu
* xdp_redirect_map_multi
* xdp_redirect_map
* xdp_redirect
* xdp_monitor
Explanation of the output:
There is now a concise output mode by default that shows primarily four fields:
rx/s Number of packets received per second
redir/s Number of packets successfully redirected per second
err,drop/s Aggregated count of errors per second (including dropped packets)
xmit/s Number of packets transmitted on the output device per second
Some examples:
; sudo ./xdp_redirect_map veth0 veth1 -s
Redirecting from veth0 (ifindex 15; driver veth) to veth1 (ifindex 14; driver veth)
veth0->veth1 0 rx/s 0 redir/s 0 err,drop/s 0 xmit/s
veth0->veth1 9,998,660 rx/s 9,998,658 redir/s 0 err,drop/s 9,998,654 xmit/s
...
There is also a verbose mode, that can also be enabled by default using -v (--verbose).
The output mode can be switched dynamically at runtime using Ctrl + \ (SIGQUIT).
To make the concise output more useful, the errors that occur are expanded inline
(as if verbose mode was enabled) to let the user pin down the source of the
problem without having to clutter output (or possibly miss it) or always use verbose mode.
For instance, let's consider a case where the output device link state is set to
down while redirection is happening:
[...]
veth0->veth1 24,503,376 rx/s 0 err,drop/s 24,503,372 xmit/s
veth0->veth1 25,044,775 rx/s 0 err,drop/s 25,044,783 xmit/s
veth0->veth1 25,263,046 rx/s 4 err,drop/s 25,263,028 xmit/s
redirect_err 4 error/s
ENETDOWN 4 error/s
[...]
The same holds for xdp_exception actions.
An example of how a complete xdp_redirect_map session would look:
; sudo ./xdp_redirect_map veth0 veth1
Redirecting from veth0 (ifindex 5; driver veth) to veth1 (ifindex 4; driver veth)
veth0->veth1 7,411,506 rx/s 0 err,drop/s 7,411,470 xmit/s
veth0->veth1 8,931,770 rx/s 0 err,drop/s 8,931,771 xmit/s
^\
veth0->veth1 8,787,295 rx/s 0 err,drop/s 8,787,325 xmit/s
receive total 8,787,295 pkt/s 0 drop/s 0 error/s
cpu:7 8,787,295 pkt/s 0 drop/s 0 error/s
redirect_err 0 error/s
xdp_exception 0 hit/s
xmit veth0->veth1 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
cpu:7 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
veth0->veth1 8,842,610 rx/s 0 err,drop/s 8,842,606 xmit/s
receive total 8,842,610 pkt/s 0 drop/s 0 error/s
cpu:7 8,842,610 pkt/s 0 drop/s 0 error/s
redirect_err 0 error/s
xdp_exception 0 hit/s
xmit veth0->veth1 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
cpu:7 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
^C
Packets received : 33,973,181
Average packets/s : 4,246,648
Packets transmitted : 33,973,172
Average transmit/s : 4,246,647
The xdp_redirect tracepoint (for success stats) needs to be enabled explicitly
using --stats/-s. Documentation for entire output and options is provided when
user specifies --help/-h with a sample.
Changelog:
----------
v3 -> v4:
v3: https://lore.kernel.org/bpf/20210728165552.435050-1-memxor@gmail.com
* Address all feedback from Daniel
* Use READ_ONCE/WRITE_ONCE from linux/compiler.h (cannot directly include
due to conflicts with vmlinux.h)
* Fix MAX_CPUS hardcoding by switching to mmapable array maps, that are
resized based on the value of libbpf_num_possible_cpus
* s/ELEMENTS_OF/ARRAY_SIZE/g
* Use tools/include/linux/hashtable.h
* Coding style fixes
* Remove hyperlinks for tracepoints
* Split into smaller reviewable changes
* Restore support for specifying custom xdp_redirect_cpu cpumap prog with some
enhancements, including built-in programs for common actions (pass, drop,
redirect). By default, cpumap prog is now disabled.
* Misc bug fixes all over the place
The printing stuff is a lot more basic without hyperlink support, hence it
has not been exported into a more general facility.
v2 -> v3
v2: https://lore.kernel.org/bpf/20210721212833.701342-1-memxor@gmail.com
* Address all feedback from Andrii
* Replace usage of libbpf hashmap (internal API) with custom one
* Rename ATOMIC_* macros to NO_TEAR_* to better reflect their use
* Use size_t as a portable word sized data type
* Set libbpf_set_strict_mode
* Invert conditions in BPF programs to exit early and reduce nesting
* Use canonical SEC("xdp") naming for all XDP BPF progams
* Add missing help description for cpumap enqueue and kthread tracepoints
* Move private struct declarations from xdp_sample_user.h to .c file
* Improve help output for cpumap enqueue and cpumap kthread tracepoints
* Fix a bug where keys array for BPF_MAP_LOOKUP_BATCH is overallocated
* Fix some conditions for printing stats (earlier only checked pps, now pps,
drop, err and print if any is greater than zero)
* Fix alloc_stats_record to properly return and cleanup allocated memory on
allocation failure instead of calling exit(3)
* Bump bpf_map_lookup_batch count to 32 to reduce lookup time with multiple
devices in map
* Fix a bug where devmap_xmit_multi stats are not printed when previous record
is missing (i.e. when the first time stats are printed), by simply using a
dummy record that is zeroed out
* Also print per-CPU counts for devmap_xmit_multi which we collect already
* Change mac_map to be BPF_MAP_TYPE_HASH instead of array to prevent resizing
to a large size when max_ifindex is high, in xdp_redirect_map_multi
* Fix instance of strerror(errno) in sample_install_xdp to use saved errno
* Provide a usage function from samples helper
* Provide a fix where incorrect stats are shown for parallel sessions of
xdp_redirect_* samples by introducing matching support for input device(s),
output device(s) and cpumap map id for enqueue and kthread stats.
Only xdp_monitor doesn't filter stats, all others do.
RFC (v1) -> v2
RFC (v1): https://lore.kernel.org/bpf/20210528235250.2635167-1-memxor@gmail.com
* Address all feedback from Andrii
* Use BPF static linking
* Use vmlinux.h
* Use BPF_PROG macro
* Use global variables instead of maps
* Use of tp_btf for raw_tracepoint progs
* Switch to timerfd for polling
* Use libbpf hashmap for maintaing device sets for per ifindex pair
devmap_xmit stats
* Fix Makefile to specify object dependencies properly
* Use in-tree bpftool
* ... misc fixes and cleanups all over the place
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper. Also adapt to change of type of mac address map, so that
no resizing is required.
Add a new flag for sample mask that skips priting the
from_device->to_device heading for each line, as xdp_redirect_map_multi
may have two devices but the flow of data may be bidirectional, so the
output would be confusing.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com
|
|
One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array
map to store mac addresses of devices, as the resizing behavior was
based on max_ifindex, which unecessarily maximized the capacity of map
beyond what was needed.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
Since get_mac_addr is already provided by XDP samples helper, we drop
it. Also convert to XDP samples helper similar to prior samples to
minimize duplication of code.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com
|
|
Also update it to use consistent SEC("xdp") and SEC("xdp_devmap")
naming, and use global variable instead of BPF map for copying the mac
address.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a
few minor omissions (e.g. redirect errno reporting). All of these have
been moved to XDP samples helper, hence drop the unneeded code and
convert to usage of helpers provided by it.
One of the important changes here is dropping of mprog-disable option,
as we make that the default. Also, we support built-in programs for some
common actions on the packet when it reaches kthread (pass, drop,
redirect to device). If the user still needs to install a custom
program, they can still supply a BPF object, however the program should
be suitably tagged with SEC("xdp_cpumap") annotation so that the
expected attach type is correct when updating our cpumap map element.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com
|
|
Similar to xdp_monitor_kern, a lot of these BPF programs have been
reimplemented properly consolidating missing features from other XDP
samples. Hence, drop the unneeded code and rename to .bpf.c suffix.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
One important note:
The XDP samples helper handles ownership of installed XDP programs on
devices, including responding to SIGINT and SIGTERM, so drop the code
here and use the helpers we provide going forward for all xdp_redirect*
conversions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
|
|
We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other
potential users, so drop it while moving code to the new file.
Also, consistently use SEC("xdp") naming instead.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to
the xdp_sample_user.o helper, so we remove the duplicate functions here
that are no longer needed.
Thanks to BPF skeleton, we no longer depend on order of tracepoints to
uninstall them on startup. Instead, the sample mask is used to install
the needed tracepoints.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
|
|
We already moved all the functionality it provided in XDP samples helper
userspace and kernel BPF object, so just delete the unneeded code.
We also add generation of BPF skeleton and compilation using clang
-target bpf for files ending with .bpf.c suffix (to denote that they use
vmlinux.h).
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-14-memxor@gmail.com
|
|
Also, take this opportunity to depend on in-tree bpftool, so that we can
use static linking support in subsequent commits for XDP samples BPF
helper object.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-13-memxor@gmail.com
|
|
This adds support for retrieval and printing for devmap_xmit total and
mutli mode tracepoint. For multi mode, we keep a hash map entry for each
redirection stream, such that we can dynamically add and remove entries
on output.
The from_match and to_match will be set by individual samples when
setting up the XDP program on these devices.
The multi mode tracepoint is also handy for xdp_redirect_map_multi,
where up to 32 devices can be specified.
Also add samples_init_pre_load macro to finally set up the resized maps
and mmap them in place for low overhead stats retrieval.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-12-memxor@gmail.com
|
|
This adds support for the devmap_xmit tracepoint, and its multi device
variant that can be used to obtain streams for each individual
net_device to net_device redirection. This is useful for decomposing
total xmit stats in xdp_monitor.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-11-memxor@gmail.com
|
|
This consolidates retrieval and printing into the XDP sample helper. For
the kthread stats, it expands xdp_stats separately with its own per-CPU
stats. For cpumap enqueue, we display FROM->TO stats also with its
per-CPU stats.
The help out explains in detail the various aspects of the output.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-10-memxor@gmail.com
|
|
These are invoked in two places, when the XDP frame or SKB (for generic
XDP) enqueued to the ptr_ring (cpumap_enqueue) and when kthread processes
the frame after invoking the CPUMAP program for it (returning stats for
the batch).
We use cpumap_map_id to filter on the map_id as a way to avoid printing
incorrect stats for parallel sessions of xdp_redirect_cpu.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-9-memxor@gmail.com
|
|
This implements the retrieval and printing, as well the help output.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-8-memxor@gmail.com
|
|
This would allow us to store stats for each XDP action, including their
per-CPU counts. Consolidating this here allows all redirect samples to
detect xdp_exception events.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-7-memxor@gmail.com
|
|
This implements per-errno reporting (for the ones we explicitly
recognize), adds some help output, and implements the stats retrieval
and printing functions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-6-memxor@gmail.com
|
|
This adds the shared BPF file that will be used going forward for
sharing tracepoint programs among XDP redirect samples.
Since vmlinux.h conflicts with tools/include for READ_ONCE/WRITE_ONCE
and ARRAY_SIZE, they are copied in to xdp_sample.bpf.h along with other
helpers that will be required.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-5-memxor@gmail.com
|
|
This file implements some common helpers to consolidate differences in
features and functionality between the various XDP samples and give them
a consistent look, feel, and reporting capabilities.
This commit only adds support for receive statistics, which does not
rely on any tracepoint, but on the XDP program installed on the device
by each XDP redirect sample.
Some of the key features are:
* A concise output format accompanied by helpful text explaining its
fields.
* An elaborate output format building upon the concise one, and folding
out details in case of errors and staying out of view otherwise.
* Printing driver names for devices redirecting packets.
* Getting mac address for interface.
* Printing summarized total statistics for the entire session.
* Ability to dynamically switch between concise and verbose mode, using
SIGQUIT (Ctrl + \).
In later patches, the support will be extended for each tracepoint with
its own custom output in concise and verbose mode.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-4-memxor@gmail.com
|
|
Instead of copying the whole header in, just add the struct definitions
we need for now. In the future it can be synced as a copy of in-tree
header if required.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-3-memxor@gmail.com
|
|
cookie_uid_helper_example.c: In function ‘main’:
cookie_uid_helper_example.c:178:69: warning: ‘ -j ACCEPT’ directive
writing 10 bytes into a region of size between 8 and 58
[-Wformat-overflow=]
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~
/home/kkd/src/linux/samples/bpf/cookie_uid_helper_example.c:178:9: note:
‘sprintf’ output between 53 and 103 bytes into a destination of size 100
178 | sprintf(rules, "iptables -A OUTPUT -m bpf --object-pinned %s -j ACCEPT",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179 | file);
| ~~~~~
Fix by using snprintf and a sufficiently sized buffer.
tracex4_user.c:35:15: warning: ‘write’ reading 12 bytes from a region of
size 11 [-Wstringop-overread]
35 | key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use size as 11.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-2-memxor@gmail.com
|
|
Fix a verifier bug found by smatch static checker in [0].
This problem has never been seen in prod to my best knowledge. Fixing it
still seems to be a good idea since it's hard to say for sure whether
it's possible or not to have a scenario where a combination of
convert_ctx_access() and a narrow load would lead to an out of bound
write.
When narrow load is handled, one or two new instructions are added to
insn_buf array, but before it was only checked that
cnt >= ARRAY_SIZE(insn_buf)
And it's safe to add a new instruction to insn_buf[cnt++] only once. The
second try will lead to out of bound write. And this is what can happen
if `shift` is set.
Fix it by making sure that if the BPF_RSH instruction has to be added in
addition to BPF_AND then there is enough space for two more instructions
in insn_buf.
The full report [0] is below:
kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array
kernel/bpf/verifier.c
12282
12283 insn->off = off & ~(size_default - 1);
12284 insn->code = BPF_LDX | BPF_MEM | size_code;
12285 }
12286
12287 target_size = 0;
12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog,
12289 &target_size);
12290 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bounds check.
12291 (ctx_field_size && !target_size)) {
12292 verbose(env, "bpf verifier is misconfigured\n");
12293 return -EINVAL;
12294 }
12295
12296 if (is_narrower_load && size < target_size) {
12297 u8 shift = bpf_ctx_narrow_access_offset(
12298 off, size, size_default) * 8;
12299 if (ctx_field_size <= 4) {
12300 if (shift)
12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
^^^^^
increment beyond end of array
12302 insn->dst_reg,
12303 shift);
--> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
^^^^^
out of bounds write
12305 (1 << size * 8) - 1);
12306 } else {
12307 if (shift)
12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
12309 insn->dst_reg,
12310 shift);
12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
^^^^^^^^^^^^^^^
Same.
12312 (1ULL << size * 8) - 1);
12313 }
12314 }
12315
12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
12317 if (!new_prog)
12318 return -ENOMEM;
12319
12320 delta += cnt - 1;
12321
12322 /* keep walking new program and skip insns we just inserted */
12323 env->prog = new_prog;
12324 insn = new_prog->insnsi + i + delta;
12325 }
12326
12327 return 0;
12328 }
[0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/
v1->v2:
- clarify that problem was only seen by static checker but not in prod;
Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com
|
|
Xu Liu says:
====================
We'd like to be able to identify netns from sk_msg hooks
to accelerate local process communication form different netns.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add test to use get_netns_cookie() from BPF_PROG_TYPE_SK_MSG.
Signed-off-by: Xu Liu <liuxu623@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820071712.52852-3-liuxu623@gmail.com
|
|
We'd like to be able to identify netns from sk_msg hooks
to accelerate local process communication form different netns.
Signed-off-by: Xu Liu <liuxu623@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820071712.52852-2-liuxu623@gmail.com
|
|
Li Zhijian says:
====================
Fix a few issues reported by 0Day/LKP during runing selftests/bpf.
Changelog:
V2:
- folded previous similar standalone patch to [1/5], and add acked tag
from Song Liu
- add acked tag to [2/5], [3/5] from Song Liu
- [4/5]: move test_bpftool.py to TEST_PROGS_EXTENDED, files in TEST_GEN_PROGS_EXTENDED
are generated by make. Otherwise, it will break out-of-tree install:
'make O=/kselftest-build SKIP_TARGETS= V=1 -C tools/testing/selftests install INSTALL_PATH=/kselftest-install'
- [5/5]: new patch
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This would happend when we run the tests after install kselftests
root@lkp-skl-d01 ~# /kselftests/run_kselftest.sh -t bpf:test_doc_build.sh
TAP version 13
1..1
# selftests: bpf: test_doc_build.sh
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_ADDRESS = "en_US.UTF-8",
LC_NAME = "en_US.UTF-8",
LC_MONETARY = "en_US.UTF-8",
LC_PAPER = "en_US.UTF-8",
LC_IDENTIFICATION = "en_US.UTF-8",
LC_TELEPHONE = "en_US.UTF-8",
LC_MEASUREMENT = "en_US.UTF-8",
LC_TIME = "en_US.UTF-8",
LC_NUMERIC = "en_US.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
# skip: bpftool files not found!
#
ok 1 selftests: bpf: test_doc_build.sh # SKIP
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820025549.28325-1-lizhijian@cn.fujitsu.com
|
|
test_bpftool.sh relies on bpftool and test_bpftool.py.
'make install' will install bpftool to INSTALL_PATH/bpf/bpftool, and
export it to PATH so that it can be used after installing.
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210820015556.23276-5-lizhijian@cn.fujitsu.com
|
|
For 'make run_tests':
selftests will build bpftool into tools/testing/selftests/bpf/tools/sbin/bpftool
by default.
==================
root@lkp-skl-d01 /opt/rootfs/v5.14-rc4# make -C tools/testing/selftests/bpf run_tests
make: Entering directory '/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf'
MKDIR include
MKDIR libbpf
MKDIR bpftool
[...]
GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/profiler.skel.h
CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/prog.o
GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pid_iter.skel.h
CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pids.o
LINK /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/bpftool
INSTALL bpftool
GEN vmlinux.h
[...]
# test_feature_dev_json (test_bpftool.TestBpftool) ... ERROR
# test_feature_kernel (test_bpftool.TestBpftool) ... ERROR
# test_feature_kernel_full (test_bpftool.TestBpftool) ... ERROR
# test_feature_kernel_full_vs_not_full (test_bpftool.TestBpftool) ... ERROR
# test_feature_macros (test_bpftool.TestBpftool) ... Error: bug: failed to retrieve CAP_BPF status: Invalid argument
# ERROR
#
# ======================================================================
# ERROR: test_feature_dev_json (test_bpftool.TestBpftool)
# ----------------------------------------------------------------------
# Traceback (most recent call last):
# File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 57, in wrapper
# return f(*args, iface, **kwargs)
# File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 82, in test_feature_dev_json
# res = bpftool_json(["feature", "probe", "dev", iface])
# File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 42, in bpftool_json
# res = _bpftool(args)
# File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 34, in _bpftool
# return subprocess.check_output(_args)
# File "/usr/lib/python3.7/subprocess.py", line 395, in check_output
# **kwargs).stdout
# File "/usr/lib/python3.7/subprocess.py", line 487, in run
# output=stdout, stderr=stderr)
# subprocess.CalledProcessError: Command '['bpftool', '-j', 'feature', 'probe', 'dev', 'dummy0']' returned non-zero exit status 255.
#
==================
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210820015556.23276-4-lizhijian@cn.fujitsu.com
|
|
Previously, it fails as below:
-------------
root@lkp-skl-d01 /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf# ./test_doc_build.sh
++ realpath --relative-to=/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf ./test_doc_build.sh
+ SCRIPT_REL_PATH=test_doc_build.sh
++ dirname test_doc_build.sh
+ SCRIPT_REL_DIR=.
++ realpath /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/./../../../../
+ KDIR_ROOT_DIR=/opt/rootfs/v5.14-rc4
+ cd /opt/rootfs/v5.14-rc4
+ for tgt in docs docs-clean
+ make -s -C /opt/rootfs/v5.14-rc4/. docs
make: *** No rule to make target 'docs'. Stop.
+ for tgt in docs docs-clean
+ make -s -C /opt/rootfs/v5.14-rc4/. docs-clean
make: *** No rule to make target 'docs-clean'. Stop.
-----------
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210820015556.23276-3-lizhijian@cn.fujitsu.com
|
|
0Day robot observed that it's easily timeout on a heavy load host.
-------------------
# selftests: bpf: test_maps
# Fork 1024 tasks to 'test_update_delete'
# Fork 1024 tasks to 'test_update_delete'
# Fork 100 tasks to 'test_hashmap'
# Fork 100 tasks to 'test_hashmap_percpu'
# Fork 100 tasks to 'test_hashmap_sizes'
# Fork 100 tasks to 'test_hashmap_walk'
# Fork 100 tasks to 'test_arraymap'
# Fork 100 tasks to 'test_arraymap_percpu'
# Failed sockmap unexpected timeout
not ok 3 selftests: bpf: test_maps # exit=1
# selftests: bpf: test_lru_map
# nr_cpus:8
-------------------
Since this test will be scheduled by 0Day to a random host that could have
only a few cpus(2-8), enlarge the timeout to avoid a false NG report.
In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps',
and knew 10S is likely enough, but i still perfer to a larger value 30.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210820015556.23276-2-lizhijian@cn.fujitsu.com
|