Age | Commit message (Collapse) | Author |
|
All map redirect functions except XSK maps convert xdp_buff to xdp_frame
before enqueueing it. So move this conversion of out the map functions
and into xdp_do_redirect(). This removes a bit of duplicated code, but more
importantly it makes it possible to support caller-allocated xdp_frame
structures, which will be added in a subsequent commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103150812.87914-5-toke@redhat.com
|
|
Store the XDP mem ID inside the page_pool struct so it can be retrieved
later for use in bpf_prog_run().
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220103150812.87914-4-toke@redhat.com
|
|
Add a new callback function to page_pool that, if set, will be called every
time a new page is allocated. This will be used from bpf_test_run() to
initialise the page data with the data provided by userspace when running
XDP programs with redirect turned on.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20220103150812.87914-3-toke@redhat.com
|
|
The functions that register an XDP memory model take a struct xdp_rxq as
parameter, but the RXQ is not actually used for anything other than pulling
out the struct xdp_mem_info that it embeds. So refactor the register
functions and export variants that just take a pointer to the xdp_mem_info.
This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a
page_pool instance that is not connected to any network device.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com
|
|
Ong Boon says:
====================
First of all, sorry for taking more time to get back to this series and
thanks to all valuble feedback in series-1 at [1] from Jesper and Song
Liu.
Since then I have looked into what Jesper suggested in [2] and worked on
revising the patch series into several patches for ease of review:
v1->v2:
1/7: [No change]. Add VLAN tag (ID & Priority) to the generated Tx-Only
frames.
2/7: [No change]. Add DMAC and SMAC setting to the generated Tx-Only
frames. If parameters are not set, previous DMAC and SMAC are used.
3/7: [New]. Add support for selecting different CLOCK for clock_gettime()
used in get_nsecs.
4/7: [New]. This is a total rework from series-1 3/4-patch [3]. It uses
clock_nanosleep() suggested by Jesper. In addition, added statistic
for Tx schedule variance under application stat (-a|--app-stats).
Make the cyclic Tx operation and --poll mode to be mutually-
exclusive. Still, the ability to specify TX cycle time and used
together with batch size and packet count remain the same.
5/7: [New]. Add the support for TX process schedule policy and priority
setting. By default, SCHED_OTHER policy is used. This too is matching
the schedule policy setting in [2].
6/7: [Change]. This is update from series-1 4/4-patch [4]. Added TX clean
process time-out in 1s granularity with configurable retries count
(-O|--retries).
7/7: [New]. Added timestamp for TX packet following pktgen_hdr format
matching the implementation in [2]. However, the sequence ID remains
the same as it is instead of process schedule diff in [2].
To summarize on what program options have been added with v2 series
using an example below:-
DMAC (-G) = fa:8d:f1:e2:0b:e8
SMAC (-H) = ce:17:07:17:3e:3a
VLAN tagged (-V)
VLAN ID (-J) = 12
VLAN Pri (-K) = 3
Tx Queue (-q) = 3
Cycle Time in us (-T) = 1000
Batch (-b) = 2
Packet Count = 6
Tx schedule policy (-W) = FIFO
Tx schedule priority (-U) = 50
Clock selection (-w) = REALTIME
Tx timeout retries(-O) = 5
Tx timestamp (-y)
Cyclic Tx schedule stat (-a)
Note: xdpsock sets UDP dest-port and src-port to 0x1000 as default.
Sending Board
=============
$ xdpsock -i eth0 -t -N -z -H ce:17:07:17:3e:3a -G fa:8d:f1:e2:0b:e8 \
-V -J 12 -K 3 -q 3 \
-T 1000 -b 2 -C 6 -W FIFO -U 50 -w REALTIME \
-O 5 -y -a
sock0@eth0:3 txonly xdp-drv
pps pkts 0.00
rx 0 0
tx 0 6
calls/s count
rx empty polls 0 0
fill fail polls 0 0
copy tx sendtos 0 0
tx wakeup sendtos 0 5
opt polls 0 0
period min ave max cycle
Cyclic TX 1000000 31033 32009 33397 3
Receiving Board
===============
$ tcpdump -nei eth0 udp port 0x1000 -vv -Q in -X \
--time-stamp-precision nano
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
03:46:40.520111580 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~....
0x0010: 0a0a 0a20 1000 1000 0018 e997 be9b e955 ...............U
0x0020: 0000 0000 61cd 2ba1 0006 987c ....a.+....|
03:46:40.520112163 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~....
0x0010: 0a0a 0a20 1000 1000 0018 e996 be9b e955 ...............U
0x0020: 0000 0001 61cd 2ba1 0006 987c ....a.+....|
03:46:40.521066860 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~....
0x0010: 0a0a 0a20 1000 1000 0018 e5af be9b e955 ...............U
0x0020: 0000 0002 61cd 2ba1 0006 9c62 ....a.+....b
03:46:40.521067012 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~....
0x0010: 0a0a 0a20 1000 1000 0018 e5ae be9b e955 ...............U
0x0020: 0000 0003 61cd 2ba1 0006 9c62 ....a.+....b
03:46:40.522061935 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~....
0x0010: 0a0a 0a20 1000 1000 0018 e1c5 be9b e955 ...............U
0x0020: 0000 0004 61cd 2ba1 0006 a04a ....a.+....J
03:46:40.522062173 ce:17:07:17:3e:3a > fa:8d:f1:e2:0b:e8, ethertype 802.1Q (0x8100), length 62: vlan 12, p 3, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 44)
10.10.10.16.4096 > 10.10.10.32.4096: [udp sum ok] UDP, length 16
0x0000: 4500 002c 0000 0000 4011 527e 0a0a 0a10 E..,....@.R~....
0x0010: 0a0a 0a20 1000 1000 0018 e1c4 be9b e955 ...............U
0x0020: 0000 0005 61cd 2ba1 0006 a04a ....a.+....J
I have tested the above with both tagged and untagged packet format and
based on the timestamp in tcpdump found that the timing of the batch
cyclic transmission is correct.
Appreciate if community can give the patch series v2 a try and point out
any gap.
Thanks
Boon Leong
[1] https://patchwork.kernel.org/project/netdevbpf/cover/20211124091821.3916046-1-boon.leong.ong@intel.com/
[2] https://github.com/netoptimizer/network-testing/blob/master/src/udp_pacer.c
[3] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-4-boon.leong.ong@intel.com/
[4] https://patchwork.kernel.org/project/netdevbpf/patch/20211124091821.3916046-5-boon.leong.ong@intel.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
It may be useful to add timestamp for Tx packets for continuous or cyclic
transmit operation. The timestamp and sequence ID of a Tx packet are
stored according to pktgen header format. To enable per-packet timestamp,
use -y|--tstamp option. If timestamp is off, pktgen header is not
included in the UDP payload. This means receiving side can use the magic
number for pktgen for differentiation.
The implementation supports both VLAN tagged and untagged option. By
default, the minimum packet size is set at 64B. However, if VLAN tagged
is on (-V), the minimum packet size is increased to 66B just so to fit
the pktgen_hdr size.
Added hex_dump() into the code path just for future cross-checking.
As before, simply change to "#define DEBUG_HEXDUMP 1" to inspect the
accuracy of TX packet.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-8-boon.leong.ong@intel.com
|
|
When user sets tx-pkt-count and in case where there are invalid Tx frame,
the complete_tx_only_all() process polls indefinitely. So, this patch
adds a time-out mechanism into the process so that the application
can terminate automatically after it retries 3*polling interval duration.
v1->v2:
Thanks to Jesper's and Song Liu's suggestion.
- clean-up git message to remove polling log
- make the Tx time-out retries configurable with 1s granularity
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-7-boon.leong.ong@intel.com
|
|
By default, TX schedule policy is SCHED_OTHER (round-robin time-sharing).
To improve TX cyclic scheduling, we add SCHED_FIFO policy and its priority
by using -W FIFO or --policy=FIFO and -U <PRIO> or --schpri=<PRIO>.
A) From xdpsock --app-stats, for SCHED_OTHER policy:
$ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a
period min ave max cycle
Cyclic TX 1000000 53507 75334 712642 6250
B) For SCHED_FIFO policy and schpri=50:
$ xdpsock -i eth0 -t -N -z -T 1000 -b 16 -C 100000 -a -W FIFO -U 50
period min ave max cycle
Cyclic TX 1000000 3699 24859 54397 6250
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-6-boon.leong.ong@intel.com
|
|
Tx cycle time is in micro-seconds unit. By combining the batch size (-b M)
and Tx cycle time (-T|--tx-cycle N), xdpsock now can transmit batch-size of
packets every N-us periodically. Cyclic TX operation is not applicable if
--poll mode is used.
To transmit 16 packets every 1ms cycle time for total of 100000 packets
silently:
$ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000
To print cyclic TX schedule variance stats, use --app-stats|-a:
$ xdpsock -i eth0 -T -N -z -T 1000 -b 16 -C 100000 -a
sock0@eth0:0 txonly xdp-drv
pps pkts 0.00
rx 0 0
tx 0 100000
calls/s count
rx empty polls 0 0
fill fail polls 0 0
copy tx sendtos 0 0
tx wakeup sendtos 0 6254
opt polls 0 0
period min ave max cycle
Cyclic TX 1000000 53507 75334 712642 6250
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-5-boon.leong.ong@intel.com
|
|
User specifies the clock selection by using -w CLOCK or --clock=CLOCK
where CLOCK=[REALTIME, TAI, BOOTTIME, MONOTONIC].
The default CLOCK selection is MONOTONIC.
The implementation of clock selection parsing is borrowed from
iproute2/tc/q_taprio.c
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211230035447.523177-4-boon.leong.ong@intel.com
|
|
To set Dest MAC address (-G|--tx-dmac) only:
$ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff
To set Source MAC address (-H|--tx-smac) only:
$ xdpsock -i eth0 -t -N -z -H 11:22:33:44:55:66
To set both Dest and Source MAC address:
$ xdpsock -i eth0 -t -N -z -G aa:bb:cc:dd:ee:ff \
-H 11:22:33:44:55:66
The default Dest and Source MAC address remain the same as before.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20211230035447.523177-3-boon.leong.ong@intel.com
|
|
In multi-queue environment testing, the support for VLAN-tag based
steering is useful. So, this patch adds the capability to add
VLAN tag (VLAN ID and Priority) to the generated Tx frame.
To set the VLAN ID=10 and Priority=2 for Tx only through TxQ=3:
$ xdpsock -i eth0 -t -N -z -q 3 -V -J 10 -K 2
If VLAN ID (-J) and Priority (-K) is set, it default to
VLAN ID = 1
VLAN Priority = 0.
For example, VLAN-tagged Tx only, xdp copy mode through TxQ=1:
$ xdpsock -i eth0 -t -N -c -q 1 -V
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211230035447.523177-2-boon.leong.ong@intel.com
|
|
Aleksander Jan Bajkowski says:
====================
net: lantiq_xrx200: improve ethernet performance
This patchset improves Ethernet performance by 15%.
NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500):
Down Up
Before 539 Mbps 599 Mbps
After 624 Mbps 695 Mbps
====================
Link: https://lore.kernel.org/r/20220104151144.181736-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can increase the efficiency of rx path by using buffers to receive
packets then build SKBs around them just before passing into the network
stack. In contrast, preallocating SKBs too early reduces CPU cache
efficiency.
NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500):
Down Up
Before 577 Mbps 648 Mbps
After 624 Mbps 695 Mbps
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500):
Down Up
Before 545 Mbps 625 Mbps
After 577 Mbps 648 Mbps
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NAT Performance results on BT Home Hub 5A (kernel 5.10.89, mtu 1500):
Down Up
Before 539 Mbps 599 Mbps
After 545 Mbps 625 Mbps
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the -L option of the testptp utility is specified with other
options (e.g. -p to enable PPS output), the user probably wants to
apply it to the pin configured by the -L option.
Reorder the code to set the pin function before other function requests
to avoid confusing users.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20220105152506.3256026-1-mlichvar@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
API created with simplistic assumptions about BPF map definitions.
It hasn’t worked for a while, deprecate it in preparation for
libbpf 1.0.
[0] Closes: https://github.com/libbpf/libbpf/issues/302
Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220105003120.2222673-1-christylee@fb.com
|
|
Deprecate bpf_map__is_offload_neutral(). It’s most probably broken
already. PERF_EVENT_ARRAY isn’t the only map that’s not suitable
for hardware offloading. Applications can directly check map type
instead.
[0] Closes: https://github.com/libbpf/libbpf/issues/306
Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220105000601.2090044-1-christylee@fb.com
|
|
If repeated legacy kprobes on same function in one process,
libbpf will register using the same probe name and got -EBUSY
error. So append index to the probe name format to fix this
problem.
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211227130713.66933-2-wangqiang.wq.frank@bytedance.com
|
|
Fix a bug in commit 46ed5fc33db9, which wrongly used the
func_name instead of probe_name to register legacy kprobe.
Fixes: 46ed5fc33db9 ("libbpf: Refactor and simplify legacy kprobe code")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Qiang Wang <wangqiang.wq.frank@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20211227130713.66933-1-wangqiang.wq.frank@bytedance.com
|
|
With perf_buffer__poll() and perf_buffer__consume() APIs available,
there is no reason to expose bpf_perf_event_read_simple() API to
users. If users need custom perf buffer, they could re-implement
the function.
Mark bpf_perf_event_read_simple() and move the logic to a new
static function so it can still be called by other functions in the
same file.
[0] Closes: https://github.com/libbpf/libbpf/issues/310
Signed-off-by: Christy Lee <christylee@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211229204156.13569-1-christylee@fb.com
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch exposes SO_RCVBUF/SO_SNDBUF through bpf_getsockopt().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220104013153.97906-3-kuniyu@amazon.co.jp
|
|
The commit 4057765f2dee ("sock: consistent handling of extreme
SO_SNDBUF/SO_RCVBUF values") added a change to prevent underflow
in setsockopt() around SO_SNDBUF/SO_RCVBUF.
This patch adds the same change to _bpf_setsockopt().
Fixes: 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220104013153.97906-2-kuniyu@amazon.co.jp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski"
"Networking fixes, including fixes from bpf, and WiFi. One last pull
request, turns out some of the recent fixes did more harm than good.
Current release - regressions:
- Revert "xsk: Do not sleep in poll() when need_wakeup set", made the
problem worse
- Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in
__fixed_phy_register", broke EPROBE_DEFER handling
- Revert "net: usb: r8152: Add MAC pass-through support for more
Lenovo Docks", broke setups without a Lenovo dock
Current release - new code bugs:
- selftests: set amt.sh executable
Previous releases - regressions:
- batman-adv: mcast: don't send link-local multicast to mcast routers
Previous releases - always broken:
- ipv4/ipv6: check attribute length for RTA_FLOW / RTA_GATEWAY
- sctp: hold endpoint before calling cb in
sctp_transport_lookup_process
- mac80211: mesh: embed mesh_paths and mpp_paths into
ieee80211_if_mesh to avoid complicated handling of sub-object
allocation failures
- seg6: fix traceroute in the presence of SRv6
- tipc: fix a kernel-infoleak in __tipc_sendmsg()"
* tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
selftests: set amt.sh executable
Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"
sfc: The RX page_ring is optional
iavf: Fix limit of total number of queues to active queues of VF
i40e: Fix incorrect netdev's real number of RX/TX queues
i40e: Fix for displaying message regarding NVM version
i40e: fix use-after-free in i40e_sync_filters_subtask()
i40e: Fix to not show opcode msg on unsuccessful VF MAC change
ieee802154: atusb: fix uninit value in atusb_set_extended_addr
mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh
mac80211: initialize variable have_higher_than_11mbit
sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc
netrom: fix copying in user data in nr_setsockopt
udp6: Use Segment Routing Header for dest address if present
icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
seg6: export get_srh() for ICMP handling
Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register"
ipv6: Do cleanup if attribute validation fails in multipath route
ipv6: Continue processing multipath route even if gateway attribute is invalid
net/fsl: Remove leftover definition in xgmac_mdio
...
|
|
Commit bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.")
added support for BPF_FUNC_timer_set_callback to
the __check_func_call() function. The test in __check_func_call() is
flaweed because it can mis-interpret a regular BPF-to-BPF pseudo-call
as a BPF_FUNC_timer_set_callback callback call.
Consider the conditional in the code:
if (insn->code == (BPF_JMP | BPF_CALL) &&
insn->imm == BPF_FUNC_timer_set_callback) {
The BPF_FUNC_timer_set_callback has value 170. This means that if you
have a BPF program that contains a pseudo-call with an instruction delta
of 170, this conditional will be found to be true by the verifier, and
it will interpret the pseudo-call as a callback. This leads to a mess
with the verification of the program because it makes the wrong
assumptions about the nature of this call.
Solution: include an explicit check to ensure that insn->src_reg == 0.
This ensures that calls cannot be mis-interpreted as an async callback
call.
Fixes: bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.")
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220105210150.GH1559@oracle.com
|
|
Add a description for all the modifiers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-7-hch@lst.de
|
|
Add pseudo-code to document all the different BPF_JMP / BPF_JMP64
opcodes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-6-hch@lst.de
|
|
Add pseudo-code to document all the different BPF_ALU / BPF_ALU64
opcodes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-5-hch@lst.de
|
|
Add a description for each opcode class.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-4-hch@lst.de
|
|
Add a little more stucture to the ALU/JMP documentation with sections and
improve the example text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-3-hch@lst.de
|
|
The eBPF instruction set document does not currently document the basic
instruction encoding. Add a section to do that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220103183556.41040-2-hch@lst.de
|
|
Add a new test case with mem_or_null typed register with off > 0 to ensure
it gets rejected by the verifier:
# ./test_verifier 1011
#1009/u check with invalid reg offset 0 OK
#1009/p check with invalid reg offset 0 OK
Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If we ever get to a point again where we convert a bogus looking <ptr>_or_null
typed register containing a non-zero fixed or variable offset, then lets not
reset these bounds to zero since they are not and also don't promote the register
to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown
register could be an avenue as well, but then if we run into this case it would
allow to leak a kernel pointer this way.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
sock_map_link() is called to update a sockmap entry with a sk. But, if the
sock_map_init_proto() call fails then we return an error to the map_update
op against the sockmap. In the error path though we need to cleanup psock
and dec the refcnt on any programs associated with the map, because we
refcnt them early in the update process to ensure they are pinned for the
psock. (This avoids a race where user deletes programs while also updating
the map with new socks.)
In current code we do the prog refcnt dec explicitely by calling
bpf_prog_put() when the program was found in the map. But, after commit
'38207a5e81230' in this error path we've already done the prog to psock
assignment so the programs have a reference from the psock as well. This
then causes the psock tear down logic, invoked by sk_psock_put() in the
error path, to similarly call bpf_prog_put on the programs there.
To be explicit this logic does the prog->psock assignment:
if (msg_*)
psock_set_prog(...)
Then the error path under the out_progs label does a similar check and
dec with:
if (msg_*)
bpf_prog_put(...)
And the teardown logic sk_psock_put() does ...
psock_set_prog(msg_*, NULL)
... triggering another bpf_prog_put(...). Then KASAN gives us this splat,
found by syzbot because we've created an inbalance between bpf_prog_inc and
bpf_prog_put calling put twice on the program.
BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline]
BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829
BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829
Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641
To fix clean up error path so it doesn't try to do the bpf_prog_put in the
error path once progs are assigned then it relies on the normal psock
tear down logic to do complete cleanup.
For completness we also cover the case whereh sk_psock_init_strp() fails,
but this is not expected because it indicates an incorrect socket type
and should be caught earlier.
Fixes: 38207a5e8123 ("bpf, sockmap: Attach map progs to psock early for feature probes")
Reported-by: syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com
|
|
Applications can be confused slightly because we do not always return the
same error code as expected, e.g. what the TCP stack normally returns. For
example on a sock err sk->sk_err instead of returning the sock_error we
return EAGAIN. This usually means the application will 'try again'
instead of aborting immediately. Another example, when a shutdown event
is received we should immediately abort instead of waiting for data when
the user provides a timeout.
These tend to not be fatal, applications usually recover, but introduces
bogus errors to the user or introduces unexpected latency. Before
'c5d2177a72a16' we fell back to the TCP stack when no data was available
so we managed to catch many of the cases here, although with the extra
latency cost of calling tcp_msg_wait_data() first.
To fix lets duplicate the error handling in TCP stack into tcp_bpf so
that we get the same error codes.
These were found in our CI tests that run applications against sockmap
and do longer lived testing, at least compared to test_sockmap that
does short-lived ping/pong tests, and in some of our test clusters
we deploy.
Its non-trivial to do these in a shorter form CI tests that would be
appropriate for BPF selftests, but we are looking into it so we can
ensure this keeps working going forward. As a preview one idea is to
pull in the packetdrill testing which catches some of this.
Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220104205918.286416-1-john.fastabend@gmail.com
|
|
The following error is reported when running "./test_progs -t for_each"
under arm64:
bpf_jit: multi-func JIT bug 58 != 56
[...]
JIT doesn't support bpf-to-bpf calls
The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:
0x48: 21 00 C0 D2 movz x1, #0x1, lsl #32
0x4c: 21 00 80 F2 movk x1, #0x1
0x48: E1 3F C0 92 movn x1, #0x1ff, lsl #32
0x4c: 41 FE A2 F2 movk x1, #0x17f2, lsl #16
0x50: 81 70 9F F2 movk x1, #0xfb84
Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
the size of jited image will not change.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211231151018.3781550-1-houtao1@huawei.com
|
|
The four RGMII interface modes take care of the required RGMII delay
configuration at the PHY and should not be limited by the network MAC
driver. Sadly, gemini was only permitting RGMII mode with no delays,
which would require the required delay to be inserted via PCB tracking
or by the MAC.
However, there are designs that require the PHY to add the delay, which
is impossible without Gemini permitting the other three PHY interface
modes. Fix the driver to allow these.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Link: https://lore.kernel.org/r/E1n4mpT-002PLd-Ha@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
Fix RGMII delays for 88E1118
This series fixes the RGMII delays for 88E1118 Marvell PHYs, after
a report by Corentin Labbe that the Marvell driver fails to work.
Patch 1 cleans up the paged register accesses in m88e1118_config_init()
and patch 2 adds the RGMII delay configuration.
This comes with an element of risk as existing DT may need to be fixed
for this in a similar way as we have done in the recent past for other
PHY drivers that have misinterpreted the RGMII interface modes.
====================
Link: https://lore.kernel.org/r/YdR3wYFkm4eJApwb@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Corentin Labbe reports that the SSI 1328 does not work when allowing
the PHY to operate at gigabit speeds, but does work with the generic
PHY driver.
This appears to be because m88e1118_config_init() writes a fixed value
to the MSCR register, claiming that this is to enable 1G speeds.
However, this always sets bits 4 and 5, enabling RGMII transmit and
receive delays. The suspicion is that the original board this was
added for required the delays to make 1G speeds work.
Add the necessary configuration for RGMII delays for the 88E1118 to
bring this into line with the requirements for RGMII support, and thus
make the SSI 1328 work.
Corentin Labbe has tested this on gemini-ssi1328 and gemini-ns2502.
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use phy_write_paged() in m88e1118_config_init() to set the MSCR value.
We leave the other paged write for the LEDs in case the DT register
parsing is relying on this page.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
amt.sh test script will not work because it doesn't have execution
permission. So, it adds execution permission.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20220105144436.13415-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit f77b83b5bbab53d2be339184838b19ed2c62c0a5.
This change breaks multiple usb to ethernet dongles attached on Lenovo
USB hub.
Fixes: f77b83b5bbab ("net: usb: r8152: Add MAC passthrough support for more Lenovo Docks")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Link: https://lore.kernel.org/r/20220105155102.8557-1-aaron.ma@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As reported by Johannes, the tracker allocated in
ethnl_default_notify() is not really needed, as this
function is not expected to change a device reference count.
Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20220105170849.2610470-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
qdisc_create() error path needs to use dev_put_track()
because qdisc_alloc() allocated the tracker.
Fixes: 606509f27f67 ("net/sched: add net device refcount tracker to struct Qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220104170439.3790052-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Here are two last fixes for this release cycle from the GPIO
subsystem:
- fix irq offset calculation in gpio-aspeed-sgpio
- update the MAINTAINERS entry for gpio-brcmstb"
* tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: update gpio-brcmstb maintainers
gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2022-01-05
Below I have a last minute fix for the atusb driver.
Pavel fixes a KASAN uninit report for the driver. This version is the
minimal impact fix to ease backporting. A bigger rework of the driver to
avoid potential similar problems is ongoing and will come through net-next
when ready.
* tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
ieee802154: atusb: fix uninit value in atusb_set_extended_addr
====================
Link: https://lore.kernel.org/r/20220105153914.512305-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean says:
====================
DSA cross-chip notifier cleanup
This series deletes the no-op cross-chip notifier support for MRP and
HSR, features which were introduced relatively recently and did not get
full review at the time. The new code is functionally equivalent, but
simpler.
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Cc: George McCollister <george.mccollister@gmail.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The cross-chip notifiers for HSR are bypass operations, meaning that
even though all switches in a tree are notified, only the switch
specified in the info structure is targeted.
We can eliminate the unnecessary complexity by deleting the cross-chip
notifier logic and calling the ds->ops straight from port.c.
Cc: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|