1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
|
.. SPDX-License-Identifier: GPL-2.0
===============
XDP RX Metadata
===============
This document describes how an eXpress Data Path (XDP) program can access
hardware metadata related to a packet using a set of helper functions,
and how it can pass that metadata on to other consumers.
General Design
==============
XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame.
Every device driver that wishes to expose additional packet metadata can
implement these kfuncs. The set of kfuncs is declared in ``include/net/xdp.h``
via ``XDP_METADATA_KFUNC_xxx``.
Currently, the following kfuncs are supported. In the future, as more
metadata is supported, this set will grow:
.. kernel-doc:: net/core/xdp.c
:identifiers: bpf_xdp_metadata_rx_timestamp
.. kernel-doc:: net/core/xdp.c
:identifiers: bpf_xdp_metadata_rx_hash
.. kernel-doc:: net/core/xdp.c
:identifiers: bpf_xdp_metadata_rx_vlan_tag
An XDP program can use these kfuncs to read the metadata into stack
variables for its own consumption. Or, to pass the metadata on to other
consumers, an XDP program can store it into the metadata area carried
ahead of the packet. Not all packets will necessary have the requested
metadata available in which case the driver returns ``-ENODATA``.
Not all kfuncs have to be implemented by the device driver; when not
implemented, the default ones that return ``-EOPNOTSUPP`` will be used
to indicate the device driver have not implemented this kfunc.
Within an XDP frame, the metadata layout (accessed via ``xdp_buff``) is
as follows::
+----------+-----------------+------+
| headroom | custom metadata | data |
+----------+-----------------+------+
^ ^
| |
xdp_buff->data_meta xdp_buff->data
An XDP program can store individual metadata items into this ``data_meta``
area in whichever format it chooses. Later consumers of the metadata
will have to agree on the format by some out of band contract (like for
the AF_XDP use case, see below).
AF_XDP
======
:doc:`af_xdp` use-case implies that there is a contract between the BPF
program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and
the final consumer. Thus the BPF program manually allocates a fixed number of
bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
of kfuncs to populate it. The userspace ``XSK`` consumer computes
``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata.
Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
``METADATA_SIZE`` is an application-specific constant (``AF_XDP`` receive
descriptor does _not_ explicitly carry the size of the metadata).
Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
+----------+-----------------+------+
| headroom | custom metadata | data |
+----------+-----------------+------+
^
|
rx_desc->address
XDP_PASS
========
This is the path where the packets processed by the XDP program are passed
into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
contents. Currently, every driver has custom kernel code to parse
the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
conversion, and the XDP metadata is not used by the kernel when building
``skbs``. However, TC-BPF programs can access the XDP metadata area using
the ``data_meta`` pointer.
In the future, we'd like to support a case where an XDP program
can override some of the metadata used for building ``skbs``.
bpf_redirect_map
================
``bpf_redirect_map`` can redirect the frame to a different device.
Some devices (like virtual ethernet links) support running a second XDP
program after the redirect. However, the final consumer doesn't have
access to the original hardware descriptor and can't access any of
the original metadata. The same applies to XDP programs installed
into devmaps and cpumaps.
This means that for redirected packets only custom metadata is
currently supported, which has to be prepared by the initial XDP program
before redirect. If the frame is eventually passed to the kernel, the
``skb`` created from such a frame won't have any hardware metadata populated
in its ``skb``. If such a packet is later redirected into an ``XSK``,
that will also only have access to the custom metadata.
bpf_tail_call
=============
Adding programs that access metadata kfuncs to the ``BPF_MAP_TYPE_PROG_ARRAY``
is currently not supported.
Supported Devices
=================
It is possible to query which kfunc the particular netdev implements via
netlink. See ``xdp-rx-metadata-features`` attribute set in
``Documentation/netlink/specs/netdev.yaml``.
Driver Implementation
=====================
Certain devices may prepend metadata to received packets. However, as of now,
``AF_XDP`` lacks the ability to communicate the size of the ``data_meta`` area
to the consumer. Therefore, it is the responsibility of the driver to copy any
device-reserved metadata out from the metadata area and ensure that
``xdp_buff->data_meta`` is pointing to ``xdp_buff->data`` before presenting the
frame to the XDP program. This is necessary so that, after the XDP program
adjusts the metadata area, the consumer can reliably retrieve the metadata
address using ``METADATA_SIZE`` offset.
The following diagram shows how custom metadata is positioned relative to the
packet data and how pointers are adjusted for metadata access::
|<-- bpf_xdp_adjust_meta(xdp_buff, -METADATA_SIZE) --|
new xdp_buff->data_meta old xdp_buff->data_meta
| |
| xdp_buff->data
| |
+----------+----------------------------------------------------+------+
| headroom | custom metadata | data |
+----------+----------------------------------------------------+------+
| |
| xdp_desc->addr
|<------ xsk_umem__get_data() - METADATA_SIZE -------|
``bpf_xdp_adjust_meta`` ensures that ``METADATA_SIZE`` is aligned to 4 bytes,
does not exceed 252 bytes, and leaves sufficient space for building the
xdp_frame. If these conditions are not met, it returns a negative error. In this
case, the BPF program should not proceed to populate data into the ``data_meta``
area.
Example
=======
See ``tools/testing/selftests/bpf/progs/xdp_metadata.c`` and
``tools/testing/selftests/bpf/prog_tests/xdp_metadata.c`` for an example of
BPF program that handles XDP metadata.
|