Age | Commit message (Collapse) | Author |
|
Pass all individual BPF object files (generated from progs/*.c) through
`bpftool gen object` command to validate that BPF static linker doesn't
corrupt them.
As an additional sanity checks, validate that passing resulting object files
through linker again results in identical ELF files. Exact same ELF contents
can be guaranteed only after two passes, as after the first pass ELF sections
order changes, and thus .BTF.ext data sections order changes. That, in turn,
means that strings are added into the final BTF string sections in different
order, so .BTF strings data might not be exactly the same. But doing another
round of linking afterwards should result in the identical ELF file, which is
checked with additional `diff` command.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-12-andrii@kernel.org
|
|
Trigger vmlinux.h and BPF skeletons re-generation if detected that bpftool was
re-compiled. Otherwise full `make clean` is required to get updated skeletons,
if bpftool is modified.
Fixes: acbd06206bbb ("selftests/bpf: Add vmlinux.h selftest exercising tracing of syscalls")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-11-andrii@kernel.org
|
|
Add `bpftool gen object <output-file> <input_file>...` command to statically
link multiple BPF ELF object files into a single output BPF ELF object file.
This patch also updates bash completions and man page. Man page gets a short
section on `gen object` command, but also updates the skeleton example to show
off workflow for BPF application with two .bpf.c files, compiled individually
with Clang, then resulting object files are linked together with `gen object`,
and then final object file is used to generate usable BPF skeleton. This
should help new users understand realistic workflow w.r.t. compiling
mutli-file BPF application.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-10-andrii@kernel.org
|
|
Add optional name OBJECT_NAME parameter to `gen skeleton` command to override
default object name, normally derived from input file name. This allows much
more flexibility during build time.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-9-andrii@kernel.org
|
|
Add .BTF and .BTF.ext static linking logic.
When multiple BPF object files are linked together, their respective .BTF and
.BTF.ext sections are merged together. BTF types are not just concatenated,
but also deduplicated. .BTF.ext data is grouped by type (func info, line info,
core_relos) and target section names, and then all the records are
concatenated together, preserving their relative order. All the BTF type ID
references and string offsets are updated as necessary, to take into account
possibly deduplicated strings and types.
BTF DATASEC types are handled specially. Their respective var_secinfos are
accumulated separately in special per-section data and then final DATASEC
types are emitted at the very end during bpf_linker__finalize() operation,
just before emitting final ELF output file.
BTF data can also provide "section annotations" for some extern variables.
Such concept is missing in ELF, but BTF will have DATASEC types for such
special extern datasections (e.g., .kconfig, .ksyms). Such sections are called
"ephemeral" internally. Internally linker will keep metadata for each such
section, collecting variables information, but those sections won't be emitted
into the final ELF file.
Also, given LLVM/Clang during compilation emits BTF DATASECS that are
incomplete, missing section size and variable offsets for static variables,
BPF static linker will initially fix up such DATASECs, using ELF symbols data.
The final DATASECs will preserve section sizes and all variable offsets. This
is handled correctly by libbpf already, so won't cause any new issues. On the
other hand, it's actually a nice property to have a complete BTF data without
runtime adjustments done during bpf_object__open() by libbpf. In that sense,
BPF static linker is also a BTF normalizer.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-8-andrii@kernel.org
|
|
Introduce BPF static linker APIs to libbpf. BPF static linker allows to
perform static linking of multiple BPF object files into a single combined
resulting object file, preserving all the BPF programs, maps, global
variables, etc.
Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are
concatenated together. Similarly, code sections are also concatenated. All the
symbols and ELF relocations are also concatenated in their respective ELF
sections and are adjusted accordingly to the new object file layout.
Static variables and functions are handled correctly as well, adjusting BPF
instructions offsets to reflect new variable/function offset within the
combined ELF section. Such relocations are referencing STT_SECTION symbols and
that stays intact.
Data sections in different files can have different alignment requirements, so
that is taken care of as well, adjusting sizes and offsets as necessary to
satisfy both old and new alignment requirements.
DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG
section, which is ignored by libbpf in bpf_object__open() anyways. So, in
a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty
nice property, especially if resulting .o file is then used to generate BPF
skeleton.
Original string sections are ignored and instead we construct our own set of
unique strings using libbpf-internal `struct strset` API.
To reduce the size of the patch, all the .BTF and .BTF.ext processing was
moved into a separate patch.
The high-level API consists of just 4 functions:
- bpf_linker__new() creates an instance of BPF static linker. It accepts
output filename and (currently empty) options struct;
- bpf_linker__add_file() takes input filename and appends it to the already
processed ELF data; it can be called multiple times, one for each BPF
ELF object file that needs to be linked in;
- bpf_linker__finalize() needs to be called to dump final ELF contents into
the output file, specified when bpf_linker was created; after
bpf_linker__finalize() is called, no more bpf_linker__add_file() and
bpf_linker__finalize() calls are allowed, they will return error;
- regardless of whether bpf_linker__finalize() was called or not,
bpf_linker__free() will free up all the used resources.
Currently, BPF static linker doesn't resolve cross-object file references
(extern variables and/or functions). This will be added in the follow up patch
set.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
|
|
Add btf__add_type() API that performs shallow copy of a given BTF type from
the source BTF into the destination BTF. All the information and type IDs are
preserved, but all the strings encountered are added into the destination BTF
and corresponding offsets are rewritten. BTF type IDs are assumed to be
correct or such that will be (somehow) modified afterwards.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-6-andrii@kernel.org
|
|
Extract BTF logic for maintaining a set of strings data structure, used for
BTF strings section construction in writable mode, into separate re-usable
API. This data structure is going to be used by bpf_linker to maintains ELF
STRTAB section, which has the same layout as BTF strings section.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-5-andrii@kernel.org
|
|
Rename btf_add_mem() and btf_ensure_mem() helpers that abstract away details
of dynamically resizable memory to use libbpf_ prefix, as they are not
BTF-specific. No functional changes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-4-andrii@kernel.org
|
|
Extract and generalize the logic to iterate BTF type ID and string offset
fields within BTF types and .BTF.ext data. Expose this internally in libbpf
for re-use by bpf_linker.
Additionally, complete strings deduplication handling for BTF.ext (e.g., CO-RE
access strings), which was previously missing. There previously was no
case of deduplicating .BTF.ext data, but bpf_linker is going to use it.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-3-andrii@kernel.org
|
|
btf_type_by_id() is internal-only convenience API returning non-const pointer
to struct btf_type. Expose it outside of btf.c for re-use.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210318194036.3521577-2-andrii@kernel.org
|
|
Antoine Tenart says:
====================
net: xps: improve the xps maps handling
This series aims at fixing various issues with the xps code, including
out-of-bound accesses and use-after-free. While doing so we try to
improve the xps code maintainability and readability.
The main change is moving dev->num_tc and dev->nr_ids in the xps maps, to
avoid out-of-bound accesses as those two fields can be updated after the
maps have been allocated. This allows further reworks, to improve the
xps code readability and allow to stop taking the rtnl lock when
reading the maps in sysfs. The maps are moved to an array in net_device,
which simplifies the code a lot.
One future improvement may be to remove the use of xps_map_mutex from
net/core/dev.c, but that may require extra care.
Thanks!
Antoine
Since v3:
- Removed the 3 patches about the rtnl lock and __netif_set_xps_queue
as there are extra issues. Those patches were not tied to the
others, and I'll see want can be done as a separate effort.
- One small fix in patch 12.
Since v2:
- Patches 13-16 are new to the series.
- Fixed another issue I found while preparing v3 (use after free of
old xps maps).
- Kept the rtnl lock when calling netdev_get_tx_queue and
netdev_txq_to_tc.
- Use get_device/put_device when using the sb_dev.
- Take the rtnl lock in mlx5 and virtio_net when calling
netif_set_xps_queue.
- Fixed a coding style issue.
Since v1:
- Reordered the patches to improve readability and avoid introducing
issues in between patches.
- Use dev_maps->nr_ids to allocate the mask in xps_queue_show but
still default to nr_cpu_ids/dev->num_rx_queues in xps_queue_show
when dev_maps hasn't been allocated yet for backward
compatibility.:w
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In __netif_set_xps_queue, old map entries from the old dev_maps are
freed but their corresponding entry in the old dev_maps aren't NULLed.
Fix this.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When setting up an new dev_maps in __netif_set_xps_queue, we remove and
free maps from unused CPUs/rx-queues near the end of the function; by
calling remove_xps_queue. However it's possible those maps are also part
of the old not-freed-yet dev_maps, which might be used concurrently.
When that happens, a map can be freed while its corresponding entry in
the old dev_maps table isn't NULLed, leading to: "BUG: KASAN:
use-after-free" in different places.
This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL
the map entries from the old dev_maps table.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most of the xps_cpus_show and xps_rxqs_show functions share the same
logic. Having it in two different functions does not help maintenance.
This patch moves their common logic into a new function, xps_queue_show,
to improve this.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
protected, we do not have the need to protect the maps in the rtnl lock.
Move the rtnl unlock up so we reduce the rtnl locking section.
We also increase the reference count on the subordinate device if any,
as we don't want this device to be freed while we use it (now that the
rtnl lock isn't protecting it in the whole function).
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Improve the readability of the loop removing tx-queue from unused
CPUs/rx-queues in __netif_set_xps_queue. The change should only be
cosmetic.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds an helper, xps_copy_dev_maps, to copy maps from dev_maps
to new_dev_maps at a given index. The logic should be the same, with an
improved code readability and maintenance.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in
net_device. That will simplify a lot the code removing the need for lots
of if/else conditionals as the correct map will be available using its
offset in the array.
This should not modify the xps maps behaviour in any way.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the xps possible_mask. It was an optimization but we can just
loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That
simplifies the code a bit.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Embed nr_ids (the number of cpu for the xps cpus map, and the number of
rxqs for the xps cpus map) in dev_maps. That will help not accessing out
of bound memory if those values change after dev_maps was allocated.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The xps cpus/rxqs map is accessed using dev->num_tc, which is used when
allocating the map. But later updates of dev->num_tc can lead to having
a mismatch between the maps and how they're accessed. In such cases the
map values do not make any sense and out of bound accesses can occur
(that can be easily seen using KASAN).
This patch aims at fixing this by embedding num_tc into the maps, using
the value at the time the map is created. This brings two improvements:
- The maps can be accessed using the embedded num_tc, so we know for
sure we won't have out of bound accesses.
- Checks can be made before accessing the maps so we know the values
retrieved will make sense.
We also update __netif_set_xps_queue to conditionally copy old maps from
dev_maps in the new one only if the number of traffic classes from both
maps match.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make the implementations of xps_cpus_show and xps_rxqs_show to converge,
as the two share the same logic but diverted over time. This should not
modify their behaviour but will help future changes and improve
maintenance.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of
its callers use an unsigned long to store the returned value. Update the
code to be consistent, this should only be cosmetic.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use bitmap_zalloc instead of zalloc_cpumask_var in xps_cpus_show to
align with xps_rxqs_show. This will improve maintenance and allow us to
factorize the two functions. The function should behave the same.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BCM4908 has only 1 RGMII reg for controlling port 7.
Fixes: 73b7a6047971 ("net: dsa: bcm_sf2: support BCM4908's integrated switch")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simple macro like REG_RGMII_CNTRL_P() is insufficient as:
1. It doesn't validate port argument
2. It doesn't support chipsets with non-lineral RGMII regs layout
Missing port validation could result in getting register offset from out
of array. Random memory -> random offset -> random reads/writes. It
affected e.g. BCM4908 for REG_RGMII_CNTRL_P(7).
Fixes: a78e86ed586d ("net: dsa: bcm_sf2: Prepare for different register layouts")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.
Since commit ff92741270bf8b6e78aa885f166b68c7a67ab13a ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names. __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.
This demonstrates the bug:
# rmmod dummy 2>/dev/null
# ip link property add dev lo altname dummy0
# modprobe dummy numdummies=1
modprobe: ERROR: could not insert 'dummy': Too many open files in system
Instead of creating a device named dummy1, modprobe fails.
Fix this by checking all the names in the d->name_node list, not just d->name.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add device tree support to b53_mmap.c while keeping platform devices support.
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mohammad Athari Bin Ismail says:
====================
net: stmmac: EST interrupts and ethtool
This patchset adds support for handling EST interrupts and reporting EST
errors. Additionally, the errors are added into ethtool statistic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Below EST errors are added into ethtool statistic:
1) Constant Gate Control Error (CGCE):
The counter "mtl_est_cgce" increases everytime CGCE interrupt is
triggered.
2) Head-of-Line Blocking due to Scheduling (HLBS):
The counter "mtl_est_hlbs" increases everytime HLBS interrupt is
triggered.
3) Head-of-Line Blocking due to Frame Size (HLBF):
The counter "mtl_est_hlbf" increases everytime HLBF interrupt is
triggered.
4) Base Time Register error (BTRE):
The counter "mtl_est_btre" increases everytime BTRE interrupt is
triggered but BTRL not reaches maximum value of 15.
5) Base Time Register Error Loop Count (BTRL) reaches maximum value:
The counter "mtl_est_btrlm" increases everytime BTRE interrupt is
triggered and BTRL value reaches maximum value of 15.
Please refer to MTL_EST_STATUS register in DesignWare Cores Ethernet
Quality-of-Service Databook for more detail explanation.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Co-developed-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enabled EST related interrupts as below:
1) Constant Gate Control Error (CGCE)
2) Head-of-Line Blocking due to Scheduling (HLBS)
3) Head-of-Line Blocking due to Frame Size (HLBF).
4) Base Time Register error (BTRE)
5) Switch to S/W owned list Complete (SWLC)
For HLBS, the user will get the info of all the queues that shows this
error. For HLBF, the user will get the info of all the queue with the
latest frame size which causes the error. Frame size 0 indicates no
error.
The ISR handling takes place when EST feature is enabled by user.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Co-developed-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")
removed last selector of CONFIG_MV64X60.
As it is not a user selectable config item, all references to it
are stale. Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ong Boon Leong says:
====================
stmmac: add VLAN priority based RX steering
The current tc flower implementation in stmmac supports both L3 and L4
filter offloading. This patch adds the support of VLAN priority based
RX frame steering into different Rx Queues.
The patches have been tested on both configuration test (include L3/L4)
and traffic test (multi VLAN ping streams with RX Frame Steering) below:-
> tc qdisc delete dev eth0 ingress
> tc qdisc del dev eth0 parent root 2&> /dev/null
> tc qdisc del dev eth0 parent ffff: 2&> /dev/null
> tc qdisc add dev eth0 ingress
> tc filter add dev eth0 parent ffff: protocol ip flower dst_ip 192.168.0.1 \
src_ip 192.168.1.1 ip_proto tcp dst_port 5201 src_port 6201 action drop
> tc filter add dev eth0 parent ffff: protocol ip flower dst_ip 192.168.0.2 \
src_ip 192.168.1.2 ip_proto tcp dst_port 5202 src_port 6202 action drop
> tc filter show dev eth0 ingress
filter parent ffff: protocol ip pref 49151 flower chain 0
filter parent ffff: protocol ip pref 49151 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_ip 192.168.0.2
src_ip 192.168.1.2
dst_port 5202
src_port 6202
in_hw in_hw_count 1
action order 1: gact action drop
random type none pass val 0
index 2 ref 1 bind 1
filter parent ffff: protocol ip pref 49152 flower chain 0
filter parent ffff: protocol ip pref 49152 flower chain 0 handle 0x1
eth_type ipv4
ip_proto tcp
dst_ip 192.168.0.1
src_ip 192.168.1.1
dst_port 5201
src_port 6201
in_hw in_hw_count 1
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
> tc qdisc delete dev eth0 ingress
> tc qdisc del dev eth0 parent root 2&> /dev/null
> tc qdisc del dev eth0 parent ffff: 2&> /dev/null
> tc qdisc add dev eth0 ingress
> tc qdisc add dev eth0 root mqprio num_tc 4 \
map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 1@3 hw 0
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 0 hw_tc 3
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc 2
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 2 hw_tc 1
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 3 hw_tc 0
> tc filter show dev eth0 ingress
filter parent ffff: protocol 802.1Q pref 49149 flower chain 0
filter parent ffff: protocol 802.1Q pref 49149 flower chain 0 handle 0x1 hw_tc 0
vlan_prio 3
in_hw in_hw_count 1
filter parent ffff: protocol 802.1Q pref 49150 flower chain 0
filter parent ffff: protocol 802.1Q pref 49150 flower chain 0 handle 0x1 hw_tc 1
vlan_prio 2
in_hw in_hw_count 1
filter parent ffff: protocol 802.1Q pref 49151 flower chain 0
filter parent ffff: protocol 802.1Q pref 49151 flower chain 0 handle 0x1 hw_tc 2
vlan_prio 1
in_hw in_hw_count 1
filter parent ffff: protocol 802.1Q pref 49152 flower chain 0
filter parent ffff: protocol 802.1Q pref 49152 flower chain 0 handle 0x1 hw_tc 3
vlan_prio 0
in_hw in_hw_count 1
> tc qdisc delete dev eth0 ingress
> ip address flush dev eth0
> ip address add 169.254.1.11/24 dev eth0
> ip link delete dev eth0.vlan1 2> /dev/null
> ip link add link eth0 name eth0.vlan1 type vlan id 1
> ip address flush dev eth0.vlan1 2> /dev/null
> ip address add 169.254.11.11/24 dev eth0.vlan1
> ip link delete dev eth0.vlan2 2> /dev/null
> ip link add link eth0 name eth0.vlan2 type vlan id 2
> ip address flush dev eth0.vlan2 2> /dev/null
> ip address add 169.254.12.11/24 dev eth0.vlan2
> ip link delete dev eth0.vlan3 2> /dev/null
> ip link add link eth0 name eth0.vlan3 type vlan id 3
> ip address flush dev eth0.vlan3 2> /dev/null
> ip address add 169.254.13.11/24 dev eth0.vlan3
> ip link delete dev eth0.vlan4 2> /dev/null
> ip link add link eth0 name eth0.vlan4 type vlan id 4
> ip address flush dev eth0.vlan4 2> /dev/null
> ip address add 169.254.14.11/24 dev eth0.vlan4
> ip address flush dev eth0
> ip address add 169.254.1.22/24 dev eth0
> ip link delete dev eth0.vlan1 2> /dev/null
> ip link add link eth0 name eth0.vlan1 type vlan id 1
> ip address flush dev eth0.vlan1 2> /dev/null
> ip address add 169.254.11.22/24 dev eth0.vlan1
> ip link delete dev eth0.vlan2 2> /dev/null
> ip link add link eth0 name eth0.vlan2 type vlan id 2
> ip address flush dev eth0.vlan2 2> /dev/null
> ip address add 169.254.12.22/24 dev eth0.vlan2
> ip link delete dev eth0.vlan3 2> /dev/null
> ip link add link eth0 name eth0.vlan3 type vlan id 3
> ip address flush dev eth0.vlan3 2> /dev/null
> ip address add 169.254.13.22/24 dev eth0.vlan3
> ip link delete dev eth0.vlan4 2> /dev/null
> ip link add link eth0 name eth0.vlan4 type vlan id 4
> ip address flush dev eth0.vlan4 2> /dev/null
> ip address add 169.254.14.22/24 dev eth0.vlan4
> mkdir -p /sys/fs/cgroup/net_prio/grp0
> echo eth0 0 > /sys/fs/cgroup/net_prio/grp0/net_prio.ifpriomap
> echo eth0.vlan1 0 > /sys/fs/cgroup/net_prio/grp0/net_prio.ifpriomap
> mkdir -p /sys/fs/cgroup/net_prio/grp1
> echo eth0 0 > /sys/fs/cgroup/net_prio/grp1/net_prio.ifpriomap
> echo eth0.vlan2 1 > /sys/fs/cgroup/net_prio/grp1/net_prio.ifpriomap
> mkdir -p /sys/fs/cgroup/net_prio/grp2
> echo eth0 0 > /sys/fs/cgroup/net_prio/grp2/net_prio.ifpriomap
> echo eth0.vlan3 2 > /sys/fs/cgroup/net_prio/grp2/net_prio.ifpriomap
> mkdir -p /sys/fs/cgroup/net_prio/grp3
> echo eth0 0 > /sys/fs/cgroup/net_prio/grp3/net_prio.ifpriomap
> echo eth0.vlan4 3 > /sys/fs/cgroup/net_prio/grp3/net_prio.ifpriomap
> tc qdisc del dev eth0 parent root 2&> /dev/null
> tc qdisc del dev eth0 parent ffff: 2&> /dev/null
> tc qdisc add dev eth0 ingress
> tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 1@2 1@3 hw 0
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 0 hw_tc 0
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc 1
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 2 hw_tc 2
> tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 3 hw_tc 3
> ip link set eth0.vlan1 type vlan egress-qos-map 0:0
> ip link set eth0.vlan2 type vlan egress-qos-map 1:1
> ip link set eth0.vlan3 type vlan egress-qos-map 2:2
> ip link set eth0.vlan4 type vlan egress-qos-map 3:3
> tc filter show dev eth0 ingress
filter parent ffff: protocol 802.1Q pref 49149 flower chain 0
filter parent ffff: protocol 802.1Q pref 49149 flower chain 0 handle 0x1 hw_tc 3
vlan_prio 3
in_hw in_hw_count 1
filter parent ffff: protocol 802.1Q pref 49150 flower chain 0
filter parent ffff: protocol 802.1Q pref 49150 flower chain 0 handle 0x1 hw_tc 2
vlan_prio 2
in_hw in_hw_count 1
filter parent ffff: protocol 802.1Q pref 49151 flower chain 0
filter parent ffff: protocol 802.1Q pref 49151 flower chain 0 handle 0x1 hw_tc 1
vlan_prio 1
in_hw in_hw_count 1
filter parent ffff: protocol 802.1Q pref 49152 flower chain 0
filter parent ffff: protocol 802.1Q pref 49152 flower chain 0 handle 0x1 hw_tc 0
vlan_prio 0
in_hw in_hw_count 1
> echo 1 > /proc/irq/131/smp_affinity
> echo 1 > /proc/irq/132/smp_affinity
> echo 4 > /proc/irq/133/smp_affinity
> echo 4 > /proc/irq/134/smp_affinity
> echo 4 > /proc/irq/135/smp_affinity
> echo 4 > /proc/irq/136/smp_affinity
> echo 2 > /proc/irq/137/smp_affinity
> echo 2 > /proc/irq/138/smp_affinity
> ping -i 0.001 169.254.11.22 2&> /dev/null &
> PID1="$!"
> echo $PID1 > /sys/fs/cgroup/net_prio/grp0/cgroup.procs
> ping -i 0.001 169.254.12.22 2&> /dev/null &
> PID2="$!"
> echo $PID2 > /sys/fs/cgroup/net_prio/grp1/cgroup.procs
> ping -i 0.001 169.254.13.22 2&> /dev/null &
> PID3="$!"
> echo $PID3 > /sys/fs/cgroup/net_prio/grp2/cgroup.procs
> ping -i 0.001 169.254.14.22 2&> /dev/null &
> PID4="$!"
> echo $PID4 > /sys/fs/cgroup/net_prio/grp3/cgroup.procs
> ping -i 0.001 169.254.11.11 2&> /dev/null &
> PID1="$!"
> echo $PID1 > /sys/fs/cgroup/net_prio/grp0/cgroup.procs
> ping -i 0.001 169.254.12.11 2&> /dev/null &
> PID2="$!"
> echo $PID2 > /sys/fs/cgroup/net_prio/grp1/cgroup.procs
> ping -i 0.001 169.254.13.11 2&> /dev/null &
> PID3="$!"
> echo $PID3 > /sys/fs/cgroup/net_prio/grp2/cgroup.procs
> ping -i 0.001 169.254.14.11 2&> /dev/null &
> PID4="$!"
> echo $PID4 > /sys/fs/cgroup/net_prio/grp3/cgroup.procs
> watch -n 0.5 -d "cat /proc/interrupts | grep eth0"
131: 251918 41 0 0 IR-PCI-MSI 477184-edge eth0:rx-0
132: 18969 1 0 0 IR-PCI-MSI 477185-edge eth0:tx-0
133: 0 0 295872 0 IR-PCI-MSI 477186-edge eth0:rx-1
134: 0 0 16136 0 IR-PCI-MSI 477187-edge eth0:tx-1
135: 0 0 288042 0 IR-PCI-MSI 477188-edge eth0:rx-2
136: 0 0 16135 0 IR-PCI-MSI 477189-edge eth0:tx-2
137: 0 211177 0 0 IR-PCI-MSI 477190-edge eth0:rx-3
138: 2 16144 0 0 IR-PCI-MSI 477191-edge eth0:tx-3
139: 0 0 0 0 IR-PCI-MSI 477192-edge eth0:rx-4
140: 0 0 0 0 IR-PCI-MSI 477193-edge eth0:tx-4
141: 0 0 0 0 IR-PCI-MSI 477194-edge eth0:rx-5
142: 0 0 0 0 IR-PCI-MSI 477195-edge eth0:tx-5
143: 0 0 0 0 IR-PCI-MSI 477196-edge eth0:rx-6
144: 0 0 0 0 IR-PCI-MSI 477197-edge eth0:tx-6
145: 0 0 0 0 IR-PCI-MSI 477198-edge eth0:rx-7
146: 0 0 0 0 IR-PCI-MSI 477199-edge eth0:tx-7
157: 0 0 0 0 IR-PCI-MSI 477210-edge eth0:safety-ue
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We extend tc flower to support configuration of VLAN priority-based RX
frame steering hardware offloading.
To map VLAN <PCP> to Traffic Class <TC>:
$ tc filter add dev <IFNAME> parent ffff: protocol 802.1Q flower \
vlan_prio <PCP> hw_tc <TC>
Note: <TC> < N whereby "tc qdisc ... num_tc N ..."
To delete all tc flower configurations:
$ tc qdisc delete dev <IFNAME> ingress
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current tc_add_flow() and tc_del_flow() use hardware L3 & L4 filters
as offloading. The number of L3/L4 filters is read from L3L4FNUM field
from MAC_HW_Feature1 register and is used to alloc priv->tc_entries[].
For RX frame steering based on VLAN priority offloading, we use
MAC_RXQ_CTRL2 & MAC_RXQ_CTRL3 registers and all VLAN priority level
can be configured independent from L3 & L4 filters.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hariprasad Kelam says:
====================
octeontx2: miscellaneous fixes
This series of patches fixes various issues related to NPC MCAM entry
management, debugfs, devlink, CGX LMAC mapping, RSS config etc
Change-log:
v2:
Fixed below review comments
- corrected Fixed tag syntax with 12 digits SHA1
and providing space between SHA1 and subject line
- remove code improvement patch
- make commit description more clear
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Initialize l4_key_offset variable to fix uninitialized
variable compiler warning.
Fixes: b9b7421a01d8 ("octeontx2-af: Support ESP/AH RSS hashing")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
unmapping npc counter works in a way by traversing all mcam
entries to find which mcam rule is associated with counter.
But loop cursor variable 'entry' is not incremented before
checking next mcam entry which resulting in infinite loop.
This in turn hogs the kworker thread forever and no other
mbox message is processed by AF driver after that.
Fix this by updating entry value before checking next
mcam entry.
Fixes: a958dd59f9ce ("octeontx2-af: Map or unmap NPC MCAM entry and counter")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RSS configuration can not be get/set when interface is in down state
as they required mbox communication. RSS enable flag status
is used for set/get configuration. Current code do not clear the
RSS enable flag on interface down which lead to mbox error while
trying to set/get RSS configuration.
Fixes: 85069e95e531 ("octeontx2-pf: Receive side scaling support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current devlink code try to free already freed irqs as the
irq_allocate flag is not cleared after free leading to kernel
crash while removing rvu driver. The patch fixes the irq free
sequence and clears the irq_allocate flag on free.
Fixes: 7304ac4567bc ("octeontx2-af: Add mailbox IRQ and msg handlers")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CGX receive buffer size is a constant value and
cannot be read from CGX0 block always since
CGX0 may not enabled everytime. Hence return CGX
receive buffer size from first enabled CGX block
instead of CGX0.
Fixes: 6e54e1c5399a ("octeontx2-af: cn10K: MTU configuration")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MKEX profile describes what packet fields need to be extracted from
the input packet and how to place those packet fields in the output key
for MCAM matching. The MKEX profile can be in a way where higher layer
packet fields can overwrite lower layer packet fields in output MCAM
Key.
Hence MKEX profile is always ensured that there are no overlaps between
any of the layers. But the commit 42006910b5ea
("octeontx2-af: cleanup KPU config data") introduced TX TOS field which
overlaps with DMAC in MCAM key.
This led to AF driver returning error when TX rule is installed with
DMAC as match criteria since DMAC gets overwritten and cannot be
supported. This patch fixes the issue by removing TOS field from MKEX TX
profile.
Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the existing rsrc_alloc's format, there is misalignment for the
pcifunc entries whose VF's index is a double digit. This patch fixes
this.
pcifunc NPA NIX0 NIX1 SSO GROUP SSOWS
TIM CPT0 CPT1 REE0 REE1
PF0:VF0 8 5
PF0:VF1 9 3
PF0:VF10 18 10
PF0:VF11 19 8
PF0:VF12 20 11
PF0:VF13 21 9
PF0:VF14 22 12
PF0:VF15 23 10
PF1 0 0
Fixes: 23205e6d06d4 ("octeontx2-af: Dump current resource provisioning status")
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the ETHTOOL_GRXCLSRLALL ioctl ethtool uses
below structure to read number of rules from the driver.
struct ethtool_rxnfc {
__u32 cmd;
__u32 flow_type;
__u64 data;
struct ethtool_rx_flow_spec fs;
union {
__u32 rule_cnt;
__u32 rss_context;
};
__u32 rule_locs[0];
};
Driver must not modify rule_cnt member. But currently driver
modifies it by modifying rss_context. Hence fix it by using a
local variable.
Fixes: 81a4362016e7 ("octeontx2-pf: Add RSS multi group support")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"There are still regressions being found and fixed in the zoned mode
and subpage code, the rest are fixes for bugs reported by users.
Regressions:
- subpage block support:
- readahead works on the proper block size
- fix last page zeroing
- zoned mode:
- linked list corruption for tree log
Fixes:
- qgroup leak after falloc failure
- tree mod log and backref resolving:
- extent buffer cloning race when resolving backrefs
- pin deleted leaves with active tree mod log users
- drop debugging flag from slab cache"
* tag 'for-5.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: always pin deleted leaves when there are active tree mod log users
btrfs: fix race when cloning extent buffer during rewind of an old root
btrfs: fix slab cache flags for free space tree bitmap
btrfs: subpage: make readahead work properly
btrfs: subpage: fix wild pointer access during metadata read failure
btrfs: zoned: fix linked list corruption after log root tree allocation failure
btrfs: fix qgroup data rsv leak caused by falloc failure
btrfs: track qgroup released data in own variable in insert_prealloc_file_extent
btrfs: fix wrong offset to zero out range beyond i_size
|
|
Pull VFIO fixes from Alex Williamson:
- Fix 32-bit issue with new unmap-all flag (Steve Sistare)
- Various Kconfig changes for better coverage (Jason Gunthorpe)
- Fix to batch pinning support (Daniel Jordan)
* tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio:
vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()
vfio: Depend on MMU
ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST
vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM
vfio: IOMMU_API should be selected
vfio/type1: fix unmap all on ILP32
|
|
Pull xfs fixes from Darrick Wong:
"A couple of minor corrections for the new idmapping functionality, and
a fix for a theoretical hang that could occur if we decide to abort a
mount after dirtying the quota inodes.
Summary:
- Fix quota accounting on creat() when id mapping is enabled
- Actually reclaim dirty quota inodes when mount fails
- Typo fixes for documentation
- Restrict both bulkstat calls on idmapped/namespaced mounts"
* tag 'xfs-5.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: also reject BULKSTAT_SINGLE in a mount user namespace
docs: ABI: Fix the spelling oustanding to outstanding in the file sysfs-fs-xfs
xfs: force log and push AIL to clear pinned inodes when aborting mount
xfs: fix quota accounting when a mount is idmapped
|
|
Naveen Mamindlapalli says:
====================
Add tc hardware offloads
This patch series adds support for tc hardware offloads.
Patch #1 adds support for offloading flows that matches IP tos and IP
protocol which will be used by tc hw offload support. Also
added ethtool n-tuple filter to code to offload the flows
matching the above fields.
Patch #2 adds tc flower hardware offload support on ingress traffic.
Patch #3 adds TC flower offload stats.
Patch #4 adds tc TC_MATCHALL egress ratelimiting offload.
* tc flower hardware offload in PF driver
The driver parses the flow match fields and actions received from the tc
subsystem and adds/delete MCAM rules for the same. Each flow contains set
of match and action fields. If the action or fields are not supported,
the rule cannot be offloaded to hardware. The tc uses same set of MCAM
rules allocated for ethtool n-tuple filters. So, at a time only one entity
can offload the flows to hardware, they're made mutually exclusive in the
driver.
Following match and actions are supported.
Match: Eth dst_mac, EtherType, 802.1Q {vlan_id,vlan_prio}, vlan EtherType,
IP proto {tcp,udp,sctp,icmp,icmp6}, IPv4 tos, IPv4{dst_ip,src_ip},
L4 proto {dst_port|src_port number}.
Actions: drop, accept, vlan pop, redirect to another port on the device.
The Hardware stats are also supported. Currently only packet counter stats
are updated.
* tc egress rate limiting support
Added TC-MATCHALL classifier offload with police action applied for all
egress traffic on the specified interface.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add TC_MATCHALL egress ratelimiting offload support with POLICE
action for entire traffic going out of the interface.
Eg: To ratelimit egress traffic to 100Mbps
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 egress matchall skip_sw \
action police rate 100Mbit burst 16Kbit
HW supports a max burst size of ~128KB.
Only one ratelimiting filter can be installed at a time.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|