summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-03-18selftests/bpf: Pass all BPF .o's through BPF static linkerAndrii Nakryiko
Pass all individual BPF object files (generated from progs/*.c) through `bpftool gen object` command to validate that BPF static linker doesn't corrupt them. As an additional sanity checks, validate that passing resulting object files through linker again results in identical ELF files. Exact same ELF contents can be guaranteed only after two passes, as after the first pass ELF sections order changes, and thus .BTF.ext data sections order changes. That, in turn, means that strings are added into the final BTF string sections in different order, so .BTF strings data might not be exactly the same. But doing another round of linking afterwards should result in the identical ELF file, which is checked with additional `diff` command. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-12-andrii@kernel.org
2021-03-18selftests/bpf: Re-generate vmlinux.h and BPF skeletons if bpftool changedAndrii Nakryiko
Trigger vmlinux.h and BPF skeletons re-generation if detected that bpftool was re-compiled. Otherwise full `make clean` is required to get updated skeletons, if bpftool is modified. Fixes: acbd06206bbb ("selftests/bpf: Add vmlinux.h selftest exercising tracing of syscalls") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-11-andrii@kernel.org
2021-03-18bpftool: Add `gen object` command to perform BPF static linkingAndrii Nakryiko
Add `bpftool gen object <output-file> <input_file>...` command to statically link multiple BPF ELF object files into a single output BPF ELF object file. This patch also updates bash completions and man page. Man page gets a short section on `gen object` command, but also updates the skeleton example to show off workflow for BPF application with two .bpf.c files, compiled individually with Clang, then resulting object files are linked together with `gen object`, and then final object file is used to generate usable BPF skeleton. This should help new users understand realistic workflow w.r.t. compiling mutli-file BPF application. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20210318194036.3521577-10-andrii@kernel.org
2021-03-18bpftool: Add ability to specify custom skeleton object nameAndrii Nakryiko
Add optional name OBJECT_NAME parameter to `gen skeleton` command to override default object name, normally derived from input file name. This allows much more flexibility during build time. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-9-andrii@kernel.org
2021-03-18libbpf: Add BPF static linker BTF and BTF.ext supportAndrii Nakryiko
Add .BTF and .BTF.ext static linking logic. When multiple BPF object files are linked together, their respective .BTF and .BTF.ext sections are merged together. BTF types are not just concatenated, but also deduplicated. .BTF.ext data is grouped by type (func info, line info, core_relos) and target section names, and then all the records are concatenated together, preserving their relative order. All the BTF type ID references and string offsets are updated as necessary, to take into account possibly deduplicated strings and types. BTF DATASEC types are handled specially. Their respective var_secinfos are accumulated separately in special per-section data and then final DATASEC types are emitted at the very end during bpf_linker__finalize() operation, just before emitting final ELF output file. BTF data can also provide "section annotations" for some extern variables. Such concept is missing in ELF, but BTF will have DATASEC types for such special extern datasections (e.g., .kconfig, .ksyms). Such sections are called "ephemeral" internally. Internally linker will keep metadata for each such section, collecting variables information, but those sections won't be emitted into the final ELF file. Also, given LLVM/Clang during compilation emits BTF DATASECS that are incomplete, missing section size and variable offsets for static variables, BPF static linker will initially fix up such DATASECs, using ELF symbols data. The final DATASECs will preserve section sizes and all variable offsets. This is handled correctly by libbpf already, so won't cause any new issues. On the other hand, it's actually a nice property to have a complete BTF data without runtime adjustments done during bpf_object__open() by libbpf. In that sense, BPF static linker is also a BTF normalizer. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-8-andrii@kernel.org
2021-03-18libbpf: Add BPF static linker APIsAndrii Nakryiko
Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18libbpf: Add generic BTF type shallow copy APIAndrii Nakryiko
Add btf__add_type() API that performs shallow copy of a given BTF type from the source BTF into the destination BTF. All the information and type IDs are preserved, but all the strings encountered are added into the destination BTF and corresponding offsets are rewritten. BTF type IDs are assumed to be correct or such that will be (somehow) modified afterwards. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-6-andrii@kernel.org
2021-03-18libbpf: Extract internal set-of-strings datastructure APIsAndrii Nakryiko
Extract BTF logic for maintaining a set of strings data structure, used for BTF strings section construction in writable mode, into separate re-usable API. This data structure is going to be used by bpf_linker to maintains ELF STRTAB section, which has the same layout as BTF strings section. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-5-andrii@kernel.org
2021-03-18libbpf: Rename internal memory-management helpersAndrii Nakryiko
Rename btf_add_mem() and btf_ensure_mem() helpers that abstract away details of dynamically resizable memory to use libbpf_ prefix, as they are not BTF-specific. No functional changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-4-andrii@kernel.org
2021-03-18libbpf: Generalize BTF and BTF.ext type ID and strings iterationAndrii Nakryiko
Extract and generalize the logic to iterate BTF type ID and string offset fields within BTF types and .BTF.ext data. Expose this internally in libbpf for re-use by bpf_linker. Additionally, complete strings deduplication handling for BTF.ext (e.g., CO-RE access strings), which was previously missing. There previously was no case of deduplicating .BTF.ext data, but bpf_linker is going to use it. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-3-andrii@kernel.org
2021-03-18libbpf: Expose btf_type_by_id() internallyAndrii Nakryiko
btf_type_by_id() is internal-only convenience API returning non-const pointer to struct btf_type. Expose it outside of btf.c for re-use. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-2-andrii@kernel.org
2021-03-18Merge branch 'net-xps-improve-the-xps-maps-handling'David S. Miller
Antoine Tenart says: ==================== net: xps: improve the xps maps handling This series aims at fixing various issues with the xps code, including out-of-bound accesses and use-after-free. While doing so we try to improve the xps code maintainability and readability. The main change is moving dev->num_tc and dev->nr_ids in the xps maps, to avoid out-of-bound accesses as those two fields can be updated after the maps have been allocated. This allows further reworks, to improve the xps code readability and allow to stop taking the rtnl lock when reading the maps in sysfs. The maps are moved to an array in net_device, which simplifies the code a lot. One future improvement may be to remove the use of xps_map_mutex from net/core/dev.c, but that may require extra care. Thanks! Antoine Since v3: - Removed the 3 patches about the rtnl lock and __netif_set_xps_queue as there are extra issues. Those patches were not tied to the others, and I'll see want can be done as a separate effort. - One small fix in patch 12. Since v2: - Patches 13-16 are new to the series. - Fixed another issue I found while preparing v3 (use after free of old xps maps). - Kept the rtnl lock when calling netdev_get_tx_queue and netdev_txq_to_tc. - Use get_device/put_device when using the sb_dev. - Take the rtnl lock in mlx5 and virtio_net when calling netif_set_xps_queue. - Fixed a coding style issue. Since v1: - Reordered the patches to improve readability and avoid introducing issues in between patches. - Use dev_maps->nr_ids to allocate the mask in xps_queue_show but still default to nr_cpu_ids/dev->num_rx_queues in xps_queue_show when dev_maps hasn't been allocated yet for backward compatibility.:w ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: NULL the old xps map entries when freeing themAntoine Tenart
In __netif_set_xps_queue, old map entries from the old dev_maps are freed but their corresponding entry in the old dev_maps aren't NULLed. Fix this. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: fix use after free in xpsAntoine Tenart
When setting up an new dev_maps in __netif_set_xps_queue, we remove and free maps from unused CPUs/rx-queues near the end of the function; by calling remove_xps_queue. However it's possible those maps are also part of the old not-freed-yet dev_maps, which might be used concurrently. When that happens, a map can be freed while its corresponding entry in the old dev_maps table isn't NULLed, leading to: "BUG: KASAN: use-after-free" in different places. This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL the map entries from the old dev_maps table. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net-sysfs: move the xps cpus/rxqs retrieval in a common functionAntoine Tenart
Most of the xps_cpus_show and xps_rxqs_show functions share the same logic. Having it in two different functions does not help maintenance. This patch moves their common logic into a new function, xps_queue_show, to improve this. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net-sysfs: move the rtnl unlock up in the xps show helpersAntoine Tenart
Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU protected, we do not have the need to protect the maps in the rtnl lock. Move the rtnl unlock up so we reduce the rtnl locking section. We also increase the reference count on the subordinate device if any, as we don't want this device to be freed while we use it (now that the rtnl lock isn't protecting it in the whole function). Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: improve queue removal readability in __netif_set_xps_queueAntoine Tenart
Improve the readability of the loop removing tx-queue from unused CPUs/rx-queues in __netif_set_xps_queue. The change should only be cosmetic. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: add an helper to copy xps maps to the new dev_mapsAntoine Tenart
This patch adds an helper, xps_copy_dev_maps, to copy maps from dev_maps to new_dev_maps at a given index. The logic should be the same, with an improved code readability and maintenance. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: move the xps maps to an arrayAntoine Tenart
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in net_device. That will simplify a lot the code removing the need for lots of if/else conditionals as the correct map will be available using its offset in the array. This should not modify the xps maps behaviour in any way. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: remove the xps possible_maskAntoine Tenart
Remove the xps possible_mask. It was an optimization but we can just loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That simplifies the code a bit. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: embed nr_ids in the xps mapsAntoine Tenart
Embed nr_ids (the number of cpu for the xps cpus map, and the number of rxqs for the xps cpus map) in dev_maps. That will help not accessing out of bound memory if those values change after dev_maps was allocated. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: embed num_tc in the xps mapsAntoine Tenart
The xps cpus/rxqs map is accessed using dev->num_tc, which is used when allocating the map. But later updates of dev->num_tc can lead to having a mismatch between the maps and how they're accessed. In such cases the map values do not make any sense and out of bound accesses can occur (that can be easily seen using KASAN). This patch aims at fixing this by embedding num_tc into the maps, using the value at the time the map is created. This brings two improvements: - The maps can be accessed using the embedded num_tc, so we know for sure we won't have out of bound accesses. - Checks can be made before accessing the maps so we know the values retrieved will make sense. We also update __netif_set_xps_queue to conditionally copy old maps from dev_maps in the new one only if the number of traffic classes from both maps match. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net-sysfs: make xps_cpus_show and xps_rxqs_show consistentAntoine Tenart
Make the implementations of xps_cpus_show and xps_rxqs_show to converge, as the two share the same logic but diverted over time. This should not modify their behaviour but will help future changes and improve maintenance. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net-sysfs: store the return of get_netdev_queue_index in an unsigned intAntoine Tenart
In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of its callers use an unsigned long to store the returned value. Update the code to be consistent, this should only be cosmetic. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net-sysfs: convert xps_cpus_show to bitmap_zallocAntoine Tenart
Use bitmap_zalloc instead of zalloc_cpumask_var in xps_cpus_show to align with xps_rxqs_show. This will improve maintenance and allow us to factorize the two functions. The function should behave the same. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: dsa: bcm_sf2: fix BCM4908 RGMII reg(s)Rafał Miłecki
BCM4908 has only 1 RGMII reg for controlling port 7. Fixes: 73b7a6047971 ("net: dsa: bcm_sf2: support BCM4908's integrated switch") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: dsa: bcm_sf2: add function finding RGMII registerRafał Miłecki
Simple macro like REG_RGMII_CNTRL_P() is insufficient as: 1. It doesn't validate port argument 2. It doesn't support chipsets with non-lineral RGMII regs layout Missing port validation could result in getting register offset from out of array. Random memory -> random offset -> random reads/writes. It affected e.g. BCM4908 for REG_RGMII_CNTRL_P(7). Fixes: a78e86ed586d ("net: dsa: bcm_sf2: Prepare for different register layouts") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: check all name nodes in __dev_alloc_nameJiri Bohac
__dev_alloc_name(), when supplied with a name containing '%d', will search for the first available device number to generate a unique device name. Since commit ff92741270bf8b6e78aa885f166b68c7a67ab13a ("net: introduce name_node struct to be used in hashlist") network devices may have alternate names. __dev_alloc_name() does take these alternate names into account, possibly generating a name that is already taken and failing with -ENFILE as a result. This demonstrates the bug: # rmmod dummy 2>/dev/null # ip link property add dev lo altname dummy0 # modprobe dummy numdummies=1 modprobe: ERROR: could not insert 'dummy': Too many open files in system Instead of creating a device named dummy1, modprobe fails. Fix this by checking all the names in the d->name_node list, not just d->name. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: dsa: b53: mmap: Add device tree supportÁlvaro Fernández Rojas
Add device tree support to b53_mmap.c while keeping platform devices support. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18Merge branch 'stmmac-EST-interrupts-and-ethtool'David S. Miller
Mohammad Athari Bin Ismail says: ==================== net: stmmac: EST interrupts and ethtool This patchset adds support for handling EST interrupts and reporting EST errors. Additionally, the errors are added into ethtool statistic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: stmmac: Add EST errors into ethtool statisticOng Boon Leong
Below EST errors are added into ethtool statistic: 1) Constant Gate Control Error (CGCE): The counter "mtl_est_cgce" increases everytime CGCE interrupt is triggered. 2) Head-of-Line Blocking due to Scheduling (HLBS): The counter "mtl_est_hlbs" increases everytime HLBS interrupt is triggered. 3) Head-of-Line Blocking due to Frame Size (HLBF): The counter "mtl_est_hlbf" increases everytime HLBF interrupt is triggered. 4) Base Time Register error (BTRE): The counter "mtl_est_btre" increases everytime BTRE interrupt is triggered but BTRL not reaches maximum value of 15. 5) Base Time Register Error Loop Count (BTRL) reaches maximum value: The counter "mtl_est_btrlm" increases everytime BTRE interrupt is triggered and BTRL value reaches maximum value of 15. Please refer to MTL_EST_STATUS register in DesignWare Cores Ethernet Quality-of-Service Databook for more detail explanation. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Co-developed-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: stmmac: EST interrupts handling and error reportingVoon Weifeng
Enabled EST related interrupts as below: 1) Constant Gate Control Error (CGCE) 2) Head-of-Line Blocking due to Scheduling (HLBS) 3) Head-of-Line Blocking due to Frame Size (HLBF). 4) Base Time Register error (BTRE) 5) Switch to S/W owned list Complete (SWLC) For HLBS, the user will get the info of all the queues that shows this error. For HLBF, the user will get the info of all the queue with the latest frame size which causes the error. Frame size 0 indicates no error. The ISR handling takes place when EST feature is enabled by user. Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Co-developed-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: marvell: Remove reference to CONFIG_MV64X60Christophe Leroy
Commit 92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support") removed last selector of CONFIG_MV64X60. As it is not a user selectable config item, all references to it are stale. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18Merge branch 'stmmac-vlan-priority-rx-steering'David S. Miller
Ong Boon Leong says: ==================== stmmac: add VLAN priority based RX steering The current tc flower implementation in stmmac supports both L3 and L4 filter offloading. This patch adds the support of VLAN priority based RX frame steering into different Rx Queues. The patches have been tested on both configuration test (include L3/L4) and traffic test (multi VLAN ping streams with RX Frame Steering) below:- > tc qdisc delete dev eth0 ingress > tc qdisc del dev eth0 parent root 2&> /dev/null > tc qdisc del dev eth0 parent ffff: 2&> /dev/null > tc qdisc add dev eth0 ingress > tc filter add dev eth0 parent ffff: protocol ip flower dst_ip 192.168.0.1 \ src_ip 192.168.1.1 ip_proto tcp dst_port 5201 src_port 6201 action drop > tc filter add dev eth0 parent ffff: protocol ip flower dst_ip 192.168.0.2 \ src_ip 192.168.1.2 ip_proto tcp dst_port 5202 src_port 6202 action drop > tc filter show dev eth0 ingress filter parent ffff: protocol ip pref 49151 flower chain 0 filter parent ffff: protocol ip pref 49151 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_ip 192.168.0.2 src_ip 192.168.1.2 dst_port 5202 src_port 6202 in_hw in_hw_count 1 action order 1: gact action drop random type none pass val 0 index 2 ref 1 bind 1 filter parent ffff: protocol ip pref 49152 flower chain 0 filter parent ffff: protocol ip pref 49152 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_ip 192.168.0.1 src_ip 192.168.1.1 dst_port 5201 src_port 6201 in_hw in_hw_count 1 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 > tc qdisc delete dev eth0 ingress > tc qdisc del dev eth0 parent root 2&> /dev/null > tc qdisc del dev eth0 parent ffff: 2&> /dev/null > tc qdisc add dev eth0 ingress > tc qdisc add dev eth0 root mqprio num_tc 4 \ map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 1@1 1@2 1@3 hw 0 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 0 hw_tc 3 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc 2 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 2 hw_tc 1 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 3 hw_tc 0 > tc filter show dev eth0 ingress filter parent ffff: protocol 802.1Q pref 49149 flower chain 0 filter parent ffff: protocol 802.1Q pref 49149 flower chain 0 handle 0x1 hw_tc 0 vlan_prio 3 in_hw in_hw_count 1 filter parent ffff: protocol 802.1Q pref 49150 flower chain 0 filter parent ffff: protocol 802.1Q pref 49150 flower chain 0 handle 0x1 hw_tc 1 vlan_prio 2 in_hw in_hw_count 1 filter parent ffff: protocol 802.1Q pref 49151 flower chain 0 filter parent ffff: protocol 802.1Q pref 49151 flower chain 0 handle 0x1 hw_tc 2 vlan_prio 1 in_hw in_hw_count 1 filter parent ffff: protocol 802.1Q pref 49152 flower chain 0 filter parent ffff: protocol 802.1Q pref 49152 flower chain 0 handle 0x1 hw_tc 3 vlan_prio 0 in_hw in_hw_count 1 > tc qdisc delete dev eth0 ingress > ip address flush dev eth0 > ip address add 169.254.1.11/24 dev eth0 > ip link delete dev eth0.vlan1 2> /dev/null > ip link add link eth0 name eth0.vlan1 type vlan id 1 > ip address flush dev eth0.vlan1 2> /dev/null > ip address add 169.254.11.11/24 dev eth0.vlan1 > ip link delete dev eth0.vlan2 2> /dev/null > ip link add link eth0 name eth0.vlan2 type vlan id 2 > ip address flush dev eth0.vlan2 2> /dev/null > ip address add 169.254.12.11/24 dev eth0.vlan2 > ip link delete dev eth0.vlan3 2> /dev/null > ip link add link eth0 name eth0.vlan3 type vlan id 3 > ip address flush dev eth0.vlan3 2> /dev/null > ip address add 169.254.13.11/24 dev eth0.vlan3 > ip link delete dev eth0.vlan4 2> /dev/null > ip link add link eth0 name eth0.vlan4 type vlan id 4 > ip address flush dev eth0.vlan4 2> /dev/null > ip address add 169.254.14.11/24 dev eth0.vlan4 > ip address flush dev eth0 > ip address add 169.254.1.22/24 dev eth0 > ip link delete dev eth0.vlan1 2> /dev/null > ip link add link eth0 name eth0.vlan1 type vlan id 1 > ip address flush dev eth0.vlan1 2> /dev/null > ip address add 169.254.11.22/24 dev eth0.vlan1 > ip link delete dev eth0.vlan2 2> /dev/null > ip link add link eth0 name eth0.vlan2 type vlan id 2 > ip address flush dev eth0.vlan2 2> /dev/null > ip address add 169.254.12.22/24 dev eth0.vlan2 > ip link delete dev eth0.vlan3 2> /dev/null > ip link add link eth0 name eth0.vlan3 type vlan id 3 > ip address flush dev eth0.vlan3 2> /dev/null > ip address add 169.254.13.22/24 dev eth0.vlan3 > ip link delete dev eth0.vlan4 2> /dev/null > ip link add link eth0 name eth0.vlan4 type vlan id 4 > ip address flush dev eth0.vlan4 2> /dev/null > ip address add 169.254.14.22/24 dev eth0.vlan4 > mkdir -p /sys/fs/cgroup/net_prio/grp0 > echo eth0 0 > /sys/fs/cgroup/net_prio/grp0/net_prio.ifpriomap > echo eth0.vlan1 0 > /sys/fs/cgroup/net_prio/grp0/net_prio.ifpriomap > mkdir -p /sys/fs/cgroup/net_prio/grp1 > echo eth0 0 > /sys/fs/cgroup/net_prio/grp1/net_prio.ifpriomap > echo eth0.vlan2 1 > /sys/fs/cgroup/net_prio/grp1/net_prio.ifpriomap > mkdir -p /sys/fs/cgroup/net_prio/grp2 > echo eth0 0 > /sys/fs/cgroup/net_prio/grp2/net_prio.ifpriomap > echo eth0.vlan3 2 > /sys/fs/cgroup/net_prio/grp2/net_prio.ifpriomap > mkdir -p /sys/fs/cgroup/net_prio/grp3 > echo eth0 0 > /sys/fs/cgroup/net_prio/grp3/net_prio.ifpriomap > echo eth0.vlan4 3 > /sys/fs/cgroup/net_prio/grp3/net_prio.ifpriomap > tc qdisc del dev eth0 parent root 2&> /dev/null > tc qdisc del dev eth0 parent ffff: 2&> /dev/null > tc qdisc add dev eth0 ingress > tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 1@1 1@2 1@3 hw 0 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 0 hw_tc 0 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 1 hw_tc 1 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 2 hw_tc 2 > tc filter add dev eth0 parent ffff: protocol 802.1Q flower vlan_prio 3 hw_tc 3 > ip link set eth0.vlan1 type vlan egress-qos-map 0:0 > ip link set eth0.vlan2 type vlan egress-qos-map 1:1 > ip link set eth0.vlan3 type vlan egress-qos-map 2:2 > ip link set eth0.vlan4 type vlan egress-qos-map 3:3 > tc filter show dev eth0 ingress filter parent ffff: protocol 802.1Q pref 49149 flower chain 0 filter parent ffff: protocol 802.1Q pref 49149 flower chain 0 handle 0x1 hw_tc 3 vlan_prio 3 in_hw in_hw_count 1 filter parent ffff: protocol 802.1Q pref 49150 flower chain 0 filter parent ffff: protocol 802.1Q pref 49150 flower chain 0 handle 0x1 hw_tc 2 vlan_prio 2 in_hw in_hw_count 1 filter parent ffff: protocol 802.1Q pref 49151 flower chain 0 filter parent ffff: protocol 802.1Q pref 49151 flower chain 0 handle 0x1 hw_tc 1 vlan_prio 1 in_hw in_hw_count 1 filter parent ffff: protocol 802.1Q pref 49152 flower chain 0 filter parent ffff: protocol 802.1Q pref 49152 flower chain 0 handle 0x1 hw_tc 0 vlan_prio 0 in_hw in_hw_count 1 > echo 1 > /proc/irq/131/smp_affinity > echo 1 > /proc/irq/132/smp_affinity > echo 4 > /proc/irq/133/smp_affinity > echo 4 > /proc/irq/134/smp_affinity > echo 4 > /proc/irq/135/smp_affinity > echo 4 > /proc/irq/136/smp_affinity > echo 2 > /proc/irq/137/smp_affinity > echo 2 > /proc/irq/138/smp_affinity > ping -i 0.001 169.254.11.22 2&> /dev/null & > PID1="$!" > echo $PID1 > /sys/fs/cgroup/net_prio/grp0/cgroup.procs > ping -i 0.001 169.254.12.22 2&> /dev/null & > PID2="$!" > echo $PID2 > /sys/fs/cgroup/net_prio/grp1/cgroup.procs > ping -i 0.001 169.254.13.22 2&> /dev/null & > PID3="$!" > echo $PID3 > /sys/fs/cgroup/net_prio/grp2/cgroup.procs > ping -i 0.001 169.254.14.22 2&> /dev/null & > PID4="$!" > echo $PID4 > /sys/fs/cgroup/net_prio/grp3/cgroup.procs > ping -i 0.001 169.254.11.11 2&> /dev/null & > PID1="$!" > echo $PID1 > /sys/fs/cgroup/net_prio/grp0/cgroup.procs > ping -i 0.001 169.254.12.11 2&> /dev/null & > PID2="$!" > echo $PID2 > /sys/fs/cgroup/net_prio/grp1/cgroup.procs > ping -i 0.001 169.254.13.11 2&> /dev/null & > PID3="$!" > echo $PID3 > /sys/fs/cgroup/net_prio/grp2/cgroup.procs > ping -i 0.001 169.254.14.11 2&> /dev/null & > PID4="$!" > echo $PID4 > /sys/fs/cgroup/net_prio/grp3/cgroup.procs > watch -n 0.5 -d "cat /proc/interrupts | grep eth0" 131: 251918 41 0 0 IR-PCI-MSI 477184-edge eth0:rx-0 132: 18969 1 0 0 IR-PCI-MSI 477185-edge eth0:tx-0 133: 0 0 295872 0 IR-PCI-MSI 477186-edge eth0:rx-1 134: 0 0 16136 0 IR-PCI-MSI 477187-edge eth0:tx-1 135: 0 0 288042 0 IR-PCI-MSI 477188-edge eth0:rx-2 136: 0 0 16135 0 IR-PCI-MSI 477189-edge eth0:tx-2 137: 0 211177 0 0 IR-PCI-MSI 477190-edge eth0:rx-3 138: 2 16144 0 0 IR-PCI-MSI 477191-edge eth0:tx-3 139: 0 0 0 0 IR-PCI-MSI 477192-edge eth0:rx-4 140: 0 0 0 0 IR-PCI-MSI 477193-edge eth0:tx-4 141: 0 0 0 0 IR-PCI-MSI 477194-edge eth0:rx-5 142: 0 0 0 0 IR-PCI-MSI 477195-edge eth0:tx-5 143: 0 0 0 0 IR-PCI-MSI 477196-edge eth0:rx-6 144: 0 0 0 0 IR-PCI-MSI 477197-edge eth0:tx-6 145: 0 0 0 0 IR-PCI-MSI 477198-edge eth0:rx-7 146: 0 0 0 0 IR-PCI-MSI 477199-edge eth0:tx-7 157: 0 0 0 0 IR-PCI-MSI 477210-edge eth0:safety-ue ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: stmmac: add RX frame steering based on VLAN priority in tc flowerOng Boon Leong
We extend tc flower to support configuration of VLAN priority-based RX frame steering hardware offloading. To map VLAN <PCP> to Traffic Class <TC>: $ tc filter add dev <IFNAME> parent ffff: protocol 802.1Q flower \ vlan_prio <PCP> hw_tc <TC> Note: <TC> < N whereby "tc qdisc ... num_tc N ..." To delete all tc flower configurations: $ tc qdisc delete dev <IFNAME> ingress Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18net: stmmac: restructure tc implementation for RX VLAN Priority steeringOng Boon Leong
The current tc_add_flow() and tc_del_flow() use hardware L3 & L4 filters as offloading. The number of L3/L4 filters is read from L3L4FNUM field from MAC_HW_Feature1 register and is used to alloc priv->tc_entries[]. For RX frame steering based on VLAN priority offloading, we use MAC_RXQ_CTRL2 & MAC_RXQ_CTRL3 registers and all VLAN priority level can be configured independent from L3 & L4 filters. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18Merge branch 'octeontx2-fixes'David S. Miller
Hariprasad Kelam says: ==================== octeontx2: miscellaneous fixes This series of patches fixes various issues related to NPC MCAM entry management, debugfs, devlink, CGX LMAC mapping, RSS config etc Change-log: v2: Fixed below review comments - corrected Fixed tag syntax with 12 digits SHA1 and providing space between SHA1 and subject line - remove code improvement patch - make commit description more clear ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-af: Fix uninitialized variable warningSubbaraya Sundeep
Initialize l4_key_offset variable to fix uninitialized variable compiler warning. Fixes: b9b7421a01d8 ("octeontx2-af: Support ESP/AH RSS hashing") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-af: fix infinite loop in unmapping NPC counterHariprasad Kelam
unmapping npc counter works in a way by traversing all mcam entries to find which mcam rule is associated with counter. But loop cursor variable 'entry' is not incremented before checking next mcam entry which resulting in infinite loop. This in turn hogs the kworker thread forever and no other mbox message is processed by AF driver after that. Fix this by updating entry value before checking next mcam entry. Fixes: a958dd59f9ce ("octeontx2-af: Map or unmap NPC MCAM entry and counter") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-pf: Clear RSS enable flag on interace downGeetha sowjanya
RSS configuration can not be get/set when interface is in down state as they required mbox communication. RSS enable flag status is used for set/get configuration. Current code do not clear the RSS enable flag on interface down which lead to mbox error while trying to set/get RSS configuration. Fixes: 85069e95e531 ("octeontx2-pf: Receive side scaling support") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-af: Fix irq free in rvu teardownGeetha sowjanya
Current devlink code try to free already freed irqs as the irq_allocate flag is not cleared after free leading to kernel crash while removing rvu driver. The patch fixes the irq free sequence and clears the irq_allocate flag on free. Fixes: 7304ac4567bc ("octeontx2-af: Add mailbox IRQ and msg handlers") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-af: Return correct CGX RX fifo sizeSubbaraya Sundeep
CGX receive buffer size is a constant value and cannot be read from CGX0 block always since CGX0 may not enabled everytime. Hence return CGX receive buffer size from first enabled CGX block instead of CGX0. Fixes: 6e54e1c5399a ("octeontx2-af: cn10K: MTU configuration") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-af: Remove TOS field from MKEX TXSubbaraya Sundeep
The MKEX profile describes what packet fields need to be extracted from the input packet and how to place those packet fields in the output key for MCAM matching. The MKEX profile can be in a way where higher layer packet fields can overwrite lower layer packet fields in output MCAM Key. Hence MKEX profile is always ensured that there are no overlaps between any of the layers. But the commit 42006910b5ea ("octeontx2-af: cleanup KPU config data") introduced TX TOS field which overlaps with DMAC in MCAM key. This led to AF driver returning error when TX rule is installed with DMAC as match criteria since DMAC gets overwritten and cannot be supported. This patch fixes the issue by removing TOS field from MKEX TX profile. Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-af: Formatting debugfs entry rsrc_alloc.Rakesh Babu
With the existing rsrc_alloc's format, there is misalignment for the pcifunc entries whose VF's index is a double digit. This patch fixes this. pcifunc NPA NIX0 NIX1 SSO GROUP SSOWS TIM CPT0 CPT1 REE0 REE1 PF0:VF0 8 5 PF0:VF1 9 3 PF0:VF10 18 10 PF0:VF11 19 8 PF0:VF12 20 11 PF0:VF13 21 9 PF0:VF14 22 12 PF0:VF15 23 10 PF1 0 0 Fixes: 23205e6d06d4 ("octeontx2-af: Dump current resource provisioning status") Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-pf: Do not modify number of rulesSubbaraya Sundeep
In the ETHTOOL_GRXCLSRLALL ioctl ethtool uses below structure to read number of rules from the driver. struct ethtool_rxnfc { __u32 cmd; __u32 flow_type; __u64 data; struct ethtool_rx_flow_spec fs; union { __u32 rule_cnt; __u32 rss_context; }; __u32 rule_locs[0]; }; Driver must not modify rule_cnt member. But currently driver modifies it by modifying rss_context. Hence fix it by using a local variable. Fixes: 81a4362016e7 ("octeontx2-pf: Add RSS multi group support") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18Merge tag 'for-5.12-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "There are still regressions being found and fixed in the zoned mode and subpage code, the rest are fixes for bugs reported by users. Regressions: - subpage block support: - readahead works on the proper block size - fix last page zeroing - zoned mode: - linked list corruption for tree log Fixes: - qgroup leak after falloc failure - tree mod log and backref resolving: - extent buffer cloning race when resolving backrefs - pin deleted leaves with active tree mod log users - drop debugging flag from slab cache" * tag 'for-5.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: always pin deleted leaves when there are active tree mod log users btrfs: fix race when cloning extent buffer during rewind of an old root btrfs: fix slab cache flags for free space tree bitmap btrfs: subpage: make readahead work properly btrfs: subpage: fix wild pointer access during metadata read failure btrfs: zoned: fix linked list corruption after log root tree allocation failure btrfs: fix qgroup data rsv leak caused by falloc failure btrfs: track qgroup released data in own variable in insert_prealloc_file_extent btrfs: fix wrong offset to zero out range beyond i_size
2021-03-18Merge tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fixes from Alex Williamson: - Fix 32-bit issue with new unmap-all flag (Steve Sistare) - Various Kconfig changes for better coverage (Jason Gunthorpe) - Fix to batch pinning support (Daniel Jordan) * tag 'vfio-v5.12-rc4' of git://github.com/awilliam/linux-vfio: vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external() vfio: Depend on MMU ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM vfio: IOMMU_API should be selected vfio/type1: fix unmap all on ILP32
2021-03-18Merge tag 'xfs-5.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "A couple of minor corrections for the new idmapping functionality, and a fix for a theoretical hang that could occur if we decide to abort a mount after dirtying the quota inodes. Summary: - Fix quota accounting on creat() when id mapping is enabled - Actually reclaim dirty quota inodes when mount fails - Typo fixes for documentation - Restrict both bulkstat calls on idmapped/namespaced mounts" * tag 'xfs-5.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: also reject BULKSTAT_SINGLE in a mount user namespace docs: ABI: Fix the spelling oustanding to outstanding in the file sysfs-fs-xfs xfs: force log and push AIL to clear pinned inodes when aborting mount xfs: fix quota accounting when a mount is idmapped
2021-03-18Merge branch 'octeon-tc-offloads'David S. Miller
Naveen Mamindlapalli says: ==================== Add tc hardware offloads This patch series adds support for tc hardware offloads. Patch #1 adds support for offloading flows that matches IP tos and IP protocol which will be used by tc hw offload support. Also added ethtool n-tuple filter to code to offload the flows matching the above fields. Patch #2 adds tc flower hardware offload support on ingress traffic. Patch #3 adds TC flower offload stats. Patch #4 adds tc TC_MATCHALL egress ratelimiting offload. * tc flower hardware offload in PF driver The driver parses the flow match fields and actions received from the tc subsystem and adds/delete MCAM rules for the same. Each flow contains set of match and action fields. If the action or fields are not supported, the rule cannot be offloaded to hardware. The tc uses same set of MCAM rules allocated for ethtool n-tuple filters. So, at a time only one entity can offload the flows to hardware, they're made mutually exclusive in the driver. Following match and actions are supported. Match: Eth dst_mac, EtherType, 802.1Q {vlan_id,vlan_prio}, vlan EtherType, IP proto {tcp,udp,sctp,icmp,icmp6}, IPv4 tos, IPv4{dst_ip,src_ip}, L4 proto {dst_port|src_port number}. Actions: drop, accept, vlan pop, redirect to another port on the device. The Hardware stats are also supported. Currently only packet counter stats are updated. * tc egress rate limiting support Added TC-MATCHALL classifier offload with police action applied for all egress traffic on the specified interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18octeontx2-pf: TC_MATCHALL egress ratelimiting offloadSunil Goutham
Add TC_MATCHALL egress ratelimiting offload support with POLICE action for entire traffic going out of the interface. Eg: To ratelimit egress traffic to 100Mbps $ ethtool -K eth0 hw-tc-offload on $ tc qdisc add dev eth0 clsact $ tc filter add dev eth0 egress matchall skip_sw \ action police rate 100Mbit burst 16Kbit HW supports a max burst size of ~128KB. Only one ratelimiting filter can be installed at a time. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>