summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-25ARM: dts: marvell: Fix some common switch mistakesLinus Walleij
Fix some errors in the Marvell MV88E6xxx switch descriptions: - The top node had no address size or cells. - switch0@0 is not OK, should be ethernet-switch@0. - The ports node should be named ethernet-ports - The ethernet-ports node should have port@0 etc children, no plural "ports" in the children. - Ports should be named ethernet-port@0 etc - PHYs should be named ethernet-phy@0 etc This serves as an example of fixes needed for introducing a schema for the bindings, but the patch can simply be applied. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25dt-bindings: net: mvusb: Fix up DSA exampleLinus Walleij
When adding a proper schema for the Marvell mx88e6xxx switch, the scripts start complaining about this embedded example: dtschema/dtc warnings/errors: net/marvell,mvusb.example.dtb: switch@0: ports: '#address-cells' is a required property from schema $id: http://devicetree.org/schemas/net/dsa/marvell,mv88e6xxx.yaml# net/marvell,mvusb.example.dtb: switch@0: ports: '#size-cells' is a required property from schema $id: http://devicetree.org/schemas/net/dsa/marvell,mv88e6xxx.yaml# Fix this up by extending the example with those properties in the ports node. While we are at it, rename "ports" to "ethernet-ports" and rename "switch" to "ethernet-switch" as this is recommended practice. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25dt-bindings: net: dsa: Require ports or ethernet-portsLinus Walleij
Bindings using dsa.yaml#/$defs/ethernet-ports specify that a DSA switch node need to have a ports or ethernet-ports subnode, and that is actually required, so add requirements using oneOf. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25sched: act_ct: switch to per-action label countingFlorian Westphal
net->ct.labels_used was meant to convey 'number of ip/nftables rules that need the label extension allocated'. act_ct enables this for each net namespace, which voids all attempts to avoid ct->ext allocation when possible. Move this increment to the control plane to request label extension space allocation only when its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25wifi: rtw89: cleanup firmware elements parsingDmitry Antipov
When compiling with clang-18, I've noticed the following: drivers/net/wireless/realtek/rtw89/fw.c:389:28: warning: cast to smaller integer type 'enum rtw89_fw_type' from 'const void *' [-Wvoid-pointer-to-enum-cast] 389 | enum rtw89_fw_type type = (enum rtw89_fw_type)data; | ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/wireless/realtek/rtw89/fw.c:569:13: warning: cast to smaller integer type 'enum rtw89_rf_path' from 'const void *' [-Wvoid-pointer-to-enum-cast] 569 | rf_path = (enum rtw89_rf_path)data; | ^~~~~~~~~~~~~~~~~~~~~~~~ So avoid brutal everything-to-const-void-and-back casts, introduce 'union rtw89_fw_element_arg' to pass parameters to element handler callbacks, and adjust all of the related bits accordingly. Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231020040940.33154-1-dmantipov@yandex.ru
2023-10-25wifi: rt2x00: rework MT7620 PA/LNA RF calibrationShiji Yang
1. Move MT7620 PA/LNA calibration code to dedicated functions. 2. For external PA/LNA devices, restore RF and BBP registers before R-Calibration. 3. Do Rx DCOC calibration again before RXIQ calibration. 4. Add some missing LNA related registers' initialization. Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/TYAP286MB0315979F92DC563019B8F238BCD4A@TYAP286MB0315.JPNP286.PROD.OUTLOOK.COM
2023-10-25wifi: rt2x00: rework MT7620 channel config functionShiji Yang
1. Move the channel configuration code from rt2800_vco_calibration() to the rt2800_config_channel(). 2. Use MT7620 SoC specific AGC initial LNA value instead of the RT5592's value. 3. BBP{195,196} pairing write has been replaced with rt2800_bbp_glrt_write() to reduce redundant code. Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/TYAP286MB0315622A4340BFFA530B1B86BCD4A@TYAP286MB0315.JPNP286.PROD.OUTLOOK.COM
2023-10-25wifi: rt2x00: improve MT7620 register initializationShiji Yang
1. Do not hard reset the BBP. We can use soft reset instead. This change has some help to the calibration failure issue. 2. Enable falling back to legacy rate from the HT/RTS rate by setting the HT_FBK_TO_LEGACY register. 3. Implement MCS rate specific maximum PSDU size. It can improve the transmission quality under the low RSSI condition. 4. Set BBP_84 register value to 0x19. This is used for extension channel overlapping IOT. Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/TYAP286MB031553CCD4B7A3B89C85935DBCD4A@TYAP286MB0315.JPNP286.PROD.OUTLOOK.COM
2023-10-25net: hns3: add some link modes for hisilicon deviceHao Chen
Add HCLGE_SUPPORT_50G_R1_BIT and HCLGE_SUPPORT_100G_R2_BIT two capability bits and Corresponding link modes. Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25Merge branch 'dsa-microchip-WoL-support'David S. Miller
2023-10-25net: dsa: microchip: ksz9477: add Wake on LAN supportOleksij Rempel
Add WoL support for KSZ9477 family of switches. This code was tested on KSZ8563 chip. KSZ9477 family of switches supports multiple PHY events: - wake on Link Up - wake on Energy Detect. Since current UAPI can't differentiate between this PHY events, map all of them to WAKE_PHY. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25net: dsa: microchip: use wakeup-source DT property to enable PME outputOleksij Rempel
KSZ switches with WoL support signals wake event over PME pin. If this pin is attached to some external PMIC or System Controller can't be described as GPIO, the only way to describe it in the devicetree is to use wakeup-source property. So, add support for this property and enable PME switch output if this property is present. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25dt-bindings: net: dsa: microchip: add wakeup-source propertyOleksij Rempel
Add wakeup-source property to enable Wake on Lan functionality in the switch. Since PME wake pin is not always attached to the SoC, use wakeup-source instead of wakeup-gpios Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-25net: dsa: microchip: Add missing MAC address register offset for ksz8863Oleksij Rempel
Add the missing offset for the global MAC address register (REG_SW_MAC_ADDR) for the ksz8863 family of switches. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-24s390/qeth: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect new_entry->dbf_name to be NUL-terminated based on its use with strcmp(): | if (strcmp(entry->dbf_name, name) == 0) { Moreover, NUL-padding is not required as new_entry is kzalloc'd just before this assignment: | new_entry = kzalloc(sizeof(struct qeth_dbf_entry), GFP_KERNEL); ... rendering any future NUL-byte assignments (like the ones strncpy() does) redundant. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com> Tested-by: Thorsten Winkler <twinkler@linux.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231023-strncpy-drivers-s390-net-qeth_core_main-c-v1-1-e7ce65454446@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24s390/ctcm: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect chid to be NUL-terminated based on its use with format strings: CTCM_DBF_TEXT_(SETUP, CTC_DBF_INFO, "%s(%s) %s", CTCM_FUNTAIL, chid, ok ? "OK" : "failed"); Moreover, NUL-padding is not required as it is _only_ used in this one instance with a format string. Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. We can also drop the +1 from chid's declaration as we no longer need to be cautious about leaving a spot for a NUL-byte. Let's use the more idiomatic strscpy usage of (dest, src, sizeof(dest)) as this more closely ties the destination buffer to the length. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com> Tested-by: Thorsten Winkler <twinkler@linux.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231023-strncpy-drivers-s390-net-ctcm_main-c-v1-1-265db6e78165@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: ethernet: mtk_wed: remove wo pointer in wo_r32/wo_w32 signatureLorenzo Bianconi
wo pointer is no longer used in wo_r32 and wo_w32 routines so get rid of it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/530537db0872f7523deff21f0a5dfdd9b75fdc9d.1698098459.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: ethernet: mtk_wed: fix firmware loading for MT7986 SoCLorenzo Bianconi
The WED mcu firmware does not contain all the memory regions defined in the dts reserved_memory node (e.g. MT7986 WED firmware does not contain cpu-boot region). Reverse the mtk_wed_mcu_run_firmware() logic to check all the fw sections are defined in the dts reserved_memory node. Fixes: c6d961aeaa77 ("net: ethernet: mtk_wed: move mem_region array out of mtk_wed_mcu_load_firmware") Tested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/d983cbfe8ea562fef9264de8f0c501f7d5705bd5.1698098381.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITRIvan Vecera
The I40E_TXR_FLAGS_WB_ON_ITR is i40e_ring flag and not i40e_pf one. Fixes: 8e0764b4d6be42 ("i40e/i40evf: Add support for writeback on ITR feature for X722") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231023212714.178032-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Merge branch ↵Jakub Kicinski
'net-ethernet-renesas-infrastructure-preparations-for-upcoming-driver' Wolfram Sang says: ==================== net: ethernet: renesas: infrastructure preparations for upcoming driver Before we upstream a new driver, Niklas and I thought that a few cleanups for Kconfig/Makefile will help readability and maintainability. Here they are, looking forward to comments. ==================== Link: https://lore.kernel.org/r/20231022205316.3209-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: ethernet: renesas: drop SoC names in KconfigWolfram Sang
Mentioning SoCs in Kconfig descriptions tends to get stale (e.g. RAVB is missing RZV2M) or imprecise (e.g. SH_ETH is not available on all R8A779x). Drop them instead of providing vague information. Improve the file description a tad while here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20231022205316.3209-3-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: ethernet: renesas: group entries in MakefileWolfram Sang
A new Renesas driver shall be added soon. Prepare the Makefile by grouping the specific objects to the Kconfig symbol for better readability. Improve the file description a tad while here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20231022205316.3209-2-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Merge branch 'Add bpf programmable net device'Martin KaFai Lau
Daniel Borkmann says: ==================== This work adds a BPF programmable device which can operate in L3 or L2 mode where the BPF program is part of the xmit routine. It's program management is done via bpf_mprog and it comes with BPF link support. For details see patch 1 and following. Thanks! v3 -> v4: - Moved netkit_release_all() into ndo_uninit (Stan) - Two small commit msg corrections (Toke) - Added Acked/Reviewed-by v2 -> v3: - Remove setting dev->min_mtu to ETH_MIN_MTU (Andrew) - Do not populate ethtool info->version (Andrew) - Populate netdev private data before register_netdevice (Andrew) - Use strscpy for ifname template (Jakub) - Use GFP_KERNEL_ACCOUNT for link kzalloc (Jakub) - Carry and dump link attach type for bpftool (Toke) v1 -> v2: - Rename from meta (Toke, Andrii, Alexei) - Reuse skb_scrub_packet (Stan) - Remove IFF_META and use netdev_ops (Toke) - Add comment to multicast handler (Toke) - Remove silly version info (Toke) - Fix attach_type_name (Quentin) - Rework libbpf link attach api to be similar as tcx (Andrii) - Move flags last for bpf_netkit_opts (Andrii) - Rebased to bpf_mprog query api changes - Folded link support patch into main one ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24selftests/bpf: Add selftests for netkitDaniel Borkmann
Add a bigger batch of test coverage to assert correct operation of netkit devices and their BPF program management: # ./test_progs -t tc_netkit [...] [ 1.166267] bpf_testmod: loading out-of-tree module taints kernel. [ 1.166831] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.270957] tsc: Refined TSC clocksource calibration: 3407.988 MHz [ 1.272579] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns [ 1.275336] clocksource: Switched to clocksource tsc #257 tc_netkit_basic:OK #258 tc_netkit_device:OK #259 tc_netkit_multi_links:OK #260 tc_netkit_multi_opts:OK #261 tc_netkit_neigh_links:OK Summary: 5/0 PASSED, 0 SKIPPED, 0 FAILED [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-8-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24selftests/bpf: Add netlink helper libraryDaniel Borkmann
Add a minimal netlink helper library for the BPF selftests. This has been taken and cut down and cleaned up from iproute2. This covers basics such as netdevice creation which we need for BPF selftests / BPF CI given iproute2 package cannot cover it yet. Stanislav Fomichev suggested that this could be replaced in future by ynl tool generated C code once it has RTNL support to create devices. Once we get to this point the BPF CI would also need to add libmnl. If no further extensions are needed, a second option could be that we remove this code again once iproute2 package has support. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-7-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24bpftool: Extend net dump with netkit progsDaniel Borkmann
Add support to dump BPF programs on netkit via bpftool. This includes both the BPF link and attach ops programs. Dumped information contain the attach location, function entry name, program ID and link ID when applicable. Example with tc BPF link: # ./bpftool net xdp: tc: nk1(22) netkit/peer tc1 prog_id 43 link_id 12 [...] Example with json dump: # ./bpftool net --json | jq [ { "xdp": [], "tc": [ { "devname": "nk1", "ifindex": 18, "kind": "netkit/primary", "name": "tc1", "prog_id": 29, "prog_flags": [], "link_id": 8, "link_flags": [] } ], "flow_dissector": [], "netfilter": [] } ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-6-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24bpftool: Implement link show support for netkitDaniel Borkmann
Add support to dump netkit link information to bpftool in similar way as we have for XDP. The netkit link info only exposes the ifindex and the attach_type. Below shows an example link dump output, and a cgroup link is included for comparison, too: # bpftool link [...] 10: cgroup prog 2466 cgroup_id 1 attach_type cgroup_inet6_post_bind [...] 8: netkit prog 35 ifindex nk1(18) attach_type netkit_primary [...] Equivalent json output: # bpftool link --json [...] { "id": 10, "type": "cgroup", "prog_id": 2466, "cgroup_id": 1, "attach_type": "cgroup_inet6_post_bind" }, [...] { "id": 12, "type": "netkit", "prog_id": 61, "devname": "nk1", "ifindex": 21, "attach_type": "netkit_primary" } [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-5-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24libbpf: Add link-based API for netkitDaniel Borkmann
This adds bpf_program__attach_netkit() API to libbpf. Overall it is very similar to tcx. The API looks as following: LIBBPF_API struct bpf_link * bpf_program__attach_netkit(const struct bpf_program *prog, int ifindex, const struct bpf_netkit_opts *opts); The struct bpf_netkit_opts is done in similar way as struct bpf_tcx_opts for supporting bpf_mprog control parameters. The attach location for the primary and peer device is derived from the program section "netkit/primary" and "netkit/peer", respectively. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24tools: Sync if_link uapi headerDaniel Borkmann
Sync if_link uapi header to the latest version as we need the refresher in tooling for netkit device. Given it's been a while since the last sync and the diff is fairly big, it has been done as its own commit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-3-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24netkit, bpf: Add bpf programmable net deviceDaniel Borkmann
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides). In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24selftests: net: change ifconfig with ip commandSwarup Laxman Kotiaklapudi
Change ifconfig with ip command, on a system where ifconfig is not used this script will not work correcly. Test result with this patchset: sudo make TARGETS="net" kselftest .... TAP version 13 1..1 timeout set to 1500 selftests: net: route_localnet.sh run arp_announce test net.ipv4.conf.veth0.route_localnet = 1 net.ipv4.conf.veth1.route_localnet = 1 net.ipv4.conf.veth0.arp_announce = 2 net.ipv4.conf.veth1.arp_announce = 2 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84) bytes of data. 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.038 ms 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.068 ms 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.068 ms 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.068 ms 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.068 ms --- 127.25.3.14 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4073ms rtt min/avg/max/mdev = 0.038/0.062/0.068/0.012 ms ok run arp_ignore test net.ipv4.conf.veth0.route_localnet = 1 net.ipv4.conf.veth1.route_localnet = 1 net.ipv4.conf.veth0.arp_ignore = 3 net.ipv4.conf.veth1.arp_ignore = 3 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84) bytes of data. 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.032 ms 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.065 ms 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.066 ms 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.065 ms 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.065 ms --- 127.25.3.14 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4092ms rtt min/avg/max/mdev = 0.032/0.058/0.066/0.013 ms ok ok 1 selftests: net: route_localnet.sh ... Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231023123422.2895-1-swarupkotikalapudi@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Merge tag 'wireless-2023-10-24' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Three more fixes: - don't drop all unprotected public action frames since some don't have a protected dual - fix pointer confusion in scanning code - fix warning in some connections with multiple links * tag 'wireless-2023-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() ==================== Link: https://lore.kernel.org/r/20231024103540.19198-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Merge branch 'switch-dsa-to-inclusive-terminology'Jakub Kicinski
Florian Fainelli says: ==================== Switch DSA to inclusive terminology One of the action items following Netconf'23 is to switch subsystems to use inclusive terminology. DSA has been making extensive use of the "master" and "slave" words which are now replaced by "conduit" and "user" respectively. ==================== Link: https://lore.kernel.org/r/20231023181729.1191071-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUITFlorian Fainelli
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Use conduit and user termsFlorian Fainelli
Use more inclusive terms throughout the DSA subsystem by moving away from "master" which is replaced by "conduit" and "slave" which is replaced by "user". No functional changes. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24tsnep: Fix tsnep_request_irq() format-overflow warningGerhard Engleder
Compiler warns about a possible format-overflow in tsnep_request_irq(): drivers/net/ethernet/engleder/tsnep_main.c:884:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] sprintf(queue->name, "%s-rx-%d", name, ^ drivers/net/ethernet/engleder/tsnep_main.c:881:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] sprintf(queue->name, "%s-tx-%d", name, ^ drivers/net/ethernet/engleder/tsnep_main.c:878:49: warning: '-txrx-' directive writing 6 bytes into a region of size between 5 and 25 [-Wformat-overflow=] sprintf(queue->name, "%s-txrx-%d", name, ^~~~~~ Actually overflow cannot happen. Name is limited to IFNAMSIZ, because netdev_name() is called during ndo_open(). queue_index is single char, because less than 10 queues are supported. Fix warning with snprintf(). Additionally increase buffer to 32 bytes, because those 7 additional bytes were unused anyway. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310182028.vmDthIUa-lkp@intel.com/ Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231023183856.58373-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Merge branch 'net-deduplicate-netdev-name-allocation'Jakub Kicinski
Jakub Kicinski says: ==================== net: deduplicate netdev name allocation After recent fixes we have even more duplicated code in netdev name allocation helpers. There are two complications in this code. First, __dev_alloc_name() clobbers its output arg even if allocation fails, forcing callers to do extra copies. Second as our experience in commit 55a5ec9b7710 ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and commit 029b6d140550 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") taught us, user space is very sensitive to the exact error codes. Align the callers of __dev_alloc_name(), and remove some of its complexity. v1: https://lore.kernel.org/all/20231020011856.3244410-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20231023152346.3639749-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: remove else after return in dev_prep_valid_name()Jakub Kicinski
Remove unnecessary else clauses after return. I copied this if / else construct from somewhere, it makes the code harder to read. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: remove dev_valid_name() check from __dev_alloc_name()Jakub Kicinski
__dev_alloc_name() is only called by dev_prep_valid_name(), which already checks that name is valid. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: trust the bitmap in __dev_alloc_name()Jakub Kicinski
Prior to restructuring __dev_alloc_name() handled both printf and non-printf names. In a clever attempt at code reuse it always prints the name into a buffer and checks if it's a duplicate. Trust the bitmap, and return an error if its full. This shrinks the possible ID space by one from 32K to 32K - 1, as previously the max value would have been tried as a valid ID. It seems very unlikely that anyone would care as we heard no requests to increase the max beyond 32k. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: reduce indentation of __dev_alloc_name()Jakub Kicinski
All callers of __dev_valid_name() go thru dev_prep_valid_name() which handles the non-printf case. Focus __dev_alloc_name() on the sprintf case, remove the indentation level. Minor functional change of returning -EINVAL if % is not found, which should now never happen. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: make dev_alloc_name() call dev_prep_valid_name()Jakub Kicinski
__dev_alloc_name() handles both the sprintf and non-sprintf target names. This complicates the code. dev_prep_valid_name() already handles the non-sprintf case, before calling __dev_alloc_name(), make the only other caller also go thru dev_prep_valid_name(). This way we can drop the non-sprintf handling in __dev_alloc_name() in one of the next changes. commit 55a5ec9b7710 ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and commit 029b6d140550 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"") tell us that we can't start returning -EEXIST from dev_alloc_name() on name duplicates. Bite the bullet and pass the expected errno to dev_prep_valid_name(). dev_prep_valid_name() must now propagate out the allocated id for printf names. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: don't use input buffer of __dev_alloc_name() as a scratch spaceJakub Kicinski
Callers of __dev_alloc_name() want to pass dev->name as the output buffer. Make __dev_alloc_name() not clobber that buffer on failure, and remove the workarounds in callers. dev_alloc_name_ns() is now completely unnecessary. The extra strscpy() added here will be gone by the end of the patch series. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231023152346.3639749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Merge branch 'mptcp-convert-netlink-code-to-use-yaml-spec'Jakub Kicinski
Mat Martineau says: ==================== mptcp: convert Netlink code to use YAML spec This series from Davide converts most of the MPTCP Netlink interface (plus uAPI bits) to use sources generated by YNL using a YAML spec file. This new YAML file is useful to validate the API and to generate a good documentation page. Patch 1 modifies YNL spec to support "uns-admin-perm" for genetlink legacy. Patch 2 adds support for validating exact length of netlink attrs. Patch 3 converts Netlink structures from small_ops to ops to prepare the switch to YAML. Patch 4 adds the Netlink YAML spec for MPTCP. Patch 5 adds and uses a new header file generated from the new YAML spec. Patch 6 renames some handlers to match the ones generated from the YAML spec. Patch 7 adds and uses Netlink policies automatically generated from the YAML spec. ==================== Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-0-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: mptcp: use policy generated by YAML specDavide Caratti
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/mptcp.yaml --source \ > -o net/mptcp/mptcp_pm_gen.c $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/mptcp.yaml --header \ > -o net/mptcp/mptcp_pm_gen.h Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-7-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: mptcp: rename netlink handlers to mptcp_pm_nl_<blah>_{doit,dumpit}Davide Caratti
so that they will match names generated from YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-6-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24uapi: mptcp: use header file generated from YAML specDavide Caratti
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \ > --spec Documentation/netlink/specs/mptcp.yaml \ > --header -o include/uapi/linux/mptcp_pm.h Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-5-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24Documentation: netlink: add a YAML spec for mptcpDavide Caratti
it describes most of the current netlink interface (uAPI definitions, doit/dumpit operations and attributes) Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-4-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: mptcp: convert netlink from small_ops to opsDavide Caratti
in the current MPTCP control plane, all operations use a netlink attribute of the same type "MPTCP_PM_ATTR". However, add/del/get/flush operations only parse the first element in the message _ the one that describes MPTCP endpoints (that was named MPTCP_PM_ATTR_ADDR and mostly used in ADD_ADDR operations _ probably the similarity of "attr", "addr" and "add" might cause some confusion to human readers). Convert MPTCP from 'small_ops' to 'ops', thus allowing different attributes for each single operation, hopefully makes all this clearer to human readers. - use a separate attribute set for add/del/get/flush address operation, binary compatible with the existing one, to store the endpoint address. MPTCP_PM_ENDPOINT_ADDR is added to the uAPI (with the same value as MPTCP_PM_ATTR_ADDR) for these operations. - convert mptcp_pm_ops[] and add policy files accordingly. this prepares MPTCP control plane to be described as YAML spec. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-3-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24tools: ynl-gen: add support for exact-len validationDavide Caratti
add support for 'exact-len' validation on netlink attributes. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340 Acked-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-2-16b1f701f900@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>