summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-01-07skbuff: Add skb parameter to the ubuf zerocopy callbackJonathan Lemon
Add an optional skb parameter to the zerocopy callback parameter, which is passed down from skb_zcopy_clear(). This gives access to the original skb, which is needed for upcoming RX zero-copy error handling. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07skbuff: replace sock_zerocopy_get with skb_zcopy_getJonathan Lemon
Rename the get routines for consistency. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07skbuff: replace sock_zerocopy_put() with skb_zcopy_put()Jonathan Lemon
Replace sock_zerocopy_put with the generic skb_zcopy_put() function. Pass 'true' as the success argument, as this is identical to no change. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07skbuff: Push status and refcounts into sock_zerocopy_callbackJonathan Lemon
Before this change, the caller of sock_zerocopy_callback would need to save the zerocopy status, decrement and check the refcount, and then call the callback function - the callback was only invoked when the refcount reached zero. Now, the caller just passes the status into the callback function, which saves the status and handles its own refcounts. This makes the behavior of the sock_zerocopy_callback identical to the tpacket and vhost callbacks. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07skbuff: simplify sock_zerocopy_putJonathan Lemon
All 'struct ubuf_info' users should have a callback defined as of commit 0a4a060bb204 ("sock: fix zerocopy_success regression with msg_zerocopy"). Remove the dead code path to consume_skb(), which makes assumptions about how the structure was allocated. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07skbuff: remove unused skb_zcopy_abort functionJonathan Lemon
skb_zcopy_abort() has no in-tree consumers, remove it. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge branch 'dwmac-meson8b-picosecond-precision-rx-delay-support'Jakub Kicinski
Martin Blumenstingl says: ==================== dwmac-meson8b: picosecond precision RX delay support with the help of Jianxin Pan (many thanks!) the meaning of the "new" PRG_ETH1[19:16] register bits on Amlogic Meson G12A, G12B and SM1 SoCs are finally known. These SoCs allow fine-tuning the RGMII RX delay in 200ps steps (contrary to what I have thought in the past [0] these are not some "calibration" values). The vendor u-boot has code to automatically detect the best RX/TX delay settings. For now we keep it simple and add a device-tree property with 200ps precision to select the "right" RX delay for each board. While here, deprecate the "amlogic,rx-delay-ns" property as it's not used on any upstream .dts (yet). The driver is backwards compatible. I have tested this on an X96 Air 4GB board (not upstream yet). Testing with iperf3 gives 938 Mbits/sec in both directions (RX and TX). The following network settings were used in the .dts (2ns TX delay generated by the PHY, 800ps RX delay generated by the MAC as the PHY only supports 0ns or 2ns RX delays): &ext_mdio { external_phy: ethernet-phy@0 { /* Realtek RTL8211F (0x001cc916) */ reg = <0>; eee-broken-1000t; reset-assert-us = <10000>; reset-deassert-us = <30000>; reset-gpios = <&gpio GPIOZ_15 (GPIO_ACTIVE_LOW | GPIO_OPEN_DRAIN)>; interrupt-parent = <&gpio_intc>; /* MAC_INTR on GPIOZ_14 */ interrupts = <26 IRQ_TYPE_LEVEL_LOW>; }; }; &ethmac { status = "okay"; pinctrl-0 = <&eth_pins>, <&eth_rgmii_pins>; pinctrl-names = "default"; phy-mode = "rgmii-txid"; phy-handle = <&external_phy>; amlogic,rgmii-rx-delay-ps = <800>; }; To use the same settings from vendor u-boot (which in my case has broken Ethernet) the following commands can be used: mw.l 0xff634540 0x1621 mw.l 0xff634544 0x30000 phyreg w 0x0 0x1040 phyreg w 0x1f 0xd08 phyreg w 0x11 0x9 phyreg w 0x15 0x11 phyreg w 0x1f 0x0 phyreg w 0x0 0x9200 Also I have tested this on a X96 Max board without any .dts changes to confirm that other boards with the same IP block still work fine with these changes. Changes since v3 at [3]. - added Florian's Reviewed-by to patch 1 (thank you!) - rebased on top of net-next Changes since v2 at [2]: - use the generic property name "rx-internal-delay-ps" as suggested by Rob (thanks!). This affects patches #1 and #3. The biggest change is is in patch #1 which is why I didn't add Florian's and Andrew's Reviewed-by - added Andrew's and Florian's Reviewed-by to patches 2, 3, 4, 5 (many thanks to both!). I decided to do this despite renaming the property to the generic name "rx-internal-delay-ps" as it only affects the patch description and one line of code - updated patch description of patch #3 to explain why there's not a lot of validation when parsing the old device-tree property (in nanosecond precision) - dropped RFC status Changes since v1 at [1]: - updated patch 1 by making it more clear when the RX delay is applied. Thanks to Andrew for the suggestion! - added a fix to enabling the timing-adjustment clock only when really needed. Found by Andrew - thanks! - added testing not about X96 Max - v1 did not go to the netdev mailing list, v2 fixes this [0] https://lore.kernel.org/netdev/CAFBinCATt4Hi9rigj52nMf3oygyFbnopZcsakGL=KyWnsjY3JA@mail.gmail.com/ [1] https://patchwork.kernel.org/project/linux-amlogic/list/?series=384279&state=%2A&archive=both [2] https://patchwork.kernel.org/project/linux-amlogic/list/?series=384491&state=%2A&archive=both [3] https://patchwork.kernel.org/project/linux-amlogic/list/?series=406005&state=%2A&archive=both ==================== Link: https://lore.kernel.org/r/20210106134251.45264-1-martin.blumenstingl@googlemail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: stmmac: dwmac-meson8b: add support for the RGMII RX delay on G12AMartin Blumenstingl
Amlogic Meson G12A (and newer: G12B, SM1) SoCs have a more advanced RX delay logic. Instead of fine-tuning the delay in the nanoseconds range it now allows tuning in 200 picosecond steps. This support comes with new bits in the PRG_ETH1[19:16] register. Add support for validating the RGMII RX delay as well as configuring the register accordingly on these platforms. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: stmmac: dwmac-meson8b: move RGMII delays into a separate functionMartin Blumenstingl
Newer SoCs starting with the Amlogic Meson G12A have more a precise RGMII RX delay configuration register. This means more complexity in the code. Extract the existing RGMII delay configuration code into a separate function to make it easier to read/understand even when adding more logic in the future. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: stmmac: dwmac-meson8b: use picoseconds for the RGMII RX delayMartin Blumenstingl
Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX delay register which allows picoseconds precision. Parse the new "rx-internal-delay-ps" property or fall back to the value from the old "amlogic,rx-delay-ns" property. No upstream DTB uses the old "amlogic,rx-delay-ns" property (yet). Only include minimalistic logic to fall back to the old property, without any special validation (for example if the old and new property are given at the same time). Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clockMartin Blumenstingl
The timing-adjustment clock only has to be enabled when a) there is a 2ns RX delay configured using device-tree and b) the phy-mode indicates that the RX delay should be enabled. Only enable the RX delay if both are true, instead of (by accident) also enabling it when there's the 2ns RX delay configured but the phy-mode incicates that the RX delay is not used. Fixes: 9308c47640d515 ("net: stmmac: dwmac-meson8b: add support for the RX delay configuration") Reported-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07dt-bindings: net: dwmac-meson: use picoseconds for the RGMII RX delayMartin Blumenstingl
Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX delay register which allows picoseconds precision. Deprecate the old "amlogic,rx-delay-ns" in favour of the generic "rx-internal-delay-ps" property. For older SoCs the only known supported values were 0ns and 2ns. The new SoCs have support for RGMII RX delays between 0ps and 3000ps in 200ps steps. Don't carry over the description for the "rx-internal-delay-ps" property and inherit that from ethernet-controller.yaml instead. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge branch 'reduce-coupling-between-dsa-and-broadcom-systemport-driver'Jakub Kicinski
Vladimir Oltean says: ==================== Reduce coupling between DSA and Broadcom SYSTEMPORT driver Upon a quick inspection, it seems that there is some code in the generic DSA layer that is somehow specific to the Broadcom SYSTEMPORT driver. The challenge there is that the hardware integration is very tight between the switch and the DSA master interface. However this does not mean that the drivers must also be as integrated as the hardware is. We can avoid creating a DSA notifier just for the Broadcom SYSTEMPORT, and we can move some Broadcom-specific queue mapping helpers outside of the common include/net/dsa.h. ==================== Link: https://lore.kernel.org/r/20210107012403.1521114-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: remove the DSA specific notifiersVladimir Oltean
This effectively reverts commit 60724d4bae14 ("net: dsa: Add support for DSA specific notifiers"). The reason is that since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), it appears that there is a generic way to achieve the same purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was converted to use the generic notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: systemport: use standard netdevice notifier to detect DSA presenceVladimir Oltean
The SYSTEMPORT driver maps each port of the embedded Broadcom DSA switch port to a certain queue of the master Ethernet controller. For that it currently uses a dedicated notifier infrastructure which was added in commit 60724d4bae14 ("net: dsa: Add support for DSA specific notifiers"). However, since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), DSA is actually an upper of the Broadcom SYSTEMPORT as far as the netdevice adjacency lists are concerned. So naturally, the plain NETDEV_CHANGEUPPER net device notifiers are emitted. It looks like there is enough API exposed by DSA to the outside world already to make the call_dsa_notifiers API redundant. So let's convert its only user to plain netdev notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: export dsa_slave_dev_checkVladimir Oltean
Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when they are enslaved to e.g. a bridge by calling netif_is_bridge_master(). Export this helper from DSA to get the equivalent functionality of determining whether the upper interface of a CHANGEUPPER notifier is a DSA switch interface or not. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: move the Broadcom tag information in a separate header fileVladimir Oltean
It is a bit strange to see something as specific as Broadcom SYSTEMPORT bits in the main DSA include file. Move these away into a separate header, and have the tagger and the SYSTEMPORT driver include them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge branch 'offload-software-learnt-bridge-addresses-to-dsa'Jakub Kicinski
Vladimir Oltean says: ==================== Offload software learnt bridge addresses to DSA This series tries to make DSA behave a bit more sanely when bridged with "foreign" (non-DSA) interfaces and source address learning is not supported on the hardware CPU port (which would make things work more seamlessly without software intervention). When a station A connected to a DSA switch port needs to talk to another station B connected to a non-DSA port through the Linux bridge, DSA must explicitly add a route for station B towards its CPU port. Initial RFC was posted here: https://patchwork.ozlabs.org/project/netdev/cover/20201108131953.2462644-1-olteanv@gmail.com/ v2 was posted here: https://patchwork.kernel.org/project/netdevbpf/cover/20201213024018.772586-1-vladimir.oltean@nxp.com/ v3 was posted here: https://patchwork.kernel.org/project/netdevbpf/cover/20201213140710.1198050-1-vladimir.oltean@nxp.com/ This is a resend of the previous v3 with some added Reviewed-by tags. ==================== Link: https://lore.kernel.org/r/20210106095136.224739-1-olteanv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: ocelot: request DSA to fix up lack of address learning on CPU portVladimir Oltean
Given the following setup: ip link add br0 type bridge ip link set eno0 master br0 ip link set swp0 master br0 ip link set swp1 master br0 ip link set swp2 master br0 ip link set swp3 master br0 Currently, packets received on a DSA slave interface (such as swp0) which should be routed by the software bridge towards a non-switch port (such as eno0) are also flooded towards the other switch ports (swp1, swp2, swp3) because the destination is unknown to the hardware switch. This patch addresses the issue by monitoring the addresses learnt by the software bridge on eno0, and adding/deleting them as static FDB entries on the CPU port accordingly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge ↵Vladimir Oltean
neighbors Some DSA switches (and not only) cannot learn source MAC addresses from packets injected from the CPU. They only perform hardware address learning from inbound traffic. This can be problematic when we have a bridge spanning some DSA switch ports and some non-DSA ports (which we'll call "foreign interfaces" from DSA's perspective). There are 2 classes of problems created by the lack of learning on CPU-injected traffic: - excessive flooding, due to the fact that DSA treats those addresses as unknown - the risk of stale routes, which can lead to temporary packet loss To illustrate the second class, consider the following situation, which is common in production equipment (wireless access points, where there is a WLAN interface and an Ethernet switch, and these form a single bridging domain). AP 1: +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ | ^ ^ | | | | | | | Client A Client B | | | +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 will know that Clients A and B are reachable via wlan0 - the hardware fdb of a DSA switch driver today is not kept in sync with the software entries on other bridge ports, so it will not know that clients A and B are reachable via the CPU port UNLESS the hardware switch itself performs SA learning from traffic injected from the CPU. Nonetheless, a substantial number of switches don't. - the hardware fdb of the DSA switch on AP 2 may autonomously learn that Client A and B are reachable through swp0. Therefore, the software br0 of AP 2 also may or may not learn this. In the example we're illustrating, some Ethernet traffic has been going on, and br0 from AP 2 has indeed learnt that it can reach Client B through swp0. One of the wireless clients, say Client B, disconnects from AP 1 and roams to AP 2. The topology now looks like this: AP 1: +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ | ^ | | | Client A | | | Client B | | | v +------------+ +------------+ +------------+ +------------+ +------------+ | swp0 | | swp1 | | swp2 | | swp3 | | wlan0 | +------------+ +------------+ +------------+ +------------+ +------------+ +------------------------------------------------------------------------+ | br0 | +------------------------------------------------------------------------+ AP 2 - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change) - br0 of AP 1 will (possibly) know that Client B has left wlan0. There are cases where it might never find out though. Either way, DSA today does not process that notification in any way. - the hardware FDB of the DSA switch on AP 1 may learn autonomously that Client B can be reached via swp0, if it receives any packet with Client 1's source MAC address over Ethernet. - the hardware FDB of the DSA switch on AP 2 still thinks that Client B can be reached via swp0. It does not know that it has roamed to wlan0, because it doesn't perform SA learning from the CPU port. Now Client A contacts Client B. AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet segment. AP 2 sees a frame on swp0 and its fdb says that the destination is swp0. Hairpinning is disabled => drop. This problem comes from the fact that these switches have a 'blind spot' for addresses coming from software bridging. The generic solution is not to assume that hardware learning can be enabled somehow, but to listen to more bridge learning events. It turns out that the bridge driver does learn in software from all inbound frames, in __br_handle_local_finish. A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the addresses serviced by the bridge on 'foreign' interfaces. The software bridge also does the right thing on migration, by notifying that the old entry is deleted, so that does not need to be special-cased in DSA. When it is deleted, we just need to delete our static FDB entry towards the CPU too, and wait. The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE events received on its own interfaces, such as static FDB entries. Luckily we can change that, and DSA can listen to all switchdev FDB add/del events in the system and figure out if those events were emitted by a bridge that spans at least one of DSA's own ports. In case that is true, DSA will also offload that address towards its own CPU port, in the eventuality that there might be bridge clients attached to the DSA switch who want to talk to the station connected to the foreign interface. In terms of implementation, we need to keep the fdb_info->added_by_user check for the case where the switchdev event was targeted directly at a DSA switch port. But we don't need to look at that flag for snooped events. So the check is currently too late, we need to move it earlier. This also simplifies the code a bit, since we avoid uselessly allocating and freeing switchdev_work. We could probably do some improvements in the future. For example, multi-bridge support is rudimentary at the moment. If there are two bridges spanning a DSA switch's ports, and both of them need to service the same MAC address, then what will happen is that the migration of one of those stations will trigger the deletion of the FDB entry from the CPU port while it is still used by other bridge. That could be improved with reference counting but is left for another time. This behavior needs to be enabled at driver level by setting ds->assisted_learning_on_cpu_port = true. This is because we don't want to inflict a potential performance penalty (accesses through MDIO/I2C/SPI are expensive) to hardware that really doesn't need it because address learning on the CPU port works there. Reported-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDBVladimir Oltean
Right now, the following would happen for a switch driver that does not implement .port_fdb_add or .port_fdb_del. dsa_slave_switchdev_event returns NOTIFY_OK and schedules: -> dsa_slave_switchdev_event_work -> dsa_port_fdb_add -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD) -> dsa_switch_fdb_add -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP; -> an error is printed with dev_dbg, and dsa_fdb_offload_notify(switchdev_work) is not called. We can avoid scheduling the worker for nothing and say NOTIFY_DONE. Because we don't call dsa_fdb_offload_notify, the static FDB entry will remain just in the software bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: move switchdev event implementation under the same switch/case ↵Vladimir Oltean
statement We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events even for interfaces where dsa_slave_dev_check returns false, so we need that check inside the switch-case statement for SWITCHDEV_FDB_*. This movement also avoids a useless allocation / free of switchdev_work on the untreated "default event" case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_workVladimir Oltean
Currently DSA doesn't add FDB entries on the CPU port, because it only does so through switchdev, which is associated with a net_device, and there are none of those for the CPU port. But actually FDB addresses on the CPU port have some use cases of their own, if the switchdev operations are initiated from within the DSA layer. There is just one problem with the existing code: it passes a structure in dsa_switchdev_event_work which was retrieved directly from switchdev, so it contains a net_device. We need to generalize the contents to something that covers the CPU port as well: the "ds, port" tuple is fine for that. Note that the new procedure for notifying the successful FDB offload is inspired from the rocker model. Also, nothing was being done if added_by_user was false. Let's check for that a lot earlier, and don't actually bother to schedule the worker for nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: be louder when a non-legacy FDB operation failsVladimir Oltean
The dev_close() call was added in commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") "to indicate inconsistent situation" when we could not delete an FDB entry from the port. bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master It is a bit drastic and at the same time not helpful if the above fails to only print with netdev_dbg log level, but on the other hand to bring the interface down. So increase the verbosity of the error message, and drop dev_close(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: bridge: notify switchdev of disappearance of old FDB entry upon migrationVladimir Oltean
Currently the bridge emits atomic switchdev notifications for dynamically learnt FDB entries. Monitoring these notifications works wonders for switchdev drivers that want to keep their hardware FDB in sync with the bridge's FDB. For example station A wants to talk to station B in the diagram below, and we are concerned with the behavior of the bridge on the DUT device: DUT +-------------------------------------+ | br0 | | +------+ +------+ +------+ +------+ | | | | | | | | | | | | | swp0 | | swp1 | | swp2 | | eth0 | | +-------------------------------------+ | | | Station A | | | | +--+------+--+ +--+------+--+ | | | | | | | | | | swp0 | | | | swp0 | | Another | +------+ | | +------+ | Another switch | br0 | | br0 | switch | +------+ | | +------+ | | | | | | | | | | | swp1 | | | | swp1 | | +--+------+--+ +--+------+--+ | Station B Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has the following property: frames injected from its control interface bypass the internal address analyzer logic, and therefore, this hardware does not learn from the source address of packets transmitted by the network stack through it. So, since bridging between eth0 (where Station B is attached) and swp0 (where Station A is attached) is done in software, the switchdev hardware will never learn the source address of Station B. So the traffic towards that destination will be treated as unknown, i.e. flooded. This is where the bridge notifications come in handy. When br0 on the DUT sees frames with Station B's MAC address on eth0, the switchdev driver gets these notifications and can install a rule to send frames towards Station B's address that are incoming from swp0, swp1, swp2, only towards the control interface. This is all switchdev driver private business, which the notification makes possible. All is fine until someone unplugs Station B's cable and moves it to the other switch: DUT +-------------------------------------+ | br0 | | +------+ +------+ +------+ +------+ | | | | | | | | | | | | | swp0 | | swp1 | | swp2 | | eth0 | | +-------------------------------------+ | | | Station A | | | | +--+------+--+ +--+------+--+ | | | | | | | | | | swp0 | | | | swp0 | | Another | +------+ | | +------+ | Another switch | br0 | | br0 | switch | +------+ | | +------+ | | | | | | | | | | | swp1 | | | | swp1 | | +--+------+--+ +--+------+--+ | Station B Luckily for the use cases we care about, Station B is noisy enough that the DUT hears it (on swp1 this time). swp1 receives the frames and delivers them to the bridge, who enters the unlikely path in br_fdb_update of updating an existing entry. It moves the entry in the software bridge to swp1 and emits an addition notification towards that. As far as the switchdev driver is concerned, all that it needs to ensure is that traffic between Station A and Station B is not forever broken. If it does nothing, then the stale rule to send frames for Station B towards the control interface remains in place. But Station B is no longer reachable via the control interface, but via a port that can offload the bridge port learning attribute. It's just that the port is prevented from learning this address, since the rule overrides FDB updates. So the rule needs to go. The question is via what mechanism. It sure would be possible for this switchdev driver to keep track of all addresses which are sent to the control interface, and then also listen for bridge notifier events on its own ports, searching for the ones that have a MAC address which was previously sent to the control interface. But this is cumbersome and inefficient. Instead, with one small change, the bridge could notify of the address deletion from the old port, in a symmetrical manner with how it did for the insertion. Then the switchdev driver would not be required to monitor learn/forget events for its own ports. It could just delete the rule towards the control interface upon bridge entry migration. This would make hardware address learning be possible again. Then it would take a few more packets until the hardware and software FDB would be in sync again. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge branch 'r8169-improve-rtl8168g-phy-suspend-quirk'Jakub Kicinski
Heiner Kallweit says: ==================== r8169: improve RTL8168g PHY suspend quirk According to Realtek the ERI register 0x1a8 quirk is needed to work around a hw issue with the PHY on RTL8168g. The register needs to be changed before powering down the PHY. Currently we don't meet this requirement, however I'm not aware of any problems caused by this. Therefore I see the change as an improvement. The PHY driver has no means to access the chip ERI registers, therefore we have to intercept MDIO writes to the BMCR register. If the BMCR_PDOWN bit is going to be set, then let's apply the quirk before actually powering down the PHY. ==================== Link: https://lore.kernel.org/r/9303c2cf-c521-beea-c09f-63b5dfa91b9c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07r8169: improve RTL8168g PHY suspend quirkHeiner Kallweit
According to Realtek the ERI register 0x1a8 quirk is needed to work around a hw issue with the PHY on RTL8168g. The register needs to be changed before powering down the PHY. Currently we don't meet this requirement, however I'm not aware of any problems caused by this. Therefore I see the change as an improvement. The PHY driver has no means to access the chip ERI registers, therefore we have to intercept MDIO writes to BMCR register. If the BMCR_PDOWN bit is going to be set, then let's apply the quirk before actually powering down the PHY. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07r8169: move ERI access functions to avoid forward declarationHeiner Kallweit
No functional change here. We just move a code block to avoid a function forward declaration in a subsequent change. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: phy: replace mutex_is_locked with lockdep_assert_held in phylibHeiner Kallweit
Switch to lockdep_assert_held(_once), similar to what is being done in other subsystems. One advantage is that there's zero runtime overhead if lockdep support isn't enabled. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/ccc40b9d-8ee0-43a1-5009-2cc95ca79c85@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: phy: bcm7xxx: Add an entry for BCM72116Florian Fainelli
BCM72116 features a 28nm integrated EPHY, add an entry to match this PHY OUI. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210106170944.1253046-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge branch 'udp_tunnel_nic-post-conversion-cleanup'Jakub Kicinski
udp_tunnel_nic: post conversion cleanup It has been two releases since we added the common infra for UDP tunnel port offload, and we have not heard of any major issues. Remove the old direct driver NDOs completely, and perform minor simplifications in the tunnel drivers. Link: https://lore.kernel.org/r/20210106210637.1839662-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checksJakub Kicinski
Move the NETIF_F_RX_UDP_TUNNEL_PORT feature check into udp_tunnel_nic_*_port() helpers, since they're always done right before the call. Add similar checks before calling the notifier. udp_tunnel_nic invokes the notifier without checking features which could result in some wasted cycles. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: remove ndo_udp_tunnel_* callbacksJakub Kicinski
All UDP tunnel port management is now routed via udp_tunnel_nic infra directly. Remove the old callbacks. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07udp_tunnel: remove REGISTER/UNREGISTER handling from tunnel driversJakub Kicinski
udp_tunnel_nic handles REGISTER and UNREGISTER event, now that all drivers use that infra we can drop the event handling in the tunnel drivers. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07udp_tunnel: hard-wire NDOs to udp_tunnel_nic_*_port() helpersJakub Kicinski
All drivers use udp_tunnel_nic_*_port() helpers, prepare for NDO removal by invoking those helpers directly. The helpers are safe to call on all devices, they check if device has the UDP tunnel state initialized. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: broadcom: Drop OF dependency from BGMAC_PLATFORMFlorian Fainelli
All of the OF code that is used has stubbed and will compile and link just fine, keeping COMPILE_TEST is enough. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210106191546.1358324-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge branch 'bcm63xx_enet-major-makeover-of-driver'Jakub Kicinski
Sieng Piaw Liew says: ==================== bcm63xx_enet: major makeover of driver This patch series aim to improve the bcm63xx_enet driver by integrating the latest networking features, i.e. batched rx processing, BQL, build_skb, etc. The newer enetsw SoCs are found to be able to do unaligned rx DMA by adding NET_IP_ALIGN padding which, combined with these patches, improved packet processing performance by ~50% on BCM6328. Older non-enetsw SoCs still benefit mainly from rx batching. Performance improvement of ~30% is observed on BCM6333. The BCM63xx SoCs are designed for routers. As such, having BQL is beneficial as well as trivial to add. v3: * Simplify xmit_more patch by not moving around the code needlessly. * Fix indentation in xmit_more patch. * Fix indentation in build_skb patch. * Split rx ring cleanup patch from build_skb patch and precede build_skb patch for better understanding, as suggested by Florian Fainelli. v2: * Add xmit_more support and rx loop improvisation patches. * Moved BQL netdev_reset_queue() to bcm_enet_stop()/bcm_enetsw_stop() functions as suggested by Florian Fainelli. * Improved commit messages. ==================== Link: https://lore.kernel.org/r/20210106144208.1935-1-liew.s.piaw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: improve rx loopSieng Piaw Liew
Use existing rx processed count to track against budget, thereby making budget decrement operation redundant. rx_desc_count can be calculated outside the rx loop, making the loop a bit smaller. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: convert to build_skbSieng Piaw Liew
We can increase the efficiency of rx path by using buffers to receive packets then build SKBs around them just before passing into the network stack. In contrast, preallocating SKBs too early reduces CPU cache efficiency. Check if we're in NAPI context when refilling RX. Normally we're almost always running in NAPI context. Dispatch to napi_alloc_frag directly instead of relying on netdev_alloc_frag which does the same but with the overhead of local_bh_disable/enable. Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec performance. Included netif_receive_skb_list and NET_IP_ALIGN optimizations. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 49.9 MBytes 41.9 Mbits/sec 197 sender [ 4] 0.00-10.00 sec 49.3 MBytes 41.3 Mbits/sec receiver After: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 171 MBytes 47.8 Mbits/sec 272 sender [ 4] 0.00-30.00 sec 170 MBytes 47.6 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: consolidate rx SKB ring cleanup codeSieng Piaw Liew
The rx SKB ring use the same code for cleanup at various points. Combine them into a function to reduce lines of code. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: alloc rx skb with NET_IP_ALIGNSieng Piaw Liew
Use netdev_alloc_skb_ip_align on newer SoCs with integrated switch (enetsw) when refilling RX. Increases packet processing performance by 30% (with netif_receive_skb_list). Non-enetsw SoCs cannot function with the extra pad so continue to use the regular netdev_alloc_skb. Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec performance. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver After (+netif_receive_skb_list): [ 4] 0.00-30.00 sec 155 MBytes 43.3 Mbits/sec 354 sender [ 4] 0.00-30.00 sec 154 MBytes 43.1 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: add xmit_more supportSieng Piaw Liew
Support bulking hardware TX queue by using netdev_xmit_more(). Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: add BQL supportSieng Piaw Liew
Add Byte Queue Limits support to reduce/remove bufferbloat in bcm63xx_enet. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07bcm63xx_enet: batch process rx pathSieng Piaw Liew
Use netif_receive_skb_list to batch process rx skb. Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance by 12.5%. Before: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver After: [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 136 MBytes 37.9 Mbits/sec 203 sender [ 4] 0.00-30.00 sec 135 MBytes 37.7 Mbits/sec receiver Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07qmi_wwan: Increase headroom for QMAP SKBsKristian Evensen
When measuring the throughput (iperf3 + TCP) while routing on a not-so-powerful device (Mediatek MT7621, 880MHz CPU), I noticed that I achieved significantly lower speeds with QMI-based modems than for example a USB LAN dongle. The CPU was saturated in all of my tests. With the dongle I got ~300 Mbit/s, while I only measured ~200 Mbit/s with the modems. All offloads, etc. were switched off for the dongle, and I configured the modems to use QMAP (16k aggregation). The tests with the dongle were performed in my local (gigabit) network, while the LTE network the modems were connected to delivers 700-800 Mbit/s. Profiling the kernel revealed the cause of the performance difference. In qmimux_rx_fixup(), an SKB is allocated for each packet contained in the URB. This SKB has too little headroom, causing the check in skb_cow() (called from ip_forward()) to fail. pskb_expand_head() is then called and the SKB is reallocated. In the output from perf, I see that a significant amount of time is spent in pskb_expand_head() + support functions. In order to ensure that the SKB has enough headroom, this commit increases the amount of memory allocated in qmimux_rx_fixup() by LL_MAX_HEADER. The reason for using LL_MAX_HEADER and not a more accurate value, is that we do not know the type of the outgoing network interface. After making this change, I achieve the same throughput with the modems as with the dongle. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20210106122403.1321180-1-kristian.evensen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07Merge tag 'linux-can-next-for-5.12-20210106' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2021-01-06 The first 16 patches are by me and target the tcan4x5x SPI glue driver for the m_can CAN driver. First there are a several cleanup commits, then the SPI regmap part is converted to 8 bits per word, to make it possible to use that driver on SPI controllers that only support the 8 bit per word mode (such as the SPI cores on the raspberry pi). Oliver Hartkopp contributes a patch for the CAN_RAW protocol. The getsockopt() for CAN_RAW_FILTER is changed to return -ERANGE if the filterset does not fit into the provided user space buffer. The last two patches are by Joakim Zhang and add wakeup support to the flexcan driver for the i.MX8QM SoC. The dt-bindings docs are extended to describe the added property. * tag 'linux-can-next-for-5.12-20210106' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: flexcan: add CAN wakeup function for i.MX8QM dt-bindings: can: fsl,flexcan: add fsl,scu-index property to indicate a resource can: raw: return -ERANGE when filterset does not fit into user space buffer can: tcan4x5x: add support for half-duplex controllers can: tcan4x5x: rework SPI access can: tcan4x5x: add {wr,rd}_table can: tcan4x5x: add max_raw_{read,write} of 256 can: tcan4x5x: tcan4x5x_regmap: set reg_stride to 4 can: tcan4x5x: fix max register value can: tcan4x5x: tcan4x5x_regmap_init(): use spi as context pointer can: tcan4x5x: tcan4x5x_regmap_write(): remove not needed casts and replace 4 by sizeof can: tcan4x5x: rename regmap_spi_gather_write() -> tcan4x5x_regmap_gather_write() can: tcan4x5x: remove regmap async support can: tcan4x5x: tcan4x5x_bus: remove not needed read_flag_mask can: tcan4x5x: mark struct regmap_bus tcan4x5x_bus as constant can: tcan4x5x: move regmap code into seperate file can: tcan4x5x: rename tcan4x5x.c -> tcan4x5x-core.c can: tcan4x5x: beautify indention of tcan4x5x_of_match and tcan4x5x_id_table can: tcan4x5x: replace DEVICE_NAME by KBUILD_MODNAME ==================== Link: https://lore.kernel.org/r/20210107094900.173046-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-06net: dsa: print error on invalid port indexRafał Miłecki
Looking for an -EINVAL all over the dsa code could take hours for inexperienced DSA users. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210106090915.21439-1-zajec5@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-06can: flexcan: add CAN wakeup function for i.MX8QMJoakim Zhang
The System Controller Firmware (SCFW) is a low-level system function which runs on a dedicated Cortex-M core to provide power, clock, and resource management. It exists on some i.MX8 processors. e.g. i.MX8QM (QM, QP), and i.MX8QX (QXP, DX). SCU driver manages the IPC interface between host CPU and the SCU firmware running on M4. For i.MX8QM, stop mode request is controlled by System Controller Unit(SCU) firmware, this patch introduces FLEXCAN_QUIRK_SETUP_STOP_MODE_SCFW quirk for this function. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20201106105627.31061-6-qiangqing.zhang@nxp.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-06dt-bindings: can: fsl,flexcan: add fsl,scu-index property to indicate a resourceJoakim Zhang
For SoCs with SCU support, need setup stop mode via SCU firmware, so this property can help indicate a resource in SCU firmware. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20201106105627.31061-3-qiangqing.zhang@nxp.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-01-06can: raw: return -ERANGE when filterset does not fit into user space bufferOliver Hartkopp
Multiple filters (struct can_filter) can be set with the setsockopt() function, which was originally intended as a write-only operation. As getsockopt() also provides a CAN_RAW_FILTER option to read back the given filters, the caller has to provide an appropriate user space buffer. In the case this buffer is too small the getsockopt() silently truncates the filter information and gives no information about the needed space. This is safe but not convenient for the programmer. In net/core/sock.c the SO_PEERGROUPS sockopt had a similar requirement and solved it by returning -ERANGE in the case that the provided data does not fit into the given user space buffer and fills the required size into optlen, so that the caller can retry with a matching buffer length. This patch adopts this approach for CAN_RAW_FILTER getsockopt(). Reported-by: Phillip Schichtel <phillip@schich.tel> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-By: Phillip Schichtel <phillip@schich.tel> Link: https://lore.kernel.org/r/20201216174928.21663-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>