summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2021-10-18ethernet: use eth_hw_addr_set() in unmaintained driversJakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18octeontx2-nic: fix mixed module buildArnd Bergmann
Building the VF and PF side of this driver differently, with one being a loadable module and the other one built-in results in a link failure for the common PTP driver: ld.lld: error: undefined symbol: __this_module >>> referenced by otx2_ptp.c >>> net/ethernet/marvell/octeontx2/nic/otx2_ptp.o:(otx2_ptp_init) in archive drivers/built-in.a >>> referenced by otx2_ptp.c >>> net/ethernet/marvell/octeontx2/nic/otx2_ptp.o:(otx2_ptp_init) in archive drivers/built-in.a Move the otx2_ptp.c code into a separate module that gets built for both configurations, making it built-in if at least one of the other two is built-in. Fixes: 43510ef4ddad ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: ethernet: ave: Add compatible string and SoC-dependent data for NX1 SoCKunihiko Hayashi
Add basic support for UniPhier NX1 SoC. This includes a compatible string and SoC-dependent data. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: w5100: Make w5100_remove() return voidUwe Kleine-König
Up to now w5100_remove() returns zero unconditionally. Make it return void instead which makes it easier to see in the callers that there is no error to handle. Also the return value of platform and spi remove callbacks is ignored anyway. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: ks8851: Make ks8851_remove_common() return voidUwe Kleine-König
Up to now ks8851_remove_common() returns zero unconditionally. Make it return void instead which makes it easier to see in the callers that there is no error to handle. Also the return value of platform and spi remove callbacks is ignored anyway. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data typesAhmed S. Darwish
The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_stats_basic_packed) was a u64_stats sync point inside the former. The two data types are now equivalent: earlier commits added a u64_stats sync point to the latter. Combine both data types into "struct gnet_stats_basic_sync". This eliminates redundancy and simplifies the bstats read/write APIs. Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit architectures, u64_stats sync points do not use sequence counter protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: ixgb: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). ixgb_get_ee_mac_addr() is used with a non-nevdev->dev_addr pointer so we can't deal with the problem inside it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: ibmveth: use ether_addr_to_u64()Jakub Kicinski
We'll want to make netdev->dev_addr const, remove the local helper which is missing a const qualifier on the argument and use ether_addr_to_u64(). Similar story to mlx4. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: enetc: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Pass a netdev into the helper instead of just the address, read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: ec_bhf: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Copy the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: enic: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Use a zero'ed array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: bcmgenet: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: bnx2x: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: aquantia: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Use an array on the stack, then call eth_hw_addr_set(). eth_hw_addr_set() is after error checking, this should be fine, error propagates all the way to failing probe. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: amd: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: alteon: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Break the address apart into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: aeroflex: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. macaddr[] is a module param, and int, so copy the address into an array of u8 on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16ethernet: adaptec: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net: ipvtap: fix template string argument of device_create() callJean Sacren
The last argument of device_create() call should be a template string. The tap_name variable should be the argument to the string, but not the argument of the call itself. We should add the template string and turn tap_name into its argument. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net: macvtap: fix template string argument of device_create() callJean Sacren
The last argument of device_create() call should be a template string. The tap_name variable should be the argument to the string, but not the argument of the call itself. We should add the template string and turn tap_name into its argument. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16Merge tag 'mlx5-updates-2021-10-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-15 1) From Rongwei Liu: Use system_image_guid and native_port_num when bonding. Don't relay on PCIe ids anymore. With some specific NIC, the physical devices may have PCIe IDs like 0001:01:00.0/1 and 0002:02:00.0/1. All of these devices should have the same system_image_guid and device index can be queried from native_port_num. For matching sibling devices/port of the same HCA, compare the HCA GUID reported on each device rather than just assuming PCIe ids have similar attributes. 2) From Amir Tzin: Use HCA defined Timouts Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. 3) From Shay Drory: Disable roce at HCA level Disable RoCE in Firmware when devlink roce parameter is set to off. 4) A small set of trivial cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net/mlx5: Use system_image_guid to determine bondingRongwei Liu
With specific NICs, the PFs may have different PCIe ids like 0001:01:00.0/1 and 0002:02:00:00/1. For PFs with the same system_image_guid, driver should consider them under the same physical NIC and they are legal to bond together. If firmware doesn't support system_image_guid, set it to zero and fallback to use PCIe ids. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: Introduce new device index wrapperRongwei Liu
Downstream patches. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: Check return status first when querying system_image_guidRongwei Liu
When querying system_image_guid from firmware, we should check return value first. The buffer content is valid only if query succeed. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: DR, Prefer kcalloc over open coded arithmeticLen Baker
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, refactor the code a bit to use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. [1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5e: Add extack msgs related to TC for better debugAbhiram R N
As multiple places EOPNOTSUPP and EINVAL is returned from driver it becomes difficult to understand the reason only with error code. With the netlink extack message exact reason will be known and will aid in debugging. Signed-off-by: Abhiram R N <abhiramrn@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: CT: Fix missing cleanup of ct nat table on init failurePaul Blakey
If CT fails to initialize it's rhashtables, it doesn't destroy the ct nat global table. Destroy the ct nat global table on ct init failure. Fixes: d7cade513752 ("net/mlx5e: check return value of rhashtable_init") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: Disable roce at HCA levelShay Drory
Currently, when a user disables roce via the devlink param, this change isn't passed down to the device. If device allows disabling RoCE at device level, make use of it. This instructs the device to skip memory allocations related to RoCE functionality which otherwise is done by the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5i: Enable Rx steering for IPoIB via ethtoolMoosa Baransi
Enable steering IPoIB packets via ethtool, the same way it is done today for Ethernet packets. Signed-off-by: Moosa Baransi <moosab@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: Bridge, provide flow source hintsVlad Buslov
Currently, SMFS mode doesn't support rx-loopback flows which causes bridge egress rules to be rejected because without hint rules for both rx and tx destinations are created by default. Provide explicit flow source hints for compatibility with SMFS. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: Read timeout values from DTORAmir Tzin
Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15net/mlx5: Read timeout values from init segmentAmir Tzin
Replace hard coded timeouts with values stored in firmware's init segment. Timeouts are read from init segment during driver load. If init segment timeouts are not supported then fallback to hard coded defaults instead. Also move pre initialization timeouts which cannot be read from firmware to the new mechanism. Signed-off-by: Amir Tzin <amirtz@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-15ice: make use of ice_for_each_* macrosMaciej Fijalkowski
Go through the code base and use ice_for_each_* macros. While at it, introduce ice_for_each_xdp_txq() macro that can be used for looping over xdp_rings array. Commit is not introducing any new functionality. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: introduce XDP_TX fallback pathMaciej Fijalkowski
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: optimize XDP_TX workloadsMaciej Fijalkowski
Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled. Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively. Note that at this point the threshold is a constant (32), but it is something that we could make configurable. First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32. Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit. Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring. With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors. This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows. Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip> As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32 with tx_busy staying calm. When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64 it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: propagate xdp_ring onto rx_ringMaciej Fijalkowski
With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame. Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI. Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial. Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup. [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/ Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: do not create xdp_frame on XDP_TXMaciej Fijalkowski
xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: unify xdp_rings accessesMaciej Fijalkowski
There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: split ice_ring onto Tx/Rx separate structsMaciej Fijalkowski
While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: move ice_container_type onto ice_ring_containerMaciej Fijalkowski
Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: remove ring_active from ice_ringMaciej Fijalkowski
This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15net: dpaa2: add adaptive interrupt coalescingIoana Ciornei
Add support for adaptive interrupt coalescing to the dpaa2-eth driver. First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported coalesce parameter and the requested state is configured through the dpio APIs added in the previous patch. Besides the ethtool API interaction, we keep track of how many bytes and frames are dequeued per CDAN (Channel Data Availability Notification) and update the Net DIM instance through the dpaa2_io_update_net_dim() API. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15soc: fsl: dpio: add Net DIM integrationIoana Ciornei
Use the generic dynamic interrupt moderation (dim) framework to implement adaptive interrupt coalescing on Rx. With the per-packet interrupt scheme, a high interrupt rate has been noted for moderate traffic flows leading to high CPU utilization. The dpio driver exports new functions to enable/disable adaptive IRQ coalescing on a DPIO object, to query the state or to update Net DIM with a new set of bytes and frames dequeued. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dpaa2: add support for manual setup of IRQ coalesingIoana Ciornei
Use the newly exported dpio driver API to manually configure the IRQ coalescing parameters requested by the user. The .get_coalesce() and .set_coalesce() net_device callbacks are implemented and directly export or setup the rx-usecs on all the channels configured. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15soc: fsl: dpio: add support for irq coalescing per software portalIoana Ciornei
In DPAA2 based SoCs, the IRQ coalesing support per software portal has 2 configurable parameters: - the IRQ timeout period (QBMAN_CINH_SWP_ITPR): how many 256 QBMAN cycles need to pass until a dequeue interrupt is asserted. - the IRQ threshold (QBMAN_CINH_SWP_DQRR_ITR): how many dequeue responses in the DQRR ring would generate an IRQ. Add support for setting up and querying these IRQ coalescing related parameters. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15soc: fsl: dpio: extract the QBMAN clock frequency from the attributesIoana Ciornei
Through the dpio_get_attributes() firmware call the dpio driver has access to the QBMAN clock frequency. Extend the structure which holds the firmware's response so that we can have access to this information. This will be needed in the next patches which also add support for interrupt coalescing which needs to be configured based on the frequency. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15tcp: switch orphan_count to bare per-cpu countersEric Dumazet
Use of percpu_counter structure to track count of orphaned sockets is causing problems on modern hosts with 256 cpus or more. Stefan Bach reported a serious spinlock contention in real workloads, that I was able to reproduce with a netfilter rule dropping incoming FIN packets. 53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath | ---queued_spin_lock_slowpath | --53.51%--_raw_spin_lock_irqsave | --53.51%--__percpu_counter_sum tcp_check_oom | |--39.03%--__tcp_close | tcp_close | inet_release | inet6_release | sock_close | __fput | ____fput | task_work_run | exit_to_usermode_loop | do_syscall_64 | entry_SYSCALL_64_after_hwframe | __GI___libc_close | --14.48%--tcp_out_of_resources tcp_write_timeout tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn expire_timers __run_timers run_timer_softirq __softirqentry_text_start As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter batch for dst entries accounting"), default batch size is too big for the default value of tcp_max_orphans (262144). But even if we reduce batch sizes, there would still be cases where the estimated count of orphans is beyond the limit, and where tcp_too_many_orphans() has to call the expensive percpu_counter_sum_positive(). One solution is to use plain per-cpu counters, and have a timer to periodically refresh this cache. Updating this cache every 100ms seems about right, tcp pressure state is not radically changing over shorter periods. percpu_counter was nice 15 years ago while hosts had less than 16 cpus, not anymore by current standards. v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m, reported by kernel test robot <lkp@intel.com> Remove unused socket argument from tcp_too_many_orphans() Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Stefan Bach <sfb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: move port config to dedicated structAnsuel Smith
Move ports related config to dedicated struct to keep things organized. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: set internal delay also for sgmiiAnsuel Smith
QCA original code report port instability and sa that SGMII also require to set internal delay. Generalize the rgmii delay function and apply the advised value if they are not defined in DT. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15net: dsa: qca8k: add support for QCA8328Ansuel Smith
QCA8328 switch is the bigger brother of the qca8327. Same regs different chip. Change the function to set the correct pin layout and introduce a new match_data to differentiate the 2 switch as they have the same ID and their internal PHY have the same ID. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>