summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/ibm
AgeCommit message (Collapse)Author
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24net: ibm/emac: allocate dummy net_device dynamicallyBreno Leitao
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from the private struct by converting it into a pointer. Then use the leverage the new alloc_netdev_dummy() helper to allocate and initialize dummy devices. [1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/ Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-18ibmvnic: Return error code on TX scrq flush failNick Child
In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then it will inform upper level networking functions to disable tx queues. H_CLOSED signals that the connection with the vnic server is down and a transport event is expected to recover the device. Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success. Therefore, the queues would remain active until ibmvnic_cleanup() is called within do_reset(). The problem is that do_reset() depends on the RTNL lock. If several ibmvnic devices are resetting then there can be a long wait time until the last device can grab the lock. During this time the tx/rx queues still appear active to upper level functions. FYI, we do make a call to netif_carrier_off() outside the RTNL lock but its calls to dev_deactivate() are also dependent on the RTNL lock. As a result, large amounts of retransmissions were observed in a short period of time, eventually leading to ETIMEOUT. This was specifically seen with HNV devices, likely because of even more RTNL dependencies. Therefore, ensure the return code of ibmvnic_tx_scrq_flush() is propagated to the xmit function to allow for an earlier (and lock-less) response to a transport event. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://lore.kernel.org/r/20240416164128.387920-1-nnac123@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-08mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-13ibmvnic: replace deprecated strncpy with strscpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. NUL-padding is not required as the buffer is already memset to 0: | memset(adapter->fw_version, 0, 32); Note that another usage of strscpy exists on the same buffer: | strscpy((char *)adapter->fw_version, "N/A", sizeof(adapter->fw_version)); Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11netdev: replace napi_reschedule with napi_scheduleChristian Marangi
Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04ibmveth: Remove condition to recompute TCP header checksum.David Wilder
In some OVS environments the TCP pseudo header checksum may need to be recomputed. Currently this is only done when the interface instance is configured for "Trunk Mode". We found the issue also occurs in some Kubernetes environments, these environments do not use "Trunk Mode", therefor the condition is removed. Performance tests with this change show only a fractional decrease in throughput (< 0.2%). Fixes: 7525de2516fb ("ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM.") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-20net: ethernet: ibm: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Eventually after all drivers are converted, .remove_new() is renamed to .remove(). Trivially convert these drivers from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h f866fbc842de ("ipv4: fix data-races around inet->inet_id") c274af224269 ("inet: introduce inet->inet_flags") https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/ Adjacent changes: drivers/net/bonding/bond_alb.c e74216b8def3 ("bonding: fix macvlan over alb bond support") f11e5bd159b0 ("bonding: support balance-alb with openvswitch") drivers/net/ethernet/broadcom/bgmac.c d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()") 23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()") drivers/net/ethernet/broadcom/genet/bcmmii.c 32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()") acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()") net/sctp/socket.c f866fbc842de ("ipv4: fix data-races around inet->inet_id") b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-23ibmveth: Use dcbf rather than dcbflMichael Ellerman
When building for power4, newer binutils don't recognise the "dcbfl" extended mnemonic. dcbfl RA, RB is equivalent to dcbf RA, RB, 1. Switch to "dcbf" to avoid the build error. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10ibmvnic: Ensure login failure recovery is safe from other resetsNick Child
If a login request fails, the recovery process should be protected against parallel resets. It is a known issue that freeing and registering CRQ's in quick succession can result in a failover CRQ from the VIOS. Processing a failover during login recovery is dangerous for two reasons: 1. This will result in two parallel initialization processes, this can cause serious issues during login. 2. It is possible that the failover CRQ is received but never executed. We get notified of a pending failover through a transport event CRQ. The reset is not performed until a INIT CRQ request is received. Previously, if CRQ init fails during login recovery, then the ibmvnic irq is freed and the login process returned error. If failover_pending is true (a transport event was received), then the ibmvnic device would never be able to process the reset since it cannot receive the CRQ_INIT request due to the irq being freed. This leaved the device in a inoperable state. Therefore, the login failure recovery process must be hardened against these possible issues. Possible failovers (due to quick CRQ free and init) must be avoided and any issues during re-initialization should be dealt with instead of being propagated up the stack. This logic is similar to that of ibmvnic_probe(). Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-5-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10ibmvnic: Do partial reset on login failureNick Child
Perform a partial reset before sending a login request if any of the following are true: 1. If a previous request times out. This can be dangerous because the VIOS could still receive the old login request at any point after the timeout. Therefore, it is best to re-register the CRQ's and sub-CRQ's before retrying. 2. If the previous request returns an error that is not described in PAPR. PAPR provides procedures if the login returns with partial success or aborted return codes (section L.5.1) but other values do not have a defined procedure. Previously, these conditions just returned error from the login function rather than trying to resolve the issue. This can cause further issues since most callers of the login function are not prepared to handle an error when logging in. This improper cleanup can lead to the device being permanently DOWN'd. For example, if the VIOS believes that the device is already logged in then it will return INVALID_STATE (-7). If we never re-register CRQ's then it will always think that the device is already logged in. This leaves the device inoperable. The partial reset involves freeing the sub-CRQs, freeing the CRQ then registering and initializing a new CRQ and sub-CRQs. This essentially restarts all communication with VIOS to allow for a fresh login attempt that will be unhindered by any previous failed attempts. Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-4-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10ibmvnic: Handle DMA unmapping of login buffs in release functionsNick Child
Rather than leaving the DMA unmapping of the login buffers to the login response handler, move this work into the login release functions. Previously, these functions were only used for freeing the allocated buffers. This could lead to issues if there are more than one outstanding login buffer requests, which is possible if a login request times out. If a login request times out, then there is another call to send login. The send login function makes a call to the login buffer release function. In the past, this freed the buffers but did not DMA unmap. Therefore, the VIOS could still write to the old login (now freed) buffer. It is for this reason that it is a good idea to leave the DMA unmap call to the login buffers release function. Since the login buffer release functions now handle DMA unmapping, remove the duplicate DMA unmapping in handle_login_rsp(). Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10ibmvnic: Unmap DMA login rsp buffer on send login failNick Child
If the LOGIN CRQ fails to send then we must DMA unmap the response buffer. Previously, if the CRQ failed then the memory was freed without DMA unmapping. Fixes: c98d9cc4170d ("ibmvnic: send_login should check for crq errors") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10ibmvnic: Enforce stronger sanity checks on login responseNick Child
Ensure that all offsets in a login response buffer are within the size of the allocated response buffer. Any offsets or lengths that surpass the allocation are likely the result of an incomplete response buffer. In these cases, a full reset is necessary. When attempting to login, the ibmvnic device will allocate a response buffer and pass a reference to the VIOS. The VIOS will then send the ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled with data. If the ibmvnic device does not get a response in 20 seconds, the old buffer is freed and a new login request is sent. With 2 outstanding requests, any LOGIN_RSP CRQ's could be for the older login request. If this is the case then the login response buffer (which is for the newer login request) could be incomplete and contain invalid data. Therefore, we must enforce strict sanity checks on the response buffer values. Testing has shown that the `off_rxadd_buff_size` value is filled in last by the VIOS and will be the smoking gun for these circumstances. Until VIOS can implement a mechanism for tracking outstanding response buffers and a method for mapping a LOGIN_RSP CRQ to a particular login response buffer, the best ibmvnic can do in this situation is perform a full reset. Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-06ibmvnic: remove unused rc variableYu Liao
gcc with W=1 reports drivers/net/ethernet/ibm/ibmvnic.c:194:13: warning: variable 'rc' set but not used [-Wunused-but-set-variable] ^ This variable is not used so remove it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308040609.zQsSXWXI-lkp@intel.com/ Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-27net: Explicitly include correct DT includesRob Herring
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Acked-by: Alex Elder <elder@linaro.org> Reviewed-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230727014944.3972546-1-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-29ibmvnic: Do not reset dql stats on NON_FATAL errNick Child
All ibmvnic resets, make a call to netdev_tx_reset_queue() when re-opening the device. netdev_tx_reset_queue() resets the num_queued and num_completed byte counters. These stats are used in Byte Queue Limit (BQL) algorithms. The difference between these two stats tracks the number of bytes currently sitting on the physical NIC. ibmvnic increases the number of queued bytes though calls to netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports that it is done transmitting bytes, the ibmvnic device increases the number of completed bytes through calls to netdev_tx_completed_queue(). It is important to note that the driver batches its transmit calls and num_queued is increased every time that an skb is added to the next batch, not necessarily when the batch is sent to VIOS for transmission. Unlike other reset types, a NON FATAL reset will not flush the sub crq tx buffers. Therefore, it is possible for the batched skb array to be partially full. So if there is call to netdev_tx_reset_queue() when re-opening the device, the value of num_queued (0) would not account for the skb's that are currently batched. Eventually, when the batch is sent to VIOS, the call to netdev_tx_completed_queue() would increase num_completed to a value greater than the num_queued. This causes a BUG_ON crash: ibmvnic 30000002: Firmware reports error, cause: adapter problem. Starting recovery... ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ibmvnic 30000002: tx error 600 ------------[ cut here ]------------ kernel BUG at lib/dynamic_queue_limits.c:27! Oops: Exception in kernel mode, sig: 5 [....] NIP dql_completed+0x28/0x1c0 LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic] Call Trace: ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable) ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic] __handle_irq_event_percpu+0x98/0x270 ---[ end trace ]--- Therefore, do not reset the dql stats when performing a NON_FATAL reset. Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-16net: Use of_property_read_bool() for boolean propertiesRob Herring
It is preferred to use typed property access functions (i.e. of_property_read_<type> functions) rather than low-level of_get_property/of_find_property functions for reading properties. Convert reading boolean properties to of_property_read_bool(). Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can Acked-by: Kalle Valo <kvalo@kernel.org> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-24ibmvnic: Assign XPS map to correct queue indexNick Child
When setting the XPS map value for TX queues, use the index of the transmit queue. Previously, the function was passing the index of the loop that iterates over all queues (RX and TX). This was causing invalid XPS map values. Fixes: 6831582937bd ("ibmvnic: Toggle between queue types in affinity mapping") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230223153944.44969-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31ibmvnic: Toggle between queue types in affinity mappingNick Child
Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all the IRQs for transmit queues then assigning all the IRQs for receive queues. With multi-threaded processors, in a heavy RX or TX environment, physical cores would either be overloaded or underutilized (due to the IRQ assignment algorithm). This approach is sub-optimal because IRQs for the same subprocess (RX or TX) would be bound to adjacent CPU numbers, meaning they were more likely to be contending for the same core. For example, in a system with 64 CPU's and 32 queues, the IRQs would be bound to CPU in the following pattern: IRQ type | CPU number ----------------------- TX0 | 0-1 TX1 | 2-3 <etc> RX0 | 32-33 RX1 | 34-35 <etc> Observe that in SMT-8, the first 4 tx queues would be sharing the same core. A more optimal algorithm would balance the number RX and TX IRQ's across the physical cores. Therefore, to increase performance, distribute RX and TX IRQs across cores by alternating between assigning IRQs for RX and TX queues to CPUs. With a system with 64 CPUs and 32 queues, this results in the following pattern: IRQ type | CPU number ----------------------- TX0 | 0-1 RX0 | 2-3 TX1 | 4-5 RX1 | 6-7 <etc> Observe that in SMT-8, there is equal distribution of RX and TX IRQs per core. In the above case, each core handles 2 TX and 2 RX IRQ's. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Link: https://lore.kernel.org/r/20230127214358.318152-1-nnac123@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-14ibmvnic: Update XPS assignments during affinity bindingNick Child
Transmit Packet Steering (XPS) maps cpu numbers to transmit queues. By running the same connection on the same set of cpu's, contention for the queue and cache miss rate can be minimized. When assigning a cpu mask for a tranmit queues irq number, assign the same cpu mask as the set of cpu's that XPS should use for that queue. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-14ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hintsNick Child
When CPU's are added and removed, ibmvnic devices will reassign hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD to signal to ibmvnic devices that the CPU has been removed and it is time to reset affinity hint assignments. On the other hand, when CPU's are being added, add a state instance to CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity hints once the new CPU's are online. This implementation is based on the virtio_net driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-14ibmvnic: Assign IRQ affinity hints to device queuesNick Child
Assign affinity hints to ibmvnic device queue interrupts. Affinity hints are assigned and removed during sub-crq init and teardown, respectively. This update should improve latency if utilized as interrupt lines and processing are more equally distributed among CPU's. This implementation is based on the virtio_net driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/can/pch_can.c ae64438be192 ("can: dev: fix skb drop check") 1dd1b521be85 ("can: remove obsolete PCH CAN driver") https://lore.kernel.org/all/20221110102509.1f7d63cc@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-08ibmveth: Reduce default tx queues to 8Nick Child
Previously, the default number of transmit queues was 16. Due to resource concerns, set to 8 queues instead. Still allow the user to set more queues (max 16) if they like. Since the driver is virtualized away from the physical NIC, the purpose of multiple queues is purely to allow for parallel calls to the hypervisor. Therefore, there is no noticeable effect on performance by reducing queue count to 8. Fixes: d926793c1de9 ("ibmveth: Implement multi queue on xmit") Reported-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://lore.kernel.org/r/20221107203215.58206-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-02ibmvnic: Free rwi on reset successNick Child
Free the rwi structure in the event that the last rwi in the list processed successfully. The logic in commit 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets") introduces an issue that results in a 32 byte memory leak whenever the last rwi in the list gets processed. Fixes: 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://lore.kernel.org/r/20221031150642.13356-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c 2871edb32f46 ("can: kvaser_usb: Fix possible completions during init_completion") abb8670938b2 ("can: kvaser_usb_leaf: Ignore stale bus-off after start") 8d21f5927ae6 ("can: kvaser_usb_leaf: Fix improved state not being reported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27net: ehea: fix possible memory leak in ehea_register_port()Yang Yingliang
If of_device_register() returns error, the of node and the name allocated in dev_set_name() is leaked, call put_device() to give up the reference that was set in device_initialize(), so that of node is put in logical_port_release() and the name is freed in kobject_cleanup(). Fixes: 1acf2318dd13 ("ehea: dynamic add / remove port") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221025130011.1071357-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-24ibmveth: Always stop tx queues during closeNick Child
netif_stop_all_queues must be called before calling H_FREE_LOGICAL_LAN. As a result, we can remove the pool_config field from the ibmveth adapter structure. Some device configuration changes call ibmveth_close in order to free the current resources held by the device. These functions then make their changes and call ibmveth_open to reallocate and reserve resources for the device. Prior to this commit, the flag pool_config was used to tell ibmveth_close that it should not halt the transmit queue. pool_config was introduced in commit 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically") to avoid interrupting the tx flow when making rx config changes. Since then, other commits adopted this approach, even if making tx config changes. The issue with this approach was that the hypervisor freed all of the devices control structures after the hcall H_FREE_LOGICAL_LAN was performed but the transmit queues were never stopped. So the higher layers in the network stack would continue transmission but any H_SEND_LOGICAL_LAN hcall would fail with H_PARAMETER until the hypervisor's structures for the device were allocated with the H_REGISTER_LOGICAL_LAN hcall in ibmveth_open. This resulted in no real networking harm but did cause several of these error messages to be logged: "h_send_logical_lan failed with rc=-4" So, instead of trying to keep the transmit queues alive during network configuration changes, just stop the queues, make necessary changes then restart the queues. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30ibmveth: Ethtool set queue supportNick Child
Implement channel management functions to allow dynamic addition and removal of transmit queues. The `ethtool --show-channels` and `ethtool --set-channels` commands can be used to get and set the number of queues, respectively. Allow the ability to add as many transmit queues as available processors but never allow more than the hard maximum of 16. The number of receive queues is one and cannot be modified. Depending on whether the requested number of queues is larger or smaller than the current value, either allocate or free long term buffers. Since long term buffer construction and destruction can occur in two different areas, from either channel set requests or device open/close, define functions for performing this work. If allocation of a new buffer fails, then attempt to revert back to the previous number of queues. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30ibmveth: Implement multi queue on xmitNick Child
The `ndo_start_xmit` function is protected by a spinlock on the tx queue being used to transmit the skb. Allow concurrent calls to `ndo_start_xmit` by using more than one tx queue. This allows for greater throughput when several jobs are trying to transmit data. Introduce 16 tx queues (leave single rx queue as is) which each correspond to one DMA mapped long term buffer. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30ibmveth: Copy tx skbs into a premapped bufferNick Child
Rather than DMA mapping and unmapping every outgoing skb, copy the skb into a buffer that was mapped during the drivers open function. Copying the skb and its frags have proven to be more time efficient than mapping and unmapping. As an effect, performance increases by 3-5 Gbits/s. Allocate and DMA map one continuous 64KB buffer at `ndo_open`. This buffer is maintained until `ibmveth_close` is called. This buffer is large enough to hold the largest possible linnear skb. During `ndo_start_xmit`, copy the skb and all of it's frags into the continuous buffer. By manually linnearizing all the socket buffers, time is saved during memcpy as well as more efficient handling in FW. As a result, we no longer need to worry about the firmware limitation of handling a max of 6 frags. So, we only need to maintain 1 descriptor instead of 6 and can hardcode 0 for the other 5 descriptors during h_send_logical_lan. Since, DMA allocation/mapping issues can no longer arise in xmit functions, we can further reduce code size by removing the need for a bounce buffer on DMA errors. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28net: drop the weight argument from netif_napi_addJakub Kicinski
We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: ibm: emac: Switch to use dev_err_probe() helperYang Yingliang
dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31net: ethernet: move from strlcpy with unused retval to strscpyWolfram Sang
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2 Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5} Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-04ibmvnic: Properly dispose of all skbs during a failover.Rick Lindsley
During a reset, there may have been transmits in flight that are no longer valid and cannot be fulfilled. Resetting and clearing the queues is insufficient; each skb also needs to be explicitly freed so that upper levels are not left waiting for confirmation of a transmit that will never happen. If this happens frequently enough, the apparent backlog will cause TCP to begin "congestion control" unnecessarily, culminating in permanently decreased throughput. Fixes: d7c0ef36bde03 ("ibmvnic: Free and re-allocate scrqs when tx/rx scrqs change") Tested-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Rick Lindsley <ricklind@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-02net: add skb_[inner_]tcp_all_headers helpersEric Dumazet
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)" to compute headers length for a TCP packet, but others use more convoluted (but equivalent) ways. Add skb_tcp_all_headers() and skb_inner_tcp_all_headers() helpers to harmonize this a bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08eth: switch to netif_napi_add_weight()Jakub Kicinski
Switch all Ethernet drivers which use custom napi weights to the new API. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-05net: ethernet: Prepare cleanup of powerpc's asm/prom.hChristophe Leroy
powerpc's asm/prom.h includes some headers that it doesn't need itself. In order to clean powerpc's asm/prom.h up in a further step, first clean all files that include asm/prom.h Some files don't need asm/prom.h at all. For those ones, just remove inclusion of asm/prom.h Some files don't need any of the items provided by asm/prom.h, but need some of the headers included by asm/prom.h. For those ones, add the needed headers that are brought by asm/prom.h at the moment and remove asm/prom.h Some files really need asm/prom.h but also need some of the headers included by asm/prom.h. For those one, leave asm/prom.h but also add the needed headers so that they can be removed from asm/prom.h in a later step. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/09a13d592d628de95d30943e59b2170af5b48110.1651663857.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/netdevice.h net/core/dev.c 6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats") 794c24e9921f ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/ drivers/net/wan/cosa.c d48fea8401cf ("net: cosa: fix error check return value of register_chrdev()") 89fbca3307d4 ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"Dany Madden
This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae When client requests channel or ring size larger than what the server can support the server will cap the request to the supported max. So, the client would not be able to successfully request resources that exceed the server limit. Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits") Signed-off-by: Dany Madden <drt@linux.ibm.com> Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15ibmvnic: Allow multiple ltbs in txpool ltb_setSukadev Bhattiprolu
Allow multiple LTBs in the txpool's ltb_set. i.e rather than using a single large LTB, use several smaller LTBs. The first n-1 LTBs will all be of the same size. The size of the last LTB in the set depends on the number of buffers and buffer (mtu) size. This strategy hopefully allows more reuse of the initial LTBs and also reduces the chances of an allocation failure (of the large LTB) when system is low in memory. Suggested-by: Brian King <brking@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15ibmvnic: Allow multiple ltbs in rxpool ltb_setSukadev Bhattiprolu
Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all be of the same size. The size of the last LTB in the set depends on the number of buffers and buffer (mtu) size. Having a set of LTBs per pool provides a couple of benefits. First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation can fail in low memory conditions. With a set of LTBs per pool, we can use several smaller (8MB) LTBs and hopefully have fewer allocation failures. (See also comments in ibmvnic.h on the trade-off with smaller LTBs) Second since the kernel limits the size of the DMA buffer to 16MB (based on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited to 16MB. This in turn limits the number of buffers per pool to 1763 when MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096 buffers per pool even when MTU is 9000. Suggested-by: Brian King <brking@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-15ibmvnic: convert rxpool ltb to a set of ltbsSukadev Bhattiprolu
Define and use interfaces that treat the long term buffer (LTB) of an rxpool as a set of LTBs rather than a single LTB. The set only has one LTB for now. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>