git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-10-01	net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQ	Maxim Mikityanskiy
	XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on legacy RQ, adding a special case for XSK. The new branch introduced basically replaces the branch that was removed from the same place a few commits before. A check is made that DMA sync is not needed, because the batching allocator falls back to returning one frame when DMA sync is needed, and this is best handled by the loop in the standard case. Performance improvement is up to 8% in the aligned mode and up to 9% in the unaligned mode. Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQ	Maxim Mikityanskiy
	Allocation of XSK frames on legacy RQ may be made more efficient with a specialized routine that relies on certain assumptions, such as there is only one fragment, allocation units (XSK frames) are not shared among multiple packets. It reduces the number of branches both in the XSK code and in the regular RQ, because with this approach there is only a single check whether it's an XSK or regular RQ. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: Remove the outer loop when allocating legacy RQ WQEs	Maxim Mikityanskiy
	Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As partial batches are allowed, there is no point to have a loop in a loop, so the outer loop is removed, and the batch size is increased up to the total number of WQEs to allocate, still not smaller than 8. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: xsk: Use partial batches in legacy RQ with XSK	Maxim Mikityanskiy
	The previous commit allowed allocating WQE batches in legacy RQ partially, however, XSK still checks whether there are enough frames in the fill ring. Remove this check to allow to allocate batches partially also with XSK. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: Use partial batches in legacy RQ	Maxim Mikityanskiy
	Legacy RQ allocates WQEs in batches. If the batch allocation fails, the pages of the allocated part are released. This commit changes this behavior to allow to use the pages that have been already allocated. After this change, we need to be careful about indexing rq->wqe.frags[]. The WQ size is a power of two that divides by wqe_bulk (8), and the old code used whole bulks, which allowed to use indices [8K; 8K+7] without overflowing. Now that the bulks may be partial, the range can start at any location (not only at 8*K), so we need to wrap them around to avoid out-of-bounds array access. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: Make the wqe_index_mask calculation more exact	Maxim Mikityanskiy
	The old calculation of wqe_index_mask may give false positives, i.e. request bulking of pairs of WQEs when not strictly needed, for example, when the first fragment size is equal to the PAGE_SIZE, bulking is not needed, even if the number of fragments is odd. Make the calculation more exact to cut false positives. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: Introduce wqe_index_mask for legacy RQ	Maxim Mikityanskiy
	When fragments of different WQEs share the same page, mlx5e_post_rx_wqes must wait until the old WQE stops using the page, only then the new WQE can allocate the new page. Essentially, it means that if WQE index i is still in use, the allocation must stop before `i % bulk`, where bulk is the number of WQEs that may share the same page. As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the new wqe_index_mask field will be equal to `bulk - 1`. At the same time, wqe_bulk remains for optimization purposes and stores `max(bulk, 8)`, which allows to skip the allocation until we have at least 8 WQEs free. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: xsk: Drop the check for XSK state in mlx5e_xsk_wakeup	Maxim Mikityanskiy
	The MLX5E_CHANNEL_STATE_XSK flag checked in mlx5e_xsk_wakeup indicates that XSK queues are open, but not necessarily activated. This check is not very useful, because: 0. Both XSK setup and netdev state transitions take the same state_lock mutex, so they can't happen at the same time. 1. If the netdev is up, xsk_is_bound can return true only when MLX5E_CHANNEL_STATE_XSK is set on the corresponding channel. mlx5e_xsk_wakeup is only called when xsk_is_bound is true. 2. If the XSK socket is bound, and the netdev is going up or down, mlx5e_xsk_wakeup can take one of two branches, depending on the return value of napi_if_scheduled_mark_missed: 2.1. True means one of two things: either NAPI was enabled at this point, which means MLX5E_CHANNEL_STATE_XSK was also set; or NAPI was disabled, and nothing really happened. 2.2. False means that NAPI was enabled by this point, which also implies MLX5E_CHANNEL_STATE_XSK was set. Additionally, mlx5e_xsk_wakeup contains a following check for MLX5E_SQ_STATE_ENABLED on async_icosq, and this flag implies MLX5E_CHANNEL_STATE_XSK too on XSK channels. As checking this flag doesn't cut any flows, remove the check. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-01	net/mlx5e: xsk: Use mlx5e_trigger_napi_icosq for XSK wakeup	Maxim Mikityanskiy
	mlx5e_xsk_wakeup triggers an IRQ by posting a NOP to async_icosq, taking a spinlock to protect from concurrent access. There is already a function that does the same: mlx5e_trigger_napi_icosq. Use this function in mlx5e_xsk_wakeup. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear	Daniel Golle
	Setting ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine as done by commit 0e80707d94e4c8 ("net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear") breaks flow offloading, at least on older MTK_NETSYS_V1 SoCs, OpenWrt users have confirmed the bug on MT7622 and MT7621 systems. Felix Fietkau suggested to use MTK_FOE_STATE_INVALID instead which works well on both, MTK_NETSYS_V1 and MTK_NETSYS_V2. Tested on MT7622 (Linksys E8450) and MT7986 (BananaPi BPI-R3). Suggested-by: Felix Fietkau <nbd@nbd.name> Fixes: 0e80707d94e4c8 ("net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear") Fixes: 33fc42de33278b ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/YzY+1Yg0FBXcnrtc@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	nfp: add support restart of link auto-negotiation	Fei Qin
	Add support restart of link auto-negotiation. This may be initiated using: # ethtool -r <intf> Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	nfp: add support for link auto negotiation	Yinjun Zhang
	Report the auto negotiation capability if it's supported in management firmware, and advertise it if it's enabled. Changing port speed is not allowed when autoneg is enabled. The ethtool <intf> command displays the auto-neg capability: # ethtool enp1s0np0 Settings for enp1s0np0: Supported ports: [ FIBRE ] Supported link modes: Not reported Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: None RS BASER Advertised link modes: Not reported Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: None RS BASER Speed: 25000Mb/s Duplex: Full Auto-negotiation: on Port: FIBRE PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	nfp: refine the ABI of getting `sp_indiff` info	Yinjun Zhang
	Considering that whether application firmware is indifferent to port speed is a firmware property instead of port property, now use a new rtsym to get the property instead of parsing per-port tlv caps. With this change, relevant code is moved to `nfp_main` layer. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	nfp: avoid halt of driver init process when non-fatal error happens	Yinjun Zhang
	It's not a fatal error when setting `hwinfo` into management firmware fails, no need to halt the whole driver initialization process. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	nfp: add support for reporting active FEC mode	Yinjun Zhang
	The latest management firmware can now report the active FEC mode. Adapt driver accordingly so that user can get the active FEC mode by running command: # ethtool --show-fec <intf> Also correct use of `fec` field. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	r8169: add rtl_disable_rxdvgate()	Chunhao Lin
	rtl_disable_rxdvgate() is used for disable RXDV_GATE. It is opposite function of rtl_enable_rxdvgate(). Disable RXDV_GATE does not have to delay. So in this patch, also remove the delay after disale RXDV_GATE. Signed-off-by: Chunhao Lin <hau@realtek.com> Link: https://lore.kernel.org/r/20220928171356.3951-1-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	Merge tag 'wireless-next-2022-09-30' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.1 Few stack changes and lots of driver changes in this round. brcmfmac has more activity as usual and it gets new hardware support. ath11k improves WCN6750 support and also other smaller features. And of course changes all over. Note: in early September wireless tree was merged to wireless-next to avoid some conflicts with mac80211 patches, this shouldn't cause any problems but wanted to mention anyway. Major changes: mac80211 - refactoring and preparation for Wi-Fi 7 Multi-Link Operation (MLO) feature continues brcmfmac - support CYW43439 SDIO chipset - support BCM4378 on Apple platforms - support CYW89459 PCIe chipset rtw89 - more work to get rtw8852c supported - P2P support - support for enabling and disabling MSDU aggregation via nl80211 mt76 - tx status reporting improvements ath11k - cold boot calibration support on WCN6750 - Target Wake Time (TWT) debugfs support for STA interface - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - implement SRAM dump debugfs interface - enable threaded NAPI on all hardware - WoW support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz wcn36xx - add SNR from a received frame as a source of system entropy * tag 'wireless-next-2022-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (231 commits) wifi: rtl8xxxu: Improve rtl8xxxu_queue_select wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM wifi: rtl8xxxu: gen2: Enable 40 MHz channel width wifi: rtw89: 8852b: configure DLE mem wifi: rtw89: check DLE FIFO size with reserved size wifi: rtw89: mac: correct register of report IMR wifi: rtw89: pci: set power cut closed for 8852be wifi: rtw89: pci: add to do PCI auto calibration wifi: rtw89: 8852b: implement chip_ops::{enable,disable}_bb_rf wifi: rtw89: add DMA busy checking bits to chip info wifi: rtw89: mac: define DMA channel mask to avoid unsupported channels wifi: rtw89: pci: mask out unsupported TX channels iwlegacy: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper ipw2x00: Replace zero-length array with DECLARE_FLEX_ARRAY() helper wifi: iwlwifi: Track scan_cmd allocation size explicitly brcmfmac: Remove the call to "dtim_assoc" IOVAR brcmfmac: increase dcmd maximum buffer size brcmfmac: Support 89459 pcie brcmfmac: increase default max WOWL patterns to 16 cw1200: fix incorrect check to determine if no element is found in list ... ==================== Link: https://lore.kernel.org/r/20220930150413.A7984C433D6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Clean up and fix error flows in mlx5e_alloc_rq	Maxim Mikityanskiy
	Although mlx5e_rq_free_shampo can be called unconditionally, it belongs to case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ. Move it there to allow to add more init/cleanup actions to the striding RQ case. If xdp_rxq_info_reg_mem_model fails, don't forget to destroy the page pool. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Move repeating clear_bit in mlx5e_rx_reporter_err_rq_cqe_recover	Maxim Mikityanskiy
	The same clear_bit is called in both error and success flows. Move the call to do it only once and remove the out label. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Split out channel (de)activation in rx_res	Maxim Mikityanskiy
	To decrease the nesting level and reduce duplication of code, create functions to redirect direct RQTs to the actual RQs or drop_rq, which are used in the activation and deactivation flows of channels. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: xsk: Remove mlx5e_xsk_page_alloc_pool	Maxim Mikityanskiy
	mlx5e_xsk_page_alloc_pool became a thin wrapper around xsk_buff_alloc. Drop it and call xsk_buff_alloc directly. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Convert struct mlx5e_alloc_unit to a union	Maxim Mikityanskiy
	struct mlx5e_alloc_unit consists of a single union. Convert it to a union itself to simplify casting it to struct xdp_buff *, which will be used to implement XSK batching on striding RQ. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Remove DMA address from mlx5e_alloc_unit	Maxim Mikityanskiy
	mlx5e_alloc_unit stores the DMA address and a pointer to either struct page (regular RQ) or struct xdp_buff (XSK RQ). This DMA address is redundant, because when a page or an XSK frame is allocated, the same address is also stored there. Some flows take the address from struct mlx5e_alloc_unit, and some take it from struct page or xdp_buff. This commit removes the address from struct mlx5e_alloc_unit, which makes it twice as small and improves locality (this struct is used in an array), also saving on unnecessary stores to the addr field. Almost all flows know unambiguously whether the DMA address should be taken from page or from xdp_buff. The exception is the allocation flows, where a new branch appeared, which will be optimized out in the next commits. struct mlx5e_alloc_unit used to be called mlx5e_dma_info. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Rename mlx5e_dma_info to prepare for removal of DMA address	Maxim Mikityanskiy
	The next commit will remove the DMA address from the struct currently called mlx5e_dma_info, because the same value can be retrieved with page_pool_get_dma_addr(page) in almost all cases, with the notable exception of SHAMPO (HW GRO implementation) that modifies this address on the fly, after the initial allocation. To keep the SHAMPO logic intact, struct mlx5e_dma_info remains in the SHAMPO code, consisting of addr and page (XSK is not compatible with SHAMPO). The struct used in all other places is renamed to mlx5e_alloc_unit, allowing the next commit to remove the addr field without affecting SHAMPO. The new name means "allocation unit", and it's more appropriate after the field with the DMA address gets removed. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Optimize the page cache reducing its size 2x	Maxim Mikityanskiy
	RX page cache stores dma_info structs, that consist of a pointer to struct page and a DMA address. In fact, the DMA address is extracted from struct page using page_pool_get_dma_addr when a page is pushed to the cache. By moving this call to the point when a page is popped from the cache, we can avoid storing the DMA address in the cache, effectively reducing its size by two times without losing any functionality. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Fix calculations for ICOSQ size	Maxim Mikityanskiy
	WQEs must not cross page boundaries, they are padded with NOPs if they don't fit the page. mlx5e_mpwrq_total_umr_wqebbs doesn't take into account this padding, risking reserving not enough space. The padding is not straightforward to add to this calculation, because WQEs of different sizes may be mixed together in the queue. If each page ends with a big WQE that doesn't fit and requires at most its size minus 1 WQEBB of padding, the total space can be much bigger than in case when smaller WQEs take advantage of this padding. Replace the wrong exact calculation by the following estimation. Each padding can be at most the size of the maximum WQE used in the queue minus one WQEBB. Let's call the rest of the page "useful space". If we divide the total size of all needed WQEs by this useful space, rounding up, we'll get the number of pages, which is enough to contain all these WQEs. It's correct, because every WQE that appeared on the boundary between two blocks of useful space would start in the useful space of one page and end in the padding of the same page, while our estimation reserved space for its tail in the next space, making the estimation not smaller than the real space occupied in the queue. The code actually uses a looser estimation: instead of taking the maximum size of all used WQE types minus 1 WQEBB, it takes the maximum hardware size of a WQE. It's made for simplicity and extensibility. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: xsk: Use KSM for unaligned XSK	Maxim Mikityanskiy
	UMR MTTs used in striding RQ have certain alignment requirements. While it's guaranteed to work when UMR pages are aligned to the UMR page size, in practice it works then UMR pages are aligned to 8 bytes. However, it's still not enough flexibility for the unaligned mode of XSK. This patch leverages KSM to map UMR pages without alignment requirements, when unaligned XSK is active. The downside is that KSM entries are twice as big as MTTs, which limits the maximum WQE size, so regular RQs and aligned XSK continue using MTTs. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5: Add MLX5_FLEXIBLE_INLEN to safely calculate cmd inlen	Maxim Mikityanskiy
	Some commands use a flexible array after a common header. Add a macro to safely calculate the total input length of the command, detecting overflows and printing errors with specific values when such overflows happen. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Keep a separate MKey for striding RQ	Maxim Mikityanskiy
	Currently, rq->mkey_be keeps a big-endian value of either the PA MKey (for legacy RQ, no address translation) or MTT MKey (for striding RQ, direct address translation). Striding RQ stores the same value in rq->umr_mkey in the native endianness. The next commit will make striding RQ use KSM MKey (indirect address translation) for the unaligned mode of XSK, which will require storing both KSM MKey and PA MKey in the RQ struct. This commit optimizes fields of mlx5e_rq: umr_mkey is removed (it's redundant), mkey_be always points to the PA MKey, and mpwqe.umr_mkey_be points to the MTT MKey (or to the KSM MKey, starting from the next commit). Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: xsk: Use XSK frame size as striding RQ page size	Maxim Mikityanskiy
	XSK RQs support striding RQ linear mode, but the stride size is always set to PAGE_SIZE. It may be larger than the XSK frame size, unnecessarily reducing the useful space in a WQE, but more importantly causing UMEM data corruption in certain cases. Normally, stride size bigger than XSK frame size is not a problem if the hardware enforces the MTU. However, traffic between vports skips the hardware MTU check, and oversized packets may be received. If an oversized packet is bigger than the XSK frame but not bigger than the stride, it will cause overwriting of the adjacent UMEM region. If the packet takes more than one stride, they can be recycled for reuse so it's not a problem when the XSK frame size matches the stride size. To reduce the impact of the above issue, attempt to use the MTT page size for striding RQ that matches the XSK frame size, allowing to safely use 2048-byte frames on an up-to-date firmware. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net/mlx5e: Use runtime page_shift for striding RQ	Maxim Mikityanskiy
	This commit allows striding RQ to determine MTT page size at runtime, instead of sticking to the compile-time PAGE_SIZE. This functionality will be used by a following commit that adjusts the MTT page size to the XSK frame size. Stick with PAGE_SIZE for XSK on legacy RQ, as frag_stride is not used in data path, it only helps calculate how pages are partitioned into fragments, and PAGE_SIZE will ensure each fragment starts at the beginning of a new allocation unit (XSK frame). Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-30	net: stmmac: add a parse for new property 'snps,clk-csr'	Jianguo Zhang
	Parse new property 'snps,clk-csr' firstly because the new property is documented in binding file, if failed, fall back to old property 'clk_csr' for legacy case Signed-off-by: Jianguo Zhang <jianguo.zhang@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	net/mlx5: Fix spelling mistake "syndrom" -> "syndrome"	Colin Ian King
	There is a spelling mistake in a devlink_health_report message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	net: bna: Fix spelling mistake "muliple" -> "multiple"	Colin Ian King
	There is a spelling mistake in a literal string in the array bnad_net_stats_strings. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	ibmveth: Ethtool set queue support	Nick Child
	Implement channel management functions to allow dynamic addition and removal of transmit queues. The `ethtool --show-channels` and `ethtool --set-channels` commands can be used to get and set the number of queues, respectively. Allow the ability to add as many transmit queues as available processors but never allow more than the hard maximum of 16. The number of receive queues is one and cannot be modified. Depending on whether the requested number of queues is larger or smaller than the current value, either allocate or free long term buffers. Since long term buffer construction and destruction can occur in two different areas, from either channel set requests or device open/close, define functions for performing this work. If allocation of a new buffer fails, then attempt to revert back to the previous number of queues. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	ibmveth: Implement multi queue on xmit	Nick Child
	The `ndo_start_xmit` function is protected by a spinlock on the tx queue being used to transmit the skb. Allow concurrent calls to `ndo_start_xmit` by using more than one tx queue. This allows for greater throughput when several jobs are trying to transmit data. Introduce 16 tx queues (leave single rx queue as is) which each correspond to one DMA mapped long term buffer. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	ibmveth: Copy tx skbs into a premapped buffer	Nick Child
	Rather than DMA mapping and unmapping every outgoing skb, copy the skb into a buffer that was mapped during the drivers open function. Copying the skb and its frags have proven to be more time efficient than mapping and unmapping. As an effect, performance increases by 3-5 Gbits/s. Allocate and DMA map one continuous 64KB buffer at `ndo_open`. This buffer is maintained until `ibmveth_close` is called. This buffer is large enough to hold the largest possible linnear skb. During `ndo_start_xmit`, copy the skb and all of it's frags into the continuous buffer. By manually linnearizing all the socket buffers, time is saved during memcpy as well as more efficient handling in FW. As a result, we no longer need to worry about the firmware limitation of handling a max of 6 frags. So, we only need to maintain 1 descriptor instead of 6 and can hardcode 0 for the other 5 descriptors during h_send_logical_lan. Since, DMA allocation/mapping issues can no longer arise in xmit functions, we can further reduce code size by removing the need for a bounce buffer on DMA errors. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	bnx2: Fix spelling mistake "bufferred" -> "buffered"	Colin Ian King
	There are spelling mistakes in two literal strings. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	net: lan966x: Fix spelling mistake "tarffic" -> "traffic"	Colin Ian King
	There is a spelling mistake in a netdev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	net: bonding: Convert to use sysfs_emit()/sysfs_emit_at() APIs	Wang Yufen
	Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	net: tun: Convert to use sysfs_emit() APIs	Wang Yufen
	Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	tsnep: Use page pool for RX	Gerhard Engleder
	Use page pool for RX buffer handling. Makes RX path more efficient and is required prework for future XDP support. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	tsnep: Add EtherType RX flow classification support	Gerhard Engleder
	Received Ethernet frames are assigned to first RX queue per default. Based on EtherType Ethernet frames can be assigned to other RX queues. This enables processing of real-time Ethernet protocols on dedicated RX queues. Add RX flow classification interface for EtherType based RX queue assignment. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	tsnep: Support multiple TX/RX queue pairs	Gerhard Engleder
	Support additional TX/RX queue pairs if dedicated interrupt is available. Interrupts are detected by name in device tree. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	tsnep: Move interrupt from device to queue	Gerhard Engleder
	For multiple queues multiple interrupts shall be used. Therefore, rework global interrupt to per queue interrupt. Every interrupt name shall contain interface name and queue information. To get a valid interface name, the interrupt request needs to by done during open like in other drivers. Additionally, this allows the removal of some initialisation checks in the interrupt handler. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30	wifi: ath11k: fix warning in dma_free_coherent() of memory chunks while recovery	Wen Gong
	Commit 26f3a021b37c ("ath11k: allocate smaller chunks of memory for firmware") and commit f6f92968e1e5 ("ath11k: qmi: try to allocate a big block of DMA memory first") change ath11k to allocate the memory chunks for target twice while wlan load. It fails for the 1st time because of large memory and then changed to allocate many small chunks for the 2nd time sometimes as below log. 1st time failed: [10411.640620] ath11k_pci 0000:05:00.0: qmi firmware request memory request [10411.640625] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 6881280 [10411.640630] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 3784704 [10411.640658] ath11k_pci 0000:05:00.0: qmi dma allocation failed (6881280 B type 1), will try later with small size [10411.640671] ath11k_pci 0000:05:00.0: qmi delays mem_request 2 [10411.640677] ath11k_pci 0000:05:00.0: qmi respond memory request delayed 1 2nd time success: [10411.642004] ath11k_pci 0000:05:00.0: qmi firmware request memory request [10411.642008] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642012] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642014] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642016] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642018] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642020] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642022] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642024] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642027] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642029] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 [10411.642031] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 458752 [10411.642033] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 131072 [10411.642035] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288 [10411.642037] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288 [10411.642039] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288 [10411.642041] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288 [10411.642043] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288 [10411.642045] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 524288 [10411.642047] ath11k_pci 0000:05:00.0: qmi mem seg type 4 size 491520 [10411.642049] ath11k_pci 0000:05:00.0: qmi mem seg type 1 size 524288 And then commit 5962f370ce41 ("ath11k: Reuse the available memory after firmware reload") skip the ath11k_qmi_free_resource() which frees the memory chunks while recovery, after that, when run recovery test on WCN6855, a warning happened every time as below and finally leads fail for recovery. [ 159.570318] BUG: Bad page state in process kworker/u16:5 pfn:33300 [ 159.570320] page:0000000096ffdbb9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33300 [ 159.570324] flags: 0xfffffc0000000(node=0\|zone=1\|lastcpupid=0x1fffff) [ 159.570329] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000 [ 159.570332] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 159.570334] page dumped because: nonzero _refcount [ 159.570440] firewire_ohci syscopyarea sysfillrect psmouse sdhci_pci ahci sysimgblt firewire_core fb_sys_fops libahci crc_itu_t cqhci drm sdhci e1000e wmi video [ 159.570460] CPU: 2 PID: 217 Comm: kworker/u16:5 Kdump: loaded Tainted: G B 5.19.0-rc1-wt-ath+ #3 [ 159.570465] Hardware name: LENOVO 418065C/418065C, BIOS 83ET63WW (1.33 ) 07/29/2011 [ 159.570467] Workqueue: qmi_msg_handler qmi_data_ready_work [qmi_helpers] [ 159.570475] Call Trace: [ 159.570476] <TASK> [ 159.570478] dump_stack_lvl+0x49/0x5f [ 159.570486] dump_stack+0x10/0x12 [ 159.570493] bad_page+0xab/0xf0 [ 159.570502] check_free_page_bad+0x66/0x70 [ 159.570511] __free_pages_ok+0x530/0x9a0 [ 159.570517] ? __dev_printk+0x58/0x6b [ 159.570525] ? _dev_printk+0x56/0x72 [ 159.570534] ? qmi_decode+0x119/0x470 [qmi_helpers] [ 159.570543] __free_pages+0x91/0xd0 [ 159.570548] dma_free_contiguous+0x50/0x60 [ 159.570556] dma_direct_free+0xe5/0x140 [ 159.570564] dma_free_attrs+0x35/0x50 [ 159.570570] ath11k_qmi_msg_mem_request_cb+0x2ae/0x3c0 [ath11k] [ 159.570620] qmi_invoke_handler+0xac/0xe0 [qmi_helpers] [ 159.570630] qmi_handle_message+0x6d/0x180 [qmi_helpers] [ 159.570643] qmi_data_ready_work+0x2ca/0x440 [qmi_helpers] [ 159.570656] process_one_work+0x227/0x440 [ 159.570667] worker_thread+0x31/0x3d0 [ 159.570676] ? process_one_work+0x440/0x440 [ 159.570685] kthread+0xfe/0x130 [ 159.570692] ? kthread_complete_and_exit+0x20/0x20 [ 159.570701] ret_from_fork+0x22/0x30 [ 159.570712] </TASK> The reason is because when wlan start to recovery, the type, size and count is not same for the 1st and 2nd QMI_WLFW_REQUEST_MEM_IND message, Then it leads the parameter size is not correct for the dma_free_coherent(). For the chunk[1], the actual dma size is 524288 which allocate in the 2nd time of the initial wlan load phase, and the size which pass to dma_free_coherent() is 3784704 which is got in the 1st time of recovery phase, then warning above happened. Change to use prev_size of struct target_mem_chunk for the paramter of dma_free_coherent() since prev_size is the real size of last load/recovery. Also change to check both type and size of struct target_mem_chunk to reuse the memory to avoid mismatch buffer size for target. Then the warning disappear and recovery success. When the 1st QMI_WLFW_REQUEST_MEM_IND for recovery arrived, the trunk[0] is freed in ath11k_qmi_alloc_target_mem_chunk() and then dma_alloc_coherent() failed caused by large size, and then trunk[1] is freed in ath11k_qmi_free_target_mem_chunk(), the left 18 trunks will be reuse for the 2nd QMI_WLFW_REQUEST_MEM_IND message. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Fixes: 5962f370ce41 ("ath11k: Reuse the available memory after firmware reload") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220928073832.16251-1-quic_wgong@quicinc.com
2022-09-30	wifi: ath11k: Don't exit on wakeup failure	Baochen Qiang
	Currently, ath11k_pcic_read() returns an error if wakeup() fails, this makes firmware crash debug quite hard because we can get nothing. Change to go ahead on wakeup failure, in that case we still may get something valid to check. There should be no mislead due to incorrect content because we are aware of the failure with the log printed. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220928015140.5431-1-quic_bqiang@quicinc.com
2022-09-30	carl9170: Replace zero-length array with DECLARE_FLEX_ARRAY() helper	Gustavo A. R. Silva
	Zero-length arrays are deprecated and we are moving towards adopting C99 flexible-array members, instead. So, replace zero-length arrays declarations in anonymous union with the new DECLARE_FLEX_ARRAY() helper macro. This helper allows for flexible-array members in unions. Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/215 Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/YzIdWc8QSdZFHBYg@work
2022-09-29	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-28 (ice) Arkadiusz implements a single pin initialization function, checking feature bits, instead of having separate device functions and updates sub-device IDs for recognizing E810T devices. Martyna adds support for switchdev filters on VLAN priority field. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for VLAN priority filters in switchdev ice: support features on new E810T variants ice: Merge pin initialization of E810 and E810T adapters ==================== Link: https://lore.kernel.org/r/20220928203217.411078-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29	eth: alx: take rtnl_lock on resume	Jakub Kicinski
	Zbynek reports that alx trips an rtnl assertion on resume: RTNL: assertion failed at net/core/dev.c (2891) RIP: 0010:netif_set_real_num_tx_queues+0x1ac/0x1c0 Call Trace: <TASK> __alx_open+0x230/0x570 [alx] alx_resume+0x54/0x80 [alx] ? pci_legacy_resume+0x80/0x80 dpm_run_callback+0x4a/0x150 device_resume+0x8b/0x190 async_resume+0x19/0x30 async_run_entry_fn+0x30/0x130 process_one_work+0x1e5/0x3b0 indeed the driver does not hold rtnl_lock during its internal close and re-open functions during suspend/resume. Note that this is not a huge bug as the driver implements its own locking, and does not implement changing the number of queues, but we need to silence the splat. Fixes: 4a5fe57e7751 ("alx: use fine-grained locking instead of RTNL") Reported-and-tested-by: Zbynek Michl <zbynek.michl@gmail.com> Reviewed-by: Niels Dossche <dossche.niels@gmail.com> Link: https://lore.kernel.org/r/20220928181236.1053043-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>