linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-01-02	bnxt_en: Add bnxt_lookup_ntp_filter_from_idx() function	Michael Chan
	Add the helper function to look up the ntuple filter from the hash index and use it in bnxt_rx_flow_steer(). The helper function will also be used by user defined ntuple filters in the next patches. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02	bnxt_en: Add function to calculate Toeplitz hash	Pavan Chebbi
	For ntuple filters added by aRFS, the Toeplitz hash calculated by our NIC is available and is used to store the ntuple filter for quick retrieval. In the next patches, user defined ntuple filter support will be added and we need to calculate the same hash for these filters. The same hash function needs to be used so we can detect duplicates. Add the function bnxt_toeplitz() to calculate the Toeplitz hash for user defined ntuple filters. bnxt_toeplitz() uses the same Toeplitz key and the same key length as the NIC. bnxt_get_ntp_filter_idx() is added to return the hash index. For aRFS, the hash comes from the NIC. For user defined ntuple, we call bnxt_toeplitz() to calculate the hash index. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02	bnxt_en: Refactor L2 filter alloc/free firmware commands.	Michael Chan
	Refactor the L2 filter alloc/free logic so that these filters can be added/deleted by the user. The bp->ntp_fltr_bmap allocated size is also increased to allow enough IDs for L2 filters. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02	bnxt_en: Re-structure the bnxt_ntuple_filter structure.	Michael Chan
	With the new bnxt_l2_filter structure, we can now re-structure the bnxt_ntuple_filter structure to point to the bnxt_l2_filter structure. We eliminate the L2 ether address info from the ntuple filter structure as we can get the information from the L2 filter structure. Note that the source L2 MAC address is no longer used. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02	bnxt_en: Add bnxt_l2_filter hash table.	Michael Chan
	The current driver only has an array of 4 additional L2 unicast addresses to support the netdev uc address list. Generalize and expand this infrastructure with an L2 address hash table so we can support an expanded list of unicast addresses (for bridges, macvlans, OVS, etc). The L2 hash table infrastructure will also allow more generalized n-tuple filter support. This patch creates the bnxt_l2_filter structure and the hash table. This L2 filter structure has the same bnxt_filter_base structure as used in the bnxt_ntuple_filter structure. All currently supported L2 filters will now have an entry in this new table. Note that L2 filters may be created for the VF. VF filters should not be freed when the PF goes down. Add some logic in bnxt_free_l2_filters() to allow keeping the VF filters or to free everything during rmmod. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02	bnxt_en: Refactor bnxt_ntuple_filter structure.	Michael Chan
	This is in preparation to support user defined L2 (ether) filters, which will have many similarities with ntuple filters. Refactor bnxt_ntuple_filter structure to have a bnxt_filter_base structure that can be re-used by the L2 filters. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01	r8169: Fix PCI error on system resume	Kai-Heng Feng
	Some r8168 NICs stop working upon system resume: [ 688.051096] r8169 0000:02:00.1 enp2s0f1: rtl_ep_ocp_read_cond == 0 (loop: 10, delay: 10000). [ 688.175131] r8169 0000:02:00.1 enp2s0f1: Link is Down ... [ 691.534611] r8169 0000:02:00.1 enp2s0f1: PCI error (cmd = 0x0407, status_errs = 0x0000) Not sure if it's related, but those NICs have a BMC device at function 0: 02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Realtek RealManage BMC [10ec:816e] (rev 1a) Trial and error shows that increase the loop wait on rtl_ep_ocp_read_cond to 30 can eliminate the issue, so let rtl8168ep_driver_start() to wait a bit longer. Fixes: e6d6ca6e1204 ("r8169: Add support for another RTL8168FP") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	mlxbf_gige: fix receive packet race condition	David Thompson
	Under heavy traffic, the BlueField Gigabit interface can become unresponsive. This is due to a possible race condition in the mlxbf_gige_rx_packet function, where the function exits with producer and consumer indices equal but there are remaining packet(s) to be processed. In order to prevent this situation, read receive consumer index before the HW replenish so that the mlxbf_gige_rx_packet function returns an accurate return value even if a packet is received into just-replenished buffer prior to exiting this routine. If the just-replenished buffer is received and occupies the last RX ring entry, the interface would not recover and instead would encounter RX packet drops related to internal buffer shortages since the driver RX logic is not being triggered to drain the RX ring. This patch will address and prevent this "ring full" condition. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	Merge tag 'mlx5-updates-2023-12-20' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-12-20 mlx5 Socket direct support and management PF profile. Tariq Says: =========== Support Socket-Direct multi-dev netdev This series adds support for combining multiple devices (PFs) of the same port under one netdev instance. Passing traffic through different devices belonging to different NUMA sockets saves cross-numa traffic and allows apps running on the same netdev from different numas to still feel a sense of proximity to the device and achieve improved performance. We achieve this by grouping PFs together, and creating the netdev only once all group members are probed. Symmetrically, we destroy the netdev once any of the PFs is removed. The channels are distributed between all devices, a proper configuration would utilize the correct close numa when working on a certain app/cpu. We pick one device to be a primary (leader), and it fills a special role. The other devices (secondaries) are disconnected from the network in the chip level (set to silent mode). All RX/TX traffic is steered through the primary to/from the secondaries. Currently, we limit the support to PFs only, and up to two devices (sockets). =========== Armen Says: =========== Management PF support and module integration This patch rolls out comprehensive support for the Management Physical Function (MGMT PF) within the mlx5 driver. It involves updating the mlx5 interface header to introduce necessary definitions for MGMT PF and adding a new management PF netdev profile, which will allow the host side to communicate with the embedded linux on Blue-field devices. =========== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-27	igc: Check VLAN EtherType mask	Kurt Kanzenbach
	Currently the driver accepts VLAN EtherType steering rules regardless of the configured mask. And things might fail silently or with confusing error messages to the user. The VLAN EtherType can only be matched by full mask. Therefore, add a check for that. For instance the following rule is invalid, but the driver accepts it and ignores the user specified mask: \|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \ \| m 0x00ff action 0 \|Added rule with ID 63 \|root@host:~# ethtool --show-ntuple enp3s0 \|4 RX rings available \|Total 1 rules \| \|Filter: 63 \| Flow Type: Raw Ethernet \| Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Ethertype: 0x0 mask: 0xFFFF \| VLAN EtherType: 0x8100 mask: 0x0 \| VLAN: 0x0 mask: 0xffff \| User-defined: 0x0 mask: 0xffffffffffffffff \| Action: Direct to queue 0 After: \|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \ \| m 0x00ff action 0 \|rmgr: Cannot insert RX class rule: Operation not supported Fixes: 2b477d057e33 ("igc: Integrate flex filter into ethtool ops") Suggested-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27	igc: Check VLAN TCI mask	Kurt Kanzenbach
	Currently the driver accepts VLAN TCI steering rules regardless of the configured mask. And things might fail silently or with confusing error messages to the user. There are two ways to handle the VLAN TCI mask: 1. Match on the PCP field using a VLAN prio filter 2. Match on complete TCI field using a flex filter Therefore, add checks and code for that. For instance the following rule is invalid and will be converted into a VLAN prio rule which is not correct: \|root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \ \| action 1 \|Added rule with ID 61 \|root@host:~# ethtool --show-ntuple enp3s0 \|4 RX rings available \|Total 1 rules \| \|Filter: 61 \| Flow Type: Raw Ethernet \| Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Ethertype: 0x0 mask: 0xFFFF \| VLAN EtherType: 0x0 mask: 0xffff \| VLAN: 0x1 mask: 0x1fff \| User-defined: 0x0 mask: 0xffffffffffffffff \| Action: Direct to queue 1 After: \|root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \ \| action 1 \|rmgr: Cannot insert RX class rule: Operation not supported Fixes: 7991487ecb2d ("igc: Allow for Flex Filters to be installed") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27	igc: Report VLAN EtherType matching back to user	Kurt Kanzenbach
	Currently the driver allows to configure matching by VLAN EtherType. However, the retrieval function does not report it back to the user. Add it. Before: \|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 action 0 \|Added rule with ID 63 \|root@host:~# ethtool --show-ntuple enp3s0 \|4 RX rings available \|Total 1 rules \| \|Filter: 63 \| Flow Type: Raw Ethernet \| Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Ethertype: 0x0 mask: 0xFFFF \| Action: Direct to queue 0 After: \|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 action 0 \|Added rule with ID 63 \|root@host:~# ethtool --show-ntuple enp3s0 \|4 RX rings available \|Total 1 rules \| \|Filter: 63 \| Flow Type: Raw Ethernet \| Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF \| Ethertype: 0x0 mask: 0xFFFF \| VLAN EtherType: 0x8100 mask: 0x0 \| VLAN: 0x0 mask: 0xffff \| User-defined: 0x0 mask: 0xffffffffffffffff \| Action: Direct to queue 0 Fixes: 2b477d057e33 ("igc: Integrate flex filter into ethtool ops") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27	i40e: Fix filter input checks to prevent config with invalid values	Sudheer Mogilappagari
	Prevent VF from configuring filters with unsupported actions or use REDIRECT action with invalid tc number. Current checks could cause out of bounds access on PF side. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Reviewed-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27	ice: dpll: fix phase offset value	Arkadiusz Kubalewski
	Stop dividing the phase_offset value received from firmware. This fault is present since the initial implementation. The phase_offset value received from firmware is in 0.01ps resolution. Dpll subsystem is using the value in 0.001ps, raw value is adjusted before providing it to the user. The user can observe the value of phase offset with response to `pin-get` netlink message of dpll subsystem for an active pin: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \ --do pin-get --json '{"id":2}' Where example of correct response would be: {'board-label': 'C827_0-RCLKA', 'capabilities': 6, 'clock-id': 4658613174691613800, 'frequency': 1953125, 'id': 2, 'module-name': 'ice', 'parent-device': [{'direction': 'input', 'parent-id': 6, 'phase-offset': -216839550, 'prio': 9, 'state': 'connected'}, {'direction': 'input', 'parent-id': 7, 'phase-offset': -42930, 'prio': 8, 'state': 'connected'}], 'phase-adjust': 0, 'phase-adjust-max': 16723, 'phase-adjust-min': -16723, 'type': 'mux'} Provided phase-offset value (-42930) shall be divided by the user with DPLL_PHASE_OFFSET_DIVIDER to get actual value of -42.930 ps. Before the fix, the response was not correct: {'board-label': 'C827_0-RCLKA', 'capabilities': 6, 'clock-id': 4658613174691613800, 'frequency': 1953125, 'id': 2, 'module-name': 'ice', 'parent-device': [{'direction': 'input', 'parent-id': 6, 'phase-offset': -216839, 'prio': 9, 'state': 'connected'}, {'direction': 'input', 'parent-id': 7, 'phase-offset': -42, 'prio': 8, 'state': 'connected'}], 'phase-adjust': 0, 'phase-adjust-max': 16723, 'phase-adjust-min': -16723, 'type': 'mux'} Where phase-offset value (-42), after division (DPLL_PHASE_OFFSET_DIVIDER) would be: -0.042 ps. Fixes: 8a3a565ff210 ("ice: add admin commands to access cgu configuration") Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27	ice: Shut down VSI with "link-down-on-close" enabled	Ngai-Mint Kwan
	Disabling netdev with ethtool private flag "link-down-on-close" enabled can cause NULL pointer dereference bug. Shut down VSI regardless of "link-down-on-close" state. Fixes: 8ac7132704f3 ("ice: Fix interface being down after reset with link-down-on-close flag on") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27	ice: Fix link_down_on_close message	Katarzyna Wieczerzycka
	The driver should not report an error message when for a medialess port the link_down_on_close flag is enabled and the physical link cannot be set down. Fixes: 8ac7132704f3 ("ice: Fix interface being down after reset with link-down-on-close flag on") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Katarzyna Wieczerzycka <katarzyna.wieczerzycka@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26	octeontx2-af: Fix marking couple of structure as __packed	Suman Ghosh
	Couple of structures was not marked as __packed. This patch fixes the same and mark them as __packed. Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26	idpf: avoid compiler introduced padding in virtchnl2_rss_key struct	Pavan Kumar Linga
	Size of the virtchnl2_rss_key struct should be 7 bytes but the compiler introduces a padding byte for the structure alignment. This results in idpf sending an additional byte of memory to the device control plane than the expected buffer size. As the control plane enforces virtchnl message size checks to validate the message, set RSS key message fails resulting in the driver load failure. Remove implicit compiler padding by using "__packed" structure attribute for the virtchnl2_rss_key struct. Also there is no need to use __DECLARE_FLEX_ARRAY macro for the 'key_flex' struct field. So drop it. Fixes: 0d7502a9b4a7 ("virtchnl: add virtchnl version 2 ops") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26	idpf: fix corrupted frames and skb leaks in singleq mode	Alexander Lobakin
	idpf_ring::skb serves only for keeping an incomplete frame between several NAPI Rx polling cycles, as one cycle may end up before processing the end of packet descriptor. The pointer is taken from the ring onto the stack before entering the loop and gets written there after the loop exits. When inside the loop, only the onstack pointer is used. For some reason, the logics is broken in the singleq mode, where the pointer is taken from the ring each iteration. This means that if a frame got fragmented into several descriptors, each fragment will have its own skb, but only the last one will be passed up the stack (containing garbage), leaving the rest leaked. Then, on ifdown, rxq::skb is being freed only in the splitq mode, while it can point to a valid skb in singleq as well. This can lead to a yet another skb leak. Just don't touch the ring skb field inside the polling loop, letting the onstack skb pointer work as expected: build a new skb if it's the first frame descriptor and attach a frag otherwise. On ifdown, free rxq::skb unconditionally if the pointer is non-NULL. Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-23	dpaa2-switch: cleanup the egress flood of an unused FDB	Ioana Ciornei
	In case a DPAA2 switch interface joins a bridge, the FDB used on the port will be changed to the one associated with the bridge. What this means exactly is that any VLAN installed on the port will need to be removed and then installed back so that it points to the new FDB. Once this is done, the previous FDB will become unused (no VLAN to point to it). Even though no traffic will reach this FDB, it's best to just cleanup the state of the FDB by zeroing its egress flood domain. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: move a check to the prechangeupper stage	Ioana Ciornei
	Two different DPAA2 switch ports from two different DPSW instances cannot be under the same bridge. Instead of checking for this unsupported configuration in the CHANGEUPPER event, check it as early as possible in the PRECHANGEUPPER one. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: reorganize the [pre]changeupper events	Ioana Ciornei
	Create separate functions, dpaa2_switch_port_prechangeupper and dpaa2_switch_port_changeupper, to be called directly when a DPSW port changes its upper device. This way we are not open-coding everything in the main event callback and we can easily extent, for example, with bond offload. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: do not clear any interrupts automatically	Ioana Ciornei
	The DPSW object has multiple event sources multiplexed over the same IRQ. The driver has the capability to configure only some of these events to trigger the IRQ. The dpsw_get_irq_status() can clear events automatically based on the value stored in the 'status' variable passed to it. We don't want that to happen because we could get into a situation when we are clearing more events than we actually handled. Just resort to manually clearing the events that we handled. Also, since status is not used on the out path we remove its initialization to zero. This change does not have a user-visible effect because the dpaa2-switch driver enables and handles all the DPSW events which exist at the moment. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: add ENDPOINT_CHANGED to the irq_mask	Ioana Ciornei
	Commit 84cba72956fd ("dpaa2-switch: integrate the MAC endpoint support") added support for MAC endpoints in the dpaa2-switch driver but omitted to add the ENDPOINT_CHANGED irq to the list of interrupt sources. Fix this by extending the list of events which can raise an interrupt by extending the mask passed to the dpsw_set_irq_mask() firmware API. There is no user visible impact even without this patch since whenever a switch interface is connected/disconnected from an endpoint both events are set (LINK_CHANGED and ENDPOINT_CHANGED) and, luckily, the LINK_CHANGED event could actually raise the interrupt and thus get the MAC/PHY SW configuration started. Even with this, it's better to just not rely on undocumented firmware behavior which can change. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: print an error when the vlan is already configured	Ioana Ciornei
	Print a netdev error when we hit a case in which a specific VLAN is already configured on the port. While at it, change the already existing netdev_warn into an _err for consistency purposes. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: declare the netdev as IFF_LIVE_ADDR_CHANGE capable	Ioana Ciornei
	There is no restriction around the change of the MAC address on the switch ports, thus declare the interface netdevs IFF_LIVE_ADDR_CHANGE capable. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	dpaa2-switch: set interface MAC address only on endpoint change	Ioana Ciornei
	There is no point in updating the MAC address of a switch interface each time the link state changes, this only needs to happen in case the endpoint changes (the switch interface is [dis]connected from/to a MAC). Just move the call to dpaa2_switch_port_set_mac_addr() under DPSW_IRQ_EVENT_ENDPOINT_CHANGED. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: ti: am65-cpsw: add sw tx/rx irq coalescing based on hrtimers	Grygorii Strashko
	Add SW IRQ coalescing based on hrtimers for TX and RX data path which can be enabled by ethtool commands: - RX coalescing ethtool -C eth1 rx-usecs 50 - TX coalescing can be enabled per TX queue - by default enables coalesing for TX0 ethtool -C eth1 tx-usecs 50 - configure TX0 ethtool -Q eth0 queue_mask 1 --coalesce tx-usecs 100 - configure TX1 ethtool -Q eth0 queue_mask 2 --coalesce tx-usecs 100 - configure TX0 and TX1 ethtool -Q eth0 queue_mask 3 --coalesce tx-usecs 100 --coalesce tx-usecs 100 show configuration for TX0 and TX1: ethtool -Q eth0 queue_mask 3 --show-coalesce Comparing to gro_flush_timeout and napi_defer_hard_irqs, this patch allows to enable IRQ coalesing for RX path separately. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: ti: am65-cpsw-qos: Add Frame Preemption MAC Merge support	Roger Quadros
	Add driver support for viewing / changing the MAC Merge sublayer parameters and seeing the verification state machine's current state via ethtool. As hardware does not support interrupt notification for verification events we resort to polling on link up. On link up we try a couple of times for verification success and if unsuccessful then give up. The Frame Preemption feature is described in the Technical Reference Manual [1] in section: 12.3.1.4.6.7 Intersperced Express Traffic (IET – P802.3br/D2.0) Due to Silicon Errata i2208 [2] we set limit min IET fragment size to 124 (excluding 4 bytes mCRC). [1] AM62x TRM - https://www.ti.com/lit/ug/spruiv7a/spruiv7a.pdf [2] AM62x Silicon Errata - https://www.ti.com/lit/er/sprz487c/sprz487c.pdf Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: ti: am65-cpsw: add mqprio qdisc offload in channel mode	Grygorii Strashko
	This patch adds MQPRIO Qdisc offload in full 'channel' mode which allows not only setting up pri:tc mapping, but also configuring TX shapers (rate-limiting) on external port FIFOs. The MQPRIO Qdisc offload is expected to work with or without VLAN/priority tagged packets. The CPSW external Port FIFO has 8 Priority queues. The rate-limit can be set for each of these priority queues. Which Priority queue a packet is assigned to depends on PN_REG_TX_PRI_MAP register which maps header priority to switch priority. The header priority of a packet is assigned via the RX_PRI_MAP_REG which maps packet priority to header priority. The packet priority is either the VLAN priority (for VLAN tagged packets) or the thread/channel offset. For simplicity, we assign the same priority queue to all queues of a Traffic Class so it can be rate-limited correctly. Configuration example: ethtool -L eth1 tx 5 ethtool --set-priv-flags eth1 p0-rx-ptype-rrobin off tc qdisc add dev eth1 parent root handle 100: mqprio num_tc 3 \ map 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 1@1 1@2 hw 1 mode channel \ shaper bw_rlimit min_rate 0 100mbit 200mbit max_rate 0 101mbit 202mbit tc qdisc replace dev eth2 handle 100: parent root mqprio num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 hw 1 ip link add link eth1 name eth1.100 type vlan id 100 ip link set eth1.100 type vlan egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 In the above example two ports share the same TX CPPI queue 0 for low priority traffic. 3 traffic classes are defined for eth1 and mapped to: TC0 - low priority, TX CPPI queue 0 -> ext Port 1 fifo0, no rate limit TC1 - prio 2, TX CPPI queue 1 -> ext Port 1 fifo1, CIR=100Mbit/s, EIR=1Mbit/s TC2 - prio 3, TX CPPI queue 2 -> ext Port 1 fifo2, CIR=200Mbit/s, EIR=2Mbit/s Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: am65-cpsw: Move register definitions to header file	Roger Quadros
	Move register definitions to header file. No functional change. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: ti: am65-cpsw: Move code to avoid forward declaration	Roger Quadros
	Move this code around to avoid forward declaration. No functional change. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: am65-cpsw: cleanup TAPRIO handling	Roger Quadros
	Handle offloading commands using switch-case in am65_cpsw_setup_taprio(). Move checks to am65_cpsw_taprio_replace(). Use NL_SET_ERR_MSG_MOD for error messages. Change error message from "Failed to set cycle time extension" to "cycle time extension not supported" Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: am65-cpsw: Rename TI_AM65_CPSW_TAS to TI_AM65_CPSW_QOS	Roger Quadros
	We will use this Kconfig option to not only enable TAS/EST offload but also other QoS features like Multiqueue priority descriptors and MAC-Merge/Frame Preemption. TI_AM65_CPSW_QOS seems a more appropriate Kconfig option name than TI_AM65_CPSW_TAS. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23	net: ethernet: am65-cpsw: Build am65-cpsw-qos only if required	Roger Quadros
	Build am65-cpsw-qos only if CONFIG_TI_AM65_CPSW_TAS is enabled. Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22	octeontx2-af: Fix a double free issue	Suman Ghosh
	There was a memory leak during error handling in function npc_mcam_rsrcs_init(). Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22	Merge branch '1GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== intel: use bitfield operations Jesse Brandeburg says: After repeatedly getting review comments on new patches, and sporadic patches to fix parts of our drivers, we should just convert the Intel code to use FIELD_PREP() and FIELD_GET(). It's then "common" in the code and hopefully future change-sets will see the context and do-the-right-thing. This conversion was done with a coccinelle script which is mentioned in the commit messages. Generally there were only a couple conversions that were "undone" after the automatic changes because they tried to convert a non-contiguous mask. Patch 1 is required at the beginning of this series to fix a "forever" issue in the e1000e driver that fails the compilation test after conversion because the shift / mask was out of range. The second patch just adds all the new #includes in one go. The patch titled: "ice: fix pre-shifted bit usage" is needed to allow the use of the FIELD_* macros and fix up the unexpected "shifts included" defines found while creating this series. The rest are the conversion to use FIELD_PREP()/FIELD_GET(), and the occasional leXX_{get,set,encode}_bits() call, as suggested by Alex. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-21	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Paolo Abeni
	Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21	Merge branch '100GbE' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-18 (ice) This series contains updates to ice driver only. Jakes stops clearing of needed aggregator information. Dave adds a check for LAG device support before initializing the associated event handler. Larysa restores accounting of XDP queues in TC configurations. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix PF with enabled XDP going no-carrier after reset ice: alter feature support check for SRIOV and LAG ice: stop trashing VF VSI aggregator node ID information ==================== Link: https://lore.kernel.org/r/20231218192708.3397702-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21	net: ethernet: mtk_wed: fix possible NULL pointer dereference in ↵	Lorenzo Bianconi
	mtk_wed_wo_queue_tx_clean() In order to avoid a NULL pointer dereference, check entry->buf pointer before running skb_free_frag in mtk_wed_wo_queue_tx_clean routine. Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/3c1262464d215faa8acebfc08869798c81c96f4a.1702827359.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-20	net/mlx5: Implement management PF Ethernet profile	Armen Ratner
	Add management PF modules, which introduce support for the structures needed to create the resources for the MGMT PF to work. Also, add the necessary calls and functions to establish this functionality. Signed-off-by: Armen Ratner <armeng@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
2023-12-20	net/mlx5: Enable SD feature	Tariq Toukan
	Have an actual mlx5_sd instance in the core device, and fix the getter accordingly. This allows SD stuff to flow, the feature becomes supported only here. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5e: Block TLS device offload on combined SD netdev	Tariq Toukan
	1) Each TX TLS device offloaded context has its own TIS object. Extra work is needed to get it working in a SD environment, where a stream can move between different SQs (belonging to different mdevs). 2) Each RX TLS device offloaded context needs a DEK object from the DEK pool. Extra work is needed to get it working in a SD environment, as the DEK pool currently falsely depends on TX cap, and is on the primary device only. Disallow this combination for now. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5e: Support per-mdev queue counter	Tariq Toukan
	Each queue counter object counts some events (in hardware) for the RQs that are attached to it, like events of packet drops due to no receive WQE (rx_out_of_buffer). Each RQ can be attached to a queue counter only within the same vhca. To still cover all RQs with these counters, we create multiple instances, one per vhca. The result that's shown to the user is now the sum of all instances. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5e: Support cross-vhca RSS	Tariq Toukan
	Implement driver support for the HW feature that allows RX steering of one device to target other device's RQs. In SD multi-mdev netdev mode, we set the secondaries into silent mode, disconnecting them from the network. This feature is then used to steer traffic from the primary to the secondaries. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5e: Let channels be SD-aware	Tariq Toukan
	Distribute the channels between the different SD-devices to acheive local numa node performance on multiple numas. Each channel works against one specific mdev, creating all datapath queues against it. We distribute channels to mdevs in a round-robin policy. Example for 2 mdevs and 6 channels: +-------+---------+ \| ch ix \| mdev ix \| +-------+---------+ \| 0 \| 0 \| \| 1 \| 1 \| \| 2 \| 0 \| \| 3 \| 1 \| \| 4 \| 0 \| \| 5 \| 1 \| +-------+---------+ This round-robin distribution policy is preferred over another suggested intuitive distribution, in which we first distribute one half of the channels to mdev #0 and then the second half to mdev #1. We prefer round-robin for a reason: it is less influenced by changes in the number of channels. The mapping between channel index and mdev is fixed, no matter how many channels the user configures. As the channel stats are persistent to channels closure, changing the mapping every single time would turn the accumulative stats less representing of the channel's history. Per-channel objects should stop using the primary mdev (priv->mdev) directly, and instead move to using their own channel's mdev. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5e: Create EN core HW resources for all secondary devices	Tariq Toukan
	Traffic queues will be created on all devices, including the secondaries. Create the needed core layer resources for them as well. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5e: Create single netdev per SD group	Tariq Toukan
	Integrate the SD library calls into the auxiliary_driver ops in preparation for creating a single netdev for the multiple devices belonging to the same SD group. SD is still disabled at this stage. It is enabled by a downstream patch when all needed parts are implemented. The netdev is created only when the SD group, with all its participants, are ready. It is later destroyed if any of the participating devices drops. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5: SD, Add informative prints in kernel log	Tariq Toukan
	Print to kernel log when an SD group moves from/to ready state. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-20	net/mlx5: SD, Implement steering for primary and secondaries	Tariq Toukan
	Implement the needed SD steering adjustments for the primary and secondaries. While the SD multiple devices are used to avoid cross-numa memory, when it comes to chip level all traffic goes only through the primary device. The secondaries are forced to silent mode, to guarantee they are not involved in any unexpected ingress/egress traffic. In RX, secondary devices will not have steering objects. Traffic will be steered from the primary device to the RQs of a secondary device using advanced cross-vhca RX steering capabilities. In TX, the primary creates a new TX flow table, which is aliased by the secondaries. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>