summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-18net: ipv6: fix TCP GSO segmentation with NATFelix Fietkau
When updating the source/destination address, the TCP/UDP checksum needs to be updated as well. Fixes: bee88cd5bd83 ("net: add support for segmenting TCP fraglist GSO packets") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20250311212530.91519-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge branch 'udp_tunnel-gro-optimizations'Paolo Abeni
Paolo Abeni says: ==================== udp_tunnel: GRO optimizations The UDP tunnel GRO stage is source of measurable overhead for workload based on UDP-encapsulated traffic: each incoming packets requires a full UDP socket lookup and an indirect call. In the most common setups a single UDP tunnel device is used. In such case we can optimize both the lookup and the indirect call. Patch 1 tracks per netns the active UDP tunnels and replaces the socket lookup with a single destination port comparison when possible. Patch 2 tracks the different types of UDP tunnels and replaces the indirect call with a static one when there is a single UDP tunnel type active. I measure ~5% performance improvement in TCP over UDP tunnel stream tests on top of this series. v3: https://lore.kernel.org/netdev/cover.1741632298.git.pabeni@redhat.com/ v2: https://lore.kernel.org/netdev/cover.1741338765.git.pabeni@redhat.com/ v1: https://lore.kernel.org/netdev/cover.1741275846.git.pabeni@redhat.com/ ==================== Link: https://patch.msgid.link/cover.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18udp_tunnel: use static call for GRO hooks when possiblePaolo Abeni
It's quite common to have a single UDP tunnel type active in the whole system. In such a case we can replace the indirect call for the UDP tunnel GRO callback with a static call. Add the related accounting in the control path and switch to static call when possible. To keep the code simple use a static array for the registered tunnel types, and size such array based on the kernel config. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/6fd1f9c7651151493ecab174e7b8386a1534170d.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18udp_tunnel: create a fastpath GRO lookup.Paolo Abeni
Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace. Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns. When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port. The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port. Matching packets with non-local destination addresses will be aggregated, and eventually segmented as needed - no behavior changes intended. Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/4d5c319c4471161829f50cb8436841de81a5edae.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: mana: Support holes in device list reply msgHaiyang Zhang
According to GDMA protocol, holes (zeros) are allowed at the beginning or middle of the gdma_list_devices_resp message. The existing code cannot properly handle this, and may miss some devices in the list. To fix, scan the entire list until the num_of_devs are found, or until the end of the list. Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@microsoft.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/1741723974-1534-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: ethernet: ti: am65-cpsw: Fix NAPI registration sequenceVignesh Raghavendra
Registering the interrupts for TX or RX DMA Channels prior to registering their respective NAPI callbacks can result in a NULL pointer dereference. This is seen in practice as a random occurrence since it depends on the randomness associated with the generation of traffic by Linux and the reception of traffic from the wire. Fixes: 681eb2beb3ef ("net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Co-developed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250311154259.102865-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18qed: remove cast to pointers passed to kfreeChen Ni
Remove unnecessary casts to pointer types passed to kfree. Issue detected by coccinelle: @@ type t1; expression *e; @@ -kfree((t1 *)e); +kfree(e); Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20250311070624.1037787-1-nichen@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge branch 'mlx5-support-setting-a-parent-for-a-devlink-rate-node'Paolo Abeni
Tariq Toukan says: ==================== mlx5: Support setting a parent for a devlink rate node This series by Carolina adds mlx5 support for the setting of a parent to devlink rate nodes. By introducing a hierarchical level to scheduling nodes, these changes allow for more granular control over bandwidth allocation and isolation of Virtual Functions. Function renaming for parent setting on leafs: - net/mlx5: Rename devlink rate parent set function for leaf nodes Add support for hierarchy level tracking: - net/mlx5: Introduce hierarchy level tracking on scheduling nodes - net/mlx5: Preserve rate settings when creating a rate node Support setting parent for rate nodes: - net/mlx5: Add support for setting parent of nodes ==================== Link: https://patch.msgid.link/1741642016-44918-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net/mlx5: Add support for setting parent of nodesCarolina Jubran
Introduce `mlx5_esw_devlink_rate_node_parent_set()` to allow assigning a parent to scheduling nodes. Implement `mlx5_esw_qos_node_update_parent()` and `mlx5_esw_qos_node_validate_set_parent()` to enforce constraints on node reassignment. Don't allow reassignment of nodes with active rate objects. Update `esw_qos_node_set_parent()` to handle cases where the parent is NULL. A NULL parent indicates that the scheduling element is attached to the root scheduling element, and since only rate nodes can be connected to the root, this update is now necessary. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-5-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net/mlx5: Preserve rate settings when creating a rate nodeCarolina Jubran
Modify `esw_qos_create_node_sched_elem()` to receive max_rate and bw_share values while maintaining the previous configuration. This change is essential for the upcoming patch that will modify rate nodes and requires the existing settings to be preserved unless explicitly changed. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-4-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net/mlx5: Introduce hierarchy level tracking on scheduling nodesCarolina Jubran
Add a `level` field to `mlx5_esw_sched_node` to track the hierarchy depth of each scheduling node. This allows enforcement of the scheduling depth constraints based on `log_esw_max_sched_depth`. Modify `esw_qos_node_set_parent()` and `__esw_qos_alloc_node()` to correctly assign hierarchy levels. Ensure that nodes inherit their parent’s level incrementally. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-3-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net/mlx5: Rename devlink rate parent set function for leaf nodesCarolina Jubran
Rename `mlx5_esw_devlink_rate_parent_set()` to `mlx5_esw_devlink_rate_leaf_parent_set()` to distinguish setting a parent for leafs from nodes, which is not yet supported. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1741642016-44918-2-git-send-email-tariqt@nvidia.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge branch 'bnxt_en-driver-update'Paolo Abeni
Michael Chan says: ==================== bnxt_en: Driver update This patchset contains these updates to the driver: 1. New ethtool coredump type for FW to include cached context for live dump. 2. Support ENABLE_ROCE devlink generic parameter. 3. Support capability change flag from FW. 4. FW interface update. 5. Support .set_module_eeprom_by_page(). ==================== Link: https://patch.msgid.link/20250310183129.3154117-1-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: add .set_module_eeprom_by_page() supportDamodharam Ammepalli
Add support for .set_module_eeprom_by_page() callback which implements generic solution for modules eeprom access. This implementation also supports CMIS 5.0.3 compliant eeprom FW download. Sample Usage: ethtool --flash-module-firmware enp177s0np0 file dummy.bin Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-8-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Refactor bnxt_get_module_eeprom_by_page()Michael Chan
In preparation for adding .set_module_eeprom_by_page(), extract the common error checking done in bnxt_get_module_eeprom_by_page() into a new common function that can be re-used for .set_module_eeprom_by_page(). Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-7-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Update firmware interface to 1.10.3.97Michael Chan
The main changes are adding i2c write for module eeprom and a new v2 PCIe statistics structure. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-6-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Query FW parameters when the CAPS_CHANGE bit is setshantiprasad shettar
Newer FW can set the CAPS_CHANGE flag during ifup if some capabilities or configurations have changed. For example, the CoS queue configurations may have changed. Support this new flag by treating it almost like FW reset. The driver will essentially rediscover all features and capabilities, reconfigure all backing store context memory, reset everything to default, and reserve all resources. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: shantiprasad shettar <shantiprasad.shettar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-5-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Add devlink support for ENABLE_ROCE nvm parameterPavan Chebbi
Add set/show support for the ENABLE_ROCE NVM parameter to enable/disable RoCE for a PF. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-4-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Refactor bnxt_hwrm_nvm_req()Michael Chan
bnxt_hwrm_nvm_req() first searches the nvm_params[] array for the NVM parameter to set or get. The array entry contains all the NVM information about that parameter. The information is then used to send the FW message to set or get the parameter. Refactor it to only do the array search in bnxt_hwrm_nvm_req() and pass the array entry to the new function __bnxt_hwrm_nvm_req() to send the FW message. The next patch will be able to use the new function. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-3-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Add support for a new ethtool dump flag 3Vasuthevan Maheswaran
When doing a live coredump with ethtool -w, the context data cached in the NIC is not dumped by the FW by default. The reason is that retrieving this cached context data with traffic running may cause problems. Add a new dump flag 3 to allow the option to include this cached context data which may be useful in some debug scenarios. Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Vasuthevan Maheswaran <vasuthevan.maheswaran@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-2-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge branch 'intel-wired-lan-driver-updates-2025-03-10-ice-ixgbe'Paolo Abeni
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-03-10 (ice, ixgbe) For ice: Paul adds generic checksum support for E830 devices. Karol refactors PTP code related to E825C; simplifying PHY register info struct, utilizing GENMASK, removing unused defines, etc. For ixgbe: Piotr adds PTP support for E610 devices. Jedrzej adds reporting when overheating is detected on E610 devices. The following are changes since commit 8ef890df4031121a94407c84659125cbccd3fdbe: net: move misc netdev_lock flavors to a separate header and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE ==================== Link: https://patch.msgid.link/20250310174502.3708121-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ixgbe: add support for thermal sensor event receptionJedrzej Jagielski
E610 NICs unlike the previous devices utilising ixgbe driver are notified in the case of overheating by the FW ACI event. In event of overheat when threshold is exceeded, FW suspends all traffic and sends overtemp event to the driver. Then driver logs appropriate message and disables the adapter instance. The card remains in that state until the platform is rebooted. This approach is a solution to the fact current version of the E610 FW doesn't support reading thermal sensor data by the SW. So give to user at least any info that overtemp event has occurred, without interface disappearing from the OS without any note. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Jeremiah Lokan <jeremiahx.j.lokan@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-7-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ixgbe: add PTP support for E610 devicePiotr Kwapulinski
Add PTP support for E610 adapter. The E610 is based on X550 and adds firmware managed link, enhanced security capabilities and support for updated server manageability. It does not introduce any new PTP features compared to X550. Reviewed-by: Milena Olech <milena.olech@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-6-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ice: E825C PHY register cleanupKarol Kolacinski
Minor PTP register refactor, including logical grouping E825C 1-step timestamping registers. Remove unused register definitions (PHY_REG_GPCS_BITSLIP, PHY_REG_REVISION). Also, apply preferred GENMASK macro (instead of ICE_M) for register fields definition affected by this patch. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-5-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ice: Refactor E825C PHY registers info structKarol Kolacinski
Simplify ice_phy_reg_info_eth56g struct definition to include base address for the very first quad. Use base address info and 'step' value to determine address for specific PHY quad. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-4-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ice: rename ice_ptp_init_phc_eth56g functionKarol Kolacinski
Refactor the code by changing ice_ptp_init_phc_eth56g function name to ice_ptp_init_phc_e825, to be consistent with the naming pattern for other devices. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-3-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ice: Add E830 checksum offload supportPaul Greenwalt
E830 supports raw receive and generic transmit checksum offloads. Raw receive checksum support is provided by hardware calculating the checksum over the whole packet, regardless of type. The calculated checksum is provided to driver in the Rx flex descriptor. Then the driver assigns the checksum to skb->csum and sets skb->ip_summed to CHECKSUM_COMPLETE. Generic transmit checksum support is provided by hardware calculating the checksum given two offsets: the start offset to begin checksum calculation, and the offset to insert the calculated checksum in the packet. Support is advertised to the stack using NETIF_F_HW_CSUM feature. E830 has the following limitations when both generic transmit checksum offload and TCP Segmentation Offload (TSO) are enabled: 1. Inner packet header modification is not supported. This restriction includes the inability to alter TCP flags, such as the push flag. As a result, this limitation can impact the receiver's ability to coalesce packets, potentially degrading network throughput. 2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents support of Maximum Transmission Unit (MTU) greater than 1063 bytes. Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features(). When NETIF_F_HW_CSUM is requested the provided skb->csum_start and skb->csum_offset are passed to hardware in the Tx context descriptor generic checksum (GCS) parameters. Hardware calculates the 1's complement from skb->csum_start to the end of the packet, and inserts the result in the packet at skb->csum_offset. Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-2-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18ata: libata-core: Add ATA_QUIRK_NO_LPM_ON_ATI for certain Samsung SSDsNiklas Cassel
Before commit 7627a0edef54 ("ata: ahci: Drop low power policy board type") the ATI AHCI controllers specified board type 'board_ahci' rather than board type 'board_ahci'. This means that LPM was historically not enabled for the ATI AHCI controllers. By looking at commit 7a8526a5cd51 ("libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD."), it is clear that, for some unknown reason, that Samsung SSDs do not play nice with ATI AHCI controllers. (When using other AHCI controllers, NCQ can be enabled on these Samsung SSDs without issues.) In a similar way, from user reports, it is clear the ATI AHCI controllers can enable LPM on e.g. Maxtor HDDs perfectly fine, but when enabling LPM on certain Samsung SSDs, things break. (E.g. the SSDs will not get detected by the ATI AHCI controller even after a COMRESET.) Yet, when using LPM on these Samsung SSDs with other AHCI controllers, e.g. Intel AHCI controllers, these Samsung drives appear to work perfectly fine. Considering that the combination of ATI + Samsung, for some unknown reason, does not seem to work well, disable LPM when detecting an ATI AHCI controller with a problematic Samsung SSD. Apply this new ATA_QUIRK_NO_LPM_ON_ATI quirk for all Samsung SSDs that have already been reported to not play nice with ATI (ATA_QUIRK_NO_NCQ_ON_ATI). Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Suggested-by: Hans de Goede <hdegoede@redhat.com> Reported-by: Eric <eric.4.debian@grabatoulnz.fr> Closes: https://lore.kernel.org/linux-ide/Z8SBZMBjvVXA7OAK@eldamar.lan/ Tested-by: Eric <eric.4.debian@grabatoulnz.fr> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250317170348.1748671-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-03-18Merge branch 'net-phy-rework-linkmodes-handling-in-a-dedicated-file'Paolo Abeni
Maxime Chevallier says: ==================== net: phy: Rework linkmodes handling in a dedicated file This is V5 of the phy_caps series. In a nutshell, this series reworks the way we maintain the list of speed/duplex capablities for each linkmode so that we no longer have multiple definition of these associations. That will help making sure that when people add new linkmodes in include/uapi/linux/ethtool.h, they don't have to update phylib and phylink as well, making the process more straightforward and less error-prone. It also generalises the phy_caps interface to be able to lookup linkmodes from phy_interface_t, which is needed for the multi-port work I've been working on for a while. This V5 addresse Russell's and Paolo's reviews, namely : - Error out when encountering an unknown SPEED_XXX setting It prints an error and fails to initialize phylib. I've tested by introducing a dummy 1.6T speed, I guess it's only a matter of time before that actually happens :) - Deal more gracefully with the fixed-link settings, keeping some level of compatibility with what we had before by making sure we report a single BaseT mode like before. V1 : https://lore.kernel.org/netdev/20250222142727.894124-1-maxime.chevallier@bootlin.com/ V2 : https://lore.kernel.org/netdev/20250226100929.1646454-1-maxime.chevallier@bootlin.com/ V3 : https://lore.kernel.org/netdev/20250228145540.2209551-1-maxime.chevallier@bootlin.com/ V4 : https://lore.kernel.org/netdev/20250303090321.805785-1-maxime.chevallier@bootlin.com/ ==================== Link: https://patch.msgid.link/20250307173611.129125-1-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phylink: Use phy_caps to get an interface's capabilities and modesMaxime Chevallier
Phylink has internal code to get the MAC capabilities of a given PHY interface (what are the supported speed and duplex). Extract that into phy_caps, but use the link_capa for conversion. Add an internal phylink helper for the link caps -> mac caps conversion, and use this in phylink_caps_to_linkmodes(). Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-14-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phylink: Convert capabilities to linkmodes using phy_capsMaxime Chevallier
phylink_caps_to_linkmodes() is used to derive a list of linkmodes that can be conceivably exposed using a given set of speeds and duplex through phylink's MAC capabilities. This list can be derived from the link_caps array in phy_caps, provided we convert the MAC capabilities into a LINK_CAPA bitmask first. Introduce an internal phylink helper phylink_caps_to_link_caps() to convert from MAC capabilities into phy_caps, then phy_caps_linkmodes() to do the link_caps -> linkmodes conversion. This avoids having to update phylink for every new linkmode. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-13-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phylink: Add a mapping between MAC_CAPS and LINK_CAPSMaxime Chevallier
phylink allows MAC drivers to report the capabilities in terms of speed, duplex and pause support. This is done through a dedicated set of enum values in the form of the MAC_ capabilities. They are very close to what the LINK_CAPA_xxx can express, with the difference that LINK_CAPA don't have any information about Pause/Asym Pause support. To prepare converting phylink to using the phy_caps, add the mapping between MAC capabilities and phy_caps. While doing so, we move the phylink_caps_params array up a bit to simplify future commits. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-12-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: drop phy_settings and the associated lookup helpersMaxime Chevallier
The phy_settings array is no longer relevant as it has now been replaced by the link_caps array and associated phy_caps helpers. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-11-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phylink: Use phy_caps_lookup for fixed-link configurationMaxime Chevallier
When phylink creates a fixed-link configuration, it finds a matching linkmode to set as the advertised, lp_advertising and supported modes based on the speed and duplex of the fixed link. Use the newly introduced phy_caps_lookup to get these modes instead of phy_lookup_settings(). This has the side effect that the matched settings and configured linkmodes may now contain several linkmodes (the intersection of supported linkmodes from the phylink settings and the linkmodes that match speed/duplex) instead of the one from phy_lookup_settings(). Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-10-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: phy_device: Use link_capabilities lookup for PHY aneg configMaxime Chevallier
When configuring PHY advertising with autoneg disabled, we lookd for an exact linkmode to advertise and configure for the requested Speed and Duplex, specially at or over 1G. Using phy_caps_lookup allows us to build a list of the supported linkmodes at that speed that we can advertise instead of the first mode that matches. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-9-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: phy_caps: Allow looking-up link caps based on speed and duplexMaxime Chevallier
As the link_caps array is efficient for <speed,duplex> lookups, implement a function for speed/duplex lookups that matches a given mask. This replicates to some extent the phy_lookup_settings() behaviour, matching full link_capabilities instead of a single linkmode. phy.c's phy_santize_settings() and phylink's phylink_ethtool_ksettings_set() performs such lookup using the phy_settings table, but are only interested in the actual speed/duplex that were matched, rathet than the individual linkmode. Similar to phy_lookup_settings(), the newly introduced phy_caps_lookup() will run through the link_caps[] array by descending speed/duplex order. If the link_capabilities for a given <speed/duplex> tuple intersects the passed linkmodes, we consider that a match. Similar to phy_lookup_settings(), we also allow passing an 'exact' boolean, allowing non-exact match. Here, we MUST always match the linkmodes mask, but we allow matching on lower speed settings. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-8-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: phy_caps: Implement link_capabilities lookup by linkmodeMaxime Chevallier
In several occasions, phylib needs to lookup a set of matching speed and duplex against a given linkmode set. Instead of relying on the phy_settings array and thus iterate over the whole linkmodes list, use the link_capabilities array to lookup these matches, as we aren't interested in the actual link setting that matches but rather the speed and duplex for that setting. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-7-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: phy_caps: Introduce phy_caps_validMaxime Chevallier
With the link_capabilities array, it's trivial to validate a given mask againts a <speed, duplex> tuple. Create a helper for that purpose, and use it to replace a phy_settings lookup in phy_check_valid(); Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-6-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: phy_caps: Move __set_linkmode_max_speed to phy_capsMaxime Chevallier
Convert the __set_linkmode_max_speed to use the link_capabilities array. This makes it easy to clamp the linkmodes to a given max speed. Introduce a new helper phy_caps_linkmode_max_speed to replace the previous one that used phy_settings. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-5-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: phy_caps: Move phy_speeds to phy_capsMaxime Chevallier
Use the newly introduced link_capabilities array to derive the list of possible speeds when given a combination of linkmodes. As link_capabilities is indexed by speed, we don't have to iterate the whole phy_settings array. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-4-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: phy: Use an internal, searchable storage for the linkmodesMaxime Chevallier
The canonical definition for all the link modes is in linux/ethtool.h, which is complemented by the link_mode_params array stored in net/ethtool/common.h . That array contains all the metadata about each of these modes, including the Speed and Duplex information. Phylib and phylink needs that information as well for internal management of the link, which was done by duplicating that information in locally-stored arrays and lookup functions. This makes it easy for developpers adding new modes to forget modifying phylib and phylink accordingly. However, the link_mode_params array in net/ethtool/common.c is fairly inefficient to search through, as it isn't sorted in any manner. Phylib and phylink perform a lot of lookup operations, mostly to filter modes by speed and/or duplex. We therefore introduce the link_caps private array in phy_caps.c, that indexes linkmodes in a more efficient manner. Each element associated a tuple <speed, duplex> to a bitfield of all the linkmodes runs at these speed/duplex. We end-up with an array that's fairly short, easily addressable and that it optimised for the typical use-cases of phylib/phylink. That array is initialized at the same time as phylib. As the link_mode_params array is part of the net stack, which phylink depends on, it should always be accessible from phylib. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-3-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18net: ethtool: Export the link_mode_params definitionsMaxime Chevallier
link_mode_params contains a lookup table of all 802.3 link modes that are currently supported with structured data about each mode's speed, duplex, number of lanes and mediums. As a preparation for a port representation, export that table for the rest of the net stack to use. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250307173611.129125-2-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18efivarfs: fix NULL dereference on resumeJames Bottomley
LSMs often inspect the path.mnt of files in the security hooks, and this causes a NULL deref in efivarfs_pm_notify() because the path is constructed with a NULL path.mnt. Fix by obtaining from vfs_kern_mount() instead, and being very careful to ensure that deactivate_super() (potentially triggered by a racing userspace umount) is not called directly from the notifier, because it would deadlock when efivarfs_kill_sb() tried to unregister the notifier chain. [ Al notes: Umm... That's probably safe, but not as a long-term solution - it's too intimately dependent upon fs/super.c internals. The reasons why you can't run into ->s_umount deadlock here are non-trivial... ] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Link: https://lore.kernel.org/r/e54e6a2f-1178-4980-b771-4d9bafc2aa47@tnxip.de Link: https://lore.kernel.org/r/3e998bf87638a442cbc6864cdcd3d8d9e08ce3e3.camel@HansenPartnership.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-03-17Merge tag 'mm-hotfixes-stable-2025-03-17-20-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes. 7 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 13 are for MM and the other two are for squashfs and procfs. All are singletons. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2025-03-17-20-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/page_alloc: fix memory accept before watermarks gets initialized mm: decline to manipulate the refcount on a slab page memcg: drain obj stock on cpu hotplug teardown mm/huge_memory: drop beyond-EOF folios with the right number of refs selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT mm: memcontrol: fix swap counter leak from offline cgroup mm/vma: do not register private-anon mappings with khugepaged during mmap squashfs: fix invalid pointer dereference in squashfs_cache_delete mm/migrate: fix shmem xarray update during migration mm/hugetlb: fix surplus pages in dissolve_free_huge_page() mm/damon/core: initialize damos->walk_completed in damon_new_scheme() mm/damon: respect core layer filters' allowance decision on ops layer filemap: move prefaulting out of hot write path proc: fix UAF in proc_get_inode()
2025-03-17MAINTAINERS: Remove myselfEric W. Biederman
Unfortunately I no longer have time to meaningfully take part in the linux kernel development. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-17Merge tag 'soc-fixes-6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The majority of these last fixes are for devicetree files. These address two important regressions for the Qualcomm SMMU and the Raspberry Pi 4 USB controller, as well as a larger number of patches fixing minor mistakes in board specific files for Rockchips, i.MX, starfive and broadcom. The non-DT changes are - A fix for an old boot regression on Renesas shmobile chips - Another boot time regression for for the Qualcomm PDR SoC driver, among a few other Qualcomm firmware driver fixes for efivars and tzmem - Minor Kconfig fixes for davinci and OMAP1 - Minor code fixes for sparx5 reset controllers, OMAP memory controller, i.MX SCU, cpufreq and SoC drivers and a Hisilicon SoC driver - One more update to the Asahi maintainers, adding Neal Gompa as a reviewer" * tag 'soc-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits) ARM: davinci: da850: fix selecting ARCH_DAVINCI_DA8XX soc: hisilicon: kunpeng_hccs: Fix incorrect string assembly memory: omap-gpmc: drop no compatible check reset: mchp: sparx5: Fix for lan966x ARM: shmobile: smp: Enforce shmobile_smp_* alignment MAINTAINERS: Add myself (Neal Gompa) as a reviewer for ARM Apple support MAINTAINERS: Add apple-spi driver & binding files arm64: dts: rockchip: slow down emmc freq for rock 5 itx ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC3200 ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC5300 ARM: dts: bcm2711: Don't mark timer regs unconfigured ARM: OMAP1: select CONFIG_GENERIC_IRQ_CHIP arm64: dts: rockchip: Add missing PCIe supplies to RockPro64 board dtsi arm64: dts: rockchip: Add avdd HDMI supplies to RockPro64 board dtsi arm64: dts: rockchip: Remove undocumented sdmmc property from lubancat-1 arm64: dts: rockchip: fix pinmux of UART5 for PX30 Ringneck on Haikou arm64: dts: rockchip: fix pinmux of UART0 for PX30 Ringneck on Haikou arm64: dts: rockchip: fix u2phy1_host status for NanoPi R4S arm64: dts: bcm2712: PL011 UARTs are actually r1p5 ARM: dts: bcm2711: PL011 UARTs are actually r1p5 ...
2025-03-17Merge tag 'probes-fixes-v6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Clean up tprobe correctly when module unload Tracepoint probes do not set TRACEPOINT_STUB on the 'tpoint' pointer when unloading a module, thus they show as a normal 'fprobe' instead of 'tprobe' and never come back - Fix leakage of tprobe module refcount When a tprobe's target module is loaded, it gets the module's refcount in the module notifier but forgot to put it after registering the probe on it. Fix it by getting the refcount only when registering tprobe. * tag 'probes-fixes-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: tprobe-events: Fix leakage of module refcount tracing: tprobe-events: Fix to clean up tprobe correctly when module unload
2025-03-17Merge branch ↵Paolo Abeni
'net-stmmac-avoid-unnecessary-work-in-stmmac_release-stmmac_dvr_remove' Russell King says: ==================== net: stmmac: avoid unnecessary work in stmmac_release()/stmmac_dvr_remove() This small series is a subset of a RFC I sent earlier. These two patches remove code that is unnecessary and/or wrong in these paths. Details in each commit. ==================== Link: https://patch.msgid.link/Z87bpDd7QYYVU0ML@shell.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-17net: stmmac: remove unnecessary stmmac_mac_set() in stmmac_release()Russell King (Oracle)
stmmac_release() calls phylink_stop() and then goes on to call stmmac_mac_set(, false). However, phylink_stop() will call stmmac_mac_link_down() before returning, which will do this work. Remove this unnecessary call. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Furong Xu <0x1207@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1trcI6-005rn8-GV@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-17net: stmmac: remove redundant racy tear-down in stmmac_dvr_remove()Russell King (Oracle)
While the network device is registered, it is published to userspace, and thus userspace can change its state. This means calling functions such as stmmac_stop_all_dma() and stmmac_mac_set() are racy. Moreover, unregister_netdev() will unpublish the network device, and then if appropriate call the .ndo_stop() method, which is stmmac_release(). This will first call phylink_stop() which will synchronously take the link down, resulting in stmmac_mac_link_down() and stmmac_mac_set(, false) being called. stmmac_release() will also call stmmac_stop_all_dma(). Consequently, neither of these two functions need to called prior to unregister_netdev() as that will safely call paths that will result in this work being done if necessary. Remove these redundant racy calls. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Furong Xu <0x1207@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1trcI1-005rn2-CZ@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>