linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-03-25	Merge branch 'net-stmmac-dwmac-rk-add-gmac-support-for-rk3528'	Jakub Kicinski
	Jonas Karlman says: ==================== net: stmmac: dwmac-rk: Add GMAC support for RK3528 The Rockchip RK3528 has two Ethernet controllers, one 100/10 MAC to be used with the integrated PHY and a second 1000/100/10 MAC to be used with an external Ethernet PHY. This series add initial support for the Ethernet controllers found in RK3528 and initial support to power up/down the integrated PHY. v2: https://lore.kernel.org/20250309232622.1498084-1-jonas@kwiboo.se v1: https://lore.kernel.org/20250306221402.1704196-1-jonas@kwiboo.se ==================== Link: https://patch.msgid.link/20250319214415.3086027-1-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: dwmac-rk: Add initial support for RK3528 integrated PHY	Jonas Karlman
	Rockchip RK3528 (and RV1106) has a different integrated PHY compared to the integrated PHY on RK3228/RK3328. Current powerup/down operation is not compatible with the integrated PHY found in these newer SoCs. Add operations to powerup/down the integrated PHY found in RK3528. Use helpers that can be used by other GMAC variants in the future. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250319214415.3086027-6-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: dwmac-rk: Add integrated_phy_powerdown operation	Jonas Karlman
	Rockchip RK3528 (and RV1106) has a different integrated PHY compared to the integrated PHY on RK3228/RK3328. Current powerup/down operation is not compatible with the integrated PHY found in these newer SoCs. Add a new integrated_phy_powerdown operation and change the call chain for integrated_phy_powerup to prepare support for the integrated PHY found in these newer SoCs. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250319214415.3086027-5-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: dwmac-rk: Move integrated_phy_powerup/down functions	Jonas Karlman
	Rockchip RK3528 (and RV1106) has a different integrated PHY compared to the integrated PHY on RK3228/RK3328. Current powerup/down operation is not compatible with the integrated PHY found in these SoCs. Move the rk_gmac_integrated_phy_powerup/down functions to top of the file to prepare for them to be called directly by a GMAC variant specific powerup/down operation. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250319214415.3086027-4-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: dwmac-rk: Add GMAC support for RK3528	David Wu
	Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC Ethernet QoS IP. Add initial support for the RK3528 GMAC variant. Signed-off-by: David Wu <david.wu@rock-chips.com> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Link: https://patch.msgid.link/20250319214415.3086027-3-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	dt-bindings: net: rockchip-dwmac: Add compatible string for RK3528	Jonas Karlman
	Rockchip RK3528 has two Ethernet controllers based on Synopsys DWC Ethernet QoS IP. Add compatible string for the RK3528 variant. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250319214415.3086027-2-jonas@kwiboo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	bonding: check xdp prog when set bond mode	Wang Liang
	Following operations can trigger a warning[1]: ip netns add ns1 ip netns exec ns1 ip link add bond0 type bond mode balance-rr ip netns exec ns1 ip link set dev bond0 xdp obj af_xdp_kern.o sec xdp ip netns exec ns1 ip link set bond0 type bond mode broadcast ip netns del ns1 When delete the namespace, dev_xdp_uninstall() is called to remove xdp program on bond dev, and bond_xdp_set() will check the bond mode. If bond mode is changed after attaching xdp program, the warning may occur. Some bond modes (broadcast, etc.) do not support native xdp. Set bond mode with xdp program attached is not good. Add check for xdp program when set bond mode. [1] ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:9912 unregister_netdevice_many_notify+0x8d9/0x930 Modules linked in: CPU: 0 UID: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.14.0-rc4 #107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:unregister_netdevice_many_notify+0x8d9/0x930 Code: 00 00 48 c7 c6 6f e3 a2 82 48 c7 c7 d0 b3 96 82 e8 9c 10 3e ... RSP: 0018:ffffc90000063d80 EFLAGS: 00000282 RAX: 00000000ffffffa1 RBX: ffff888004959000 RCX: 00000000ffffdfff RDX: 0000000000000000 RSI: 00000000ffffffea RDI: ffffc90000063b48 RBP: ffffc90000063e28 R08: ffffffff82d39b28 R09: 0000000000009ffb R10: 0000000000000175 R11: ffffffff82d09b40 R12: ffff8880049598e8 R13: 0000000000000001 R14: dead000000000100 R15: ffffc90000045000 FS: 0000000000000000(0000) GS:ffff888007a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000d406b60 CR3: 000000000483e000 CR4: 00000000000006f0 Call Trace: <TASK> ? __warn+0x83/0x130 ? unregister_netdevice_many_notify+0x8d9/0x930 ? report_bug+0x18e/0x1a0 ? handle_bug+0x54/0x90 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? unregister_netdevice_many_notify+0x8d9/0x930 ? bond_net_exit_batch_rtnl+0x5c/0x90 cleanup_net+0x237/0x3d0 process_one_work+0x163/0x390 worker_thread+0x293/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xec/0x1e0 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Wang Liang <wangliang74@huawei.com> Acked-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250321044852.1086551-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	Merge branch 'net-improve-stmmac-resume-rx-clocking'	Jakub Kicinski
	Russell King says: ==================== net: improve stmmac resume rx clocking stmmac has had a long history of problems with resuming, illustrated by reset failure due to the receive clock not running. Several attempts have been attempted over the years to address this issue, such as moving phylink_start() (now phylink_resume()) super early in stmmac_resume() in commit 90702dcd19c0 ("net: stmmac: fix MAC not working when system resume back with WoL a ctive.") However, this has the downside that stmmac_mac_link_up() can (and demonstrably is) called before or during the driver initialisation in another thread. This can cause issues as packets could begin to be queued, and the transmit/receive enable bits will be set before any initialisation has been done. Another attempt is used by dwmac-socfpga.c in commit 2d871aa07136 ("net: stmmac: add platform init/exit for Altera's ARM socfpga") which pre-dates the above commit. Neither of these two approaches consider the effect of EEE with a PHY that supports receive clock-stop and has that feature enabled (which the stmmac driver does enable). If the link is up, then there is the possibility for the receive path to be in low-power mode, and the PHY may stop its receive clock. This series addresses these issues by (each is not necessarily a separate patch): 1) introducing phylink_prepare_resume(), which can be used by MAC drivers to ensure that the PHY is resumed prior to doing any re-initialisation work. This call is added to stmmac_resume(). 2) moving phylink_resume() after all re-initialisation has completed, thereby ensuring that the hardware is ready to be enabled for packet reception/transmission. 3) with (1) and (2) addressed, the need for socfpga to have a private work-around is no longer necessary, so it is removed. 4) introducing phylink functions to block/unblock the receive clock- stop at the PHY. As these require PHY access over the MDIO bus, they can sleep, so are not suitable for atomic access. 5) the stmmac hardware requires the receive clock to be running for reset to complete. Depending on synthesis options, this requirement may also extend to writing various registers as well, e.g. setting the MAC address, writing some of the vlan registers, etc. Full details are in the databook. We add blocking/unblocking of the PHY receive clock-stop around parts of the main stmmac driver where we have a context that we can sleep. These are wrapped with the new phylink functions. However, depending on synthesis options, there could be other places where the net core calls the driver with a BH-disabled context where we can't sleep, and thus can't block the PHY from disabling its receive clock. These are documented with FIXME comments. Given the last paragraph above, I am wondering whether a better approach would be to ensure that receive clock-stop is always disabled at the PHY with stmmac. From what I can see, implementations do not document to this level of detail, which makes it difficult to tell which registers require the receive clock to be running to behave correctly. This patch series has been tested on the Tegra194 Jetson Xavier NX board kindly donated by NVidia, with two additional patches that are pending in patchwork - the first is required to have EEE's LPI mode passed through to the MAC on this platform to allow testing under PHY clock-stop scenarios. The second is a bug fix for PHYLIB and makes "eee off" functional, but should not affect this series. All patches on top of net-next commit f749448ce9f1 ("Merge branch 'net-mlx5-hw-steering-cleanups'") https://patchwork.kernel.org/project/netdevbpf/patch/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk/ https://patchwork.kernel.org/project/netdevbpf/patch/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk/ ==================== Link: https://patch.msgid.link/Z9ySeo61VYTClIJJ@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: block PHY RXC clock-stop	Russell King (Oracle)
	The DesignWare core requires the receive clock to be running during certain operations. Ensure that we block PHY RXC clock-stop during these operations. This is a best-efforts change - not everywhere can be covered by this because of net's core locking, which means we can't access the MDIO bus to configure the PHY to disable RXC clock-stop in certain areas. These are marked with FIXME comments. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6p-008Vjz-Qy@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: phylink: add functions to block/unblock rx clock stop	Russell King (Oracle)
	Some MACs require the PHY receive clock to be running to complete setup actions. This may fail if the PHY has negotiated EEE, the MAC supports receive clock stop, and the link has entered LPI state. Provide a pair of APIs that MAC drivers can use to temporarily block the PHY disabling the receive clock. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: socfpga: remove phy_resume() call	Russell King (Oracle)
	As the previous commit addressed DWGMAC resuming with a PHY in suspended state, there is now no need for socfpga to work around this. Remove this code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6f-008Vjn-J1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: stmmac: address non-LPI resume failures properly	Russell King (Oracle)
	The Synopsys Designware GMAC core databook requires all clocks to be active in order to complete software reset, which we perform during resume. However, IEEE 802.3 allows a PHY to stop its clocks when placed in low-power mode, which happens when the system is suspended and WoL is not enabled. As an attempt to work around this, commit 36d18b5664ef ("net: stmmac: start phylink instance before stmmac_hw_setup()") started phylink early, but this has the side effect that the mac_link_up() method may be called before or during the initialisation of GMAC hardware. We also have the socfpga glue driver directly calling phy_resume() also as an attempt to work around this. In a previous commit, phylink_prepare_resume() has been introduced to give MAC drivers a way to ensure that the PHY is resumed prior to their initialisation of their MAC hardware. This commit adds the call, and moves the phylink_resume() call back to where it should be before the aforementioned commit. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6a-008Vjh-FG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: phylink: add phylink_prepare_resume()	Russell King (Oracle)
	When the system is suspended, the PHY may be placed in low-power mode by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the behaviour of the PHY in this state is implementation specific, and the PHY is not required to meet the RX_CLK and TX_CLK requirements. Essentially, this means that a PHY may stop the clocks that it is generating while in power down state. However, MACs exist which require the clocks from the PHY to be running in order to properly resume. phylink_prepare_resume() provides them with a way to clear the Power down bit early. Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the transmit and receive clocks meet the requirements after clearing the power down bit. Add a resume preparation function, which will ensure that the receive clock from the PHY is appropriately configured while resuming. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	exportfs: add module description	Arnd Bergmann
	Every loadable module should have a description, to avoid a warning such as: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/exportfs/exportfs.o Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250324173242.1501003-1-arnd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25	exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() ↵	Oleg Nesterov
	and pidfs_exit() Consider a process with a group leader L and a sub-thread T. L does sys_exit(1), then T does sys_exit_group(2). In this case wait_task_zombie(L) will notice SIGNAL_GROUP_EXIT and use L->signal->group_exit_code, this is correct. But, before that, do_notify_parent(L) called by release_task(T) will use L->exit_code != L->signal->group_exit_code, and this is not consistent. We don't really care, I think that nobody relies on the info which comes with SIGCHLD, if nothing else SIGCHLD < SIGRTMIN can be queued only once. But pidfs_exit() is more problematic, I think pidfs_exit_info->exit_code should report ->group_exit_code in this case, just like wait_task_zombie(). TODO: with this change we can hopefully cleanup (or may be even kill) the similar SIGNAL_GROUP_EXIT checks, at least in wait_task_zombie(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250324171941.GA13114@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25	Merge branch 'sfc-devlink-flash-for-x4'	Jakub Kicinski
	Edward Cree says: ==================== sfc: devlink flash for X4 Updates to support devlink flash on X4 NICs. Patch #2 is needed for NVRAM_PARTITION_TYPE_AUTO, and patch #1 is needed because the latest MCDI headers from firmware no longer include MDIO read/write commands. v1: https://lore.kernel.org/cover.1742223233.git.ecree.xilinx@gmail.com ==================== Link: https://patch.msgid.link/cover.1742493016.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	sfc: support X4 devlink flash	Edward Cree
	Unlike X2 and EF100, we do not attempt to parse the firmware file to find an image within it; we simply hand the entire file to the MC, which is responsible for understanding any container formats we might use and validating that the firmware file is applicable to this NIC. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/9a72a74002a7819c780b0a18ce9294c9d4e1db12.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	sfc: update MCDI protocol headers	Edward Cree
	Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/bcb7597460a5a99d1dca4ef282f4aa2dd46ae545.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	sfc: rip out MDIO support	Edward Cree
	Unlike Siena, no EF10 board ever had an external PHY, and consequently MDIO handling isn't even built into the firmware. Since Siena has been split out into its own driver, the MDIO code can be deleted from the sfc driver. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/aa689d192ddaef7abe82709316c2be648a7bd66e.1742493017.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	netfs: add Paulo as maintainer and remove myself as Reviewer	Jeff Layton
	My role has changed since I originally agreed to help with netfs, and I'm no longer providing a lot of value here. Luckily, Paulo has agreed to step in as co-maintainer. Acked-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20250324-master-v1-1-e2dd2fdb15b4@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25	vmxnet3: unregister xdp rxq info in the reset path	Sankararaman Jayaraman
	vmxnet3 does not unregister xdp rxq info in the vmxnet3_reset_work() code path as vmxnet3_rq_destroy() is not invoked in this code path. So, we get below message with a backtrace. Missing unregister, handled but fix driver WARNING: CPU:48 PID: 500 at net/core/xdp.c:182 __xdp_rxq_info_reg+0x93/0xf0 This patch fixes the problem by moving the unregister code of XDP from vmxnet3_rq_destroy() to vmxnet3_rq_cleanup(). Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.") Signed-off-by: Sankararaman Jayaraman <sankararaman.jayaraman@broadcom.com> Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com> Link: https://patch.msgid.link/20250320045522.57892-1-sankararaman.jayaraman@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net: reorganize IP MIB values (II)	Eric Dumazet
	Commit 14a196807482 ("net: reorganize IP MIB values") changed MIB values to group hot fields together. Since then 5 new fields have been added without caring about data locality. This patch moves IPSTATS_MIB_OUTPKTS, IPSTATS_MIB_NOECTPKTS, IPSTATS_MIB_ECT1PKTS, IPSTATS_MIB_ECT0PKTS, IPSTATS_MIB_CEPKTS to the hot portion of per-cpu data. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250320101434.3174412-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	tcp: avoid atomic operations on sk->sk_rmem_alloc	Eric Dumazet
	TCP uses generic skb_set_owner_r() and sock_rfree() for received packets, with socket lock being owned. Switch to private versions, avoiding two atomic operations per packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250320121604.3342831-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	Merge branch 'nexthop-convert-rtm_-new-del-nexthop-to-per-netns-rtnl'	Jakub Kicinski
	Kuniyuki Iwashima says: ==================== nexthop: Convert RTM_{NEW,DEL}NEXTHOP to per-netns RTNL. Patch 1 - 5 move some validation for RTM_NEWNEXTHOP so that it can be called without RTNL. Patch 6 & 7 converts RTM_NEWNEXTHOP and RTM_DELNEXTHOP to per-netns RTNL. Note that RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET are not touched in this series. rtm_get_nexthop() can be easily converted to RCU, but rtm_dump_nexthop() needs more work due to the left-to-right rbtree walk, which looks prone to node deletion and tree rotation without a retry mechanism. v1: https://lore.kernel.org/netdev/20250318233240.53946-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20250319230743.65267-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Convert RTM_DELNEXTHOP to per-netns RTNL.	Kuniyuki Iwashima
	In rtm_del_nexthop(), only nexthop_find_by_id() and remove_nexthop() require RTNL as they touch net->nexthop.rb_root. Let's move RTNL down as rtnl_net_lock() before nexthop_find_by_id(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Convert RTM_NEWNEXTHOP to per-netns RTNL.	Kuniyuki Iwashima
	If we pass false to the rtnl_held param of lwtunnel_valid_encap_type(), we can move RTNL down before rtm_to_nh_config_rtnl(). Let's use rtnl_net_lock() in rtm_new_nexthop(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Remove redundant group len check in nexthop_create_group().	Kuniyuki Iwashima
	The number of NHA_GROUP entries is guaranteed to be non-zero in nh_check_attr_group(). Let's remove the redundant check in nexthop_create_group(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Check NLM_F_REPLACE and NHA_ID in rtm_new_nexthop().	Kuniyuki Iwashima
	nexthop_add() checks if NLM_F_REPLACE is specified without non-zero NHA_ID, which does not require RTNL. Let's move the check to rtm_new_nexthop(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Move NHA_OIF validation to rtm_to_nh_config_rtnl().	Kuniyuki Iwashima
	NHA_OIF needs to look up a device by __dev_get_by_index(), which requires RTNL. Let's move NHA_OIF validation to rtm_to_nh_config_rtnl(). Note that the proceeding checks made the original !cfg->nh_fdb check redundant. NHA_FDB is set -> NHA_OIF cannot be set NHA_FDB is set but false -> NHA_OIF must be set NHA_FDB is not set -> NHA_OIF must be set Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Split nh_check_attr_group().	Kuniyuki Iwashima
	We will push RTNL down to rtm_new_nexthop(), and then we want to move non-RTNL operations out of the scope. nh_check_attr_group() validates NHA_GROUP attributes, and nexthop_find_by_id() and some validation requires RTNL. Let's factorise such parts as nh_check_attr_group_rtnl() and call it from rtm_to_nh_config_rtnl(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250319230743.65267-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	nexthop: Move nlmsg_parse() in rtm_to_nh_config() to rtm_new_nexthop().	Kuniyuki Iwashima
	We will split rtm_to_nh_config() into non-RTNL and RTNL parts, and then the latter also needs tb. As a prep, let's move nlmsg_parse() to rtm_new_nexthop(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250319230743.65267-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	ipv6: fix _DEVADD() and _DEVUPD() macros	Eric Dumazet
	ip6_rcv_core() is using: __IP6_ADD_STATS(net, idev, IPSTATS_MIB_NOECTPKTS + (ipv6_get_dsfield(hdr) & INET_ECN_MASK), max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs)); This is currently evaluating both expressions twice. Fix _DEVADD() and _DEVUPD() macros to evaluate their arguments once. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250319212516.2385451-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	Merge branch 'mlx5-misc-enhancements-2025-03-19'	Jakub Kicinski
	Tariq Toukan says: ==================== mlx5 misc enhancements 2025-03-19 This series introduces multiple small misc enhancements from the team to the mlx5 core and Eth drivers. ==================== Link: https://patch.msgid.link/1742392983-153050-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5e: TC, Don't offload CT commit if it's the last action	Jianbo Liu
	For CT action with commit argument, it's usually followed by the forward action, either to the output netdev or next chain. The default behavior for software is to drop by setting action attribute to TC_ACT_SHOT instead of TC_ACT_PIPE if it's the last action. But driver can't handle it, so block the offload for such case. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5e: CT: Filter legacy rules that are unrelated to nic	Paul Blakey
	In nic mode CT setup where we do hairpin between the two nics, both nics register to the same flow table (per zone), and try to offload all rules on it. Instead, filter the rules that originated from the relevant nic (so only one side is offloaded for each nic). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5: Update pfnum retrieval for devlink port attributes	Shay Drory
	Align mlx5 driver usage of 'pfnum' with the documentation clarification introduced in commit bb70b0d48d8e ("devlink: Improve the port attributes description"). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5: fw reset, check bridge accessibility at earlier stage	Amir Tzin
	Currently, mlx5_is_reset_now_capable() checks whether the pci bridge is accessible only on bridge hot plug capability check. If the pci bridge is not accessible, reset now will fail regardless of bridge hotplug capability. Move this check to function mlx5_is_reset_now_capable() which, in such case, aborts the reset and does so in the request phase instead of the reset now phase. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5: Lag, use port selection tables when available	Mark Bloch
	As queue affinity is being deprecated and will no longer be supported in the future, Always check for the presence of the port selection namespace. When available, leverage it to distribute traffic across the physical ports via steering, ensuring compatibility with future NICs. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1742392983-153050-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	net/mlx5e: TX, Utilize WQ fragments edge for multi-packet WQEs	Tariq Toukan
	For simplicity reasons, the driver avoids crossing work queue fragment boundaries within the same TX WQE (Work-Queue Element). Until today, as the number of packets in a TX MPWQE (Multi-Packet WQE) descriptor is not known in advance, the driver pre-prepared contiguous memory for the largest possible WQE. For this, when getting too close to the fragment edge, having no room for the largest WQE possible, the driver was filling the fragment remainder with NOP descriptors, aligning the next descriptor to the beginning of the next fragment. Generating and handling these NOPs wastes resources, like: CPU cycles, work-queue entries fetched to the device, and PCI bandwidth. In this patch, we replace this NOPs filling mechanism in the TX MPWQE flow. Instead, we utilize the remaining entries of the fragment with a TX MPWQE. If this room turns out to be too small, we simply open an additional descriptor starting at the beginning of the next fragment. Performance benchmark: uperf test, single server against 3 clients. TCP multi-stream, bidir, traffic profile "2x350B read, 1400B write". Bottleneck is in inbound PCI bandwidth (device POV). +---------------+------------+------------+--------+ \| \| Before \| After \| \| +---------------+------------+------------+--------+ \| BW \| 117.4 Gbps \| 121.1 Gbps \| +3.1% \| +---------------+------------+------------+--------+ \| tx_packets \| 15 M/sec \| 15.5 M/sec \| +3.3% \| +---------------+------------+------------+--------+ \| tx_nops \| 3 M/sec \| 0 \| -100% \| +---------------+------------+------------+--------+ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1742391746-118647-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25	firmware: cs_dsp: Ensure cs_dsp_load[_coeff]() returns 0 on success	Richard Fitzgerald
	Set ret = 0 on successful completion of the processing loop in cs_dsp_load() and cs_dsp_load_coeff() to ensure that the function returns 0 on success. All normal firmware files will have at least one data block, and processing this block will set ret == 0, from the result of either regmap_raw_write() or cs_dsp_parse_coeff(). The kunit tests create a dummy firmware file that contains only the header, without any data blocks. This gives cs_dsp a file to "load" that will not cause any side-effects. As there aren't any data blocks, the processing loop will not set ret == 0. Originally there was a line after the processing loop: ret = regmap_async_complete(regmap); which would set ret == 0 before the function returned. Commit fe08b7d5085a ("firmware: cs_dsp: Remove async regmap writes") changed the regmap write to a normal sync write, so the call to regmap_async_complete() wasn't necessary and was removed. It was overlooked that the ret here wasn't only to check the result of regmap_async_complete(), it also set the final return value of the function. Fixes: fe08b7d5085a ("firmware: cs_dsp: Remove async regmap writes") Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20250323170529.197205-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-25	dt-bindings: riscv: document vector crypto requirements	Conor Dooley
	The Unpriv spec states: \| The Zvknhb and Zvbc Vector Crypto Extensions --and accordingly the \| composite extensions Zvkn, Zvknc, Zvkng, and Zvksc-- require a Zve64x \| base, or application ("V") base Vector Extension. All of the other \| Vector Crypto Extensions can be built on any embedded (Zve*) or \| application ("V") base Vector Extension. Enforce the minimum requirement via schema. Link: https://github.com/riscv/riscv-isa-manual/blob/main/src/vector-crypto.adoc#extensions-overview Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250312-flask-relay-b36ee622b2c8@spud Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-25	dt-bindings: riscv: add vector sub-extension dependencies	Conor Dooley
	Section 33.18.2. Zve*: Vector Extensions for Embedded Processors in [1] says: \| The Zve32f and Zve64x extensions depend on the Zve32x extension. The Zve64f extension depends \| on the Zve32f and Zve64x extensions. The Zve64d extension depends on the Zve64f extension \| The Zve32x extension depends on the Zicsr extension. The Zve32f and Zve64f extensions depend \| upon the F extension \| The Zve64d extension depends upon the D extension Apply these rules to the bindings to help prevent invalid combinations. Link: https://github.com/riscv/riscv-isa-manual/releases/tag/riscv-isa-release-698e64a-2024-09-09 [1] Reviewed-by: Clément Léger <cleger@rivosinc.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250312-banking-crestless-58f3259a5018@spud Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-25	dt-bindings: riscv: d requires f	Conor Dooley
	Per the specifications, the d extension for double-precision floating point operations depends on the f extension for single-precision floating point. Add that requirement to the bindings. This differs from the Linux implementation, where single-precious only is not supported. Reviewed-by: Clément Léger <cleger@rivosinc.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250312-perpetual-daunting-ad489c9a857a@spud Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-25	RISC-V: add f & d extension validation checks	Conor Dooley
	Using Clement's new validation callbacks, support checking that dependencies have been satisfied for the floating point extensions. The check for "d" might be slightly confusingly shorter than that of "f", despite "d" depending on "f". This is because the requirement that a hart supporting double precision must also support single precision, should be validated by dt-bindings etc, not the kernel but lack of support for single precision only is a limitation of the kernel. Tested-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250312-reptile-platinum-62ee0f444a32@spud Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-25	RISC-V: add vector crypto extension validation checks	Conor Dooley
	Using Clement's new validation callbacks, support checking that dependencies have been satisfied for the vector crpyto extensions. Currently riscv_isa_extension_available(<vector crypto>) will return true on systems that support the extensions but vector itself has been disabled by the kernel, adding validation callbacks will prevent such a scenario from occuring and make the behaviour of the extension detection functions more consistent with user expectations - it's not expected to have to check for vector AND the specific crypto extension. The Unpriv spec states: \| The Zvknhb and Zvbc Vector Crypto Extensions --and accordingly the \| composite extensions Zvkn, Zvknc, Zvkng, and Zvksc-- require a Zve64x \| base, or application ("V") base Vector Extension. All of the other \| Vector Crypto Extensions can be built on any embedded (Zve*) or \| application ("V") base Vector Extension. While this could be used as the basis for checking that the correct base for individual crypto extensions, but that's not really the kernel's job in my opinion and it is sufficient to leave that sort of precision to the dt-bindings. The kernel only needs to make sure that vector, in some form, is available. Link: https://github.com/riscv/riscv-isa-manual/blob/main/src/vector-crypto.adoc#extensions-overview Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250312-entertain-shaking-b664142c2f99@spud Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-25	RISC-V: add vector extension validation checks	Conor Dooley
	Using Clement's new validation callbacks, support checking that dependencies have been satisfied for the vector extensions. From the kernel's perfective, it's not required to differentiate between the conditions for all the various vector subsets - it's the firmware's job to not report impossible combinations. Instead, the kernel only has to check that the correct config options are enabled and to enforce its requirement of the d extension being present for FPU support. Since vector will now be disabled proactively, there's no need to clear the bit in elf_hwcap in riscv_fill_hwcap() any longer. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250312-eclair-affluent-55b098c3602b@spud Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-25	cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory	Marc Dionne
	Commit c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.") changed cachefiles_get_directory, replacing "subdir" with a ERR_PTR from the result of cachefiles_inject_write_error, which is either 0 or some error code. This causes an oops when the resulting pointer is passed to vfs_mkdir. Use a similar pattern to what is used earlier in the function; replace subdir with either the return value from vfs_mkdir, or the ERR_PTR of the cachefiles_inject_write_error() return value, but only if it is non zero. Fixes: c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.") cc: netfs@lists.linux.dev Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/20250325125905.395372-1-marc.dionne@auristor.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25	exec: fix the racy usage of fs_struct->in_exec	Oleg Nesterov
	check_unsafe_exec() sets fs->in_exec under cred_guard_mutex, then execve() paths clear fs->in_exec lockless. This is fine if exec succeeds, but if it fails we have the following race: T1 sets fs->in_exec = 1, fails, drops cred_guard_mutex T2 sets fs->in_exec = 1 T1 clears fs->in_exec T2 continues with fs->in_exec == 0 Change fs/exec.c to clear fs->in_exec with cred_guard_mutex held. Reported-by: syzbot+1c486d0b62032c82a968@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@google.com/ Cc: stable@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250324160003.GA8878@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25	selftests/pidfd: fixes syscall number defines	Oleg Nesterov
	I had to spend some (a lot;) time to understand why pidfd_info_test (and more) fails with my patch under qemu on my machine ;) Until I applied the patch below. I think it is a bad idea to do the things like #ifndef __NR_clone3 #define __NR_clone3 -1 #endif because this can hide a problem. My working laptop runs Fedora-23 which doesn't have __NR_clone3/etc in /usr/include/. So "make" happily succeeds, but everything fails and it is not clear why. Link: https://lore.kernel.org/r/20250323174518.GB834@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-25	pidfs: cleanup the usage of do_notify_pidfd()	Oleg Nesterov
	If a single-threaded process exits do_notify_pidfd() will be called twice, from exit_notify() and right after that from do_notify_parent(). 1. Change exit_notify() to call do_notify_pidfd() if the exiting task is not ptraced and it is not a group leader. 2. Change do_notify_parent() to call do_notify_pidfd() unconditionally. If tsk is not ptraced, do_notify_parent() will only be called when it is a group-leader and thread_group_empty() is true. This means that if tsk is ptraced, do_notify_pidfd() will be called from do_notify_parent() even if tsk is a delay_group_leader(). But this case is less common, and apart from the unnecessary __wake_up() is harmless. Granted, this unnecessary __wake_up() can be avoided, but I don't want to do it in this patch because it's just a consequence of another historical oddity: we notify the tracer even if !thread_group_empty(), but do_wait() from debugger can't work until all other threads exit. With or without this patch we should either eliminate do_notify_parent() in this case, or change do_wait(WEXITED) to untrace the ptraced delay_group_leader() at least when ptrace_reparented(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250323171955.GA834@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>