summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-08net: replace ADDRLABEL with dynamic debugWang Liang
ADDRLABEL only works when it was set in compilation phase. Replace it with net_dbg_ratelimited(). Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702104417.1526138-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08xfs: remove the bt_bdev_file buftarg fieldChristoph Hellwig
And use bt_file for both bdev and shmem backed buftargs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08xfs: rename the bt_bdev_* buftarg fieldsChristoph Hellwig
The extra bdev_ is weird, so drop it. Also improve the comment to make it clear these are the hardware limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08xfs: refactor xfs_calc_atomic_write_unit_maxChristoph Hellwig
This function and the helpers used by it duplicate the same logic for AGs and RTGs. Use the xfs_group_type enum to unify both variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08xfs: add a xfs_group_type_buftarg helperChristoph Hellwig
Generalize the xfs_group_type helper in the discard code to return a buftarg and move it to xfs_mount.h, and use the result in xfs_dax_notify_dev_failure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08xfs: remove the call to sync_blockdev in xfs_configure_buftargChristoph Hellwig
This extra call is not needed as xfs_alloc_buftarg already calls sync_blockdev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08xfs: clean up the initial read logic in xfs_readsbChristoph Hellwig
The initial sb read is always for a device logical block size buffer. The device logical block size is provided in the bt_logical_sectorsize in struct buftarg, so use that instead of the confusingly named xfs_getsize_buftarg buffer that reads it from the bdev. Update the comments surrounding the code to better describe what is going on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08Revert "xfrm: destroy xfrm_state synchronously on net exit path"Sabrina Dubroca
This reverts commit f75a2804da391571563c4b6b29e7797787332673. With all states (whether user or kern) removed from the hashtables during deletion, there's no need for synchronous destruction of states. xfrm6_tunnel states still need to have been destroyed (which will be the case when its last user is deleted (not destroyed)) so that xfrm6_tunnel_free_spi removes it from the per-netns hashtable before the netns is destroyed. This has the benefit of skipping one synchronize_rcu per state (in __xfrm_state_destroy(sync=true)) when we exit a netns. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-08xfrm: delete x->tunnel as we delete xSabrina Dubroca
The ipcomp fallback tunnels currently get deleted (from the various lists and hashtables) as the last user state that needed that fallback is destroyed (not deleted). If a reference to that user state still exists, the fallback state will remain on the hashtables/lists, triggering the WARN in xfrm_state_fini. Because of those remaining references, the fix in commit f75a2804da39 ("xfrm: destroy xfrm_state synchronously on net exit path") is not complete. We recently fixed one such situation in TCP due to defered freeing of skbs (commit 9b6412e6979f ("tcp: drop secpath at the same time as we currently drop dst")). This can also happen due to IP reassembly: skbs with a secpath remain on the reassembly queue until netns destruction. If we can't guarantee that the queues are flushed by the time xfrm_state_fini runs, there may still be references to a (user) xfrm_state, preventing the timely deletion of the corresponding fallback state. Instead of chasing each instance of skbs holding a secpath one by one, this patch fixes the issue directly within xfrm, by deleting the fallback state as soon as the last user state depending on it has been deleted. Destruction will still happen when the final reference is dropped. A separate lockdep class for the fallback state is required since we're going to lock x->tunnel while x is locked. Fixes: 9d4139c76905 ("netns xfrm: per-netns xfrm_state_all list") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-07-08net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()Eric Dumazet
tc_action_net_exit() got an rtnl exclusion in commit a159d3c4b829 ("net_sched: acquire RTNL in tc_action_net_exit()") Since then, commit 16af6067392c ("net: sched: implement reference counted action release") made this RTNL exclusion obsolete for most cases. Only tcf_action_offload_del() might still require it. Move the rtnl locking into tcf_idrinfo_destroy() when an offload action is found. Most netns do not have actions, yet deleting them is adding a lot of pressure on RTNL, which is for many the most contended mutex in the kernel. We are moving to a per-netns 'rtnl', so tc_action_net_exit() will not be able to grab 'rtnl' a single time for a batch of netns. Before the patch: perf probe -a rtnl_lock perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.305 MB perf.data (25 samples) ] After the patch: perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ] Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@nvidia.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/20250702071230.1892674-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08Merge branch 'net-mctp-add-support-for-gateway-routing'Paolo Abeni
Jeremy Kerr says: ==================== net: mctp: Add support for gateway routing This series adds a gateway route type for the MCTP core, allowing non-local EIDs as the match for a route. Example setup using the mctp tools: mctp route add 9 via mctpi2c0 mctp neigh add 9 dev mctpi2c0 lladdr 0x1d mctp route add 10 gw 9 - will route packets to eid 10 through mctpi2c0, using a dest lladdr of 0x1d (ie, that of the directly-attached eid 9). The core change to support this is the introduction of a struct mctp_dst, which represents the result of a route lookup. Since this involves a bit of surgery through the routing code, we add a few tests along the way. We're introducing an ABI change in the new RTM_{NEW,GET,DEL}ROUTE netlink formats, with the support for a RTA_GATEWAY attribute. Because we need a network ID specified to fully-qualify a gateway EID, the RTA_GATEWAY attribute carries the (net, eid) tuple in full: struct mctp_fq_addr { unsigned int net; mctp_eid_t eid; } Of course, any questions, comments etc are most welcome. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> ==================== Link: https://patch.msgid.link/20250702-dev-forwarding-v5-0-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: Add tests for gateway routesJeremy Kerr
Add a few kunit tests for the gateway routing. Because we have multiple route types now (direct and gateway), rename mctp_test_create_route to mctp_test_create_route_direct, and add a _gateway variant too. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-14-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: add gateway routing supportJeremy Kerr
This change allows for gateway routing, where a route table entry may reference a routable endpoint (by network and EID), instead of routing directly to a netdevice. We add support for a RTM_GATEWAY attribute for netlink route updates, with an attribute format of: struct mctp_fq_addr { unsigned int net; mctp_eid_t eid; } - we need the net here to uniquely identify the target EID, as we no longer have the device reference directly (which would provide the net id in the case of direct routes). This makes route lookups recursive, as a route lookup that returns a gateway route must be resolved into a direct route (ie, to a device) eventually. We provide a limit to the route lookups, to prevent infinite loop routing. The route lookup populates a new 'nexthop' field in the dst structure, which now specifies the key for the neighbour table lookup on device output, rather than using the packet destination address directly. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-13-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: allow NL parsing directly into a struct mctp_routeJeremy Kerr
The netlink route parsing functions end up setting a bunch of output variables from the rt attributes. This will get messy when the routes become more complex. So, split the rt parsing into two types: a lookup (returning route target data suitable for a route lookup, like when deleting a route) and a populate (setting fields of a struct mctp_route). In doing this, we need to separate the route allocation from mctp_route_add, so add some comments on the lifetime semantics for the latter. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-12-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: remove routes by netid, not by deviceJeremy Kerr
In upcoming changes, a route may not have a device associated. Since the route is matched on the (network, eid) tuple, pass the netid itself into mctp_route_remove. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-11-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: pass net into route creationJeremy Kerr
We may not have a mdev pointer, from which we currently extract the net. Instead, pass the net directly. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-10-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: Add initial socket testsJeremy Kerr
Recent changes have modified the extaddr path a little, so add a couple of kunit tests to af-mctp.c. These check that we're correctly passing lladdr data between sendmsg/recvmsg and the routing layer. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-9-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: add sock test infrastructureJeremy Kerr
Add a new test object, for use with the af_mctp socket code. This is intially empty, but we'll start populating actual tests in an upcoming change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-8-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: move functions into utils.[ch]Jeremy Kerr
A future change will add another mctp test .c file, so move some of the common test setup from route.c into the utils object. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-7-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: Add extaddr routing output testJeremy Kerr
Test that the routing code preserves the haddr data in a skb through an input route operation. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-6-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: Add an addressed device constructorJeremy Kerr
Upcoming tests will check semantics of hardware addressing, which require a dev with ->addr_len != 0. Add a constructor to create a MCTP interface using a physically-addressed bus type. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-5-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: separate cb from direct-addressing routingJeremy Kerr
Now that we have the dst->haddr populated by sendmsg (when extended addressing is in use), we no longer need to stash the link-layer address in the skb->cb. Instead, only use skb->cb for incoming lladdr data. While we're at it: remove cb->src, as was never used. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-4-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: separate routing database from routing operationsJeremy Kerr
This change adds a struct mctp_dst, representing the result of a routing lookup. This decouples the struct mctp_route from the actual implementation of a routing operation. This will allow for future routing changes which may require more involved lookup logic, such as gateway routing - which may require multiple traversals of the routing table. Since we only use the struct mctp_route at lookup time, we no longer hold routes over a routing operation, as we only need it to populate the dst. However, we do hold the dev while the dst is active. This requires some changes to the route test infrastructure, as we no longer have a mock route to handle the route output operation, and transient dsts are created by the routing code, so we can't override them as easily. Instead, we use kunit->priv to stash a packet queue, and a custom dst_output function queues into that packet queue, which we can use for later expectations. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-3-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: make cloned_frag buffers more appropriately-sizedJeremy Kerr
In our input_cloned_frag test, we currently allocate our test buffers arbitrarily-sized at 100 bytes. We only expect to receive a max of 15 bytes from the socket, so reduce to a more appropriate size. There are some upcoming changes to the routing code which hit a frame-size limit on s390, so reduce the usage before that lands. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-2-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: don't use source cb data when forwarding, ensure pkt_type is setJeremy Kerr
In the output path, only check the skb->cb data when we know it's from a local socket; input packets will have source address information there instead. In order to detect when we're forwarding, set skb->pkt_type on input/output. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-1-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08platform/x86: ideapad-laptop: Fix kbd backlight not remembered among bootsRong Zhang
On some models supported by ideapad-laptop, the HW/FW can remember the state of keyboard backlight among boots. However, it is always turned off while shutting down, as a side effect of the LED class device unregistering sequence. This is inconvenient for users who always prefer turning on the keyboard backlight. Thus, set LED_RETAIN_AT_SHUTDOWN on the LED class device so that the state of keyboard backlight gets remembered, which also aligns with the behavior of manufacturer utilities on Windows. Fixes: 503325f84bc0 ("platform/x86: ideapad-laptop: add keyboard backlight control support") Cc: stable@vger.kernel.org Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Hans de Goede <hansg@kernel.org> Link: https://lore.kernel.org/r/20250707163808.155876-3-i@rong.moe Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-08platform/x86: ideapad-laptop: Fix FnLock not remembered among bootsRong Zhang
On devices supported by ideapad-laptop, the HW/FW can remember the FnLock state among boots. However, since the introduction of the FnLock LED class device, it is turned off while shutting down, as a side effect of the LED class device unregistering sequence. Many users always turn on FnLock because they use function keys much more frequently than multimedia keys. The behavior change is inconvenient for them. Thus, set LED_RETAIN_AT_SHUTDOWN on the LED class device so that the FnLock state gets remembered, which also aligns with the behavior of manufacturer utilities on Windows. Fixes: 07f48f668fac ("platform/x86: ideapad-laptop: add FnLock LED class device") Cc: stable@vger.kernel.org Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Hans de Goede <hansg@kernel.org> Link: https://lore.kernel.org/r/20250707163808.155876-2-i@rong.moe Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-08xfs: replace strncpy with memcpy in xattr listingPranav Tyagi
Use memcpy() in place of strncpy() in __xfs_xattr_put_listent(). The length is known and a null byte is added manually. No functional change intended. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-08wifi: mac80211: fix rx link assignment for non-MLO stationsHari Chandrakanthan
Currently, ieee80211_rx_data_set_sta() does not correctly handle the case where the interface supports multiple links (MLO), but the station does not (non-MLO). This can lead to incorrect link assignment or unexpected warnings when accessing link information. Hence, add a fix to check if the station lacks valid link support and use its default link ID for rx->link assignment. If the station unexpectedly has valid links, fall back to the default link. This ensures correct link association and prevents potential issues in mixed MLO/non-MLO environments. Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com> Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250630084119.3583593-1-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-08Merge branch 'add-broadcast_neighbor-for-no-stacking-networking-arch'Paolo Abeni
Tonghao Zhang says: ==================== add broadcast_neighbor for no-stacking networking arch For no-stacking networking arch, and enable the bond mode 4(lacp) in datacenter, the switch require arp/nd packets as session synchronization. More details please see patch. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Zengbing Tu <tuzengbing@didiglobal.com> ==================== Link: https://patch.msgid.link/cover.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: bonding: send peer notify when failure recoveryTonghao Zhang
In LACP mode with broadcast_neighbor enabled, after LACP protocol recovery, the port can transmit packets. However, if the bond port doesn't send gratuitous ARP/ND packets to the switch, the switch won't return packets through the current interface. This causes traffic imbalance. To resolve this issue, when LACP protocol recovers, send ARP/ND packets if broadcast_neighbor is enabled. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/3993652dc093fffa9504ce1c2448fb9dea31d2d2.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: bonding: add broadcast_neighbor netlink optionTonghao Zhang
User can config or display the bonding broadcast_neighbor option via iproute2/netlink. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/76b90700ba5b98027dfb51a2f3c5cfea0440a21b.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: bonding: add broadcast_neighbor option for 802.3adTonghao Zhang
Stacking technology is a type of technology used to expand ports on Ethernet switches. It is widely used as a common access method in large-scale Internet data center architectures. Years of practice have proved that stacking technology has advantages and disadvantages in high-reliability network architecture scenarios. For instance, in stacking networking arch, conventional switch system upgrades require multiple stacked devices to restart at the same time. Therefore, it is inevitable that the business will be interrupted for a while. It is for this reason that "no-stacking" in data centers has become a trend. Additionally, when the stacking link connecting the switches fails or is abnormal, the stack will split. Although it is not common, it still happens in actual operation. The problem is that after the split, it is equivalent to two switches with the same configuration appearing in the network, causing network configuration conflicts and ultimately interrupting the services carried by the stacking system. To improve network stability, "non-stacking" solutions have been increasingly adopted, particularly by public cloud providers and tech companies like Alibaba, Tencent, and Didi. "non-stacking" is a method of mimicing switch stacking that convinces a LACP peer, bonding in this case, connected to a set of "non-stacked" switches that all of its ports are connected to a single switch (i.e., LACP aggregator), as if those switches were stacked. This enables the LACP peer's ports to aggregate together, and requires (a) special switch configuration, described in the linked article, and (b) modifications to the bonding 802.3ad (LACP) mode to send all ARP/ND packets across all ports of the active aggregator. Note that, with multiple aggregators, the current broadcast mode logic will send only packets to the selected aggregator(s). +-----------+ +-----------+ | switch1 | | switch2 | +-----------+ +-----------+ ^ ^ | | +-----------------+ | bond4 lacp | +-----------------+ | | | NIC1 | NIC2 +-----------------+ | server | +-----------------+ - https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/ Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com> Link: https://patch.msgid.link/84d0a044514157bb856a10b6d03a1028c4883561.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08wifi: cfg80211: move away from using a fake platform deviceGreg Kroah-Hartman
Downloading regulatory "firmware" needs a device to hang off of, and so a platform device seemed like the simplest way to do this. Now that we have a faux device interface, use that instead as this "regulatory device" is not anything resembling a platform device at all. Cc: Johannes Berg <johannes@sipsolutions.net> Cc: <linux-wireless@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2025070116-growing-skeptic-494c@gregkh Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-08Merge tag 'mt76-next-2025-07-07' of https://github.com/nbd168/wirelessJohannes Berg
Felix Fietkay says: =================== mt76 patches for 6.17 - firmware recovery improvements for mt7915 - mlo improvements - fixes =================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-08Merge tag 'mt76-fixes-2025-07-07' of https://github.com/nbd168/wirelessJohannes Berg
Felix Fietkau says: =================== mt76 fixes for 6.16 =================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-07drm/xe/bmg: fix compressed VRAM handlingMatthew Auld
There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user. To fix this pass along use_comp_pat into emit_pte() on the src side, to indicate that compression should be considered. v2 (Jonathan): tweak the commit message Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com (cherry picked from commit f7a2fd776e57bd6468644bdecd91ab3aba57ba58) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"Matthew Brost
This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 03d85ab36bcbcbe9dc962fccd3f8e54d7bb93b35) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07drm/xe: Allocate PF queue size on pow2 boundaryMatthew Brost
CIRC_SPACE does not work unless the size argument is a power of 2, allocate PF queue size on power of 2 boundary. Cc: stable@vger.kernel.org Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size") Fixes: 29582e0ea75c ("drm/xe: Add page queue multiplier") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250702213511.3226167-1-matthew.brost@intel.com (cherry picked from commit 491b9783126303755717c0cbde0b08ee59b6abab) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07drm/xe/pf: Clear all LMTT pages on allocMichal Wajdeczko
Our LMEM buffer objects are not cleared by default on alloc and during VF provisioning we only setup LMTT PTEs for the actually provisioned LMEM range. But beyond that valid range we might leave some stale data that could either point to some other VFs allocations or even to the PF pages. Explicitly clear all new LMTT page to avoid the risk that a malicious VF would try to exploit that gap. While around add asserts to catch any undesired PTE overwrites and low-level debug traces to track LMTT PT life-cycle. Fixes: b1d204058218 ("drm/xe/pf: Introduce Local Memory Translation Table") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250701220052.1612-1-michal.wajdeczko@intel.com (cherry picked from commit 3fae6918a3e27cce20ded2551f863fb05d4bef8d) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-07Merge branch 'net-mlx5-hws-optimize-matchers-icm-usage'Jakub Kicinski
Mark Bloch says: ==================== net/mlx5: HWS, Optimize matchers ICM usage This series optimizes ICM usage for unidirectional rules and empty matchers and with the last patch we make hardware steering the default FDB steering provider for NICs that don't support software steering. Hardware steering (HWS) uses a type of rule table container (RTC) that is unidirectional, so matchers consist of two RTCs to accommodate bidirectional rules. This small series enables resizing the two RTCs independently by tracking the number of rules separately. For extreme cases where all rules are unidirectional, this results in saving close to half the memory footprint. Results for inserting 1M unidirectional rules using a simple module: Pages Memory Before this patch: 300k 1.5GiB After this patch: 160k 900MiB The 'Pages' column measures the number of 4KiB pages the device requests for itself (the ICM). The 'Memory' column is the difference between peak usage and baseline usage (before starting the test) as reported by `free -h`. In addition, second to last patch of the series handles a case where all the matcher's rules were deleted: the large RTCs of the matcher are no longer required, and we can save some more ICM by shrinking the matcher to its initial size. Finally the last patch makes hardware steering the default mode when in swichdev for NICs that don't have software steering support. ==================== Link: https://patch.msgid.link/20250703185431.445571-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: Add HWS as secondary steering modeMoshe Shemesh
Add HW Steering (HWS) as a secondary option for device steering mode. If the device does not support SW Steering (SWS), HW Steering will be used as the default, provided it is supported. FW Steering will now be selected as the default only if both HWS and SWS are unavailable. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250703185431.445571-11-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Shrink empty matchersYevgeny Kliteynik
Matcher size is dynamic: it starts at initial size, and then it grows through rehash as more and more rules are added to this matcher. When rules are deleted, matcher's size is not decreased. Rehash approach is greedy. The idea is: if the matcher got to a certain size at some point, chances are - it will get to this size again, so it is better to avoid costly rehash operations whenever possible. However, when all the rules of the matcher are deleted, this should be viewed as special case. If the matcher actually got to the point where it has zero rules, it might be an indication that some usecase from the past is no longer happening. This is where some ICM can be freed. This patch handles this case: when a number of rules in a matcher goes down to zero, the matcher's tables are shrunk to the initial size. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250703185431.445571-10-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Rearrange to prevent forward declarationYevgeny Kliteynik
As a preparation for the following patch that will add support for shrinking empty matchers, rearrange the code to prevent forward declaration of functions. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250703185431.445571-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Track matcher sizes individuallyVlad Dogaru
Track and grow matcher sizes individually for RX and TX RTCs. This allows RX-only or TX-only use cases to effectively halve the device resources they use. For testing we used a simple module that inserts 1M RX-only rules and measured the number of pages the device requests, and memory usage as reported by `free -h`. Pages Memory Before this patch: 300k 1.5GiB After this patch: 160k 900MiB Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250703185431.445571-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Decouple matcher RX and TX sizesVlad Dogaru
Kernel HWS only uses FDB tables and, as such, creates two lower level containers (RTCs) for each matcher: one for RX and one for TX. Allow these RTCs to differ in size by converting the size part of the matcher attribute to a two element array. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250703185431.445571-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Create STEs directly from matcherVlad Dogaru
Matchers were using the pool abstraction solely as a convenience to allocate two STE ranges. The pool's core functionality, that of allocating individual items from the range, was unused. Matchers rely either on the hardware to hash rules into a table, or on a user-provided index. Remove the STE pool from the matcher and allocate the STE ranges manually instead. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250703185431.445571-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Refactor rule skip logicVlad Dogaru
Reduce nesting by adding a couple of early return statements. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250703185431.445571-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, Export rule skip logicVlad Dogaru
The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of RX and TX rules individually, so export this function for future usage. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250703185431.445571-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-07net/mlx5: HWS, remove incorrect commentYevgeny Kliteynik
Removing incorrect comment section that is probably some copy-paste artifact. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250703185431.445571-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>