summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-05ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-onlyGeert Uytterhoeven
if CONFIG_NET_IPGRE is enabled, but CONFIG_IPV6 is disabled: net/ipv4/ip_gre.c: In function ‘ipgre_err’: net/ipv4/ip_gre.c:144:22: error: variable ‘data_len’ set but not used [-Werror=unused-but-set-variable] 144 | unsigned int data_len = 0; | ^~~~~~~~ Fix this by moving all data_len processing inside the IPV6-only section that uses its result. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501121007.2GofXmh5-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d09113cfe2bfaca02f3dddf832fb5f48dd20958b.1738704881.git.geert@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'rxrpc-call-state-fixes'Jakub Kicinski
David Howells says: ==================== rxrpc: Call state fixes Here some call state fixes for AF_RXRPC. (1) Fix the state of a call to not treat the challenge-response cycle as part of an incoming call's state set. The problem is that it makes handling received of the final packet in the receive phase difficult as that wants to change the call state - but security negotiations may not yet be complete. (2) Fix a race between the changing of the call state at the end of the request reception phase of a service call, recvmsg() collecting the last data and sendmsg() trying to send the reply before the I/O thread has advanced the call state. Link: https://lore.kernel.org/20250203110307.7265-2-dhowells@redhat.com ==================== Link: https://patch.msgid.link/20250204230558.712536-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05rxrpc: Fix race in call state changing vs recvmsg()David Howells
There's a race in between the rxrpc I/O thread recording the end of the receive phase of a call and recvmsg() examining the state of the call to determine whether it has completed. The problem is that call->_state records the I/O thread's view of the call, not the application's view (which may lag), so that alone is not sufficient. To this end, the application also checks whether there is anything left in call->recvmsg_queue for it to pick up. The call must be in state RXRPC_CALL_COMPLETE and the recvmsg_queue empty for the call to be considered fully complete. In rxrpc_input_queue_data(), the latest skbuff is added to the queue and then, if it was marked as LAST_PACKET, the state is advanced... But this is two separate operations with no locking around them. As a consequence, the lack of locking means that sendmsg() can jump into the gap on a service call and attempt to send the reply - but then get rejected because the I/O thread hasn't advanced the state yet. Simply flipping the order in which things are done isn't an option as that impacts the client side, causing the checks in rxrpc_kernel_check_life() as to whether the call is still alive to race instead. Fix this by moving the update of call->_state inside the skb queue spinlocked section where the packet is queued on the I/O thread side. rxrpc's recvmsg() will then automatically sync against this because it has to take the call->recvmsg_queue spinlock in order to dequeue the last packet. rxrpc's sendmsg() doesn't need amending as the app shouldn't be calling it to send a reply until recvmsg() indicates it has returned all of the request. Fixes: 93368b6bd58a ("rxrpc: Move call state changes from recvmsg to I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250204230558.712536-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05rxrpc: Fix call state set to not include the SERVER_SECURING stateDavid Howells
The RXRPC_CALL_SERVER_SECURING state doesn't really belong with the other states in the call's state set as the other states govern the call's Rx/Tx phase transition and govern when packets can and can't be received or transmitted. The "Securing" state doesn't actually govern the reception of packets and would need to be split depending on whether or not we've received the last packet yet (to mirror RECV_REQUEST/ACK_REQUEST). The "Securing" state is more about whether or not we can start forwarding packets to the application as recvmsg will need to decode them and the decoding can't take place until the challenge/response exchange has completed. Fix this by removing the RXRPC_CALL_SERVER_SECURING state from the state set and, instead, using a flag, RXRPC_CALL_CONN_CHALLENGING, to track whether or not we can queue the call for reception by recvmsg() or notify the kernel app that data is ready. In the event that we've already received all the packets, the connection event handler will poke the app layer in the appropriate manner. Also there's a race whereby the app layer sees the last packet before rxrpc has managed to end the rx phase and change the state to one amenable to allowing a reply. Fix this by queuing the packet after calling rxrpc_end_rx_phase(). Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250204230558.712536-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net: sched: Fix truncation of offloaded action statisticsIdo Schimmel
In case of tc offload, when user space queries the kernel for tc action statistics, tc will query the offloaded statistics from device drivers. Among other statistics, drivers are expected to pass the number of packets that hit the action since the last query as a 64-bit number. Unfortunately, tc treats the number of packets as a 32-bit number, leading to truncation and incorrect statistics when the number of packets since the last query exceeds 0xffffffff: $ tc -s filter show dev swp2 ingress filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 skip_sw in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device swp1) stolen index 1 ref 1 bind 1 installed 58 sec used 0 sec Action statistics: Sent 1133877034176 bytes 536959475 pkt (dropped 0, overlimits 0 requeues 0) [...] According to the above, 2111-byte packets were redirected which is impossible as only 64-byte packets were transmitted and the MTU was 1500. Fix by treating packets as a 64-bit number: $ tc -s filter show dev swp2 ingress filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 skip_sw in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device swp1) stolen index 1 ref 1 bind 1 installed 61 sec used 0 sec Action statistics: Sent 1370624380864 bytes 21416005951 pkt (dropped 0, overlimits 0 requeues 0) [...] Which shows that only 64-byte packets were redirected (1370624380864 / 21416005951 = 64). Fixes: 380407023526 ("net/sched: Enable netdev drivers to update statistics of offloaded actions") Reported-by: Joe Botha <joe@atomic.ac> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250204123839.1151804-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05r8169: don't scan PHY addresses > 0Heiner Kallweit
The PHY address is a dummy, because r8169 PHY access registers don't support a PHY address. Therefore scan address 0 only. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/830637dd-4016-4a68-92b3-618fcac6589d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05tun: revert fix group permission checkWillem de Bruijn
This reverts commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3. The blamed commit caused a regression when neither tun->owner nor tun->group is set. This is intended to be allowed, but now requires CAP_NET_ADMIN. Discussion in the referenced thread pointed out that the original issue that prompted this patch can be resolved in userspace. The relaxed access control may also make a device accessible when it previously wasn't, while existing users may depend on it to not be. This is a clean pure git revert, except for fixing the indentation on the gid_valid line that checkpatch correctly flagged. Fixes: 3ca459eaba1b ("tun: fix group permission check") Link: https://lore.kernel.org/netdev/CAFqZXNtkCBT4f+PwyVRmQGoT3p1eVa01fCG_aNtpt6dakXncUg@mail.gmail.com/ Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Stas Sergeev <stsp2@yandex.ru> Link: https://patch.msgid.link/20250204161015.739430-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net: flush_backlog() small changesEric Dumazet
Add READ_ONCE() around reads of skb->dev->reg_state, because this field can be changed from other threads/cpus. Instead of calling dev_kfree_skb_irq() and kfree_skb() while interrupts are masked and locks held, use a temporary list and use __skb_queue_purge_reason() Use SKB_DROP_REASON_DEV_READY drop reason to better describe why these skbs are dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250204144825.316785-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05s390/net: Remove LCS driverAswin Karuvally
The original Open Systems Adapter (OSA) was introduced by IBM in the mid-90s. These were then superseded by OSA-Express in 1999 which used Queued Direct IO to greatly improve throughput. The newer cards retained the older, slower non-QDIO (OSE) modes for compatibility with older systems. In Linux, the lcs driver was responsible for cards operating in the older OSE mode and the qeth driver was introduced to allow the OSA-Express cards to operate in the newer QDIO (OSD) mode. For an S390 machine from 1998 or later, there is no reason to use the OSE mode and lcs driver as all OSA cards since 1999 provide the faster OSD mode. As a result, it's been years since we have heard of a customer configuration involving the lcs driver. This patch removes the lcs driver. The technology it supports has been obsolete for past 25+ years and is irrelevant for current use cases. Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250204103135.1619097-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05cxgb4: Avoid a -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Move the conflicting declaration to the end of the structure. Notice that `struct ethtool_dump` is a flexible structure --a structure that contains a flexible-array member. Fix the following warning: ./drivers/net/ethernet/chelsio/cxgb4/cxgb4.h:1215:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/Z6GBZ4brXYffLkt_@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'net_sched-two-security-bug-fixes-and-test-cases'Jakub Kicinski
Cong Wang says: ==================== net_sched: two security bug fixes and test cases This patchset contains two bug fixes reported in security mailing list, and test cases for both of them. ==================== Link: https://patch.msgid.link/20250204005841.223511-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()Cong Wang
Integrate the test case provided by Mingi Cho into TDC. All test results: 1..4 ok 1 ca5e - Check class delete notification for ffff: ok 2 e4b7 - Check class delete notification for root ffff: ok 3 33a9 - Check ingress is not searchable on backlog update ok 4 a4b9 - Test class qlen notification Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05netem: Update sch->q.qlen before qdisc_tree_reduce_backlog()Cong Wang
qdisc_tree_reduce_backlog() notifies parent qdisc only if child qdisc becomes empty, therefore we need to reduce the backlog of the child qdisc before calling it. Otherwise it would miss the opportunity to call cops->qlen_notify(), in the case of DRR, it resulted in UAF since DRR uses ->qlen_notify() to maintain its active list. Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc") Cc: Martin Ottens <martin.ottens@fau.de> Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-4-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0Quang Le
When limit == 0, pfifo_tail_enqueue() must drop new packet and increase dropped packets count of the qdisc. All test results: 1..16 ok 1 a519 - Add bfifo qdisc with system default parameters on egress ok 2 585c - Add pfifo qdisc with system default parameters on egress ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument ok 8 7787 - Add pfifo qdisc on egress with unsupported argument ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size ok 10 3df6 - Replace pfifo qdisc on egress with new queue size ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format ok 12 1298 - Add duplicate bfifo qdisc on egress ok 13 45a0 - Delete nonexistent bfifo qdisc ok 14 972b - Add prio qdisc on egress with invalid format for handles ok 15 4d39 - Delete bfifo qdisc twice ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0 Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05pfifo_tail_enqueue: Drop new packet when sch->limit == 0Quang Le
Expected behaviour: In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a packet in scheduler's queue and decrease scheduler's qlen by one. Then, pfifo_tail_enqueue() enqueue new packet and increase scheduler's qlen by one. Finally, pfifo_tail_enqueue() return `NET_XMIT_CN` status code. Weird behaviour: In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a scheduler that has no packet, the 'drop a packet' step will do nothing. This means the scheduler's qlen still has value equal 0. Then, we continue to enqueue new packet and increase scheduler's qlen by one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by one and return `NET_XMIT_CN` status code. The problem is: Let's say we have two qdiscs: Qdisc_A and Qdisc_B. - Qdisc_A's type must have '->graft()' function to create parent/child relationship. Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`. - Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`. - Qdisc_B is configured to have `sch->limit == 0`. - Qdisc_A is configured to route the enqueued's packet to Qdisc_B. Enqueue packet through Qdisc_A will lead to: - hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B) - Qdisc_B->q.qlen += 1 - pfifo_tail_enqueue() return `NET_XMIT_CN` - hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A. The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1. Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem. This violate the design where parent's qlen should equal to the sum of its childrens'qlen. Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable. Fixes: 57dbb2d83d10 ("sched: add head drop fifo queue") Reported-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests: mptcp: connect: -f: no reconnectMatthieu Baerts (NGI0)
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE by closing the connection with unread bytes in the receive queue. The xdisconnect() helper was used to stop the connection, but it does more than that: it will shut it down, then wait before reconnecting to the same address. This causes the mptcp_join's "fastclose test" to fail all the time. This failure is due to a recent change, with commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), but that went unnoticed because the test is currently ignored. The recent modification only shown an existing issue: xdisconnect() doesn't need to be used here, only the shutdown() part is needed. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250204-net-mptcp-sft-conn-f-v1-1-6b470c72fffa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05bridge: mdb: Allow replace of a host-joined groupPetr Machata
Attempts to replace an MDB group membership of the host itself are currently bounced: # ip link add name br up type bridge vlan_filtering 1 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2 Error: bridge: Group is already joined by host. A similar operation done on a member port would succeed. Ignore the check for replacement of host group memberships as well. The bit of code that this enables is br_multicast_host_join(), which, for already-joined groups only refreshes the MC group expiration timer, which is desirable; and a userspace notification, also desirable. Change a selftest that exercises this code path from expecting a rejection to expecting a pass. The rest of MDB selftests pass without modification. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.1738685648.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests: net: suppress ReST file generation when building selftestsJakub Kicinski
Some selftests need libynl.a. When building it try to skip generating the ReST documentation, libynl.a does not depend on them. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203214850.1282291-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'net-sysfs-remove-the-rtnl_trylock-restart_syscall-construction'Jakub Kicinski
Antoine Tenart says: ==================== net-sysfs: remove the rtnl_trylock/restart_syscall construction The series initially aimed at improving spins (and thus delays) while accessing net sysfs under rtnl lock contention[1]. The culprit was the trylock/restart_syscall constructions. There wasn't much interest at the time but it got traction recently for other reasons (lowering the rtnl lock pressure). Since v1[2]: - Do not export rtnl_lock_interruptible [Stephen]. - Add netdev_warn_once messages in rx_queue_add_kobject [Jakub]. Since the RFC[1]: - Limit the breaking of the sysfs protection to sysfs_rtnl_lock() only as this is not needed in the whole rtnl locking section thanks to the additional check on dev_isalive(). This simplifies error handling as well as the unlocking path. - Used an interruptible version of rtnl_lock, as done by Jakub in his experiments. - Removed a WARN_ONCE_ONCE [Greg]. - Removed explicit inline markers [Stephen]. Most of the reasoning is explained in comments added in patch 1. This was tested by stress-testing net sysfs attributes (read/write ops) while adding/removing queues and adding/removing veths, all in parallel. I also used an OCP single node cluster, spawning lots of pods. [1] https://lore.kernel.org/all/20231018154804.420823-1-atenart@kernel.org/T/ [2] https://lore.kernel.org/all/20250117102612.132644-1-atenart@kernel.org/T/ ==================== Link: https://patch.msgid.link/20250204170314.146022-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: remove rtnl_trylock from queue attributesAntoine Tenart
Similar to the commit removing remove rtnl_trylock from device attributes we here apply the same technique to networking queues. Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-5-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: prevent uncleared queues from being re-addedAntoine Tenart
With the (upcoming) removal of the rtnl_trylock/restart_syscall logic and because of how Tx/Rx queues are implemented (and their requirements), it might happen that a queue is re-added before having the chance to be cleared. In such rare case, do not complete the queue addition operation. Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-4-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: move queue attribute groups outside the default groupsAntoine Tenart
Rx/tx queues embed their own kobject for registering their per-queue sysfs files. The issue is they're using the kobject default groups for this and entirely rely on the kobject refcounting for releasing their sysfs paths. In order to remove rtnl_trylock calls we need sysfs files not to rely on their associated kobject refcounting for their release. Thus we here move queues sysfs files from the kobject default groups to their own groups which can be removed separately. Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-3-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: remove rtnl_trylock from device attributesAntoine Tenart
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4]. There are not many possibilities to improve the above: - Rework the entire net/ locking logic. - Invert two locks in one of the paths — not possible. But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments. Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed. [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/ [2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/ Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge tag 'for-6.14-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - add lockdep annotation for relocation root to fix a splat warning while merging roots - fix assertion failure when splitting ordered extent after transaction abort - don't print 'qgroup inconsistent' message when rescan process updates qgroup data sooner than the subvolume deletion process - fix use-after-free (accessing the error number) when attempting to join an aborted transaction - avoid starting new transaction if not necessary when cleaning qgroup during subvolume drop * tag 'for-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid starting new transaction when cleaning qgroup during subvolume drop btrfs: fix use-after-free when attempting to join an aborted transaction btrfs: do not output error message if a qgroup has been already cleaned up btrfs: fix assertion failure when splitting ordered extent after transaction abort btrfs: fix lockdep splat while merging a relocation root
2025-02-05net: phy: realtek: use string choices helpersHeiner Kallweit
Use string choices helpers to simplify the code. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501190707.qQS8PGHW-lkp@intel.com/ Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-04r8169: make Kconfig option for LED support user-visibleHeiner Kallweit
Make config option R8169_LEDS user-visible, so that users can remove support if not needed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d29f0cdb-32bf-435f-b59d-dc96bca1e3ab@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: phy: realtek: make HWMON support a user-visible Kconfig symbolHeiner Kallweit
Make config symbol REALTEK_PHY_HWMON user-visible, so that users can remove support if not needed. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/3466ee92-166a-4b0f-9ae7-42b9e046f333@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04netconsole: selftest: Add test for fragmented messagesBreno Leitao
Add a new selftest to verify netconsole's handling of messages that exceed the packet size limit and require fragmentation. The test sends messages with varying sizes and userdata, validating that: 1. Large messages are correctly fragmented and reassembled 2. Userdata fields are properly preserved across fragments 3. Messages work correctly with and without kernel release version appending The test creates a networking environment using netdevsim, sends messages through /dev/kmsg, and verifies the received fragments maintain message integrity. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: atlantic: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unused flexible-array member `buf` and, with this, fix the following warnings: drivers/net/ethernet/aquantia/atlantic/aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/aquantia/atlantic/hw_atl/../aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Suggested-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Link: https://patch.msgid.link/Z6F3KZVfnAZ2FoJm@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: warn if NAPI instance wasn't shut downJakub Kicinski
Drivers should always disable a NAPI instance before removing it. If they don't the instance may be queued for polling. Since commit 86e25f40aa1e ("net: napi: Add napi_config") we also remove the NAPI from the busy polling hash table in napi_disable(), so not disabling would leave a stale entry there. Use of busy polling is relatively uncommon so bugs may be lurking in the drivers. Add an explicit warning. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250203215816.1294081-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04cavium/liquidio: Remove unused lio_get_device_idDr. David Alan Gilbert
lio_get_device_id() has been unused since 2018's commit 64fecd3ec512 ("liquidio: remove obsolete functions and data structures") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203183343.193691-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04mlxsw: spectrum_router: Remove unused functionsDr. David Alan Gilbert
mlxsw_sp_ipip_lb_ul_vr_id() has been unused since 2020's commit acde33bf7319 ("mlxsw: spectrum_router: Reduce mlxsw_sp_ipip_fib_entry_op_gre4()") mlxsw_sp_rif_exists() has been unused since 2023's commit 49c3a615d382 ("mlxsw: spectrum_router: Replay MACVLANs when RIF is made") mlxsw_sp_rif_vid() has been unused since 2023's commit a5b52692e693 ("mlxsw: spectrum_switchdev: Manage RIFs on PVID change") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250203190141.204951-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net/mlx5: Remove unused mlx5dr_domain_syncDr. David Alan Gilbert
mlx5dr_domain_sync() was added in 2019 by commit 70605ea545e8 ("net/mlx5: DR, Expose APIs for direct rule managing") but hasn't been used. Remove it. mlx5dr_domain_sync() was the only user of mlx5dr_send_ring_force_drain(). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250203185958.204794-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04mlx4: Remove unused functionsDr. David Alan Gilbert
The last use of mlx4_find_cached_mac() was removed in 2014 by commit 2f5bb473681b ("mlx4: Add ref counting to port MAC table for RoCE") mlx4_zone_free_entries() was added in 2014 by commit 7a89399ffad7 ("net/mlx4: Add mlx4_bitmap zone allocator") but hasn't been used. (The _unique version is used) Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250203185229.204279-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: qed: fix typosAndrew Kreimer
There are some typos in comments/messages: - Valiate -> Validate - acceptible -> acceptable - acces -> access - relased -> released Fix them via codespell. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203175419.4146-1-algonell@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: rose: lock the socket in rose_bind()Eric Dumazet
syzbot reported a soft lockup in rose_loopback_timer(), with a repro calling bind() from multiple threads. rose_bind() must lock the socket to avoid this issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+7ff41b5215f0c534534e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67a0f78d.050a0220.d7c5a.00a0.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250203170838.3521361-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04dt-bindings: net: faraday,ftgmac100: Add phys modeNinad Palsule
Aspeed device supports rgmii, rgmii-id, rgmii-rxid, rgmii-txid so document them. Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Ninad Palsule <ninad@linux.ibm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250203151306.276358-2-ninad@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04neighbour: remove neigh_parms_destroy()Eric Dumazet
neigh_parms_destroy() is a simple kfree(), no need for a forward declaration. neigh_parms_put() can instead call kfree() directly. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203151152.3163876-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: atlantic: fix warning during hot unplugJacob Moroni
Firmware deinitialization performs MMIO accesses which are not necessary if the device has already been removed. In some cases, these accesses happen via readx_poll_timeout_atomic which ends up timing out, resulting in a warning at hw_atl2_utils_fw.c:112: [ 104.595913] Call Trace: [ 104.595915] <TASK> [ 104.595918] ? show_regs+0x6c/0x80 [ 104.595923] ? __warn+0x8d/0x150 [ 104.595925] ? aq_a2_fw_deinit+0xcf/0xe0 [atlantic] [ 104.595934] ? report_bug+0x182/0x1b0 [ 104.595938] ? handle_bug+0x6e/0xb0 [ 104.595940] ? exc_invalid_op+0x18/0x80 [ 104.595942] ? asm_exc_invalid_op+0x1b/0x20 [ 104.595944] ? aq_a2_fw_deinit+0xcf/0xe0 [atlantic] [ 104.595952] ? aq_a2_fw_deinit+0xcf/0xe0 [atlantic] [ 104.595959] aq_nic_deinit.part.0+0xbd/0xf0 [atlantic] [ 104.595964] aq_nic_deinit+0x17/0x30 [atlantic] [ 104.595970] aq_ndev_close+0x2b/0x40 [atlantic] [ 104.595975] __dev_close_many+0xad/0x160 [ 104.595978] dev_close_many+0x99/0x170 [ 104.595979] unregister_netdevice_many_notify+0x18b/0xb20 [ 104.595981] ? __call_rcu_common+0xcd/0x700 [ 104.595984] unregister_netdevice_queue+0xc6/0x110 [ 104.595986] unregister_netdev+0x1c/0x30 [ 104.595988] aq_pci_remove+0xb1/0xc0 [atlantic] Fix this by skipping firmware deinitialization altogether if the PCI device is no longer present. Tested with an AQC113 attached via Thunderbolt by performing repeated unplug cycles while traffic was running via iperf. Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code") Signed-off-by: Jacob Moroni <mail@jakemoroni.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203143604.24930-3-mail@jakemoroni.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04bonding: delete always true device checkLeon Romanovsky
XFRM API makes sure that xs->xso.dev is valid in all XFRM offload callbacks. There is no need to check it again. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/0b2f8f5f09701bb43bbd83b94bfe5cb506b57adc.1738587150.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04Merge tag 'kthreads-fixes-2025-02-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks Pull kthreads fix from Frederic Weisbecker: - Properly handle return value when allocation fails for the preferred affinity * tag 'kthreads-fixes-2025-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: kthread: Fix return value on kzalloc() failure in kthread_affine_preferred()
2025-02-04Merge tag 'livepatching-for-6.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching fix from Petr Mladek: - Fix livepatching selftests for util-linux-2.40.x * tag 'livepatching-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: selftests: livepatch: handle PRINTK_CALLER in check_result()
2025-02-04Merge tag 'platform-drivers-x86-v6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - ideapad-laptop: Pass a correct pointer to the driver data - intel/ifs: Provide a link to the IFS test images - intel/pmc: Use large enough type when decoding LTR value * tag 'platform-drivers-x86-v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/ifs: Update documentation with image download path platform/x86/intel: pmc: fix ltr decode in pmc_core_ltr_show() platform/x86: ideapad-laptop: pass a correct pointer to the driver data
2025-02-04rxrpc: Fix the rxrpc_connection attend queue handlingDavid Howells
The rxrpc_connection attend queue is never used because conn::attend_link is never initialised and so is always NULL'd out and thus always appears to be busy. This requires the following fix: (1) Fix this the attend queue problem by initialising conn::attend_link. And, consequently, two further fixes for things masked by the above bug: (2) Fix rxrpc_input_conn_event() to handle being invoked with a NULL sk_buff pointer - something that can now happen with the above change. (3) Fix the RXRPC_SKB_MARK_SERVICE_CONN_SECURED message to carry a pointer to the connection and a ref on it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jakub Kicinski <kuba@kernel.org> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Paolo Abeni <pabeni@redhat.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Fixes: f2cce89a074e ("rxrpc: Implement a mechanism to send an event notification to a connection") Link: https://patch.msgid.link/20250203110307.7265-3-dhowells@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-04platform/x86/intel/ifs: Update documentation with image download pathJithu Joseph
The documentation previously listed the path to download In Field Scan (IFS) test images as "TBD". Update the documentation to include the correct image download location. Also move the download link to the appropriate section within the documentation. Reported-by: Anisse Astier <anisse@astier.eu> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Link: https://lore.kernel.org/r/20250131205315.1585663-1-jithu.joseph@intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-02-03net: harmonize tstats and dstatsPaolo Abeni
After the blamed commits below, some UDP tunnel use dstats for accounting. On the xmit path, all the UDP-base tunnels ends up using iptunnel_xmit_stats() for stats accounting, and the latter assumes the relevant (tunnel) network device uses tstats. The end result is some 'funny' stat report for the mentioned UDP tunnel, e.g. when no packet is actually dropped and a bunch of packets are transmitted: gnv2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue \ state UNKNOWN mode DEFAULT group default qlen 1000 link/ether ee:7d:09:87:90:ea brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 14916 23 0 15 0 0 TX: bytes packets errors dropped carrier collsns 0 1566 0 0 0 0 Address the issue ensuring the same binary layout for the overlapping fields of dstats and tstats. While this solution is a bit hackish, is smaller and with no performance pitfall compared to other alternatives i.e. supporting both dstat and tstat in iptunnel_xmit_stats() or reverting the blamed commit. With time we should possibly move all the IP-based tunnel (and virtual devices) to dstats. Fixes: c77200c07491 ("bareudp: Handle stats using NETDEV_PCPU_STAT_DSTATS.") Fixes: 6fa6de302246 ("geneve: Handle stats using NETDEV_PCPU_STAT_DSTATS.") Fixes: be226352e8dc ("vxlan: Handle stats using NETDEV_PCPU_STAT_DSTATS.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/2e1c444cf0f63ae472baff29862c4c869be17031.1738432804.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-03Merge branch 'ethtool-rss-minor-fixes-for-recent-rss-changes'Jakub Kicinski
Jakub Kicinski says: ==================== ethtool: rss: minor fixes for recent RSS changes Make sure RSS_GET messages are consistent in do and dump. Fix up a recently added safety check for RSS + queue offset. Adjust related tests so that they pass on devices which don't support RSS + queue offset. ==================== Link: https://patch.msgid.link/20250201013040.725123-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-03selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not ↵Jakub Kicinski
supported Vast majority of drivers does not support queue offset. Simply return if the rss context + queue ntuple fails. Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250201013040.725123-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-03selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigureJakub Kicinski
Commit under Fixes adds ntuple rules but never deletes them. Fixes: 29a4bc1fe961 ("selftest: extend test_rss_context_queue_reconfigure for action addition") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250201013040.725123-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-03ethtool: ntuple: fix rss + ring_cookie checkJakub Kicinski
The info.flow_type is for RXFH commands, ntuple flow_type is inside the flow spec. The check currently does nothing, as info.flow_type is 0 (or even uninitialized by user space) for ETHTOOL_SRXCLSRLINS. Fixes: 9e43ad7a1ede ("net: ethtool: only allow set_rxnfc with rss + ring_cookie if driver opts in") Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250201013040.725123-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>