summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-08ipv6: ndisc: Remove __in6_dev_get() in pndisc_{constructor,destructor}().Kuniyuki Iwashima
ipv6_dev_mc_{inc,dec}() has the same check. Let's remove __in6_dev_get() from pndisc_constructor() and pndisc_destructor(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250702230210.3115355-2-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'net-xsk-update-tx-queue-consumer'Jakub Kicinski
Jason Xing says: ==================== net: xsk: update tx queue consumer Patch 1 makes sure the consumer is updated at the end of generic xmit. Patch 2 adds corresponding test. ==================== Link: https://patch.msgid.link/20250703141712.33190-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08selftests/bpf: add a new test to check the consumer update caseJason Xing
The subtest sends 33 packets at one time on purpose to see if xsk exitting __xsk_generic_xmit() updates the global consumer of tx queue when reaching the max loop (max_tx_budget, 32 by default). The number 33 can avoid xskq_cons_peek_desc() updates the consumer when it's about to quit sending, to accurately check if the issue that the first patch resolves remains. The new case will not check this issue in zero copy mode. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250703141712.33190-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: xsk: update tx queue consumer immediately after transmissionJason Xing
For afxdp, the return value of sendto() syscall doesn't reflect how many descs handled in the kernel. One of use cases is that when user-space application tries to know the number of transmitted skbs and then decides if it continues to send, say, is it stopped due to max tx budget? The following formular can be used after sending to learn how many skbs/descs the kernel takes care of: tx_queue.consumers_before - tx_queue.consumers_after Prior to the current patch, in non-zc mode, the consumer of tx queue is not immediately updated at the end of each sendto syscall when error occurs, which leads to the consumer value out-of-dated from the perspective of user space. So this patch requires store operation to pass the cached value to the shared value to handle the problem. More than those explicit errors appearing in the while() loop in __xsk_generic_xmit(), there are a few possible error cases that might be neglected in the following call trace: __xsk_generic_xmit() xskq_cons_peek_desc() xskq_cons_read_desc() xskq_cons_is_valid_desc() It will also cause the premature exit in the while() loop even if not all the descs are consumed. Based on the above analysis, using @sent_frame could cover all the possible cases where it might lead to out-of-dated global state of consumer after finishing __xsk_generic_xmit(). The patch also adds a common helper __xsk_tx_release() to keep align with the zc mode usage in xsk_tx_release(). Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250703141712.33190-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'af_unix-introduce-so_inq-scm_inq'Jakub Kicinski
Kuniyuki Iwashima says: ==================== af_unix: Introduce SO_INQ & SCM_INQ. We have an application that uses almost the same code for TCP and AF_UNIX (SOCK_STREAM). The application uses TCP_INQ for TCP, but AF_UNIX doesn't have it and requires an extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO) as an alternative. Also, ioctl(SIOCINQ) for AF_UNIX SOCK_STREAM is more expensive because it needs to iterate all skb in the receive queue. This series adds a cached field for SIOCINQ to speed it up and introduce SO_INQ, the generic version of TCP_INQ to get the queue length as cmsg in each recvmsg(). ==================== Link: https://patch.msgid.link/20250702223606.1054680-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08selftest: af_unix: Add test for SO_INQ.Kuniyuki Iwashima
Let's add a simple test to check the basic functionality of SO_INQ. The test does the following: 1. Create socketpair in self->fd[] 2. Enable SO_INQ 3. Send data via self->fd[0] 4. Receive data from self->fd[1] 5. Compare the SCM_INQ cmsg with ioctl(SIOCINQ) Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08af_unix: Introduce SO_INQ.Kuniyuki Iwashima
We have an application that uses almost the same code for TCP and AF_UNIX (SOCK_STREAM). TCP can use TCP_INQ, but AF_UNIX doesn't have it and requires an extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO) as an alternative. Let's introduce the generic version of TCP_INQ. If SO_INQ is enabled, recvmsg() will put a cmsg of SCM_INQ that contains the exact value of ioctl(SIOCINQ). The cmsg is also included when msg->msg_get_inq is non-zero to make sockets io_uring-friendly. Note that SOCK_CUSTOM_SOCKOPT is flagged only for SOCK_STREAM to override setsockopt() for SOL_SOCKET. By having the flag in struct unix_sock, instead of struct sock, we can later add SO_INQ support for TCP and reuse tcp_sk(sk)->recvmsg_inq. Note also that supporting custom getsockopt() for SOL_SOCKET will need preparation for other SOCK_CUSTOM_SOCKOPT users (UDP, vsock, MPTCP). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08af_unix: Cache state->msg in unix_stream_read_generic().Kuniyuki Iwashima
In unix_stream_read_generic(), state->msg is fetched multiple times. Let's cache it in a local variable. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702223606.1054680-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08af_unix: Use cached value for SOCK_STREAM in unix_inq_len().Kuniyuki Iwashima
Compared to TCP, ioctl(SIOCINQ) for AF_UNIX SOCK_STREAM socket is more expensive, as unix_inq_len() requires iterating through the receive queue and accumulating skb->len. Let's cache the value for SOCK_STREAM to a new field during sendmsg() and recvmsg(). The field is protected by the receive queue lock. Note that ioctl(SIOCINQ) for SOCK_DGRAM returns the length of the first skb in the queue. SOCK_SEQPACKET still requires iterating through the queue because we do not touch functions shared with unix_dgram_ops. But, if really needed, we can support it by switching __skb_try_recv_datagram() to a custom version. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08af_unix: Don't use skb_recv_datagram() in unix_stream_read_skb().Kuniyuki Iwashima
unix_stream_read_skb() calls skb_recv_datagram() with MSG_DONTWAIT, which is mostly equivalent to sock_error(sk) + skb_dequeue(). In the following patch, we will add a new field to cache the number of bytes in the receive queue. Then, we want to avoid introducing atomic ops in the fast path, so we will reuse the receive queue lock. As a preparation for the change, let's not use skb_recv_datagram() in unix_stream_read_skb(). Note that sock_error() is now moved out of the u->iolock mutex as the mutex does not synchronise the peer's close() at all. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702223606.1054680-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08af_unix: Don't check SOCK_DEAD in unix_stream_read_skb().Kuniyuki Iwashima
unix_stream_read_skb() checks SOCK_DEAD only when the dequeued skb is OOB skb. unix_stream_read_skb() is called for a SOCK_STREAM socket in SOCKMAP when data is sent to it. The function is invoked via sk_psock_verdict_data_ready(), which is set to sk->sk_data_ready(). During sendmsg(), we check if the receiver has SOCK_DEAD, so there is no point in checking it again later in ->read_skb(). Also, unix_read_skb() for SOCK_DGRAM does not have the test either. Let's remove the SOCK_DEAD test in unix_stream_read_skb(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250702223606.1054680-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08af_unix: Don't hold unix_state_lock() in __unix_dgram_recvmsg().Kuniyuki Iwashima
When __skb_try_recv_datagram() returns NULL in __unix_dgram_recvmsg(), we hold unix_state_lock() unconditionally. This is because SOCK_SEQPACKET sk needs to return EOF in case its peer has been close()d concurrently. This behaviour totally depends on the timing of the peer's close() and reading sk->sk_shutdown, and taking the lock does not play a role. Let's drop the lock from __unix_dgram_recvmsg() and use READ_ONCE(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250702223606.1054680-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'eth-fbnic-add-firmware-logging-support'Jakub Kicinski
Lee Trager says: ==================== eth: fbnic: Add firmware logging support Firmware running on fbnic generates device logs. These logs contain useful information about the device which may or may not be related to the host. Logs are stored in a ring buffer and accessible through DebugFS. ==================== Link: https://patch.msgid.link/20250702192207.697368-1-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: fbnic: Create fw_log file in DebugFSLee Trager
Allow reading the firmware log in DebugFS by accessing the fw_log file. Buffer is read while a spinlock is acquired. Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250702192207.697368-7-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: fbnic: Enable firmware loggingLee Trager
The firmware log buffer is enabled during probe and freed during remove. Early versions of firmware do not support sending logs. Once the mailbox is up driver will enable logging when supported firmware versions are detected. Logging is disabled before the mailbox is freed. Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250702192207.697368-6-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: fbnic: Add mailbox support for firmware logsLee Trager
By default firmware will not send logs to the host. This must be explicitly enabled by the driver. The mailbox has the concept of a flag which is a u32 used as a boolean. Lack of flag defaults to a value of false. When enabling logging historical logs may be optionally requested. These are log messages generated by the NIC before the driver was loaded. The driver also sends a log version to support changing the logging format in the future. [SEND_LOGS_REQ] = { [SEND_LOGS] /* flag to request log reporting */ [SEND_LOGS_HISTORY] /* flag to request historical logs */ [SEND_LOGS_VERSION] /* u32 indicating the log format version */ } Logs may be sent to the user either one at a time, or when historical logs are requested in bulk. Firmware may not send more than 14 messages in bulk to prevent flooding the mailbox. [LOG_MSG] = { [LOG_INDEX] /* entry 0 - u64 index of log */ [LOG_MSEC] /* entry 0 - u32 timestamp of log */ [LOG_MSG] /* entry 0 - char log message up to 256 */ [LOG_LENGTH] /* u32 of remaining log items in arrays */ [LOG_INDEX_ARRAY] = { [LOG_INDEX] /* entry 1 - u64 index of log */ [LOG_INDEX] /* entry 2 - u64 index of log */ ... } [LOG_MSEC_ARRAY] = { [LOG_MSEC] /* entry 1 - u32 timestamp of log */ [LOG_MSEC] /* entry 2 - u32 timestamp of log */ ... } [LOG_MSG_ARRAY] = { [LOG_MSG] /* entry 1 - char log message up to 256 */ [LOG_MSG] /* entry 2 - char log message up to 256 */ ... } } Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250702192207.697368-5-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: fbnic: Create ring buffer for firmware logsLee Trager
When enabled, firmware may send logs messages which are specific to the device and not the host. Create a ring buffer to store these messages which are read by a user through DebugFS. Buffer access is protected by a spinlock. Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250702192207.697368-4-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: fbnic: Use FIELD_PREP to generate minimum firmware versionLee Trager
Create a new macro based on FIELD_PREP to generate easily readable minimum firmware version ints. This macro will prevent the mistake from the previous patch from happening again. Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250702192207.697368-3-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: fbnic: Fix incorrect minimum firmware versionLee Trager
The full minimum version is 0.10.6-0. The six is now correctly defined as patch and shifted appropriately. 0.10.6-0 is a preproduction version of firmware which was released over a year and a half ago. All production devices meet this requirement. Signed-off-by: Lee Trager <lee@trager.us> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250702192207.697368-2-lee@trager.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-07-08 The following pull-request contains common mlx5 updates for your *net-next* tree. v2: https://lore.kernel.org/1751574385-24672-1-git-send-email-tariqt@nvidia.com * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Check device memory pointer before usage net/mlx5: fs, fix RDMA TRANSPORT init cleanup flow net/mlx5: Add IFC bits for PCIe Congestion Event object net/mlx5: Small refactor for general object capabilities net/mlx5: fs, add multiple prios to RDMA TRANSPORT steering domain ==================== Link: https://patch.msgid.link/1752002102-11316-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'net-migrate-remaining-drivers-to-dedicated-_rxfh_context-ops'Jakub Kicinski
Jakub Kicinski says: ==================== net: migrate remaining drivers to dedicated _rxfh_context ops Around a year ago Ed added dedicated ops for managing RSS contexts. This significantly improved the clarity of the driver facing API. Migrate the remaining 3 drivers and remove the old way of muxing the RSS context operations via .set_rxfh(). v2: https://lore.kernel.org/20250702030606.1776293-1-kuba@kernel.org v1: https://lore.kernel.org/20250630160953.1093267-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250707184115.2285277-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: ethtool: reduce indent for _rxfh_context opsJakub Kicinski
Now that we don't have the compat code we can reduce the indent a little. No functional changes. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250707184115.2285277-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: ethtool: remove the compat code for _rxfh_context opsJakub Kicinski
All drivers are now converted to dedicated _rxfh_context ops. Remove the use of >set_rxfh() to manage additional contexts. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250707184115.2285277-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: mlx5: migrate to the *_rxfh_context opsJakub Kicinski
Convert mlx5 to dedicated RXFH ops. This is a fairly shallow conversion, TBH, most of the driver code stays as is, but we let the core allocate the context ID for the driver. mlx5e_rx_res_rss_get_rxfh() and friends are made void, since core only calls the driver for context 0. The second call is right after context creation so it must exist (tm). Tested with drivers/net/hw/rss_ctx.py on MCX6. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250707184115.2285277-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: ice: drop the dead code related to rss_contextsJakub Kicinski
ICE appears to have some odd form of rss_context use plumbed in for .get_rxfh. The .set_rxfh side does not support creating contexts, however, so this must be dead code. For at least a year now (since commit 7964e7884643 ("net: ethtool: use the tracking array for get_rxfh on custom RSS contexts")) we have not been calling .get_rxfh with a non-zero rss_context. We just get the info from the RSS XArray under dev->ethtool. Remove what must be dead code in the driver, clear the support flags. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707184115.2285277-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08eth: otx2: migrate to the *_rxfh_context opsJakub Kicinski
otx2 only supports additional indirection tables (no separate keys etc.) so the conversion to dedicated callbacks and core-allocated context is mostly removing the code which stores the extra tables in the driver. Core already stores the indirection tables for additional contexts, and doesn't call .get for them. One subtle change here is that we'll now start with the table covering all queues, not directing all traffic to queue 0. This is what core expects if the user doesn't pass the initial indir table explicitly (there's a WARN_ON() in the core trying to make sure driver authors don't forget to populate ctx to defaults). Drivers implementing .create_rxfh_context don't have to set cap_rss_ctx_supported, so remove it. Tested-by: Geetha Sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707184115.2285277-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08tcp: update the outdated ref draft-ietf-tcpm-rackXin Guo
As RACK-TLP was published as a standards-track RFC8985, so the outdated ref draft-ietf-tcpm-rack need to be updated. Signed-off-by: Xin Guo <guoxin0309@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250705163647.301231-1-guoxin0309@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: phy: declare package-related struct members only if CONFIG_PHY_PACKAGE ↵Heiner Kallweit
is enabled Now that we have an own config symbol for the PHY package module, we can use it to reduce size of these structs if it isn't enabled. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/f0daefa4-406a-4a06-a4f0-7e31309f82bc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: account for encap headers in qdisc pkt lenFengyuan Gong
Refine qdisc_pkt_len_init to include headers up through the inner transport header when computing header size for encapsulations. Also refine net/sched/sch_cake.c borrowed from qdisc_pkt_len_init(). Signed-off-by: Fengyuan Gong <gfengyuan@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250702160741.1204919-1-gfengyuan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'support-some-features-for-the-hibmcge-driver'Jakub Kicinski
Jijie Shao says: ==================== Support some features for the HIBMCGE driver v4: https://lore.kernel.org/20250701125446.720176-1-shaojijie@huawei.com v3: https://lore.kernel.org/20250626020613.637949-1-shaojijie@huawei.com v2: https://lore.kernel.org/20250623034129.838246-1-shaojijie@huawei.com v1: https://lore.kernel.org/20250619144423.2661528-1-shaojijie@huawei.com ==================== Link: https://patch.msgid.link/20250702125716.2875169-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: hibmcge: configure FIFO thresholds according to the MAC controller ↵Jijie Shao
documentation Configure FIFO thresholds according to the MAC controller documentation Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702125716.2875169-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: hibmcge: adjust the burst len configuration of the MAC controller to ↵Jijie Shao
improve TX performance. Adjust the burst len configuration of the MAC controller to improve TX performance. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702125716.2875169-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: hibmcge: support scenario without PHYJijie Shao
Currently, the driver uses phylib to operate PHY by default. On some boards, the PHY device is separated from the MAC device. As a result, the hibmcge driver cannot operate the PHY device. In this patch, the driver determines whether a PHY is available based on register configuration. If no PHY is available, the driver will use fixed_phy to register fake phydev. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20250702125716.2875169-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08netlink: spelling: fix appened -> appended in a commentFaisal Bukhari
Fix spelling mistake in net/netlink/af_netlink.c appened -> appended Signed-off-by: Faisal Bukhari <faisalbukhari523@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250705030841.353424-1-faisalbukhari523@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08Merge branch 'net-remove-unused-function-parameters-in-skbuff-c'Jakub Kicinski
Michal Luczaj says: ==================== net: Remove unused function parameters in skbuff.c Couple of cleanup patches to get rid of unused function parameters around skbuff.c, plus little things spotted along the way. Offshoot of my question in [1], but way more contained. Found by adding "-Wunused-parameter -Wno-error" to KBUILD_CFLAGS and grepping for specific skbuff.c warnings. [1]: https://lore.kernel.org/netdev/972af569-0c90-4585-9e1f-f2266dab6ec6@rbox.co/ v2: https://lore.kernel.org/20250626-splice-drop-unused-v2-0-3268fac1af89@rbox.co v1: https://lore.kernel.org/20250624-splice-drop-unused-v1-0-cf641a676d04@rbox.co ==================== Link: https://patch.msgid.link/20250702-splice-drop-unused-v3-0-55f68b60d2b7@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: skbuff: Drop unused @skbMichal Luczaj
Since its introduction in commit 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function"), pskb_carve_frag_list() never used the argument @skb. Drop it and adapt the only caller. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: skbuff: Drop unused @skbMichal Luczaj
Since its introduction in commit ce098da1497c ("skbuff: Introduce slab_build_skb()"), __slab_build_skb() never used the @skb argument. Remove it and adapt both callers. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: splice: Drop unused @gfpMichal Luczaj
Since its introduction in commit 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES"), skb_splice_from_iter() never used the @gfp argument. Remove it and adapt callers. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250702-splice-drop-unused-v3-2-55f68b60d2b7@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: splice: Drop unused @pipeMichal Luczaj
Since commit 41c73a0d44c9 ("net: speedup skb_splice_bits()"), __splice_segment() and spd_fill_page() do not use the @pipe argument. Drop it. While adapting the callers, move one line to enforce reverse xmas tree order. No functional change intended. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250702-splice-drop-unused-v3-1-55f68b60d2b7@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net: Use of_reserved_mem_region_to_resource{_byname}() for "memory-region"Rob Herring (Arm)
Use the newly added of_reserved_mem_region_to_resource{_byname}() functions to handle "memory-region" properties. The error handling is a bit different for mtk_wed_mcu_load_firmware(). A failed match of the "memory-region-names" would skip the entry, but then other errors in the lookup and retrieval of the address would not skip the entry. However, that distinction is not really important. Either the region is available and usable or it is not. So now, errors from of_reserved_mem_region_to_resource() are ignored so the region is simply skipped. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250703183459.2074381-1-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08gve: global: fix "for a while" typoAhelenia Ziemiańska
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5zsbhtyox3cvbntuvhigsn42uooescbvdhrat6s3d6rczznzg5@tarta.nabijaczleweli.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08atm: lanai: fix "take a while" typoAhelenia Ziemiańska
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/mn5rh6i773csmcrpfcr6bogvv2auypz2jwjn6dap2rxousxnw5@tarta.nabijaczleweli.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net/mlx5: Fix spelling mistake "disabliing" -> "disabling"Colin Ian King
There is a spelling mistake in a NL_SET_ERR_MSG_MOD message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250703102219.1248399-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING'Hannes Reinecke
Add a new netlink parameter 'HANDSHAKE_A_ACCEPT_KEYRING' to provide the serial number of the keyring to use. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250701144657.104401-1-hare@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: replace ADDRLABEL with dynamic debugWang Liang
ADDRLABEL only works when it was set in compilation phase. Replace it with net_dbg_ratelimited(). Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250702104417.1526138-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net/sched: acp_api: no longer acquire RTNL in tc_action_net_exit()Eric Dumazet
tc_action_net_exit() got an rtnl exclusion in commit a159d3c4b829 ("net_sched: acquire RTNL in tc_action_net_exit()") Since then, commit 16af6067392c ("net: sched: implement reference counted action release") made this RTNL exclusion obsolete for most cases. Only tcf_action_offload_del() might still require it. Move the rtnl locking into tcf_idrinfo_destroy() when an offload action is found. Most netns do not have actions, yet deleting them is adding a lot of pressure on RTNL, which is for many the most contended mutex in the kernel. We are moving to a per-netns 'rtnl', so tc_action_net_exit() will not be able to grab 'rtnl' a single time for a batch of netns. Before the patch: perf probe -a rtnl_lock perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.305 MB perf.data (25 samples) ] After the patch: perf record -e probe:rtnl_lock -a /bin/bash -c 'unshare -n "/bin/true"; sleep 1' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.304 MB perf.data (9 samples) ] Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@nvidia.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/20250702071230.1892674-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08Merge branch 'net-mctp-add-support-for-gateway-routing'Paolo Abeni
Jeremy Kerr says: ==================== net: mctp: Add support for gateway routing This series adds a gateway route type for the MCTP core, allowing non-local EIDs as the match for a route. Example setup using the mctp tools: mctp route add 9 via mctpi2c0 mctp neigh add 9 dev mctpi2c0 lladdr 0x1d mctp route add 10 gw 9 - will route packets to eid 10 through mctpi2c0, using a dest lladdr of 0x1d (ie, that of the directly-attached eid 9). The core change to support this is the introduction of a struct mctp_dst, which represents the result of a route lookup. Since this involves a bit of surgery through the routing code, we add a few tests along the way. We're introducing an ABI change in the new RTM_{NEW,GET,DEL}ROUTE netlink formats, with the support for a RTA_GATEWAY attribute. Because we need a network ID specified to fully-qualify a gateway EID, the RTA_GATEWAY attribute carries the (net, eid) tuple in full: struct mctp_fq_addr { unsigned int net; mctp_eid_t eid; } Of course, any questions, comments etc are most welcome. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> ==================== Link: https://patch.msgid.link/20250702-dev-forwarding-v5-0-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: test: Add tests for gateway routesJeremy Kerr
Add a few kunit tests for the gateway routing. Because we have multiple route types now (direct and gateway), rename mctp_test_create_route to mctp_test_create_route_direct, and add a _gateway variant too. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-14-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: add gateway routing supportJeremy Kerr
This change allows for gateway routing, where a route table entry may reference a routable endpoint (by network and EID), instead of routing directly to a netdevice. We add support for a RTM_GATEWAY attribute for netlink route updates, with an attribute format of: struct mctp_fq_addr { unsigned int net; mctp_eid_t eid; } - we need the net here to uniquely identify the target EID, as we no longer have the device reference directly (which would provide the net id in the case of direct routes). This makes route lookups recursive, as a route lookup that returns a gateway route must be resolved into a direct route (ie, to a device) eventually. We provide a limit to the route lookups, to prevent infinite loop routing. The route lookup populates a new 'nexthop' field in the dst structure, which now specifies the key for the neighbour table lookup on device output, rather than using the packet destination address directly. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-13-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-08net: mctp: allow NL parsing directly into a struct mctp_routeJeremy Kerr
The netlink route parsing functions end up setting a bunch of output variables from the rt attributes. This will get messy when the routes become more complex. So, split the rt parsing into two types: a lookup (returning route target data suitable for a route lookup, like when deleting a route) and a populate (setting fields of a struct mctp_route). In doing this, we need to separate the route allocation from mctp_route_add, so add some comments on the lifetime semantics for the latter. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250702-dev-forwarding-v5-12-1468191da8a4@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>