summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-17af_unix: Remove unix_our_peer().Kuniyuki Iwashima
unix_our_peer() is used only in unix_may_send(). Let's inline it in unix_may_send(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Clean up error paths in unix_dgram_sendmsg().Kuniyuki Iwashima
The error path is complicated in unix_dgram_sendmsg() because there are two timings when other could be non-NULL: when it's fetched from unix_peer_get() and when it's looked up by unix_find_other(). Let's move unix_peer_get() to the else branch for unix_find_other() and clean up the error paths. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Clean up SOCK_DEAD error paths in unix_dgram_sendmsg().Kuniyuki Iwashima
When other has SOCK_DEAD in unix_dgram_sendmsg(), we hold unix_state_lock() for the sender socket first. However, we do not need it for sk->sk_type. Let's move the lock down a bit. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Defer sock_put() to clean up path in unix_dgram_sendmsg().Kuniyuki Iwashima
When other has SOCK_DEAD in unix_dgram_sendmsg(), we call sock_put() for it first and then set NULL to other before jumping to the error path. This is to skip sock_put() in the error path. Let's not set NULL to other and defer the sock_put() to the error path to clean up the labels later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Split restart label in unix_dgram_sendmsg().Kuniyuki Iwashima
There are two paths jumping to the restart label in unix_dgram_sendmsg(). One requires another lookup and sk_filter(), but the other doesn't. Let's split the label to make each flow more straightforward. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Use msg->{msg_name,msg_namelen} in unix_dgram_sendmsg().Kuniyuki Iwashima
In unix_dgram_sendmsg(), we use a local variable sunaddr pointing NULL or msg->msg_name based on msg->msg_namelen. Let's remove sunaddr and simplify the usage. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Move !sunaddr case in unix_dgram_sendmsg().Kuniyuki Iwashima
When other is NULL in unix_dgram_sendmsg(), we check if sunaddr is NULL before looking up a receiver socket. There are three paths going through the check, but it's always false for 2 out of the 3 paths: the first socket lookup and the second 'goto restart'. The condition can be true for the first 'goto restart' only when SOCK_DEAD is flagged for the socket found with msg->msg_name. Let's move the check to the single appropriate path. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Set error only when needed in unix_dgram_sendmsg().Kuniyuki Iwashima
We will introduce skb drop reason for AF_UNIX, then we need to set an errno and a drop reason for each path. Let's set an error only when it's needed in unix_dgram_sendmsg(). Then, we need not (re)set 0 to err. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Clean up error paths in unix_stream_sendmsg().Kuniyuki Iwashima
If we move send_sig() to the SEND_SHUTDOWN check before the while loop, then we can reuse the same kfree_skb() after the pipe_err_free label. Let's gather the scattered kfree_skb()s in error paths. While at it, some style issues are fixed, and the pipe_err_free label is renamed to out_pipe to match other label names. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Set error only when needed in unix_stream_sendmsg().Kuniyuki Iwashima
We will introduce skb drop reason for AF_UNIX, then we need to set an errno and a drop reason for each path. Let's set an error only when it's needed in unix_stream_sendmsg(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Clean up error paths in unix_stream_connect().Kuniyuki Iwashima
The label order is weird in unix_stream_connect(), and all NULL checks are unnecessary if reordered. Let's clean up the error paths to make it easy to set a drop reason for each path. While at it, a comment with the old style is updated. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17af_unix: Set error only when needed in unix_stream_connect().Kuniyuki Iwashima
We will introduce skb drop reason for AF_UNIX, then we need to set an errno and a drop reason for each path. Let's set an error only when it's needed in unix_stream_connect(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17usb: xhci: fix ring expansion regression in 6.13-rc1Niklas Neronin
The source and destination rings were incorrectly assigned during the ring linking process. The "source" ring, which contains the new segments, was not spliced into the "destination" ring, leading to incorrect ring expansion. Fixes: fe688e500613 ("usb: xhci: refactor xhci_link_rings() to use source and destination rings") Reported-by: Jeff Chua <jeff.chua.linux@gmail.com> Closes: https://lore.kernel.org/lkml/CAAJw_ZtppNqC9XA=-WVQDr+vaAS=di7jo15CzSqONeX48H75MA@mail.gmail.com/ Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241217102122.2316814-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-17xhci: Turn NEC specific quirk for handling Stop Endpoint errors genericMathias Nyman
xHC hosts from several vendors have the same issue where endpoints start so slowly that a later queued 'Stop Endpoint' command may complete before endpoint is up and running. The 'Stop Endpoint' command fails with context state error as the endpoint still appears as stopped. See commit 42b758137601 ("usb: xhci: Limit Stop Endpoint retries") for details CC: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241217102122.2316814-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-17i915/guc: Accumulate active runtime on gt resetUmesh Nerlige Ramappa
On gt reset, if a context is running, then accumulate it's active time into the busyness counter since there will be no chance for the context to switch out and update it's run time. v2: Move comment right above the if (John) Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-4-umesh.nerlige.ramappa@intel.com (cherry picked from commit 7ed047da59cfa1acb558b95169d347acc8d85da1) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2024-12-17i915/guc: Ensure busyness counter increases motonicallyUmesh Nerlige Ramappa
Active busyness of an engine is calculated using gt timestamp and the context switch in time. While capturing the gt timestamp, it's possible that the context switches out. This race could result in an active busyness value that is greater than the actual context runtime value by a small amount. This leads to a negative delta and throws off busyness calculations for the user. If a subsequent count is smaller than the previous one, just return the previous one, since we expect the busyness to catch up. Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-3-umesh.nerlige.ramappa@intel.com (cherry picked from commit cf907f6d294217985e9dafd9985dce874e04ca37) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2024-12-17i915/guc: Reset engine utilization buffer before registrationUmesh Nerlige Ramappa
On GT reset, we store total busyness counts for all engines and re-register the utilization buffer with GuC. At that time we should reset the buffer, so that we don't get spurious busyness counts on subsequent queries. To repro this issue, run igt@perf_pmu@busy-hang followed by igt@perf_pmu@most-busy-idle-check-all for a couple iterations. Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-2-umesh.nerlige.ramappa@intel.com (cherry picked from commit abd318237fa6556c1e5225529af145ef15d5ff0d) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2024-12-17rust: net::phy fix module autoloadingFUJITA Tomonori
The alias symbol name was renamed. Adjust module_phy_driver macro to create the proper symbol name to fix module autoloading. Fixes: 054a9cd395a7 ("modpost: rename alias symbol for MODULE_DEVICE_TABLE()") Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20241212130015.238863-1-fujita.tomonori@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17x86/xen: remove hypercall pageJuergen Gross
The hypercall page is no longer needed. It can be removed, as from the Xen perspective it is optional. But, from Linux's perspective, it removes naked RET instructions that escape the speculative protections that Call Depth Tracking and/or Untrain Ret are trying to achieve. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
2024-12-17x86/xen: use new hypercall functions instead of hypercall pageJuergen Gross
Call the Xen hypervisor via the new xen_hypercall_func static-call instead of the hypercall page. This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2024-12-17x86/xen: add central hypercall functionsJuergen Gross
Add generic hypercall functions usable for all normal (i.e. not iret) hypercalls. Depending on the guest type and the processor vendor different functions need to be used due to the to be used instruction for entering the hypervisor: - PV guests need to use syscall - HVM/PVH guests on Intel need to use vmcall - HVM/PVH guests on AMD and Hygon need to use vmmcall As PVH guests need to issue hypercalls very early during boot, there is a 4th hypercall function needed for HVM/PVH which can be used on Intel and AMD processors. It will check the vendor type and then set the Intel or AMD specific function to use via static_call(). This is part of XSA-466 / CVE-2024-53241. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Co-developed-by: Peter Zijlstra <peterz@infradead.org>
2024-12-16Merge branch 'r8169-add-support-for-rtl8125d-rev-b'Jakub Kicinski
Heiner Kallweit says: ==================== r8169: add support for RTL8125D rev.b Add support for RTL8125D rev.b. Its XID is 0x689. It is basically based on the one with XID 0x688, but with different firmware file. To avoid a mess with the version numbering, adjust it first. ==================== Link: https://patch.msgid.link/15c4a9fd-a653-4b09-825d-751964832a7a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16r8169: add support for RTL8125D rev.bChunHao Lin
Add support for RTL8125D rev.b. Its XID is 0x689. It is basically based on the one with XID 0x688, but with different firmware file. Signed-off-by: ChunHao Lin <hau@realtek.com> [hkallweit1@gmail.com: rebased after adjusted version numbering] Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/75e5e9ec-d01f-43ac-b0f4-e7456baf18d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16r8169: adjust version numbering for RTL8126Heiner Kallweit
Adjust version numbering for RTL8126, so that it doesn't overlap with new RTL8125 versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/6a354364-20e9-48ad-a198-468264288757@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16net: hinic: Fix cleanup in create_rxqs/txqs()Dan Carpenter
There is a check for NULL at the start of create_txqs() and create_rxqs() which tess if "nic_dev->txqs" is non-NULL. The intention is that if the device is already open and the queues are already created then we don't create them a second time. However, the bug is that if we have an error in the create_txqs() then the pointer doesn't get set back to NULL. The NULL check at the start of the function will say that it's already open when it's not and the device can't be used. Set ->txqs back to NULL on cleanup on error. Fixes: c3e79baf1b03 ("net-next/hinic: Add logical Txq and Rxq") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0cc98faf-a0ed-4565-a55b-0fa2734bc205@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16team: Fix feature exposure when no ports are presentDaniel Borkmann
Small follow-up to align this to an equivalent behavior as the bond driver. The change in 3625920b62c3 ("teaming: fix vlan_features computing") removed the netdevice vlan_features when there is no team port attached, yet it leaves the full set of enc_features intact. Instead, leave the default features as pre 3625920b62c3, and recompute once we do have ports attached. Also, similarly as in bonding case, call the netdev_base_features() helper on the enc_features. Fixes: 3625920b62c3 ("teaming: fix vlan_features computing") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241213123657.401868-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16Merge branch 'add-support-for-so_priority-cmsg'Jakub Kicinski
Anna Emese Nyiri says: ==================== Add support for SO_PRIORITY cmsg Introduce a new helper function, `sk_set_prio_allowed`, to centralize the logic for validating priority settings. Add support for the `SO_PRIORITY` control message, enabling user-space applications to set socket priority via control messages (cmsg). ==================== Link: https://patch.msgid.link/20241213084457.45120-1-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16sock: Introduce SO_RCVPRIORITY socket optionAnna Emese Nyiri
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for SO_RCVMARK, as implemented in commit 6fd1d51cfa253 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()"). Reviewed-by: Willem de Bruijn <willemb@google.com> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-5-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net: test SO_PRIORITY ancillary data with cmsg_senderAnna Emese Nyiri
Extend cmsg_sender.c with a new option '-Q' to send SO_PRIORITY ancillary data. cmsg_so_priority.sh script added to validate SO_PRIORITY behavior by creating VLAN device with egress QoS mapping and testing packet priorities using flower filters. Verify that packets with different priorities are correctly matched and counted by filters for multiple protocols and IP versions. Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-4-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16sock: support SO_PRIORITY cmsgAnna Emese Nyiri
The Linux socket API currently allows setting SO_PRIORITY at the socket level, applying a uniform priority to all packets sent through that socket. The exception to this is IP_TOS, when the priority value is calculated during the handling of ancillary data, as implemented in commit f02db315b8d8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data"). However, this is a computed value, and there is currently no mechanism to set a custom priority via control messages prior to this patch. According to this patch, if SO_PRIORITY is specified as ancillary data, the packet is sent with the priority value set through sockc->priority, overriding the socket-level values set via the traditional setsockopt() method. This is analogous to the existing support for SO_MARK, as implemented in commit c6af0c227a22 ("ip: support SO_MARK cmsg"). If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that takes precedence is the last one in the cmsg list. This patch has the side effect that raw_send_hdrinc now interprets cmsg IP_TOS. Reviewed-by: Willem de Bruijn <willemb@google.com> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-3-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16sock: Introduce sk_set_prio_allowed helper functionAnna Emese Nyiri
Simplify priority setting permissions with the 'sk_set_prio_allowed' function, centralizing the validation logic. This change is made in anticipation of a second caller in a following patch. No functional changes. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-2-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netlink: specs: add phys-binding attr to rt_link specDonald Hunter
Add the missing phys-binding attr to the mctp-attrs in the rt_link spec. This fixes commit 580db513b4a9 ("net: mctp: Expose transport binding identifier via IFLA attribute"). Note that enum mctp_phys_binding is not currently uapi, but perhaps it should be? Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20241213112551.33557-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16chelsio/chtls: prevent potential integer overflow on 32bitDan Carpenter
The "gl->tot_len" variable is controlled by the user. It comes from process_responses(). On 32bit systems, the "gl->tot_len + sizeof(struct cpl_pass_accept_req) + sizeof(struct rss_header)" addition could have an integer wrapping bug. Use size_add() to prevent this. Fixes: a08943947873 ("crypto: chtls - Register chtls with net tls") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/c6bfb23c-2db2-4e1b-b8ab-ba3925c82ef5@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16rxrpc: Fix ability to add more data to a call once MSG_MORE deassertedDavid Howells
When userspace is adding data to an RPC call for transmission, it must pass MSG_MORE to sendmsg() if it intends to add more data in future calls to sendmsg(). Calling sendmsg() without MSG_MORE being asserted closes the transmission phase of the call (assuming sendmsg() adds all the data presented) and further attempts to add more data should be rejected. However, this is no longer the case. The change of call state that was previously the guard got bumped over to the I/O thread, which leaves a window for a repeat sendmsg() to insert more data. This previously went unnoticed, but the more recent patch that changed the structures behind the Tx queue added a warning: WARNING: CPU: 3 PID: 6639 at net/rxrpc/sendmsg.c:296 rxrpc_send_data+0x3f2/0x860 and rejected the additional data, returning error EPROTO. Fix this by adding a guard flag to the call, setting the flag when we queue the final packet and then rejecting further attempts to add data with EPROTO. Fixes: 2d689424b618 ("rxrpc: Move call state changes from sendmsg to I/O thread") Reported-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/6757fb68.050a0220.2477f.005f.GAE@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2870480.1734037462@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16rxrpc: Disable IRQ, not BH, to take the lock for ->attend_linkDavid Howells
Use spin_lock_irq(), not spin_lock_bh() to take the lock when accessing the ->attend_link() to stop a delay in the I/O thread due to an interrupt being taken in the app thread whilst that holds the lock and vice versa. Fixes: a2ea9a907260 ("rxrpc: Use irq-disabling spinlocks between app and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2870146.1734037095@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16Merge branch 'netdev-fix-repeated-netlink-messages-in-queue-dumps'Jakub Kicinski
Jakub Kicinski says: ==================== netdev: fix repeated netlink messages in queue dumps Fix dump continuation for queues and queue stats in the netdev family. Because we used post-increment when saving id of dumped queue next skb would re-dump the already dumped queue. ==================== Link: https://patch.msgid.link/20241213152244.3080955-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net-drv: stats: sanity check netlink dumpsJakub Kicinski
Sanity check netlink dumps, to make sure dumps don't have repeated entries or gaps in IDs. Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241213152244.3080955-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net-drv: queues: sanity check netlink dumpsJakub Kicinski
This test already catches a netlink bug fixed by this series, but only when running on HW with many queues. Make sure the netdevsim instance created has a lot of queues, and constrain the size of the recv_buffer used by netlink. While at it test both rx and tx queues. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241213152244.3080955-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16selftests: net: support setting recv_size in YNLJakub Kicinski
recv_size parameter allows constraining the buffer size for dumps. It's useful in testing kernel handling of dump continuation, IOW testing dumps which span multiple skbs. Let the tests set this parameter when initializing the YNL family. Keep the normal default, we don't want tests to unintentionally behave very differently than normal code. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241213152244.3080955-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netdev: fix repeated netlink messages in queue statsJakub Kicinski
The context is supposed to record the next queue to dump, not last dumped. If the dump doesn't fit we will restart from the already-dumped queue, duplicating the message. Before this fix and with the selftest improvements later in this series we see: # ./run_kselftest.sh -t drivers/net:stats.py timeout set to 45 selftests: drivers/net: stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 45 != 44 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 45 != 44 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 45 != 44 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 45 != 44 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 103 != 100 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 103 != 100 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 102 != 100 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 102 != 100 missing queue keys not ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:4 fail:1 xfail:0 xpass:0 skip:0 error:0 With the fix: # ./ksft-net-drv/run_kselftest.sh -t drivers/net:stats.py timeout set to 45 selftests: drivers/net: stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: ab63a2387cb9 ("netdev: add per-queue statistics") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241213152244.3080955-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netdev: fix repeated netlink messages in queue dumpJakub Kicinski
The context is supposed to record the next queue to dump, not last dumped. If the dump doesn't fit we will restart from the already-dumped queue, duplicating the message. Before this fix and with the selftest improvements later in this series we see: # ./run_kselftest.sh -t drivers/net:queues.py timeout set to 45 selftests: drivers/net: queues.py KTAP version 1 1..2 # Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues: # Check| ksft_eq(queues, expected) # Check failed 102 != 100 # Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues: # Check| ksft_eq(queues, expected) # Check failed 101 != 100 not ok 1 queues.get_queues ok 2 queues.addremove_queues # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0 not ok 1 selftests: drivers/net: queues.py # exit=1 With the fix: # ./ksft-net-drv/run_kselftest.sh -t drivers/net:queues.py timeout set to 45 selftests: drivers/net: queues.py KTAP version 1 1..2 ok 1 queues.get_queues ok 2 queues.addremove_queues # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: 6b6171db7fc8 ("netdev-genl: Add netlink framework functions for queue") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241213152244.3080955-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next 2024-12-16 The following pull-request contains mlx5 IFC updates. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add device cap abs_native_port_num net/mlx5: qos: Add ifc support for cross-esw scheduling net/mlx5: Add support for new scheduling elements net/mlx5: Add ConnectX-8 device to ifc net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits ==================== Link: https://patch.msgid.link/20241216124028.973763-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16nios2: Use str_yes_no() helper in show_cpuinfo()Thorsten Blum
Remove hard-coded strings by using the str_yes_no() helper function. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2024-12-16fortify: Hide run-time copy size from value range trackingKees Cook
GCC performs value range tracking for variables as a way to provide better diagnostics. One place this is regularly seen is with warnings associated with bounds-checking, e.g. -Wstringop-overflow, -Wstringop-overread, -Warray-bounds, etc. In order to keep the signal-to-noise ratio high, warnings aren't emitted when a value range spans the entire value range representable by a given variable. For example: unsigned int len; char dst[8]; ... memcpy(dst, src, len); If len's value is unknown, it has the full "unsigned int" range of [0, UINT_MAX], and GCC's compile-time bounds checks against memcpy() will be ignored. However, when a code path has been able to narrow the range: if (len > 16) return; memcpy(dst, src, len); Then the range will be updated for the execution path. Above, len is now [0, 16] when reading memcpy(), so depending on other optimizations, we might see a -Wstringop-overflow warning like: error: '__builtin_memcpy' writing between 9 and 16 bytes into region of size 8 [-Werror=stringop-overflow] When building with CONFIG_FORTIFY_SOURCE, the fortified run-time bounds checking can appear to narrow value ranges of lengths for memcpy(), depending on how the compiler constructs the execution paths during optimization passes, due to the checks against the field sizes. For example: if (p_size_field != SIZE_MAX && p_size != p_size_field && p_size_field < size) As intentionally designed, these checks only affect the kernel warnings emitted at run-time and do not block the potentially overflowing memcpy(), so GCC thinks it needs to produce a warning about the resulting value range that might be reaching the memcpy(). We have seen this manifest a few times now, with the most recent being with cpumasks: In function ‘bitmap_copy’, inlined from ‘cpumask_copy’ at ./include/linux/cpumask.h:839:2, inlined from ‘__padata_set_cpumasks’ at kernel/padata.c:730:2: ./include/linux/fortify-string.h:114:33: error: ‘__builtin_memcpy’ reading between 257 and 536870904 bytes from a region of size 256 [-Werror=stringop-overread] 114 | #define __underlying_memcpy __builtin_memcpy | ^ ./include/linux/fortify-string.h:633:9: note: in expansion of macro ‘__underlying_memcpy’ 633 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ ./include/linux/fortify-string.h:678:26: note: in expansion of macro ‘__fortify_memcpy_chk’ 678 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ ./include/linux/bitmap.h:259:17: note: in expansion of macro ‘memcpy’ 259 | memcpy(dst, src, len); | ^~~~~~ kernel/padata.c: In function ‘__padata_set_cpumasks’: kernel/padata.c:713:48: note: source object ‘pcpumask’ of size [0, 256] 713 | cpumask_var_t pcpumask, | ~~~~~~~~~~~~~~^~~~~~~~ This warning is _not_ emitted when CONFIG_FORTIFY_SOURCE is disabled, and with the recent -fdiagnostics-details we can confirm the origin of the warning is due to FORTIFY's bounds checking: ../include/linux/bitmap.h:259:17: note: in expansion of macro 'memcpy' 259 | memcpy(dst, src, len); | ^~~~~~ '__padata_set_cpumasks': events 1-2 ../include/linux/fortify-string.h:613:36: 612 | if (p_size_field != SIZE_MAX && | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 613 | p_size != p_size_field && p_size_field < size) | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ | | | (1) when the condition is evaluated to false | (2) when the condition is evaluated to true '__padata_set_cpumasks': event 3 114 | #define __underlying_memcpy __builtin_memcpy | ^ | | | (3) out of array bounds here Note that the cpumask warning started appearing since bitmap functions were recently marked __always_inline in commit ed8cd2b3bd9f ("bitmap: Switch from inline to __always_inline"), which allowed GCC to gain visibility into the variables as they passed through the FORTIFY implementation. In order to silence these false positives but keep otherwise deterministic compile-time warnings intact, hide the length variable from GCC with OPTIMIZE_HIDE_VAR() before calling the builtin memcpy. Additionally add a comment about why all the macro args have copies with const storage. Reported-by: "Thomas Weißschuh" <linux@weissschuh.net> Closes: https://lore.kernel.org/all/db7190c8-d17f-4a0d-bc2f-5903c79f36c2@t-8ch.de/ Reported-by: Nilay Shroff <nilay@linux.ibm.com> Closes: https://lore.kernel.org/all/20241112124127.1666300-1-nilay@linux.ibm.com/ Tested-by: Nilay Shroff <nilay@linux.ibm.com> Acked-by: Yury Norov <yury.norov@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-16hwmon: (tmp513) Fix interpretation of values of Temperature Result and Limit ↵Murad Masimov
Registers The values returned by the driver after processing the contents of the Temperature Result and the Temperature Limit Registers do not correspond to the TMP512/TMP513 specifications. A raw register value is converted to a signed integer value by a sign extension in accordance with the algorithm provided in the specification, but due to the off-by-one error in the sign bit index, the result is incorrect. According to the TMP512 and TMP513 datasheets, the Temperature Result (08h to 0Bh) and Limit (11h to 14h) Registers are 13-bit two's complement integer values, shifted left by 3 bits. The value is scaled by 0.0625 degrees Celsius per bit. E.g., if regval = 1 1110 0111 0000 000, the output should be -25 degrees, but the driver will return +487 degrees. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.") Signed-off-by: Murad Masimov <m.masimov@maxima.ru> Link: https://lore.kernel.org/r/20241216173648.526-4-m.masimov@maxima.ru [groeck: fixed description line length] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-12-16hwmon: (tmp513) Fix Current Register value interpretationMurad Masimov
The value returned by the driver after processing the contents of the Current Register does not correspond to the TMP512/TMP513 specifications. A raw register value is converted to a signed integer value by a sign extension in accordance with the algorithm provided in the specification, but due to the off-by-one error in the sign bit index, the result is incorrect. Moreover, negative values will be reported as large positive due to missing sign extension from u32 to long. According to the TMP512 and TMP513 datasheets, the Current Register (07h) is a 16-bit two's complement integer value. E.g., if regval = 1000 0011 0000 0000, then the value must be (-32000 * lsb), but the driver will return (33536 * lsb). Fix off-by-one bug, and also cast data->curr_lsb_ua (which is of type u32) to long to prevent incorrect cast for negative values. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.") Signed-off-by: Murad Masimov <m.masimov@maxima.ru> Link: https://lore.kernel.org/r/20241216173648.526-3-m.masimov@maxima.ru [groeck: Fixed description line length] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-12-16hwmon: (tmp513) Fix interpretation of values of Shunt Voltage and Limit ↵Murad Masimov
Registers The values returned by the driver after processing the contents of the Shunt Voltage Register and the Shunt Limit Registers do not correspond to the TMP512/TMP513 specifications. A raw register value is converted to a signed integer value by a sign extension in accordance with the algorithm provided in the specification, but due to the off-by-one error in the sign bit index, the result is incorrect. Moreover, the PGA shift calculated with the tmp51x_get_pga_shift function is relevant only to the Shunt Voltage Register, but is also applied to the Shunt Limit Registers. According to the TMP512 and TMP513 datasheets, the Shunt Voltage Register (04h) is 13 to 16 bit two's complement integer value, depending on the PGA setting. The Shunt Positive (0Ch) and Negative (0Dh) Limit Registers are 16-bit two's complement integer values. Below are some examples: * Shunt Voltage Register If PGA = 8, and regval = 1000 0011 0000 0000, then the decimal value must be -32000, but the value calculated by the driver will be 33536. * Shunt Limit Register If regval = 1000 0011 0000 0000, then the decimal value must be -32000, but the value calculated by the driver will be 768, if PGA = 1. Fix sign bit index, and also correct misleading comment describing the tmp51x_get_pga_shift function. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.") Signed-off-by: Murad Masimov <m.masimov@maxima.ru> Link: https://lore.kernel.org/r/20241216173648.526-2-m.masimov@maxima.ru [groeck: Fixed description and multi-line alignments] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-12-16ceph: allocate sparse_ext map only for sparse readsIlya Dryomov
If mounted with sparseread option, ceph_direct_read_write() ends up making an unnecessarily allocation for O_DIRECT writes. Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com>
2024-12-16ceph: fix memory leak in ceph_direct_read_write()Ilya Dryomov
The bvecs array which is allocated in iter_get_bvecs_alloc() is leaked and pages remain pinned if ceph_alloc_sparse_ext_map() fails. There is no need to delay the allocation of sparse_ext map until after the bvecs array is set up, so fix this by moving sparse_ext allocation a bit earlier. Also, make a similar adjustment in __ceph_sync_read() for consistency (a leak of the same kind in __ceph_sync_read() has been addressed differently). Cc: stable@vger.kernel.org Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com>
2024-12-16ceph: improve error handling and short/overflow-read logic in __ceph_sync_read()Alex Markuze
This patch refines the read logic in __ceph_sync_read() to ensure more predictable and efficient behavior in various edge cases. - Return early if the requested read length is zero or if the file size (`i_size`) is zero. - Initialize the index variable (`idx`) where needed and reorder some code to ensure it is always set before use. - Improve error handling by checking for negative return values earlier. - Remove redundant encrypted file checks after failures. Only attempt filesystem-level decryption if the read succeeded. - Simplify leftover calculations to correctly handle cases where the read extends beyond the end of the file or stops short. This can be hit by continuously reading a file while, on another client, we keep truncating and writing new data into it. - This resolves multiple issues caused by integer and consequent buffer overflow (`pages` array being accessed beyond `num_pages`): - https://tracker.ceph.com/issues/67524 - https://tracker.ceph.com/issues/68980 - https://tracker.ceph.com/issues/68981 Cc: stable@vger.kernel.org Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads") Reported-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>