summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-01-09mptcp: use msk_owned_by_me helperGeliang Tang
The helper msk_owned_by_me() is defined in protocol.h, so use it instead of sock_owned_by_me(). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09Fix XFRM-I support for nested ESP tunnelsBenedict Wong
This change adds support for nested IPsec tunnels by ensuring that XFRM-I verifies existing policies before decapsulating a subsequent policies. Addtionally, this clears the secpath entries after policies are verified, ensuring that previous tunnels with no-longer-valid do not pollute subsequent policy checks. This is necessary especially for nested tunnels, as the IP addresses, protocol and ports may all change, thus not matching the previous policies. In order to ensure that packets match the relevant inbound templates, the xfrm_policy_check should be done before handing off to the inner XFRM protocol to decrypt and decapsulate. Notably, raw ESP/AH packets did not perform policy checks inherently, whereas all other encapsulated packets (UDP, TCP encapsulated) do policy checks after calling xfrm_input handling in the respective encapsulation layer. Test: Verified with additional Android Kernel Unit tests Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-07Merge tag 'rxrpc-fixes-20230107' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix race between call connection, data transmit and call disconnect Here are patches to fix an oops[1] caused by a race between call connection, initial packet transmission and call disconnection which results in something like: kernel BUG at net/rxrpc/peer_object.c:413! when the syzbot test is run. The problem is that the connection procedure is effectively split across two threads and can get expanded by taking an interrupt, thereby adding the call to the peer error distribution list *after* it has been disconnected (say by the rxrpc socket shutting down). The easiest solution is to look at the fourth set of I/O thread conversion/SACK table expansion patches that didn't get applied[2] and take from it those patches that move call connection and disconnection into the I/O thread. Moving these things into the I/O thread means that the sequencing is managed by all being done in the same thread - and the race can no longer happen. This is preferable to introducing an extra lock as adding an extra lock would make the I/O thread have to wait for the app thread in yet another place. The changes can be considered as a number of logical parts: (1) Move all of the call state changes into the I/O thread. (2) Make client connection ID space per-local endpoint so that the I/O thread doesn't need locks to access it. (3) Move actual abort generation into the I/O thread and clean it up. If sendmsg or recvmsg want to cause an abort, they have to delegate it. (4) Offload the setting up of the security context on a connection to the thread of one of the apps that's starting a call. We don't want to be doing any sort of crypto in the I/O thread. (5) Connect calls (ie. assign them to channel slots on connections) in the I/O thread. Calls are set up by sendmsg/kafs and passed to the I/O thread to connect. Connections are allocated in the I/O thread after this. (6) Disconnect calls in the I/O thread. I've also added a patch for an unrelated bug that cropped up during testing, whereby a race can occur between an incoming call and socket shutdown. Note that whilst this fixes the original syzbot bug, another bug may get triggered if this one is fixed: INFO: rcu detected stall in corrupted rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T rcu: blocking rcu_node structures (internal RCU debug): It doesn't look this should be anything to do with rxrpc, though, as I've tested an additional patch[3] that removes practically all the RCU usage from rxrpc and it still occurs. It seems likely that it is being caused by something in the tunnelling setup that the syzbot test does, but there's not enough info to go on. It also seems unlikely to be anything to do with the afs driver as the test doesn't use that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-07Merge tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: - Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls - Fix a broken coalescing test in the pNFS file layout driver - Ensure that the access cache rcu path also applies the login test - Fix up for a sparse warning * tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix up a sparse warning NFS: Judge the file access cache's timestamp in rcu path pNFS/filelayout: Fix coalescing test for single DS SUNRPC: ensure the matching upcall is in-flight upon downcall
2023-01-07rxrpc: Fix incoming call setup raceDavid Howells
An incoming call can race with rxrpc socket destruction, leading to a leaked call. This may result in an oops when the call timer eventually expires: BUG: kernel NULL pointer dereference, address: 0000000000000874 RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50 Call Trace: <IRQ> try_to_wake_up+0x59/0x550 ? __local_bh_enable_ip+0x37/0x80 ? rxrpc_poke_call+0x52/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] call_timer_fn+0x24/0x120 with a warning in the kernel log looking something like: rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)! incurred during rmmod of rxrpc. The 1061d is the call flags: RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED, IS_SERVICE, RELEASED but no DISCONNECTED flag (0x800), so it's an incoming (service) call and it's still connected. The race appears to be that: (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state and allocates a call - then pauses, possibly for an interrupt. (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer, discards the prealloc and releases all calls attached to the socket. (3) rxrpc_new_incoming_call() resumes, launching the new call, including its timer and attaching it to the socket. Fix this by read-locking local->services_lock to access the AF_RXRPC socket providing the service rather than RCU in rxrpc_new_incoming_call(). There's no real need to use RCU here as local->services_lock is only write-locked by the socket side in two places: when binding and when shutting down. Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
2023-01-06net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arraysKees Cook
Zero-length arrays are deprecated[1]. Replace struct ipv6_rpl_sr_hdr's "segments" union of 0-length arrays with flexible arrays. Detected with GCC 13, using -fstrict-flex-arrays=3: In function 'rpl_validate_srh', inlined from 'rpl_build_state' at ../net/ipv6/rpl_iptunnel.c:96:7: ../net/ipv6/rpl_iptunnel.c:60:28: warning: array subscript <unknown> is outside array bounds of 'struct in6_addr[0]' [-Warray-bounds=] 60 | if (ipv6_addr_type(&srh->rpl_segaddr[srh->segments_left - 1]) & | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../include/net/rpl.h:12, from ../net/ipv6/rpl_iptunnel.c:13: ../include/uapi/linux/rpl.h: In function 'rpl_build_state': ../include/uapi/linux/rpl.h:40:33: note: while referencing 'addr' 40 | struct in6_addr addr[0]; | ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230105221533.never.711-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-06Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"Chuck Lever
The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Found via KCSAN. Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-06devlink: allow registering parameters after the instanceJakub Kicinski
It's most natural to register the instance first and then its subobjects. Now that we can use the instance lock to protect the atomicity of all init - it should also be safe. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: don't require setting features before registrationJakub Kicinski
Requiring devlink_set_features() to be run before devlink is registered is overzealous. devlink_set_features() itself is a leftover from old workarounds which were trying to prevent initiating reload before probe was complete. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: remove the registration guarantee of referencesJakub Kicinski
The objective of exposing the devlink instance locks to drivers was to let them use these locks to prevent user space from accessing the device before it's fully initialized. This is difficult because devlink_unregister() waits for all references to be released, meaning that devlink_unregister() can't itself be called under the instance lock. To avoid this issue devlink_register() was moved after subobject registration a while ago. Unfortunately the netdev paths get a hold of the devlink instances _before_ they are registered. Ideally netdev should wait for devlink init to finish (synchronizing on the instance lock). This can't work because we don't know if the instance will _ever_ be registered (in case of failures it may not). The other option of returning an error until devlink_register() is called is unappealing (user space would get a notification netdev exist but would have to wait arbitrary amount of time before accessing some of its attributes). Weaken the guarantees of the devlink references. Holding a reference will now only guarantee that the memory of the object is around. Another way of looking at it is that the reference now protects the object not its "registered" status. Use devlink instance lock to synchronize unregistration. This implies that releasing of the "main" reference of the devlink instance moves from devlink_unregister() to devlink_free(). Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: always check if the devlink instance is registeredJakub Kicinski
Always check under the instance lock whether the devlink instance is still / already registered. This is a no-op for the most part, as the unregistration path currently waits for all references. On the init path, however, we may temporarily open up a race with netdev code, if netdevs are registered before the devlink instance. This is temporary, the next change fixes it, and this commit has been split out for the ease of review. Note that in case of iterating over sub-objects which have their own lock (regions and line cards) we assume an implicit dependency between those objects existing and devlink unregistration. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: protect devlink->dev by the instance lockJakub Kicinski
devlink->dev is assumed to be always valid as long as any outstanding reference to the devlink instance exists. In prep for weakening of the references take the instance lock. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: update the code in netns move to latest helpersJakub Kicinski
devlink_pernet_pre_exit() is the only obvious place which takes the instance lock without using the devl_ helpers. Update the code and move the error print after releasing the reference (having unlock and put together feels slightly idiomatic). Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: bump the instance index directly when iteratingJakub Kicinski
xa_find_after() is designed to handle multi-index entries correctly. If a xarray has two entries one which spans indexes 0-3 and one at index 4 xa_find_after(0) will return the entry at index 4. Having to juggle the two callbacks, however, is unnecessary in case of the devlink xarray, as there is 1:1 relationship with indexes. Always use xa_find() and increment the index manually. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06tipc: fix unexpected link reset due to discovery messagesTung Nguyen
This unexpected behavior is observed: node 1 | node 2 ------ | ------ link is established | link is established reboot | link is reset up | send discovery message receive discovery message | link is established | link is established send discovery message | | receive discovery message | link is reset (unexpected) | send reset message link is reset | It is due to delayed re-discovery as described in function tipc_node_check_dest(): "this link endpoint has already reset and re-established contact with the peer, before receiving a discovery message from that node." However, commit 598411d70f85 has changed the condition for calling tipc_node_link_down() which was the acceptance of new media address. This commit fixes this by restoring the old and correct behavior. Fixes: 598411d70f85 ("tipc: make resetting of links non-atomic") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06sysctl: expose all net/core sysctls inside netnsMahesh Bandewar
All were not visible to the non-priv users inside netns. However, with 4ecb90090c84 ("sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN"), these vars are protected from getting modified. A proc with capable(CAP_NET_ADMIN) can change the values so not having them visible inside netns is just causing nuisance to process that check certain values (e.g. net.core.somaxconn) and see different behavior in root-netns vs. other-netns Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06rxrpc: Move client call connection to the I/O threadDavid Howells
Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move the client conn cache management to the I/O threadDavid Howells
Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Remove call->state_lockDavid Howells
All the setters of call->state are now in the I/O thread and thus the state lock is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move call state changes from recvmsg to I/O threadDavid Howells
Move the call state changes that are made in rxrpc_recvmsg() to the I/O thread. This means that, thenceforth, only the I/O thread does this and the call state lock can be removed. This requires the Rx phase to be ended when the last packet is received, not when it is processed. Since this now changes the rxrpc call state to SUCCEEDED before we've consumed all the data from it, rxrpc_kernel_check_life() mustn't say the call is dead until the recvmsg queue is empty (unless the call has failed). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move call state changes from sendmsg to I/O threadDavid Howells
Move all the call state changes that are made in rxrpc_sendmsg() to the I/O thread. This is a step towards removing the call state lock. This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is added to ->tx_sendmsg by sendmsg(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Wrap accesses to get call state to put the barrier in one placeDavid Howells
Wrap accesses to get the state of a call from outside of the I/O thread in a single place so that the barrier needed to order wrt the error code and abort code is in just that place. Also use a barrier when setting the call state and again when reading the call state such that the auxiliary completion info (error code, abort code) can be read without taking a read lock on the call state lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Split out the call state changing functions into their own fileDavid Howells
Split out the functions that change the state of an rxrpc call into their own file. The idea being to remove anything to do with changing the state of a call directly from the rxrpc sendmsg() and recvmsg() paths and have all that done in the I/O thread only, with the ultimate aim of removing the state lock entirely. Moving the code out of sendmsg.c and recvmsg.c makes that easier to manage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parametersDavid Howells
Use the information now stored in struct rxrpc_call to configure the connection bundle and thence the connection, rather than using the rxrpc_conn_parameters struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Offload the completion of service conn security to the I/O threadDavid Howells
Offload the completion of the challenge/response cycle on a service connection to the I/O thread. After the RESPONSE packet has been successfully decrypted and verified by the work queue, offloading the changing of the call states to the I/O thread makes iteration over the conn's channel list simpler. Do this by marking the RESPONSE skbuff and putting it onto the receive queue for the I/O thread to collect. We put it on the front of the queue as we've already received the packet for it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Make the set of connection IDs per local endpointDavid Howells
Make the set of connection IDs per local endpoint so that endpoints don't cause each other's connections to get dismissed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Tidy up abort generation infrastructureDavid Howells
Tidy up the abort generation infrastructure in the following ways: (1) Create an enum and string mapping table to list the reasons an abort might be generated in tracing. (2) Replace the 3-char string with the values from (1) in the places that use that to log the abort source. This gets rid of a memcpy() in the tracepoint. (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint and use values from (1) to indicate the trace reason. (4) Always make a call to an abort function at the point of the abort rather than stashing the values into variables and using goto to get to a place where it reported. The C optimiser will collapse the calls together as appropriate. The abort functions return a value that can be returned directly if appropriate. Note that this extends into afs also at the points where that generates an abort. To aid with this, the afs sources need to #define RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header because they don't have access to the rxrpc internal structures that some of the tracepoints make use of. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Clean up connection abortDavid Howells
Clean up connection abort, using the connection state_lock to gate access to change that state, and use an rxrpc_call_completion value to indicate the difference between local and remote aborts as these can be pasted directly into the call state. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Implement a mechanism to send an event notification to a connectionDavid Howells
Provide a means by which an event notification can be sent to a connection through such that the I/O thread can pick it up and handle it rather than doing it in a separate workqueue. This is then used to move the deferred final ACK of a call into the I/O thread rather than a separate work queue as part of the drive to do all transmission from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Only disconnect calls in the I/O threadDavid Howells
Only perform call disconnection in the I/O thread to reduce the locking requirement. This is the first part of a fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Only set/transmit aborts in the I/O threadDavid Howells
Only set the abort call completion state in the I/O thread and only transmit ABORT packets from there. rxrpc_abort_call() can then be made to actually send the packet. Further, ABORT packets should only be sent if the call has been exposed to the network (ie. at least one attempted DATA transmission has occurred for it). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Separate call retransmission from other conn eventsDavid Howells
Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet() rather than calling it via connection event handling. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Make the local endpoint hold a ref on a connected callDavid Howells
Make the local endpoint and it's I/O thread hold a reference on a connected call until that call is disconnected. Without this, we're reliant on either the AF_RXRPC socket to hold a ref (which is dropped when the call is released) or a queued work item to hold a ref (the work item is being replaced with the I/O thread). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Stash the network namespace pointer in rxrpc_localDavid Howells
Stash the network namespace pointer in the rxrpc_local struct in addition to a pointer to the rxrpc-specific net namespace info. Use this to remove some places where the socket is passed as a parameter. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06batman-adv: Drop prandom.h includesSven Eckelmann
The commit 8032bf1233a7 ("treewide: use get_random_u32_below() instead of deprecated function") replaced the prandom.h function prandom_u32_max with the random.h function get_random_u32_below. There is no need to still include prandom.h. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-06batman-adv: Start new development cycleSimon Wunderlich
This version will contain all the (major or even only minor) changes for Linux 6.3. The version number isn't a semantic version number with major and minor information. It is just encoding the year of the expected publishing as Linux -rc1 and the number of published versions this year (starting at 0). Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-05devlink: convert remaining dumps to the by-instance schemeJakub Kicinski
Soon we'll have to check if a devlink instance is alive after locking it. Convert to the by-instance dumping scheme to make refactoring easier. Most of the subobject code no longer has to worry about any devlink locking / lifetime rules (the only ones that still do are the two subject types which stubbornly use their own locking). Both dump and do callbacks are given a devlink instance which is already locked and good-to-access (do from the .pre_doit handler, dump from the new dump indirection). Note that we'll now check presence of an op (e.g. for sb_pool_get) under the devlink instance lock, that will soon be necessary anyway, because we don't hold refs on the driver modules so the memory in which ops live may be gone for a dead instance, after upcoming locking changes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: add by-instance dump infraJakub Kicinski
Most dumpit implementations walk the devlink instances. This requires careful lock taking and reference dropping. Factor the loop out and provide just a callback to handle a single instance dump. Convert one user as an example, other users converted in the next change. Slightly inspired by ethtool netlink code. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: uniformly take the devlink instance lock in the dump loopJakub Kicinski
Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit(). This way all dumps will take the instance lock in the main iteration loop directly, making refactoring and reading the code easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: restart dump based on devlink instance ids (function)Jakub Kicinski
Use xarray id for cases of sub-objects which are iterated in a function. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: restart dump based on devlink instance ids (nested)Jakub Kicinski
Use xarray id for cases of simple sub-object iteration. We'll now use the state->instance for the devlink instances and state->idx for subobject index. Moving the definition of idx into the inner loop makes sense, so while at it also move other sub-object local variables into the loop. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: restart dump based on devlink instance ids (simple)Jakub Kicinski
xarray gives each devlink instance an id and allows us to restart walk based on that id quite neatly. This is nice both from the perspective of code brevity and from the stability of the dump (devlink instances disappearing from before the resumption point will not cause inconsistent dumps). This patch takes care of simple cases where state->idx counts devlink instances only. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: health: combine loops in dumpJakub Kicinski
Walk devlink instances only once. Dump the instance reporters and port reporters before moving to the next instance. User space should not depend on ordering of messages. This will make improving stability of the walk easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: drop the filter argument from devlinks_xa_find_getJakub Kicinski
Looks like devlinks_xa_find_get() was intended to get the mark from the @filter argument. It doesn't actually use @filter, passing DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other than registered is unlikely so drop @filter argument completely. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: remove start variables from dumpsJakub Kicinski
The start variables made the code clearer when we had to access cb->args[0] directly, as the name args doesn't explain much. Now that we use a structure to hold state this seems no longer needed. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: use an explicit structure for dump contextJakub Kicinski
Create a dump context structure instead of using cb->args as an unsigned long array. This is a pure conversion which is intended to be as much of a noop as possible. Subsequent changes will use this to simplify the code. The two non-trivial parts are: - devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0] to see if devlink_fmsg_dumpit() has already been called (whether this is the first msg), but doesn't use the exact value, so we can drop the local variable there already - devlink_nl_cmd_region_read_dumpit() uses args[0] for address but we'll use args[1] now, shouldn't matter Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05netlink: add macro for checking dump ctx sizeJakub Kicinski
We encourage casting struct netlink_callback::ctx to a local struct (in a comment above the field). Provide a convenience macro for checking if the local struct fits into the ctx. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: split out netlink codeJakub Kicinski
Move out the netlink glue into a separate file. Leave the ops in the old file because we'd have to export a ton of functions. Going forward we should switch to split ops which will let us to put the new ops in the netlink.c file. Pure code move, no functional changes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: split out core codeJakub Kicinski
Move core code into a separate file. It's spread around the main file which makes refactoring and figuring out how devlink works harder. Move the xarray, all the most core devlink instance code out like locking, ref counting, alloc, register, etc. Leave port stuff in leftover.c, if we want to move port code it'd probably be to its own file. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: rename devlink_netdevice_event -> devlink_port_netdevice_eventJakub Kicinski
To make the upcoming change a pure(er?) code move rename devlink_netdevice_event -> devlink_port_netdevice_event. This makes it clear that it only touches ports and doesn't belong cleanly in the core. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>