summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-01-06devlink: always check if the devlink instance is registeredJakub Kicinski
Always check under the instance lock whether the devlink instance is still / already registered. This is a no-op for the most part, as the unregistration path currently waits for all references. On the init path, however, we may temporarily open up a race with netdev code, if netdevs are registered before the devlink instance. This is temporary, the next change fixes it, and this commit has been split out for the ease of review. Note that in case of iterating over sub-objects which have their own lock (regions and line cards) we assume an implicit dependency between those objects existing and devlink unregistration. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: protect devlink->dev by the instance lockJakub Kicinski
devlink->dev is assumed to be always valid as long as any outstanding reference to the devlink instance exists. In prep for weakening of the references take the instance lock. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: update the code in netns move to latest helpersJakub Kicinski
devlink_pernet_pre_exit() is the only obvious place which takes the instance lock without using the devl_ helpers. Update the code and move the error print after releasing the reference (having unlock and put together feels slightly idiomatic). Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06devlink: bump the instance index directly when iteratingJakub Kicinski
xa_find_after() is designed to handle multi-index entries correctly. If a xarray has two entries one which spans indexes 0-3 and one at index 4 xa_find_after(0) will return the entry at index 4. Having to juggle the two callbacks, however, is unnecessary in case of the devlink xarray, as there is 1:1 relationship with indexes. Always use xa_find() and increment the index manually. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06tipc: fix unexpected link reset due to discovery messagesTung Nguyen
This unexpected behavior is observed: node 1 | node 2 ------ | ------ link is established | link is established reboot | link is reset up | send discovery message receive discovery message | link is established | link is established send discovery message | | receive discovery message | link is reset (unexpected) | send reset message link is reset | It is due to delayed re-discovery as described in function tipc_node_check_dest(): "this link endpoint has already reset and re-established contact with the peer, before receiving a discovery message from that node." However, commit 598411d70f85 has changed the condition for calling tipc_node_link_down() which was the acceptance of new media address. This commit fixes this by restoring the old and correct behavior. Fixes: 598411d70f85 ("tipc: make resetting of links non-atomic") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06sysctl: expose all net/core sysctls inside netnsMahesh Bandewar
All were not visible to the non-priv users inside netns. However, with 4ecb90090c84 ("sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN"), these vars are protected from getting modified. A proc with capable(CAP_NET_ADMIN) can change the values so not having them visible inside netns is just causing nuisance to process that check certain values (e.g. net.core.somaxconn) and see different behavior in root-netns vs. other-netns Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-06rxrpc: Move client call connection to the I/O threadDavid Howells
Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move the client conn cache management to the I/O threadDavid Howells
Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Remove call->state_lockDavid Howells
All the setters of call->state are now in the I/O thread and thus the state lock is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move call state changes from recvmsg to I/O threadDavid Howells
Move the call state changes that are made in rxrpc_recvmsg() to the I/O thread. This means that, thenceforth, only the I/O thread does this and the call state lock can be removed. This requires the Rx phase to be ended when the last packet is received, not when it is processed. Since this now changes the rxrpc call state to SUCCEEDED before we've consumed all the data from it, rxrpc_kernel_check_life() mustn't say the call is dead until the recvmsg queue is empty (unless the call has failed). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Move call state changes from sendmsg to I/O threadDavid Howells
Move all the call state changes that are made in rxrpc_sendmsg() to the I/O thread. This is a step towards removing the call state lock. This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is added to ->tx_sendmsg by sendmsg(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Wrap accesses to get call state to put the barrier in one placeDavid Howells
Wrap accesses to get the state of a call from outside of the I/O thread in a single place so that the barrier needed to order wrt the error code and abort code is in just that place. Also use a barrier when setting the call state and again when reading the call state such that the auxiliary completion info (error code, abort code) can be read without taking a read lock on the call state lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Split out the call state changing functions into their own fileDavid Howells
Split out the functions that change the state of an rxrpc call into their own file. The idea being to remove anything to do with changing the state of a call directly from the rxrpc sendmsg() and recvmsg() paths and have all that done in the I/O thread only, with the ultimate aim of removing the state lock entirely. Moving the code out of sendmsg.c and recvmsg.c makes that easier to manage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parametersDavid Howells
Use the information now stored in struct rxrpc_call to configure the connection bundle and thence the connection, rather than using the rxrpc_conn_parameters struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Offload the completion of service conn security to the I/O threadDavid Howells
Offload the completion of the challenge/response cycle on a service connection to the I/O thread. After the RESPONSE packet has been successfully decrypted and verified by the work queue, offloading the changing of the call states to the I/O thread makes iteration over the conn's channel list simpler. Do this by marking the RESPONSE skbuff and putting it onto the receive queue for the I/O thread to collect. We put it on the front of the queue as we've already received the packet for it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Make the set of connection IDs per local endpointDavid Howells
Make the set of connection IDs per local endpoint so that endpoints don't cause each other's connections to get dismissed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Tidy up abort generation infrastructureDavid Howells
Tidy up the abort generation infrastructure in the following ways: (1) Create an enum and string mapping table to list the reasons an abort might be generated in tracing. (2) Replace the 3-char string with the values from (1) in the places that use that to log the abort source. This gets rid of a memcpy() in the tracepoint. (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint and use values from (1) to indicate the trace reason. (4) Always make a call to an abort function at the point of the abort rather than stashing the values into variables and using goto to get to a place where it reported. The C optimiser will collapse the calls together as appropriate. The abort functions return a value that can be returned directly if appropriate. Note that this extends into afs also at the points where that generates an abort. To aid with this, the afs sources need to #define RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header because they don't have access to the rxrpc internal structures that some of the tracepoints make use of. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Clean up connection abortDavid Howells
Clean up connection abort, using the connection state_lock to gate access to change that state, and use an rxrpc_call_completion value to indicate the difference between local and remote aborts as these can be pasted directly into the call state. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Implement a mechanism to send an event notification to a connectionDavid Howells
Provide a means by which an event notification can be sent to a connection through such that the I/O thread can pick it up and handle it rather than doing it in a separate workqueue. This is then used to move the deferred final ACK of a call into the I/O thread rather than a separate work queue as part of the drive to do all transmission from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Only disconnect calls in the I/O threadDavid Howells
Only perform call disconnection in the I/O thread to reduce the locking requirement. This is the first part of a fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Only set/transmit aborts in the I/O threadDavid Howells
Only set the abort call completion state in the I/O thread and only transmit ABORT packets from there. rxrpc_abort_call() can then be made to actually send the packet. Further, ABORT packets should only be sent if the call has been exposed to the network (ie. at least one attempted DATA transmission has occurred for it). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Separate call retransmission from other conn eventsDavid Howells
Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet() rather than calling it via connection event handling. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Make the local endpoint hold a ref on a connected callDavid Howells
Make the local endpoint and it's I/O thread hold a reference on a connected call until that call is disconnected. Without this, we're reliant on either the AF_RXRPC socket to hold a ref (which is dropped when the call is released) or a queued work item to hold a ref (the work item is being replaced with the I/O thread). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06rxrpc: Stash the network namespace pointer in rxrpc_localDavid Howells
Stash the network namespace pointer in the rxrpc_local struct in addition to a pointer to the rxrpc-specific net namespace info. Use this to remove some places where the socket is passed as a parameter. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-01-06batman-adv: Drop prandom.h includesSven Eckelmann
The commit 8032bf1233a7 ("treewide: use get_random_u32_below() instead of deprecated function") replaced the prandom.h function prandom_u32_max with the random.h function get_random_u32_below. There is no need to still include prandom.h. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-06batman-adv: Start new development cycleSimon Wunderlich
This version will contain all the (major or even only minor) changes for Linux 6.3. The version number isn't a semantic version number with major and minor information. It is just encoding the year of the expected publishing as Linux -rc1 and the number of published versions this year (starting at 0). Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-05devlink: convert remaining dumps to the by-instance schemeJakub Kicinski
Soon we'll have to check if a devlink instance is alive after locking it. Convert to the by-instance dumping scheme to make refactoring easier. Most of the subobject code no longer has to worry about any devlink locking / lifetime rules (the only ones that still do are the two subject types which stubbornly use their own locking). Both dump and do callbacks are given a devlink instance which is already locked and good-to-access (do from the .pre_doit handler, dump from the new dump indirection). Note that we'll now check presence of an op (e.g. for sb_pool_get) under the devlink instance lock, that will soon be necessary anyway, because we don't hold refs on the driver modules so the memory in which ops live may be gone for a dead instance, after upcoming locking changes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: add by-instance dump infraJakub Kicinski
Most dumpit implementations walk the devlink instances. This requires careful lock taking and reference dropping. Factor the loop out and provide just a callback to handle a single instance dump. Convert one user as an example, other users converted in the next change. Slightly inspired by ethtool netlink code. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: uniformly take the devlink instance lock in the dump loopJakub Kicinski
Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit(). This way all dumps will take the instance lock in the main iteration loop directly, making refactoring and reading the code easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: restart dump based on devlink instance ids (function)Jakub Kicinski
Use xarray id for cases of sub-objects which are iterated in a function. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: restart dump based on devlink instance ids (nested)Jakub Kicinski
Use xarray id for cases of simple sub-object iteration. We'll now use the state->instance for the devlink instances and state->idx for subobject index. Moving the definition of idx into the inner loop makes sense, so while at it also move other sub-object local variables into the loop. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: restart dump based on devlink instance ids (simple)Jakub Kicinski
xarray gives each devlink instance an id and allows us to restart walk based on that id quite neatly. This is nice both from the perspective of code brevity and from the stability of the dump (devlink instances disappearing from before the resumption point will not cause inconsistent dumps). This patch takes care of simple cases where state->idx counts devlink instances only. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: health: combine loops in dumpJakub Kicinski
Walk devlink instances only once. Dump the instance reporters and port reporters before moving to the next instance. User space should not depend on ordering of messages. This will make improving stability of the walk easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: drop the filter argument from devlinks_xa_find_getJakub Kicinski
Looks like devlinks_xa_find_get() was intended to get the mark from the @filter argument. It doesn't actually use @filter, passing DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other than registered is unlikely so drop @filter argument completely. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: remove start variables from dumpsJakub Kicinski
The start variables made the code clearer when we had to access cb->args[0] directly, as the name args doesn't explain much. Now that we use a structure to hold state this seems no longer needed. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: use an explicit structure for dump contextJakub Kicinski
Create a dump context structure instead of using cb->args as an unsigned long array. This is a pure conversion which is intended to be as much of a noop as possible. Subsequent changes will use this to simplify the code. The two non-trivial parts are: - devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0] to see if devlink_fmsg_dumpit() has already been called (whether this is the first msg), but doesn't use the exact value, so we can drop the local variable there already - devlink_nl_cmd_region_read_dumpit() uses args[0] for address but we'll use args[1] now, shouldn't matter Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05netlink: add macro for checking dump ctx sizeJakub Kicinski
We encourage casting struct netlink_callback::ctx to a local struct (in a comment above the field). Provide a convenience macro for checking if the local struct fits into the ctx. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: split out netlink codeJakub Kicinski
Move out the netlink glue into a separate file. Leave the ops in the old file because we'd have to export a ton of functions. Going forward we should switch to split ops which will let us to put the new ops in the netlink.c file. Pure code move, no functional changes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: split out core codeJakub Kicinski
Move core code into a separate file. It's spread around the main file which makes refactoring and figuring out how devlink works harder. Move the xarray, all the most core devlink instance code out like locking, ref counting, alloc, register, etc. Leave port stuff in leftover.c, if we want to move port code it'd probably be to its own file. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: rename devlink_netdevice_event -> devlink_port_netdevice_eventJakub Kicinski
To make the upcoming change a pure(er?) code move rename devlink_netdevice_event -> devlink_port_netdevice_event. This makes it clear that it only touches ports and doesn't belong cleanly in the core. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05devlink: move code to a dedicated directoryJakub Kicinski
The devlink code is hard to navigate with 13kLoC in one file. I really like the way Michal split the ethtool into per-command files and core. It'd probably be too much to split it all up, but we can at least separate the core parts out of the per-cmd implementations and put it in a directory so that new commands can be separate files. Move the code, subsequent commit will do a partial split. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05Merge tag 'net-6.2-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, wifi, and netfilter. Current release - regressions: - bpf: fix nullness propagation for reg to reg comparisons, avoid null-deref - inet: control sockets should not use current thread task_frag - bpf: always use maximal size for copy_array() - eth: bnxt_en: don't link netdev to a devlink port for VFs Current release - new code bugs: - rxrpc: fix a couple of potential use-after-frees - netfilter: conntrack: fix IPv6 exthdr error check - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes - eth: dsa: qca8k: various fixes for the in-band register access - eth: nfp: fix schedule in atomic context when sync mc address - eth: renesas: rswitch: fix getting mac address from device tree - mobile: ipa: use proper endpoint mask for suspend Previous releases - regressions: - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by Jiri / python tests - net: tc: don't intepret cls results when asked to drop, fix oob-access - vrf: determine the dst using the original ifindex for multicast - eth: bnxt_en: - fix XDP RX path if BPF adjusted packet length - fix HDS (header placement) and jumbo thresholds for RX packets - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf, avoid memory corruptions Previous releases - always broken: - ulp: prevent ULP without clone op from entering the LISTEN status - veth: fix race with AF_XDP exposing old or uninitialized descriptors - bpf: - pull before calling skb_postpull_rcsum() (fix checksum support and avoid a WARN()) - fix panic due to wrong pageattr of im->image (when livepatch and kretfunc coexist) - keep a reference to the mm, in case the task is dead - mptcp: fix deadlock in fastopen error path - netfilter: - nf_tables: perform type checking for existing sets - nf_tables: honor set timeout and garbage collection updates - ipset: fix hash:net,port,net hang with /0 subnet - ipset: avoid hung task warning when adding/deleting entries - selftests: net: - fix cmsg_so_mark.sh test hang on non-x86 systems - fix the arp_ndisc_evict_nocarrier test for IPv6 - usb: rndis_host: secure rndis_query check against int overflow - eth: r8169: fix dmar pte write access during suspend/resume with WOL - eth: lan966x: fix configuration of the PCS - eth: sparx5: fix reading of the MAC address - eth: qed: allow sleep in qed_mcp_trace_dump() - eth: hns3: - fix interrupts re-initialization after VF FLR - fix handling of promisc when MAC addr table gets full - refine the handling for VF heartbeat - eth: mlx5: - properly handle ingress QinQ-tagged packets on VST - fix io_eq_size and event_eq_size params validation on big endian - fix RoCE setting at HCA level if not supported at all - don't turn CQE compression on by default for IPoIB - eth: ena: - fix toeplitz initial hash key value - account for the number of XDP-processed bytes in interface stats - fix rx_copybreak value update Misc: - ethtool: harden phy stat handling against buggy drivers - docs: netdev: convert maintainer's doc from FAQ to a normal document" * tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) caif: fix memory leak in cfctrl_linkup_request() inet: control sockets should not use current thread task_frag net/ulp: prevent ULP without clone op from entering the LISTEN status qed: allow sleep in qed_mcp_trace_dump() MAINTAINERS: Update maintainers for ptp_vmw driver usb: rndis_host: Secure rndis_query check against int overflow net: dpaa: Fix dtsec check for PCS availability octeontx2-pf: Fix lmtst ID used in aura free drivers/net/bonding/bond_3ad: return when there's no aggregator netfilter: ipset: Rework long task execution when adding/deleting entries netfilter: ipset: fix hash:net,port,net hang with /0 subnet net: sparx5: Fix reading of the MAC address vxlan: Fix memory leaks in error path net: sched: htb: fix htb_classify() kernel-doc net: sched: cbq: dont intepret cls results when asked to drop net: sched: atm: dont intepret cls results when asked to drop dt-bindings: net: marvell,orion-mdio: Fix examples dt-bindings: net: sun8i-emac: Add phy-supply property net: ipa: use proper endpoint mask for suspend selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier ...
2023-01-05rxrpc: replace zero-lenth array with DECLARE_FLEX_ARRAY() helperStephen Rothwell
0-length arrays are deprecated, and cause problems with bounds checking. Replace with a flexible array: In file included from include/linux/string.h:253, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from arch/x86/include/asm/paravirt.h:17, from arch/x86/include/asm/cpuid.h:62, from arch/x86/include/asm/processor.h:19, from arch/x86/include/asm/cpufeature.h:5, from arch/x86/include/asm/thread_info.h:53, from include/linux/thread_info.h:60, from arch/x86/include/asm/preempt.h:9, from include/linux/preempt.h:78, from include/linux/percpu.h:6, from include/linux/prandom.h:13, from include/linux/random.h:153, from include/linux/net.h:18, from net/rxrpc/output.c:10: In function 'fortify_memcpy_chk', inlined from 'rxrpc_fill_out_ack' at net/rxrpc/output.c:158:2: include/linux/fortify-string.h:520:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 520 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: https://lore.kernel.org/linux-next/20230105132535.0d65378f@canb.auug.org.au/ Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-afs@lists.infradead.org Cc: netdev@vger.kernel.org Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Kees Cook <keescook@chromium.org>
2023-01-05caif: fix memory leak in cfctrl_linkup_request()Zhengchao Shao
When linktype is unknown or kzalloc failed in cfctrl_linkup_request(), pkt is not released. Add release process to error path. Fixes: b482cd2053e3 ("net-caif: add CAIF core protocol stack") Fixes: 8d545c8f958f ("caif: Disconnect without waiting for response") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-04inet: control sockets should not use current thread task_fragEric Dumazet
Because ICMP handlers run from softirq contexts, they must not use current thread task_frag. Previously, all sockets allocated by inet_ctl_sock_create() would use the per-socket page fragment, with no chance of recursion. Fixes: 98123866fcf3 ("Treewide: Stop corrupting socket's task_frag") Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Coddington <bcodding@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-04net/ulp: prevent ULP without clone op from entering the LISTEN statusPaolo Abeni
When an ULP-enabled socket enters the LISTEN status, the listener ULP data pointer is copied inside the child/accepted sockets by sk_clone_lock(). The relevant ULP can take care of de-duplicating the context pointer via the clone() operation, but only MPTCP and SMC implement such op. Other ULPs may end-up with a double-free at socket disposal time. We can't simply clear the ULP data at clone time, as TLS replaces the socket ops with custom ones assuming a valid TLS ULP context is available. Instead completely prevent clone-less ULP sockets from entering the LISTEN status. Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Reported-by: slipper <slipper.alive@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-04Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-04 We've added 45 non-merge commits during the last 21 day(s) which contain a total of 50 files changed, 1454 insertions(+), 375 deletions(-). The main changes are: 1) Fixes, improvements and refactoring of parts of BPF verifier's state equivalence checks, from Andrii Nakryiko. 2) Fix a few corner cases in libbpf's BTF-to-C converter in particular around padding handling and enums, also from Andrii Nakryiko. 3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata, from Christian Ehrig. 4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks, from Dave Marchevsky. 5) Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers, from Jiri Olsa. 6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps, from Maryam Tahhan. 7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du. 8) Bigger batch of improvements to BPF tracing code samples, from Daniel T. Lee. 9) Add LoongArch support to libbpf's bpf_tracing helper header, from Hengqi Chen. 10) Fix a libbpf compiler warning in perf_event_open_probe on arm32, from Khem Raj. 11) Optimize bpf_local_storage_elem by removing 56 bytes of padding, from Martin KaFai Lau. 12) Use pkg-config to locate libelf for resolve_btfids build, from Shen Jiamin. 13) Various libbpf improvements around API documentation and errno handling, from Xin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) libbpf: Return -ENODATA for missing btf section libbpf: Add LoongArch support to bpf_tracing.h libbpf: Restore errno after pr_warn. libbpf: Added the description of some API functions libbpf: Fix invalid return address register in s390 samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro samples/bpf: Change _kern suffix to .bpf with syscall tracing program samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program samples/bpf: Use kyscall instead of kprobe in syscall tracing program bpf: rename list_head -> graph_root in field info types libbpf: fix errno is overwritten after being closed. bpf: fix regs_exact() logic in regsafe() to remap IDs correctly bpf: perform byte-by-byte comparison only when necessary in regsafe() bpf: reject non-exact register type matches in regsafe() bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule bpf: reorganize struct bpf_reg_state fields bpf: teach refsafe() to take into account ID remapping bpf: Remove unused field initialization in bpf's ctl_table selftests/bpf: Add jit probe_mem corner case tests to s390x denylist ... ==================== Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-03mac802154: Handle passive scanningMiquel Raynal
Implement the core hooks in order to provide the softMAC layer support for passive scans. Scans are requested by the user and can be aborted. Changing channels manually is prohibited during scans. The implementation uses a workqueue triggered at a certain interval depending on the symbol duration for the current channel and the duration order provided. More advanced drivers with internal scheduling capabilities might require additional care but there is none mainline yet. Received beacons during a passive scan are processed in a work queue and their result forwarded to the upper layer. Active scanning is not supported yet. Co-developed-by: David Girault <david.girault@qorvo.com> Signed-off-by: David Girault <david.girault@qorvo.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20230103165644.432209-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-01-03mac802154: Add MLME Tx locked helpersMiquel Raynal
These have the exact same behavior as before, except they expect the rtnl to be already taken (and will complain otherwise). This allows performing MLME transmissions from different contexts. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20230103165644.432209-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>