summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-12-01hsr: Synchronize sending frames to have always incremented outgoing seq nr.Sebastian Andrzej Siewior
Sending frames via the hsr (master) device requires a sequence number which is tracked in hsr_priv::sequence_nr and protected by hsr_priv::seqnr_lock. Each time a new frame is sent, it will obtain a new id and then send it via the slave devices. Each time a packet is sent (via hsr_forward_do()) the sequence number is checked via hsr_register_frame_out() to ensure that a frame is not handled twice. This make sense for the receiving side to ensure that the frame is not injected into the stack twice after it has been received from both slave ports. There is no locking to cover the sending path which means the following scenario is possible: CPU0 CPU1 hsr_dev_xmit(skb1) hsr_dev_xmit(skb2) fill_frame_info() fill_frame_info() hsr_fill_frame_info() hsr_fill_frame_info() handle_std_frame() handle_std_frame() skb1's sequence_nr = 1 skb2's sequence_nr = 2 hsr_forward_do() hsr_forward_do() hsr_register_frame_out(, 2) // okay, send) hsr_register_frame_out(, 1) // stop, lower seq duplicate Both skbs (or their struct hsr_frame_info) received an unique id. However since skb2 was sent before skb1, the higher sequence number was recorded in hsr_register_frame_out() and the late arriving skb1 was dropped and never sent. This scenario has been observed in a three node HSR setup, with node1 + node2 having ping and iperf running in parallel. From time to time ping reported a missing packet. Based on tracing that missing ping packet did not leave the system. It might be possible (didn't check) to drop the sequence number check on the sending side. But if the higher sequence number leaves on wire before the lower does and the destination receives them in that order and it will drop the packet with the lower sequence number and never inject into the stack. Therefore it seems the only way is to lock the whole path from obtaining the sequence number and sending via dev_queue_xmit() and assuming the packets leave on wire in the same order (and don't get reordered by the NIC). Cover the whole path for the master interface from obtaining the ID until after it has been forwarded via hsr_forward_skb() to ensure the skbs are sent to the NIC in the order of the assigned sequence numbers. Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01hsr: Disable netpoll.Sebastian Andrzej Siewior
The hsr device is a software device. Its net_device_ops::ndo_start_xmit() routine will process the packet and then pass the resulting skb to dev_queue_xmit(). During processing, hsr acquires a lock with spin_lock_bh() (hsr_add_node()) which needs to be promoted to the _irq() suffix in order to avoid a potential deadlock. Then there are the warnings in dev_queue_xmit() (due to local_bh_disable() with disabled interrupts) left. Instead trying to address those (there is qdisc and…) for netpoll sake, just disable netpoll on hsr. Disable netpoll on hsr and replace the _irqsave() locking with _bh(). Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01hsr: Avoid double remove of a node.Sebastian Andrzej Siewior
Due to the hashed-MAC optimisation one problem become visible: hsr_handle_sup_frame() walks over the list of available nodes and merges two node entries into one if based on the information in the supervision both MAC addresses belong to one node. The list-walk happens on a RCU protected list and delete operation happens under a lock. If the supervision arrives on both slave interfaces at the same time then this delete operation can occur simultaneously on two CPUs. The result is the first-CPU deletes the from the list and the second CPUs BUGs while attempting to dereference a poisoned list-entry. This happens more likely with the optimisation because a new node for the mac_B entry is created once a packet has been received and removed (merged) once the supervision frame has been received. Avoid removing/ cleaning up a hsr_node twice by adding a `removed' field which is set to true after the removal and checked before the removal. Fixes: f266a683a4804 ("net/hsr: Better frame dispatch") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01hsr: Add a rcu-read lock to hsr_forward_skb().Sebastian Andrzej Siewior
hsr_forward_skb() a skb and keeps information in an on-stack hsr_frame_info. hsr_get_node() assigns hsr_frame_info::node_src which is from a RCU list. This pointer is used later in hsr_forward_do(). I don't see a reason why this pointer can't vanish midway since there is no guarantee that hsr_forward_skb() is invoked from an RCU read section. Use rcu_read_lock() to protect hsr_frame_info::node_src from its assignment until it is no longer used. Fixes: f266a683a4804 ("net/hsr: Better frame dispatch") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01Revert "net: hsr: use hlist_head instead of list_head for mac addresses"Sebastian Andrzej Siewior
The hlist optimisation (which not only uses hlist_head instead of list_head but also splits hsr_priv::node_db into an array of 256 slots) does not consider the "node merge": Upon starting the hsr network (with three nodes) a packet that is sent from node1 to node3 will also be sent from node1 to node2 and then forwarded to node3. As a result node3 will receive 2 packets because it is not able to filter out the duplicate. Each packet received will create a new struct hsr_node with macaddress_A only set the MAC address it received from (the two MAC addesses from node1). At some point (early in the process) two supervision frames will be received from node1. They will be processed by hsr_handle_sup_frame() and one frame will leave early ("Node has already been merged") and does nothing. The other frame will be merged as portB and have its MAC address written to macaddress_B and the hsr_node (that was created for it as macaddress_A) will be removed. From now on HSR is able to identify a duplicate because both packets sent from one node will result in the same struct hsr_node because hsr_get_node() will find the MAC address either on macaddress_A or macaddress_B. Things get tricky with the optimisation: If sender's MAC address is saved as macaddress_A then the lookup will work as usual. If the MAC address has been merged into macaddress_B of another hsr_node then the lookup won't work because it is likely that the data structure is in another bucket. This results in creating a new struct hsr_node and not recognising a possible duplicate. A way around it would be to add another hsr_node::mac_list_B and attach it to the other bucket to ensure that this hsr_node will be looked up either via macaddress_A _or_ macaddress_B. I however prefer to revert it because it sounds like an academic problem rather than real life workload plus it adds complexity. I'm not an HSR expert with what is usual size of a network but I would guess 40 to 60 nodes. With 10.000 nodes and assuming 60us for pass-through (from node to node) then it would take almost 600ms for a packet to almost wrap around which sounds a lot. Revert the hash MAC addresses optimisation. Fixes: 4acc45db71158 ("net: hsr: use hlist_head instead of list_head for mac addresses") Cc: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01sctp: delete free member from struct sctp_sched_opsXin Long
After commit 9ed7bfc79542 ("sctp: fix memory leak in sctp_stream_outq_migrate()"), sctp_sched_set_sched() is the only place calling sched->free(), and it can actually be replaced by sched->free_sid() on each stream, and yet there's already a loop to traverse all streams in sctp_sched_set_sched(). This patch adds a function sctp_sched_free_sched() where it calls sched->free_sid() for each stream to replace sched->free() calls in sctp_sched_set_sched() and then deletes the unused free member from struct sctp_sched_ops. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/e10aac150aca2686cb0bd0570299ec716da5a5c0.1669849471.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01mptcp: add pm listener eventsGeliang Tang
This patch adds two new MPTCP netlink event types for PM listening socket create and close, named MPTCP_EVENT_LISTENER_CREATED and MPTCP_EVENT_LISTENER_CLOSED. Add a new function mptcp_event_pm_listener() to push the new events with family, port and addr to userspace. Invoke mptcp_event_pm_listener() with MPTCP_EVENT_LISTENER_CREATED in mptcp_listen() and mptcp_pm_nl_create_listen_socket(), invoke it with MPTCP_EVENT_LISTENER_CLOSED in __mptcp_close_ssk(). Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01net/tcp: Separate initialization of twskDmitry Safonov
Convert BUG_ON() to WARN_ON_ONCE() and warn as well for unlikely static key int overflow error-path. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01net/tcp: Do cleanup on tcp_md5_key_copy() failureDmitry Safonov
If the kernel was short on (atomic) memory and failed to allocate it - don't proceed to creation of request socket. Otherwise the socket would be unsigned and userspace likely doesn't expect that the TCP is not MD5-signed anymore. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destructionDmitry Safonov
To do that, separate two scenarios: - where it's the first MD5 key on the system, which means that enabling of the static key may need to sleep; - copying of an existing key from a listening socket to the request socket upon receiving a signed TCP segment, where static key was already enabled (when the key was added to the listening socket). Now the life-time of the static branch for TCP-MD5 is until: - last tcp_md5sig_info is destroyed - last socket in time-wait state with MD5 key is closed. Which means that after all sockets with TCP-MD5 keys are gone, the system gets back the performance of disabled md5-key static branch. While at here, provide static_key_fast_inc() helper that does ref counter increment in atomic fashion (without grabbing cpus_read_lock() on CONFIG_JUMP_LABEL=y). This is needed to add a new user for a static_key when the caller controls the lifetime of another user. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01net/tcp: Separate tcp_md5sig_info allocation into tcp_md5sig_info_add()Dmitry Safonov
Add a helper to allocate tcp_md5sig_info, that will help later to do/allocate things when info allocated, once per socket. Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01wifi: mac80211: fix and simplify unencrypted drop check for meshFelix Fietkau
ieee80211_drop_unencrypted is called from ieee80211_rx_h_mesh_fwding and ieee80211_frame_allowed. Since ieee80211_rx_h_mesh_fwding can forward packets for other mesh nodes and is called earlier, it needs to check the decryptions status and if the packet is using the control protocol on its own, instead of deferring to the later call from ieee80211_frame_allowed. Because of that, ieee80211_drop_unencrypted has a mesh specific check that skips over the mesh header in order to check the payload protocol. This code is invalid when called from ieee80211_frame_allowed, since that happens after the 802.11->802.3 conversion. Fix this by moving the mesh specific check directly into ieee80211_rx_h_mesh_fwding. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20221201135730.19723-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: mac80211: add support for restricting netdev features per vifFelix Fietkau
This can be used to selectively disable feature flags for checksum offload, scatter/gather or GSO by changing vif->netdev_features. Removing features from vif->netdev_features does not affect the netdev features themselves, but instead fixes up skbs in the tx path so that the offloads are not needed in the driver. Aside from making it easier to deal with vif type based hardware limitations, this also makes it possible to optimize performance on hardware without native GSO support by declaring GSO support in hw->netdev_features and removing it from vif->netdev_features. This allows mac80211 to handle GSO segmentation after the sta lookup, but before itxq enqueue, thus reducing the number of unnecessary sta lookups, as well as some other per-packet processing. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20221010094338.78070-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: mac80211: update TIM for S1G specification changesKieran Frewen
Updates to the TIM information element to match changes made in the IEEE Std 802.11ah-2020. Signed-off-by: Kieran Frewen <kieran.frewen@morsemicro.com> Co-developed-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com> Signed-off-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com> Link: https://lore.kernel.org/r/20221106221602.25714-1-gilad.itzkovitch@morsemicro.com [use skb_put_data/skb_put_u8] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: mac80211: don't parse multi-BSSID in assoc respJohannes Berg
It's not valid to have the multiple BSSID element in the association response (per 802.11 REVme D1.0), so don't try to parse it there, but only in the fallback beacon elements if needed. The other case that was parsing association requests was already changed in a previous commit. Change-Id: I659d2ef1253e079cc71c46a017044e116e31c024 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: cfg80211: use bss_from_pub() instead of container_of()Johannes Berg
There's no need to open-code container_of() when we have bss_from_pub(). Use it. Change-Id: I074723717909ba211a40e6499f0c36df0e2ba4be Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: mac80211: remove unnecessary synchronize_net()Johannes Berg
The call to ieee80211_do_stop() right after will also do synchronize_rcu() to ensure the SDATA_STATE_RUNNING bit is cleared, so we don't need to synchronize_net() here. Change-Id: Id9f9ffcf195002013e5d9fde288877d219780864 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: mac80211: Drop not needed check for NULLAlexander Wetzel
ieee80211_get_txq() can only be called with vif != NULL. Remove not needed NULL test in function. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/20221107161328.2883-1-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: cfg80211: Fix not unregister reg_pdev when load_builtin_regdb_keys() failsChen Zhongjin
In regulatory_init_db(), when it's going to return a error, reg_pdev should be unregistered. When load_builtin_regdb_keys() fails it doesn't do it and makes cfg80211 can't be reload with report: sysfs: cannot create duplicate filename '/devices/platform/regulatory.0' ... <TASK> dump_stack_lvl+0x79/0x9b sysfs_warn_dup.cold+0x1c/0x29 sysfs_create_dir_ns+0x22d/0x290 kobject_add_internal+0x247/0x800 kobject_add+0x135/0x1b0 device_add+0x389/0x1be0 platform_device_add+0x28f/0x790 platform_device_register_full+0x376/0x4b0 regulatory_init+0x9a/0x4b2 [cfg80211] cfg80211_init+0x84/0x113 [cfg80211] ... Fixes: 90a53e4432b1 ("cfg80211: implement regdb signature checking") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Link: https://lore.kernel.org/r/20221109090237.214127-1-chenzhongjin@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: cfg80211: fix comparison of BSS frequenciesJUN-KYU SHIN
If the "channel->freq_offset" comparison is omitted in cmp_bss(), BSS with different kHz units cannot be distinguished in the S1G Band. So "freq_offset" should also be included in the comparison. Signed-off-by: JUN-KYU SHIN <jk.shin@newratek.com> Link: https://lore.kernel.org/r/20221111023301.6395-1-jk.shin@newratek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01wifi: mac80211: fix maybe-unused warningÍñigo Huguet
In ieee80211_lookup_key, the variable named `local` is unused if compiled without lockdep, getting this warning: net/mac80211/cfg.c: In function ‘ieee80211_lookup_key’: net/mac80211/cfg.c:542:26: error: unused variable ‘local’ [-Werror=unused-variable] struct ieee80211_local *local = sdata->local; ^~~~~ Fix it with __maybe_unused. Fixes: 8cbf0c2ab6df ("wifi: mac80211: refactor some key code") Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/20221111153622.29016-1-ihuguet@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-12-01rxrpc: Transmit ACKs at the point of generationDavid Howells
For ACKs generated inside the I/O thread, transmit the ACK at the point of generation. Where the ACK is generated outside of the I/O thread, it's offloaded to the I/O thread to transmit it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local()David Howells
Fold __rxrpc_unuse_local() into rxrpc_unuse_local() as the latter is now the only user of the former. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Move the cwnd degradation after transmitting packetsDavid Howells
When we've gone for >1RTT without transmitting a packet, we should reduce the ssthresh and cut the cwnd by half (as suggested in RFC2861 sec 3.1). However, we may receive ACK packets in a batch and the first of these may cut the cwnd, preventing further transmission, and each subsequent one cuts the cwnd yet further, reducing it to the floor and killing performance. Fix this by moving the cwnd reset to after doing the transmission and resetting the base time such that we don't cut the cwnd by half again for at least another RTT. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Trace/count transmission underflows and cwnd resetsDavid Howells
Add a tracepoint to log when a cwnd reset occurs due to lack of transmission on a call. Add stat counters to count transmission underflows (ie. when we have tx window space, but sendmsg doesn't manage to keep up), cwnd resets and transmission failures. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Remove the _bh annotation from all the spinlocksDavid Howells
None of the spinlocks in rxrpc need a _bh annotation now as the RCU callback routines no longer take spinlocks and the bulk of the packet wrangling code is now run in the I/O thread, not softirq context. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Make the I/O thread take over the call and local processor workDavid Howells
Move the functions from the call->processor and local->processor work items into the domain of the I/O thread. The call event processor, now called from the I/O thread, then takes over the job of cranking the call state machine, processing incoming packets and transmitting DATA, ACK and ABORT packets. In a future patch, rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it for later transmission. The call event processor becomes purely received-skb driven. It only transmits things in response to events. We use "pokes" to queue a dummy skb to make it do things like start/resume transmitting data. Timer expiry also results in pokes. The connection event processor, becomes similar, though crypto events, such as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item to avoid doing crypto in the I/O thread. The local event processor is removed and VERSION response packets are generated directly from the packet parser. Similarly, ABORTs generated in response to protocol errors will be transmitted immediately rather than being pushed onto a queue for later transmission. Changes: ======== ver #2) - Fix a couple of introduced lock context imbalances. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Extract the peer address from an incoming packet earlierDavid Howells
Extract the peer address from an incoming packet earlier, at the beginning of rxrpc_input_packet() and thence pass a pointer to it to various functions that use it as part of the lookup rather than doing it on several separate paths. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Reduce the use of RCU in packet inputDavid Howells
Shrink the region of rxrpc_input_packet() that is covered by the RCU read lock so that it only covers the connection and call lookup. This means that the bits now outside of that can call sleepable functions such as kmalloc and sendmsg. Also take a ref on the conn or call we're going to use before we drop the RCU read lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Simplify skbuff accounting in receive pathDavid Howells
A received skbuff needs a ref when it gets put on a call data queue or conn packet queue, and rxrpc_input_packet() and co. jump through a lot of hoops to avoid double-dropping the skbuff ref so that we can avoid getting a ref when we queue the packet. Change this so that the skbuff ref is unconditionally dropped by the caller of rxrpc_input_packet(). An additional ref is then taken on the packet if it is pushed onto a queue. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Remove RCU from peer->error_targets listDavid Howells
Remove the RCU requirements from the peer's list of error targets so that the error distributor can call sleeping functions. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Move DATA transmission into call processor work itemDavid Howells
Move DATA transmission into the call processor work item. In a future patch, this will be called from the I/O thread rather than being itsown work item. This will allow DATA transmission to be driven directly by incoming ACKs, pokes and timers as those are processed. The Tx queue is also split: The queue of packets prepared by sendmsg is now places in call->tx_sendmsg and the packet dispatcher decants the packets into call->tx_buffer as space becomes available in the transmission window. This allows sendmsg to run ahead of the available space to try and prevent an underflow in transmission. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Copy client call parameters into rxrpc_call earlierDavid Howells
Copy client call parameters into rxrpc_call earlier so that that can be used to convey them to the connection code - which can then be offloaded to the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Implement a mechanism to send an event notification to a callDavid Howells
Provide a means by which an event notification can be sent to a call such that the I/O thread can process it rather than it being done in a separate workqueue. This will allow a lot of locking to be removed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Don't use sk->sk_receive_queue.lock to guard socket state changesDavid Howells
Don't use sk->sk_receive_queue.lock to guard socket state changes as the socket mutex is sufficient. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Remove call->input_lockDavid Howells
Remove call->input_lock as it was only necessary to serialise access to the state stored in the rxrpc_call struct by simultaneous softirq handlers presenting received packets. They now dump the packets in a queue and a single process-context handler now processes them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Move error processing into the local endpoint I/O threadDavid Howells
Move the processing of error packets into the local endpoint I/O thread, leaving the handover from UDP to merely transfer them into the local endpoint queue. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Move packet reception processing into I/O threadDavid Howells
Split the packet input handler to make the softirq side just dump the received packet into the local endpoint receive queue and then call the remainder of the input function from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Create a per-local endpoint receive queue and I/O threadDavid Howells
Create a per-local receive queue to which, in a future patch, all incoming packets will be directed and an I/O thread that will process those packets and perform all transmission of packets. Destruction of the local endpoint is also moved from the local processor work item (which will be absorbed) to the thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Split the receive codeDavid Howells
Split the code that handles packet reception in softirq mode as a prelude to moving all the packet processing beyond routing to the appropriate call and setting up of a new call out into process context. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Don't hold a ref for connection workqueueDavid Howells
Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). (1) Don't give a ref to the work item. (2) Simplify handling of service connections by adding a separate active count so that the refcount isn't also used for this. (3) Connection destruction for both client and service connections can then be cleaned up by putting rxrpc_put_connection() out of line and making a tidy progression through the destruction code (offloaded to a workqueue if put from softirq or processor function context). The RCU part of the cleanup then only deals with the freeing at the end. (4) Make rxrpc_queue_conn() return immediately if it sees the active count is -1 rather then queuing the connection. (5) Make sure that the cleanup routine waits for the work item to complete. (6) Stash the rxrpc_net pointer in the conn struct so that the rcu free routine can use it, even if the local endpoint has been freed. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the connection work item is mostly going to go away with the main event work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Don't hold a ref for call timer or workqueueDavid Howells
Currently, rxrpc gives the call timer a ref on the call when it starts it and this is passed along to the workqueue by the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). Fix this by: (1) Don't give a ref to the timer. (2) Making the expiration function not do anything if the refcount is 0. Note that this is more of an optimisation. (3) Make sure that the cleanup routine waits for timer to complete. However, this has a consequence that timer cannot give a ref to the work item. Therefore the following fixes are also necessary: (4) Don't give a ref to the work item. (5) Make the work item return asap if it sees the ref count is 0. (6) Make sure that the cleanup routine waits for the work item to complete. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the call work item is going to go away with the work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: trace: Don't use __builtin_return_address for sk_buff tracingDavid Howells
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the sk_buff tracepoint. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Trace rxrpc_bundle refcountDavid Howells
Add a tracepoint for the rxrpc_bundle refcounting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracingDavid Howells
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_call tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracingDavid Howells
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracingDavid Howells
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracingDavid Howells
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_local tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Extract the code from a received ABORT packet much earlierDavid Howells
Extract the code from a received rx ABORT packet much earlier and in a single place and harmonise the responses to malformed ABORT packets. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2022-12-01rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundleDavid Howells
Remove the rxrpc_conn_parameters struct from the rxrpc_connection and rxrpc_bundle structs and emplace the members directly. These are going to get filled in from the rxrpc_call struct in future. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org