summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-12-09rxrpc: Fix initial resend timeoutDavid Howells
The constant for the initial resend timeout is in milliseconds, but the variable it's assigned to is in microseconds. Fix the constant to be in microseconds. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Fix the calculation and use of RTODavid Howells
Make the following changes to the calculation and use of RTO: (1) Fix rxrpc_resend() to use the backed-off RTO value obtained by calling rxrpc_get_rto_backoff() rather than extracting the value itself. Without this, it may retransmit packets too early. (2) The RTO value being similar to the RTT causes a lot of extraneous resends because the RTT doesn't end up taking account of clearing out of the receive queue on the server. Worse, responses to PING-ACKs are made as fast as possible and so are less than the DATA-requested-ACK RTT and so skew the RTT down. Fix this by putting a lower bound on the RTO by adding 100ms to it and limiting the lower end to 200ms. Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout") Fixes: 37473e416234 ("rxrpc: Clean up the resend algorithm") Signed-off-by: David Howells <dhowells@redhat.com> Suggested-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Display userStatus in rxrpc_rx_ack traceDavid Howells
Display the userStatus field from the Rx packet header in the rxrpc_rx_ack trace line. This is used for flow control purposes by FS.StoreData-type kafs RPC calls. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Adjust the rxrpc_rtt_rx tracepointDavid Howells
Adjust the rxrpc_rtt_rx tracepoint in the following ways: (1) Display the collected RTT sample in the rxrpc_rtt_rx trace. (2) Move the division of srtt by 8 to the TP_printk() rather doing it before invoking the trace point. (3) Display the min_rtt value. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Generate rtt_minDavid Howells
Generate rtt_min as this is required by RACK-TLP. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-27-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Don't use received skbuff timestampsDavid Howells
Don't use received skbuff timestamps, but rather set a timestamp when an ack is processed so that the time taken to get to rxrpc_input_ack() is included in the RTT. The timestamp of the latest ACK received is tracked in call->acks_latest_ts. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-26-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Store the DATA serial in the txqueue and use this in RTT calcDavid Howells
Store the serial number set on a DATA packet at the point of transmission in the rxrpc_txqueue struct and when an ACK is received, match the reference number in the ACK by trawling the txqueue rather than sharing an RTT table with ACK RTT. This can be done as part of Tx queue rotation. This means we have a lot more RTT samples available and is faster to search with all the serial numbers packed together into a few cachelines rather than being hung off different txbufs. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-25-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Use the new rxrpc_tx_queue struct to more efficiently process ACKsDavid Howells
With the change in the structure of the transmission buffer to store buffers in bunches of 32 or 64 (BITS_PER_LONG) we can place sets of per-buffer flags into the rxrpc_tx_queue struct rather than storing them in rxrpc_tx_buf, thereby vastly increasing efficiency when assessing the SACK table in an ACK packet. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-24-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Adjust names and types of congestion-related fieldsDavid Howells
Adjust some of the names of fields and constants to make them look a bit more like the TCP congestion symbol names, such as flight_size -> in_flight and congest_mode to ca_state. Move the persistent congestion-related fields from the rxrpc_ack_summary struct into the rxrpc_call struct rather than copying them out and back in again. The rxrpc_congest tracepoint can fetch them from the call struct. Rename the counters for soft acks and nacks to have an 's' on the front to reflect the softness, e.g. nr_acks -> nr_sacks. Make fields counting numbers of packets or numbers of acks u16 rather than u8 to allow for windows of up to 8192 DATA packets in flight in future. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-23-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Display stats about jumbo packets transmitted and receivedDavid Howells
In /proc/net/rxrpc/stats, display statistics about the numbers of different sizes of jumbo packets transmitted and received, showing counts for 1 subpacket (ie. a non-jumbo packet), 2 subpackets, 3, ... to 8 and then 9+. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-22-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Replace call->acks_first_seq with tracking of the hard ACK pointDavid Howells
Replace the call->acks_first_seq variable (which holds ack.firstPacket from the latest ACK packet and indicates the sequence number of the first ack slot in the SACK table) with call->acks_hard_ack which will hold the highest sequence hard ACK'd. This is 1 less than call->acks_first_seq, but it fits in the same schema as the other tracking variables which hold the sequence of a packet, not one past it. This will fix the rxrpc_congest tracepoint's calculation of SACK window size which shows one fewer than it should - and will occasionally go to -1. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-21-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: call->acks_hard_ack is now the same call->tx_bottom, so remove itDavid Howells
Now that packets are removed from the Tx queue in the rotation function rather than being cleaned up later, call->acks_hard_ack now advances in step with call->tx_bottom, so remove it. Some of the places call->acks_hard_ack is used in the rxrpc tracepoints are replaced by call->acks_first_seq instead as that's the peer's reported idea of the hard-ACK point. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-20-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Implement progressive transmission queue structDavid Howells
We need to scan the buffers in the transmission queue occasionally when processing ACKs, but the transmission queue is currently a linked list of transmission buffers which, when we eventually expand the Tx window to 8192 packets will be very slow to walk. Instead, pull the fields we need to examine a lot (last sent time, retransmitted flag) into a new struct rxrpc_txqueue and make each one hold an array of 32 or 64 packets. The transmission queue is then a list of these structs, each pointing to a contiguous set of packets. Scanning is then a lot faster as the flags and timestamps are concentrated in the CPU dcache. The transmission timestamps are stored as a number of microseconds from a base ktime to reduce memory requirements. This should be fine provided we manage to transmit an entire buffer within an hour. This will make implementing RACK-TLP [RFC8985] easier as it will be less costly to scan the transmission buffers. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-19-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Don't need barrier for ->tx_bottom and ->acks_hard_ackDavid Howells
We don't need a barrier for the ->tx_bottom value (which indicates the lowest sequence still in the transmission queue) and the ->acks_hard_ack value (which tracks the DATA packets hard-ack'd by the latest ACK packet received and thus indicates which DATA packets can now be discarded) as the app thread doesn't use either value as a reference to memory to access. Rather, the app thread merely uses these as a guide to how much space is available in the transmission queue Change the code to use READ/WRITE_ONCE() instead. Also, change rxrpc_check_tx_space() to use the same value for tx_bottom throughout. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-18-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Timestamp DATA packets before transmitting themDavid Howells
Move to setting the timestamp on DATA packets before transmitting them as part of the preparation. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-17-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Only set DF=1 on initial DATA transmissionDavid Howells
Change how the DF flag is managed on DATA transmissions. Set it on initial transmission and don't set it on retransmissions. Then remove the handling for EMSGSIZE in rxrpc_send_data_packet() and just pretend it didn't happen, leaving it to the retransmission path to retry. The path-MTU discovery using PING ACKs is then used to probe for the maximum DATA size - though notification by ICMP will be used if one is received. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Fix injection of packet lossDavid Howells
Fix the code that injects packet loss for testing to make sure call->tx_transmitted is updated. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-15-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Fix CPU time starvation in I/O threadDavid Howells
Starvation can happen in the rxrpc I/O thread because it goes back to the top of the I/O loop after it does any one thing without trying to give any other connection or call CPU time. Also, because it processes one call packet at a time, it tries to do the retransmission loop after each ACK without checking to see if there are other ACKs already in the queue that can update the SACK state. Fix this by: (1) Add a received-packet queue on each call. (2) Distribute packets from the master Rx queue to the individual call, conn and error queues and 'poking' calls to add them to the attend queue first thing in the I/O thread. (3) Go through all the attention-seeking connections and calls before going back to the top of the I/O thread. Each queue is extracted as a whole and then gone through so that new additions to insert themselves into the queue. (4) Make the call event handler go through all the packets currently on the call's rx_queue before transmitting and retransmitting DATA packets. (5) Drop the skb argument from the call event handler as this is now replaced with the rx_queue. Instead, keep track of whether we received a packet or an ACK for the tests that used to rely on that. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-14-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Add a tracepoint to show variables pertinent to jumbo packet sizeDavid Howells
Add a tracepoint to be called right before packets are transmitted for the first time that shows variable values that are pertinent to how many subpackets will be added to a jumbo DATA packet. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-13-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Prepare to be able to send jumbo DATA packetsDavid Howells
Prepare to be able to send jumbo DATA packets if the we decide to, but don't enable that yet. This will allow larger chunks of data to be sent without reducing the retryability as the subpackets in a jumbo packet can also be retransmitted individually. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-12-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Separate the packet length from the data length in rxrpc_txbufDavid Howells
Separate the packet length from the data length (txb->len) stored in the rxrpc_txbuf to make security calculations easier. Also store the allocation size as that's an upper bound on the size of the security wrapper and change a number of fields to unsigned short as the amount of data can't exceed the capacity of a UDP packet. Also, whilst we're at it, use kzalloc() for txbufs. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-11-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Implement path-MTU probing using padded PING ACKs (RFC8899)David Howells
Implement path-MTU probing (along the lines of RFC8899) by padding some of the PING ACKs we send. PING ACKs get their own individual responses quite apart from the acking of data (though, as ACKs, they fulfil that role also). The probing concentrates on packet sizes that correspond how many subpackets can be stuffed inside a jumbo packet as jumbo DATA packets are just aggregations of individual DATA packets and can be split easily for retransmission purposes. If we want to perform probing, we advertise this by setting the maximum number of jumbo subpackets to 0 in the ack trailer when we send an ACK and see if the peer is also advertising the service. This is interpreted by non-supporting Rx stacks as an indication that jumbo packets aren't supported. The MTU sizes advertised in the ACK trailer AF_RXRPC transmits are pegged at a maximum of 1444 unless pmtud is supported by both sides. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-10-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Use a large kvec[] in rxrpc_local rather than every rxrpc_txbufDavid Howells
Use a single large kvec[] in the rxrpc_local struct rather than one in every rxrpc_txbuf struct to build large packets to save on memory. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-9-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Request an ACK on impending Tx stallDavid Howells
Set the REQUEST-ACK flag on the DATA packet we're about to send if we're about to stall transmission because the app layer isn't keeping up supplying us with data to transmit. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-8-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Show stats counter for received reason-0 ACKsDavid Howells
In /proc/net/rxrpc/stats, show the stats counter for received ACKs that have the reason code set to 0 as some implementations do this. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Don't set the MORE-PACKETS rxrpc wire header flagDavid Howells
The MORE-PACKETS rxrpc header flag hasn't actually been looked at by anything since 1988 and not all implementations generate it. Change rxrpc so that it doesn't set MORE-PACKETS at all rather than setting it inconsistently. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Clean up Tx header flags generation handlingDavid Howells
Clean up the generation of the header flags when building packet headers for transmission: (1) Assemble the flags in a local variable rather than in the txb->flags. (2) Do the flags masking and JUMBO-PACKET setting in one bit of code for both the main header and the jumbo headers. (3) Generate the REQUEST-ACK flag afresh each time. There's a possibility we might want to do jumbo retransmission packets in future. (4) Pass the local flags variable to the rxrpc_tx_data tracepoint rather than the combination of the txb flags and the wire header flags (the latter belong only to the first subpacket). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Use umin() and umax() rather than min_t()/max_t() where possibleDavid Howells
Use umin() and umax() rather than min_t()/max_t() where the type specified is an unsigned type. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09rxrpc: Fix handling of received connection abortDavid Howells
Fix the handling of a connection abort that we've received. Though the abort is at the connection level, it needs propagating to the calls on that connection. Whilst the propagation bit is performed, the calls aren't then woken up to go and process their termination, and as no further input is forthcoming, they just hang. Also add some tracing for the logging of connection aborts. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09ktime: Add us_to_ktime()David Howells
Add a us_to_ktime() helper to go with ms_to_ktime() and ns_to_ktime(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Thomas Gleixner <tglx@linutronix.de> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20241204074710.990092-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-09smb3: fix compiler warning in reparse codeSteve French
utf8s_to_utf16s() specifies pwcs as a wchar_t pointer (whether big endian or little endian is passed in as an additional parm), so to remove a distracting compile warning it needs to be cast as (wchar_t *) in parse_reparse_wsl_symlink() as done by other callers. Fixes: 06a7adf318a3 ("cifs: Add support for parsing WSL-style symlinks") Reviewed-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-12-09ACPI: resource: Fix memory resource type union accessIlpo Järvinen
In acpi_decode_space() addr->info.mem.caching is checked on main level for any resource type but addr->info.mem is part of union and thus valid only if the resource type is memory range. Move the check inside the preceeding switch/case to only execute it when the union is of correct type. Fixes: fcb29bbcd540 ("ACPI: Add prefetch decoding to the address space parser") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20241202100614.20731-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-09tools/hv: reduce resource usage in hv_kvp_daemonOlaf Hering
hv_kvp_daemon uses popen(3) and system(3) as convinience helper to launch external helpers. These helpers are invoked via a temporary shell process. There is no need to keep this temporary process around while the helper runs. Replace this temporary shell with the actual helper process via 'exec'. Signed-off-by: Olaf Hering <olaf@aepfle.de> Link: https://lore.kernel.org/linux-hyperv/20241202123520.27812-1-olaf@aepfle.de/ Signed-off-by: Wei Liu <wei.liu@kernel.org>
2024-12-09tools/hv: add a .gitignore fileOlaf Hering
Remove generated files from 'git status' output after 'make -C tools/hv'. Signed-off-by: Olaf Hering <olaf@aepfle.de> Link: https://lore.kernel.org/r/20241202124107.28650-1-olaf@aepfle.de Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241202124107.28650-1-olaf@aepfle.de>
2024-12-09tools/hv: reduce resouce usage in hv_get_dns_info helperOlaf Hering
Remove the usage of cat. Replace the shell process with awk with 'exec'. Also use a generic shell because no bash specific features will be used. Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20241202120432.21115-1-olaf@aepfle.de Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241202120432.21115-1-olaf@aepfle.de>
2024-12-09hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as wellVitaly Kuznetsov
The reference implementation of hv_get_dns_info which is in the tree uses /etc/resolv.conf to get DNS servers and this does not require to know which NIC is queried. Distro specific implementations, however, may want to provide per-NIC, fine grained information. E.g. NetworkManager keeps track of DNS servers per connection. Similar to hv_get_dhcp_info, pass NIC name as a parameter to hv_get_dns_info script. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241112150401.217094-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241112150401.217094-1-vkuznets@redhat.com>
2024-12-09Drivers: hv: util: Avoid accessing a ringbuffer not initialized yetMichael Kelley
If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is fully initialized, we can hit the panic below: hv_utils: Registering HyperV Utility Driver hv_vmbus: registering driver hv_utils ... BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1 RIP: 0010:hv_pkt_iter_first+0x12/0xd0 Call Trace: ... vmbus_recvpacket hv_kvp_onchannelcallback vmbus_on_event tasklet_action_common tasklet_action handle_softirqs irq_exit_rcu sysvec_hyperv_stimer0 </IRQ> <TASK> asm_sysvec_hyperv_stimer0 ... kvp_register_done hvt_op_read vfs_read ksys_read __x64_sys_read This can happen because the KVP/VSS channel callback can be invoked even before the channel is fully opened: 1) as soon as hv_kvp_init() -> hvutil_transport_init() creates /dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and register itself to the driver by writing a message KVP_OP_REGISTER1 to the file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and reading the file for the driver's response, which is handled by hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done(). 2) the problem with kvp_register_done() is that it can cause the channel callback to be called even before the channel is fully opened, and when the channel callback is starting to run, util_probe()-> vmbus_open() may have not initialized the ringbuffer yet, so the callback can hit the panic of NULL pointer dereference. To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in __vmbus_open(), just before the first hv_ringbuffer_init(), and then we unload and reload the driver hv_utils, and run the daemon manually within the 10 seconds. Fix the panic by reordering the steps in util_probe() so the char dev entry used by the KVP or VSS daemon is not created until after vmbus_open() has completed. This reordering prevents the race condition from happening. Reported-by: Dexuan Cui <decui@microsoft.com> Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20241106154247.2271-3-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241106154247.2271-3-mhklinux@outlook.com>
2024-12-09Drivers: hv: util: Don't force error code to ENODEV in util_probe()Michael Kelley
If the util_init function call in util_probe() returns an error code, util_probe() always return ENODEV, and the error code from the util_init function is lost. The error message output in the caller, vmbus_probe(), doesn't show the real error code. Fix this by just returning the error code from the util_init function. There doesn't seem to be a reason to force ENODEV, as other errors such as ENOMEM can already be returned from util_probe(). And the code in call_driver_probe() implies that ENODEV should mean that a matching driver wasn't found, which is not the case here. Suggested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20241106154247.2271-2-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241106154247.2271-2-mhklinux@outlook.com>
2024-12-09tools/hv: terminate fcopy daemon if read from uio failsOlaf Hering
Terminate endless loop in reading fails, to avoid flooding syslog. This happens if the state of "Guest services" integration service is changed from "enabled" to "disabled" at runtime in the VM settings. In this case pread returns EIO. Also handle an interrupted system call, and continue in this case. Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20241105081437.15689-1-olaf@aepfle.de Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241105081437.15689-1-olaf@aepfle.de>
2024-12-09drivers: hv: Convert open-coded timeouts to secs_to_jiffies()Easwar Hariharan
We have several places where timeouts are open-coded as N (seconds) * HZ, but best practice is to use the utility functions from jiffies.h. Convert the timeouts to be compliant. This doesn't fix any bugs, it's a simple code improvement. Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20241030-open-coded-timeouts-v3-2-9ba123facf88@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241030-open-coded-timeouts-v3-2-9ba123facf88@linux.microsoft.com>
2024-12-09tools: hv: change permissions of NetworkManager configuration fileOlaf Hering
Align permissions of the resulting .nmconnection file, instead of the input file from hv_kvp_daemon. To avoid the tiny time frame where the output file is world-readable, use umask instead of chmod. Fixes: 42999c904612 ("hv/hv_kvp_daemon:Support for keyfile based connection profile") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/20241016143521.3735-1-olaf@aepfle.de Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241016143521.3735-1-olaf@aepfle.de>
2024-12-09x86/hyperv: Fix hv tsc page based sched_clock for hibernationNaman Jain
read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is bigger than the variable hv_sched_clock_offset, which is cached during early boot, but depending on the timing this assumption may be false when a hibernated VM starts again (the clock counter starts from 0 again) and is resuming back (Note: hv_init_tsc_clocksource() is not called during hibernation/resume); consequently, read_hv_sched_clock_tsc() may return a negative integer (which is interpreted as a huge positive integer since the return type is u64) and new kernel messages are prefixed with huge timestamps before read_hv_sched_clock_tsc() grows big enough (which typically takes several seconds). Fix the issue by saving the Hyper-V clock counter just before the suspend, and using it to correct the hv_sched_clock_offset in resume. This makes hv tsc page based sched_clock continuous and ensures that post resume, it starts from where it left off during suspend. Override x86_platform.save_sched_clock_state and x86_platform.restore_sched_clock_state routines to correct this as soon as possible. Note: if Invariant TSC is available, the issue doesn't happen because 1) we don't register read_hv_sched_clock_tsc() for sched clock: See commit e5313f1c5404 ("clocksource/drivers/hyper-v: Rework clocksource and sched clock setup"); 2) the common x86 code adjusts TSC similarly: see __restore_processor_state() -> tsc_verify_tsc_adjust(true) and x86_platform.restore_sched_clock_state(). Cc: stable@vger.kernel.org Fixes: 1349401ff1aa ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation") Co-developed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20240917053917.76787-1-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240917053917.76787-1-namjain@linux.microsoft.com>
2024-12-09tools: hv: Fix a complier warning in the fcopy uio daemonDexuan Cui
hv_fcopy_uio_daemon.c:436:53: warning: '%s' directive output may be truncated writing up to 14 bytes into a region of size 10 [-Wformat-truncation=] 436 | snprintf(uio_dev_path, sizeof(uio_dev_path), "/dev/%s", uio_name); Also added 'static' for the array 'desc[]'. Fixes: 82b0945ce2c2 ("tools: hv: Add new fcopy application based on uio driver") Cc: stable@vger.kernel.org # 6.10+ Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20240910004433.50254-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240910004433.50254-1-decui@microsoft.com>
2024-12-09Merge tag 'ath-next-20241209' of ↵Kalle Valo
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath ath.git patches for v6.14 This development cycle featured multiple patchsets to ath12k to support the new 802.11be MLO feature, although the feature is still incomplete. Also in ath12k, there were other feature patches. In ath11k, support was added for QCA6698AQ. And there was the usual set of bug fixes and cleanups across most drivers, notable being the addition of "noinline_for_stack" to some functions to avoid "stack frame size" warnings when compiling with clang.
2024-12-09Merge tag 'locking_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Remove if_not_guard() as it is generating incorrect code - Fix the initialization of the fake lockdep_map for the first locked ww_mutex * tag 'locking_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: headers/cleanup.h: Remove the if_not_guard() facility locking/ww_mutex: Fix ww_mutex dummy lockdep map selftest warnings
2024-12-09Merge tag 'perf_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Borislav Petkov: - Make sure the PEBS buffer is drained before reconfiguring the hardware - Add Arrow Lake U support * tag 'perf_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/ds: Unconditionally drain PEBS DS when changing PEBS_DATA_CFG perf/x86/intel: Add Arrow Lake U support
2024-12-09Merge tag 'sched_urgent_for_v6.13_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Remove wrong enqueueing of a task for a later wakeup when a task blocks on a RT mutex - Do not setup a new deadline entity on a boosted task as that has happened already - Update preempt= kernel command line param - Prevent needless softirqd wakeups in the idle task's context - Detect the case where the idle load balancer CPU becomes busy and avoid unnecessary load balancing invocation - Remove an unnecessary load balancing need_resched() call in nohz_csd_func() - Allow for raising of SCHED_SOFTIRQ softirq type on RT but retain the warning to catch any other cases - Remove a wrong warning when a cpuset update makes the task affinity no longer a subset of the cpuset * tag 'sched_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex sched/deadline: Fix warning in migrate_enable for boosted tasks sched/core: Update kernel boot parameters for LAZY preempt. sched/core: Prevent wakeup of ksoftirqd during idle load balance sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy sched/core: Remove the unnecessary need_resched() check in nohz_csd_func() softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel sched: fix warning in sched_setaffinity sched/deadline: Fix replenish_dl_new_period dl_server condition
2024-12-09x86: Fix build regression with CONFIG_KEXEC_JUMP enabledDamien Le Moal
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error: ld: vmlinux.o: in function `virtual_mapped': linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc' when CONFIG_KEXEC_JUMP is enabled. This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec") which introduced a use of saved_context_gdt_desc without a declaration for it. Fix that by including asm/asm-offsets.h where saved_context_gdt_desc is defined (indirectly in include/generated/asm-offsets.h which asm/asm-offsets.h includes). Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-09futex: fix user access on powerpcLinus Torvalds
The powerpc user access code is special, and unlike other architectures distinguishes between user access for reading and writing. And commit 43a43faf5376 ("futex: improve user space accesses") messed that up. It went undetected elsewhere, but caused ppc32 to fail early during boot, because the user access had been started with user_read_access_begin(), but then finished off with just a plain "user_access_end()". Note that the address-masking user access helpers don't even have that read-vs-write distinction, so if powerpc ever wants to do address masking tricks, we'll have to do some extra work for it. [ Make sure to also do it for the EFAULT case, as pointed out by Christophe Leroy ] Reported-by: Andreas Schwab <schwab@linux-m68k.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/all/87bjxl6b0i.fsf@igel.home/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-09wifi: mwifiex: decrease timeout waiting for host sleep from 10s to 5sPin-yen Lin
In commit 52250cbee7f6 ("mwifiex: use timeout variant for wait_event_interruptible") it was noted that sometimes we seemed to miss the signal that our host sleep settings took effect. A 10 second timeout was added to the code to make sure we didn't hang forever waiting. It appears that this problem still exists and we hit the timeout sometimes for Chromebooks in the field. Recently on ChromeOS we've started setting the DPM watchdog to trip if full system suspend takes over 10 seconds. Given the timeout in the original patch, obviously we're hitting the DPM watchdog before mwifiex gets a chance to timeout. While we could increase the DPM watchdog in ChromeOS to avoid this problem, it's probably better to simply decrease the timeout. Any time we're waiting several seconds for the firmware to respond it's likely that the firmware won't ever respond. With that in mind, decrease the timeout in mwifiex from 10 seconds to 5 seconds. Suggested-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Pin-yen Lin <treapking@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20241127105709.4014302-1-treapking@chromium.org