summaryrefslogtreecommitdiff
path: root/net/tls/tls_sw.c
AgeCommit message (Collapse)Author
2019-11-19net/tls: enable sk_msg redirect to tls socket egressWillem de Bruijn
Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket with TLS_TX takes the following path: tcp_bpf_sendmsg_redir tcp_bpf_push_locked tcp_bpf_push kernel_sendpage_locked sock->ops->sendpage_locked Also update the flags test in tls_sw_sendpage_locked to allow flag MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this. Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t Link: https://github.com/wdebruij/kerneltools/commits/icept.2 Fixes: 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP") Fixes: f3de19af0f5b ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net/tls: add a TX lockJakub Kicinski
TLS TX needs to release and re-acquire the socket lock if send buffer fills up. TLS SW TX path currently depends on only allowing one thread to enter the function by the abuse of sk_write_pending. If another writer is already waiting for memory no new ones are allowed in. This has two problems: - writers don't wake other threads up when they leave the kernel; meaning that this scheme works for single extra thread (second application thread or delayed work) because memory becoming available will send a wake up request, but as Mallesham and Pooja report with larger number of threads it leads to threads being put to sleep indefinitely; - the delayed work does not get _scheduled_ but it may _run_ when other writers are present leading to crashes as writers don't expect state to change under their feet (same records get pushed and freed multiple times); it's hard to reliably bail from the work, however, because the mere presence of a writer does not guarantee that the writer will push pending records before exiting. Ensuring wakeups always happen will make the code basically open code a mutex. Just use a mutex. The TLS HW TX path does not have any locking (not even the sk_write_pending hack), yet it uses a per-socket sg_tx_data array to push records. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: Mallesham Jatharakonda <mallesh537@gmail.com> Reported-by: Pooja Trivedi <poojatrivedi@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net/tls: don't pay attention to sk_write_pending when pushing partial recordsJakub Kicinski
sk_write_pending being not zero does not guarantee that partial record will be pushed. If the thread waiting for memory times out the pending record may get stuck. In case of tls_device there is no path where parial record is set and writer present in the first place. Partial record is set only in tls_push_sg() and tls_push_sg() will return an error immediately. All tls_device callers of tls_push_sg() will return (and not wait for memory) if it failed. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICEJakub Kicinski
TLS code has a number of #ifdefs which make the code a little harder to follow. Recent fixes removed the ifdef around the TLS_HW define, so we can switch to the often used pattern of defining tls_device functions as empty static inlines in the header when CONFIG_TLS_DEVICE=n. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net/tls: remove sock unlock/lock around strp_done()John Fastabend
The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove close callback sock unlock/lock around TX work flushJohn Fastabend
The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't arm strparser immediately in tls_set_sw_offload()Jakub Kicinski
In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07net/tls: fix poll ignoring partially copied recordsJakub Kicinski
David reports that RPC applications which use epoll() occasionally get stuck, and that TLS ULP causes the kernel to not wake applications, even though read() will return data. This is indeed true. The ctx->rx_list which holds partially copied records is not consulted when deciding whether socket is readable. Note that SO_RCVLOWAT with epoll() is and has always been broken for kernel TLS. We'd need to parse all records from the TCP layer, instead of just the first one. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12net: tls, correctly account for copied bytes with multiple sk_msgsJohn Fastabend
tls_sw_do_sendpage needs to return the total number of bytes sent regardless of how many sk_msgs are allocated. Unfortunately, copied (the value we return up the stack) is zero'd before each new sk_msg is allocated so we only return the copied size of the last sk_msg used. The caller (splice, etc.) of sendpage will then believe only part of its data was sent and send the missing chunks again. However, because the data actually was sent the receiver will get multiple copies of the same data. To reproduce this do multiple sendfile calls with a length close to the max record size. This will in turn call splice/sendpage, sendpage may use multiple sk_msg in this case and then returns the incorrect number of bytes. This will cause splice to resend creating duplicate data on the receiver. Andre created a C program that can easily generate this case so we will push a similar selftest for this to bpf-next shortly. The fix is to _not_ zero the copied field so that the total sent bytes is returned. Reported-by: Steinar H. Gunderson <steinar+kernel@gunderson.no> Reported-by: Andre Tomt <andre@tomt.net> Tested-by: Andre Tomt <andre@tomt.net> Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11net/tls: add kernel-driven TLS RX resyncJakub Kicinski
TLS offload device may lose sync with the TCP stream if packets arrive out of order. Drivers can currently request a resync at a specific TCP sequence number. When a record is found starting at that sequence number kernel will inform the device of the corresponding record number. This requires the device to constantly scan the stream for a known pattern (constant bytes of the header) after sync is lost. This patch adds an alternative approach which is entirely under the control of the kernel. Kernel tracks records it had to fully decrypt, even though TLS socket is in TLS_HW mode. If multiple records did not have any decrypted parts - it's a pretty strong indication that the device is out of sync. We choose the min number of fully encrypted records to be 2, which should hopefully be more than will get retransmitted at a time. After kernel decides the device is out of sync it schedules a resync request. If the TCP socket is empty the resync gets performed immediately. If socket is not empty we leave the record parser to resync when next record comes. Before resync in message parser we peek at the TCP socket and don't attempt the sync if the socket already has some of the next record queued. On resync failure (encrypted data continues to flow in) we retry with exponential backoff, up to once every 128 records (with a 16k record thats at most once every 2M of data). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11net/tls: rename handle_device_resync()Jakub Kicinski
handle_device_resync() doesn't describe the function very well. The function checks if resync should be issued upon parsing of a new record. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11net/tls: pass record number as a byte arrayJakub Kicinski
TLS offload code casts record number to a u64. The buffer should be aligned to 8 bytes, but its actually a __be64, and the rest of the TLS code treats it as big int. Make the offload callbacks take a byte array, drivers can make the choice to do the ugly cast if they want to. Prepare for copying the record number onto the stack by defining a constant for max size of the byte array. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04net/tls: don't pass version to tls_advance_record_sn()Jakub Kicinski
All callers pass prot->version as the last parameter of tls_advance_record_sn(), yet tls_advance_record_sn() itself needs a pointer to prot. Pass prot from callers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04net/tls: don't look for decrypted frames on non-offloaded socketsJakub Kicinski
If the RX config of a TLS socket is SW, there is no point iterating over the fragments and checking if frame is decrypted. It will always be fully encrypted. Note that in fully encrypted case the function doesn't actually touch any offload-related state, so it's safe to call for TLS_SW, today. Soon we will introduce code which can only be called for offloaded contexts. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net/tls: fix no wakeup on partial readsJakub Kicinski
When tls_sw_recvmsg() partially copies a record it pops that record from ctx->recv_pkt and places it on rx_list. Next iteration of tls_sw_recvmsg() reads from rx_list via process_rx_list() before it enters the decryption loop. If there is no more records to be read tls_wait_data() will put the process on the wait queue and got to sleep. This is incorrect, because some data was already copied in process_rx_list(). In case of RPC connections process may never get woken up, because peer also simply blocks in read(). I think this may also fix a similar issue when BPF is at play, because after __tcp_bpf_recvmsg() returns some data we subtract it from len and use continue to restart the loop, but len could have just reached 0, so again we'd sleep unnecessarily. That's added by: commit d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Tested-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-26net/tls: fix lowat calculation if some data came from previous recordJakub Kicinski
If some of the data came from the previous record, i.e. from the rx_list it had already been decrypted, so it's not counted towards the "decrypted" variable, but the "copied" variable. Take that into account when checking lowat. When calculating lowat target we need to pass the original len. E.g. if lowat is at 80, len is 100 and we had 30 bytes on rx_list target would currently be incorrectly calculated as 70, even though we only need 50 more bytes to make up the 80. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Tested-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09net/tls: handle errors from padding_length()Jakub Kicinski
At the time padding_length() is called the record header is still part of the message. If malicious TLS 1.3 peer sends an all-zero record padding_length() will stop at the record header, and return full length of the data including the tail_size. Subsequent subtraction of prot->overhead_size from rxm->full_len will cause rxm->full_len to turn negative. skb accessors, however, will always catch resulting out-of-bounds operation, so in practice this fix comes down to returning the correct error code. It also fixes a set but not used warning. This code was added by commit 130b392c6cd6 ("net: tls: Add tls 1.3 support"). CC: Dave Watson <davejwatson@fb.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-20net/tls: don't leak IV and record seq when offload failsJakub Kicinski
When device refuses the offload in tls_set_device_offload_rx() it calls tls_sw_free_resources_rx() to clean up software context state. Unfortunately, tls_sw_free_resources_rx() does not free all the state tls_set_sw_offload() allocated - it leaks IV and sequence number buffers. All other code paths which lead to tls_sw_release_resources_rx() (which tls_sw_free_resources_rx() calls) free those right before the call. Avoid the leak by moving freeing of iv and rec_seq into tls_sw_release_resources_rx(). Fixes: 4799ac81e52a ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10net/tls: don't leak partially sent record in device modeJakub Kicinski
David reports that tls triggers warnings related to sk->sk_forward_alloc not being zero at destruction time: WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110 WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170 When sender fills up the write buffer and dies from SIGPIPE. This is due to the device implementation not cleaning up the partially_sent_record. This is because commit a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") moved the partial record cleanup to the SW-only path. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29net: tls: prevent false connection termination with offloadJakub Kicinski
Only decrypt_internal() performs zero copy on rx, all paths which don't hit decrypt_internal() must set zc to false, otherwise tls_sw_recvmsg() may return 0 causing the application to believe that that connection got closed. Currently this happens with device offload when new record is first read from. Fixes: d069b780e367 ("tls: Fix tls_device receive") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21net/tls: Replace kfree_skb() with consume_skb()Vakul Garg
To free the skb in normal course of processing, consume_skb() should be used. Only for failure paths, skb_free() is intended to be used. https://www.kernel.org/doc/htmldocs/networking/API-consume-skb.html Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20net/tls: Add support of AES128-CCM based ciphersVakul Garg
Added support for AES128-CCM based record encryption. AES128-CCM is similar to AES128-GCM. Both of them have same salt/iv/mac size. The notable difference between the two is that while invoking AES128-CCM operation, the salt||nonce (which is passed as IV) has to be prefixed with a hardcoded value '2'. Further, CCM implementation in kernel requires IV passed in crypto_aead_request() to be full '16' bytes. Therefore, the record structure 'struct tls_rec' has been modified to reserve '16' bytes for IV. This works for both GCM and CCM based cipher. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix tls_device receiveBoris Pismenny
Currently, the receive function fails to handle records already decrypted by the device due to the commit mentioned below. This commit advances the TLS record sequence number and prepares the context to handle the next record. Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix mixing between async capable and asyncEran Ben Elisha
Today, tls_sw_recvmsg is capable of using asynchronous mode to handle application data TLS records. Moreover, it assumes that if the cipher can be handled asynchronously, then all packets will be processed asynchronously. However, this assumption is not always true. Specifically, for AES-GCM in TLS1.2, it causes data corruption, and breaks user applications. This patch fixes this problem by separating the async capability from the decryption operation result. Fixes: c0ab4732d4c6 ("net/tls: Do not use async crypto for non-data records") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03tls: Fix write space handlingBoris Pismenny
TLS device cannot use the sw context. This patch returns the original tls device write space handler and moves the sw/device specific portions to the relevant files. Also, we remove the write_space call for the tls_sw flow, because it handles partial records in its delayed tx work handler. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24tls: Return type of non-data records retrieved using MSG_PEEK in recvmsgVakul Garg
The patch enables returning 'type' in msghdr for records that are retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked from socket from getting clubbed with any other record of different type when records are subsequently dequeued from strparser. For each record, we now retain its type in sk_buff's control buffer cb[]. Inside control buffer, record's full length and offset are already stored by strparser in 'struct strp_msg'. We store record type after 'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is stored just after record dequeue. For tls1.3, the type is stored after record has been decrypted. Inside process_rx_list(), before processing a non-data record, we check that we must be able to return back the record type to the user application. If not, the decrypted records in tls context's rx_list is left there without consuming any data. Fixes: 692d7b5d1f912 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-19net/tls: Move protocol constants from cipher context to tls contextVakul Garg
Each tls context maintains two cipher contexts (one each for tx and rx directions). For each tls session, the constants such as protocol version, ciphersuite, iv size, associated data size etc are same for both the directions and need to be stored only once per tls context. Hence these are moved from 'struct cipher_context' to 'struct tls_prot_info' and stored only once in 'struct tls_context'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net/tls: Do not use async crypto for non-data recordsVakul Garg
Addition of tls1.3 support broke tls1.2 handshake when async crypto accelerator is used. This is because the record type for non-data records is not propagated to user application. Also when async decryption happens, the decryption does not stop when two different types of records get dequeued and submitted for decryption. To address it, we decrypt tls1.2 non-data records in synchronous way. We check whether the record we just processed has same type as the previous one before checking for async condition and jumping to dequeue next record. Fixes: 130b392c6cd6b ("net: tls: Add tls 1.3 support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-09net/tls: Disable async decrytion for tls1.3Vakul Garg
Function tls_sw_recvmsg() dequeues multiple records from stream parser and decrypts them. In case the decryption is done by async accelerator, the records may get submitted for decryption while the previous ones may not have been decryted yet. For tls1.3, the record type is known only after decryption. Therefore, for tls1.3, tls_sw_recvmsg() may submit records for decryption even if it gets 'handshake' records after 'data' records. These intermediate 'handshake' records may do a key updation. By the time new keys are given to ktls by userspace, it is possible that ktls has already submitted some records i(which are encrypted with new keys) for decryption using old keys. This would lead to decrypt failure. Therefore, async decryption of records should be disabled for tls1.3. Fixes: 130b392c6cd6b ("net: tls: Add tls 1.3 support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESSDave Watson
Currently we don't zerocopy if the crypto framework async bit is set. However some crypto algorithms (such as x86 AESNI) support async, but in the context of sendmsg, will never run asynchronously. Instead, check for actual EINPROGRESS return code before assuming algorithm is async. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Add tls 1.3 supportDave Watson
TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor control message handling on recvDave Watson
For TLS 1.3, the control message is encrypted. Handle control message checks after decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Refactor tls aad space size calculationDave Watson
TLS 1.3 has a different AAD size, use a variable in the code to make TLS 1.3 support easy. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net: tls: Support 256 bit keysDave Watson
Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-01-28net: tls: Fix deadlock in free_resources txDave Watson
If there are outstanding async tx requests (when crypto returns EINPROGRESS), there is a potential deadlock: the tx work acquires the lock, while we cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting for the work to complete. Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28net: tls: Save iv in tls_rec for async crypto requestsDave Watson
aead_request_set_crypt takes an iv pointer, and we change the iv soon after setting it. Some async crypto algorithms don't save the iv, so we need to save it in the tls_rec for async requests. Found by hardcoding x64 aesni to use async crypto manager (to test the async codepath), however I don't think this combination can happen in the wild. Presumably other hardware offloads will need this fix, but there have been no user reports. Fixes: a42055e8d2c30 ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17tls: Fix recvmsg() to be able to peek across multiple recordsVakul Garg
This fixes recvmsg() to be able to peek across multiple tls records. Without this patch, the tls's selftests test case 'recv_peek_large_buf_mult_recs' fails. Each tls receive context now maintains a 'rx_list' to retain incoming skb carrying tls records. If a tls record needs to be retained e.g. for peek case or for the case when the buffer passed to recvmsg() has a length smaller than decrypted record length, then it is added to 'rx_list'. Additionally, records are added in 'rx_list' if the crypto operation runs in async mode. The records are dequeued from 'rx_list' after the decrypted data is consumed by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK flag is used in recvmsg(), then records are not consumed or removed from the 'rx_list'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net/tls: Make function tls_sw_do_sendpage staticYueHaibing
Fixes the following sparse warning: net/tls/tls_sw.c:1023:5: warning: symbol 'tls_sw_do_sendpage' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-17net/tls: remove unused function tls_sw_sendpage_lockedYueHaibing
There are no in-tree callers. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-12-21tls: Do not call sk_memcopy_from_iter with zero lengthVakul Garg
In some conditions e.g. when tls_clone_plaintext_msg() returns -ENOSPC, the number of bytes to be copied using subsequent function sk_msg_memcopy_from_iter() becomes zero. This causes function sk_msg_memcopy_from_iter() to fail which in turn causes tls_sw_sendmsg() to return failure. To prevent it, do not call sk_msg_memcopy_from_iter() when number of bytes to copy (indicated by 'try_to_copy') is zero. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20bpf: sk_msg, sock{map|hash} redirect through ULPJohn Fastabend
A sockmap program that redirects through a kTLS ULP enabled socket will not work correctly because the ULP layer is skipped. This fixes the behavior to call through the ULP layer on redirect to ensure any operations required on the data stream at the ULP layer continue to be applied. To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid calling the BPF layer on a redirected message. This is required to avoid calling the BPF layer multiple times (possibly recursively) which is not the current/expected behavior without ULPs. In the future we may add a redirect flag if users _do_ want the policy applied again but this would need to work for both ULP and non-ULP sockets and be opt-in to avoid breaking existing programs. Also to avoid polluting the flag space with an internal flag we reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with MSG_WAITFORONE. Here WAITFORONE is specific to recv path and SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing to verify is user space API is masked correctly to ensure the flag can not be set by user. (Note this needs to be true regardless because we have internal flags already in-use that user space should not be able to set). But for completeness we have two UAPI paths into sendpage, sendfile and splice. In the sendfile case the function do_sendfile() zero's flags, ./fs/read_write.c: static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, size_t count, loff_t max) { ... fl = 0; #if 0 /* * We need to debate whether we can enable this or not. The * man page documents EAGAIN return for the output at least, * and the application is arguably buggy if it doesn't expect * EAGAIN on a non-blocking file descriptor. */ if (in.file->f_flags & O_NONBLOCK) fl = SPLICE_F_NONBLOCK; #endif file_start_write(out.file); retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl); } In the splice case the pipe_to_sendpage "actor" is used which masks flags with SPLICE_F_MORE. ./fs/splice.c: static int pipe_to_sendpage(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { ... more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; ... } Confirming what we expect that internal flags are in fact internal to socket side. Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-28bpf: helper to pop data from messagesJohn Fastabend
This adds a BPF SK_MSG program helper so that we can pop data from a msg. We use this to pop metadata from a previous push data call. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-01Merge branch 'work.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...