From b0d8d4363e523e952254619ae24dd0dfd7ea1181 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 21 May 2019 18:57:12 -0700 Subject: Documentation: net: move device drivers docs to a submenu Some of the device drivers have really long document titles making the networking table of contents hard to look through. Place vendor drivers under a submenu. Signed-off-by: Jakub Kicinski Acked-by: Dave Watson Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller --- Documentation/networking/device_drivers/index.rst | 30 +++++++++++++++++++++++ Documentation/networking/index.rst | 14 +---------- 2 files changed, 31 insertions(+), 13 deletions(-) create mode 100644 Documentation/networking/device_drivers/index.rst (limited to 'Documentation') diff --git a/Documentation/networking/device_drivers/index.rst b/Documentation/networking/device_drivers/index.rst new file mode 100644 index 000000000000..75fa537763a4 --- /dev/null +++ b/Documentation/networking/device_drivers/index.rst @@ -0,0 +1,30 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +Vendor Device Drivers +===================== + +Contents: + +.. toctree:: + :maxdepth: 2 + + freescale/dpaa2/index + intel/e100 + intel/e1000 + intel/e1000e + intel/fm10k + intel/igb + intel/igbvf + intel/ixgb + intel/ixgbe + intel/ixgbevf + intel/i40e + intel/iavf + intel/ice + +.. only:: subproject + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index f390fe3cfdfb..7a2bfad6a762 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -11,19 +11,7 @@ Contents: batman-adv can can_ucan_protocol - device_drivers/freescale/dpaa2/index - device_drivers/intel/e100 - device_drivers/intel/e1000 - device_drivers/intel/e1000e - device_drivers/intel/fm10k - device_drivers/intel/igb - device_drivers/intel/igbvf - device_drivers/intel/ixgb - device_drivers/intel/ixgbe - device_drivers/intel/ixgbevf - device_drivers/intel/i40e - device_drivers/intel/iavf - device_drivers/intel/ice + device_drivers/index dsa/index devlink-info-versions ieee802154 -- cgit From f3c0f3c6c2013e6caa7ab9c3c6a9fb12f6832c43 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 21 May 2019 18:57:13 -0700 Subject: Documentation: tls: RSTify the ktls documentation Convert the TLS doc to RST. Use C code blocks for the code samples, and mark hyperlinks. Signed-off-by: Jakub Kicinski Acked-by: Dave Watson Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller --- Documentation/networking/index.rst | 1 + Documentation/networking/tls.rst | 213 +++++++++++++++++++++++++++++++++++++ Documentation/networking/tls.txt | 197 ---------------------------------- 3 files changed, 214 insertions(+), 197 deletions(-) create mode 100644 Documentation/networking/tls.rst delete mode 100644 Documentation/networking/tls.txt (limited to 'Documentation') diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 7a2bfad6a762..f0f97eef091c 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -28,6 +28,7 @@ Contents: checksum-offloads segmentation-offloads scaling + tls .. only:: subproject diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst new file mode 100644 index 000000000000..482bd73f18a2 --- /dev/null +++ b/Documentation/networking/tls.rst @@ -0,0 +1,213 @@ +========== +Kernel TLS +========== + +Overview +======== + +Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over +TCP. TLS provides end-to-end data integrity and confidentiality. + +User interface +============== + +Creating a TLS connection +------------------------- + +First create a new TCP socket and set the TLS ULP. + +.. code-block:: c + + sock = socket(AF_INET, SOCK_STREAM, 0); + setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); + +Setting the TLS ULP allows us to set/get TLS socket options. Currently +only the symmetric encryption is handled in the kernel. After the TLS +handshake is complete, we have all the parameters required to move the +data-path to the kernel. There is a separate socket option for moving +the transmit and the receive into the kernel. + +.. code-block:: c + + /* From linux/tls.h */ + struct tls_crypto_info { + unsigned short version; + unsigned short cipher_type; + }; + + struct tls12_crypto_info_aes_gcm_128 { + struct tls_crypto_info info; + unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; + unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; + unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; + unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; + }; + + + struct tls12_crypto_info_aes_gcm_128 crypto_info; + + crypto_info.info.version = TLS_1_2_VERSION; + crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; + memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE); + memcpy(crypto_info.rec_seq, seq_number_write, + TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); + memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE); + memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE); + + setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)); + +Transmit and receive are set separately, but the setup is the same, using either +TLS_TX or TLS_RX. + +Sending TLS application data +---------------------------- + +After setting the TLS_TX socket option all application data sent over this +socket is encrypted using TLS and the parameters provided in the socket option. +For example, we can send an encrypted hello world record as follows: + +.. code-block:: c + + const char *msg = "hello world\n"; + send(sock, msg, strlen(msg)); + +send() data is directly encrypted from the userspace buffer provided +to the encrypted kernel send buffer if possible. + +The sendfile system call will send the file's data over TLS records of maximum +length (2^14). + +.. code-block:: c + + file = open(filename, O_RDONLY); + fstat(file, &stat); + sendfile(sock, file, &offset, stat.st_size); + +TLS records are created and sent after each send() call, unless +MSG_MORE is passed. MSG_MORE will delay creation of a record until +MSG_MORE is not passed, or the maximum record size is reached. + +The kernel will need to allocate a buffer for the encrypted data. +This buffer is allocated at the time send() is called, such that +either the entire send() call will return -ENOMEM (or block waiting +for memory), or the encryption will always succeed. If send() returns +-ENOMEM and some data was left on the socket buffer from a previous +call using MSG_MORE, the MSG_MORE data is left on the socket buffer. + +Receiving TLS application data +------------------------------ + +After setting the TLS_RX socket option, all recv family socket calls +are decrypted using TLS parameters provided. A full TLS record must +be received before decryption can happen. + +.. code-block:: c + + char buffer[16384]; + recv(sock, buffer, 16384); + +Received data is decrypted directly in to the user buffer if it is +large enough, and no additional allocations occur. If the userspace +buffer is too small, data is decrypted in the kernel and copied to +userspace. + +``EINVAL`` is returned if the TLS version in the received message does not +match the version passed in setsockopt. + +``EMSGSIZE`` is returned if the received message is too big. + +``EBADMSG`` is returned if decryption failed for any other reason. + +Send TLS control messages +------------------------- + +Other than application data, TLS has control messages such as alert +messages (record type 21) and handshake messages (record type 22), etc. +These messages can be sent over the socket by providing the TLS record type +via a CMSG. For example the following function sends @data of @length bytes +using a record of type @record_type. + +.. code-block:: c + + /* send TLS control message using record_type */ + static int klts_send_ctrl_message(int sock, unsigned char record_type, + void *data, size_t length) + { + struct msghdr msg = {0}; + int cmsg_len = sizeof(record_type); + struct cmsghdr *cmsg; + char buf[CMSG_SPACE(cmsg_len)]; + struct iovec msg_iov; /* Vector of data to send/receive into. */ + + msg.msg_control = buf; + msg.msg_controllen = sizeof(buf); + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_TLS; + cmsg->cmsg_type = TLS_SET_RECORD_TYPE; + cmsg->cmsg_len = CMSG_LEN(cmsg_len); + *CMSG_DATA(cmsg) = record_type; + msg.msg_controllen = cmsg->cmsg_len; + + msg_iov.iov_base = data; + msg_iov.iov_len = length; + msg.msg_iov = &msg_iov; + msg.msg_iovlen = 1; + + return sendmsg(sock, &msg, 0); + } + +Control message data should be provided unencrypted, and will be +encrypted by the kernel. + +Receiving TLS control messages +------------------------------ + +TLS control messages are passed in the userspace buffer, with message +type passed via cmsg. If no cmsg buffer is provided, an error is +returned if a control message is received. Data messages may be +received without a cmsg buffer set. + +.. code-block:: c + + char buffer[16384]; + char cmsg[CMSG_SPACE(sizeof(unsigned char))]; + struct msghdr msg = {0}; + msg.msg_control = cmsg; + msg.msg_controllen = sizeof(cmsg); + + struct iovec msg_iov; + msg_iov.iov_base = buffer; + msg_iov.iov_len = 16384; + + msg.msg_iov = &msg_iov; + msg.msg_iovlen = 1; + + int ret = recvmsg(sock, &msg, 0 /* flags */); + + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + if (cmsg->cmsg_level == SOL_TLS && + cmsg->cmsg_type == TLS_GET_RECORD_TYPE) { + int record_type = *((unsigned char *)CMSG_DATA(cmsg)); + // Do something with record_type, and control message data in + // buffer. + // + // Note that record_type may be == to application data (23). + } else { + // Buffer contains application data. + } + +recv will never return data from mixed types of TLS records. + +Integrating in to userspace TLS library +--------------------------------------- + +At a high level, the kernel TLS ULP is a replacement for the record +layer of a userspace TLS library. + +A patchset to OpenSSL to use ktls as the record layer is +`here `_. + +`An example `_ +of calling send directly after a handshake using gnutls. +Since it doesn't implement a full record layer, control +messages are not supported. diff --git a/Documentation/networking/tls.txt b/Documentation/networking/tls.txt deleted file mode 100644 index 58b5ef75f1b7..000000000000 --- a/Documentation/networking/tls.txt +++ /dev/null @@ -1,197 +0,0 @@ -Overview -======== - -Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over -TCP. TLS provides end-to-end data integrity and confidentiality. - -User interface -============== - -Creating a TLS connection -------------------------- - -First create a new TCP socket and set the TLS ULP. - - sock = socket(AF_INET, SOCK_STREAM, 0); - setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); - -Setting the TLS ULP allows us to set/get TLS socket options. Currently -only the symmetric encryption is handled in the kernel. After the TLS -handshake is complete, we have all the parameters required to move the -data-path to the kernel. There is a separate socket option for moving -the transmit and the receive into the kernel. - - /* From linux/tls.h */ - struct tls_crypto_info { - unsigned short version; - unsigned short cipher_type; - }; - - struct tls12_crypto_info_aes_gcm_128 { - struct tls_crypto_info info; - unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; - unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; - unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; - unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; - }; - - - struct tls12_crypto_info_aes_gcm_128 crypto_info; - - crypto_info.info.version = TLS_1_2_VERSION; - crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128; - memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE); - memcpy(crypto_info.rec_seq, seq_number_write, - TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE); - memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE); - memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE); - - setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)); - -Transmit and receive are set separately, but the setup is the same, using either -TLS_TX or TLS_RX. - -Sending TLS application data ----------------------------- - -After setting the TLS_TX socket option all application data sent over this -socket is encrypted using TLS and the parameters provided in the socket option. -For example, we can send an encrypted hello world record as follows: - - const char *msg = "hello world\n"; - send(sock, msg, strlen(msg)); - -send() data is directly encrypted from the userspace buffer provided -to the encrypted kernel send buffer if possible. - -The sendfile system call will send the file's data over TLS records of maximum -length (2^14). - - file = open(filename, O_RDONLY); - fstat(file, &stat); - sendfile(sock, file, &offset, stat.st_size); - -TLS records are created and sent after each send() call, unless -MSG_MORE is passed. MSG_MORE will delay creation of a record until -MSG_MORE is not passed, or the maximum record size is reached. - -The kernel will need to allocate a buffer for the encrypted data. -This buffer is allocated at the time send() is called, such that -either the entire send() call will return -ENOMEM (or block waiting -for memory), or the encryption will always succeed. If send() returns --ENOMEM and some data was left on the socket buffer from a previous -call using MSG_MORE, the MSG_MORE data is left on the socket buffer. - -Receiving TLS application data ------------------------------- - -After setting the TLS_RX socket option, all recv family socket calls -are decrypted using TLS parameters provided. A full TLS record must -be received before decryption can happen. - - char buffer[16384]; - recv(sock, buffer, 16384); - -Received data is decrypted directly in to the user buffer if it is -large enough, and no additional allocations occur. If the userspace -buffer is too small, data is decrypted in the kernel and copied to -userspace. - -EINVAL is returned if the TLS version in the received message does not -match the version passed in setsockopt. - -EMSGSIZE is returned if the received message is too big. - -EBADMSG is returned if decryption failed for any other reason. - -Send TLS control messages -------------------------- - -Other than application data, TLS has control messages such as alert -messages (record type 21) and handshake messages (record type 22), etc. -These messages can be sent over the socket by providing the TLS record type -via a CMSG. For example the following function sends @data of @length bytes -using a record of type @record_type. - -/* send TLS control message using record_type */ - static int klts_send_ctrl_message(int sock, unsigned char record_type, - void *data, size_t length) - { - struct msghdr msg = {0}; - int cmsg_len = sizeof(record_type); - struct cmsghdr *cmsg; - char buf[CMSG_SPACE(cmsg_len)]; - struct iovec msg_iov; /* Vector of data to send/receive into. */ - - msg.msg_control = buf; - msg.msg_controllen = sizeof(buf); - cmsg = CMSG_FIRSTHDR(&msg); - cmsg->cmsg_level = SOL_TLS; - cmsg->cmsg_type = TLS_SET_RECORD_TYPE; - cmsg->cmsg_len = CMSG_LEN(cmsg_len); - *CMSG_DATA(cmsg) = record_type; - msg.msg_controllen = cmsg->cmsg_len; - - msg_iov.iov_base = data; - msg_iov.iov_len = length; - msg.msg_iov = &msg_iov; - msg.msg_iovlen = 1; - - return sendmsg(sock, &msg, 0); - } - -Control message data should be provided unencrypted, and will be -encrypted by the kernel. - -Receiving TLS control messages ------------------------------- - -TLS control messages are passed in the userspace buffer, with message -type passed via cmsg. If no cmsg buffer is provided, an error is -returned if a control message is received. Data messages may be -received without a cmsg buffer set. - - char buffer[16384]; - char cmsg[CMSG_SPACE(sizeof(unsigned char))]; - struct msghdr msg = {0}; - msg.msg_control = cmsg; - msg.msg_controllen = sizeof(cmsg); - - struct iovec msg_iov; - msg_iov.iov_base = buffer; - msg_iov.iov_len = 16384; - - msg.msg_iov = &msg_iov; - msg.msg_iovlen = 1; - - int ret = recvmsg(sock, &msg, 0 /* flags */); - - struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); - if (cmsg->cmsg_level == SOL_TLS && - cmsg->cmsg_type == TLS_GET_RECORD_TYPE) { - int record_type = *((unsigned char *)CMSG_DATA(cmsg)); - // Do something with record_type, and control message data in - // buffer. - // - // Note that record_type may be == to application data (23). - } else { - // Buffer contains application data. - } - -recv will never return data from mixed types of TLS records. - -Integrating in to userspace TLS library ---------------------------------------- - -At a high level, the kernel TLS ULP is a replacement for the record -layer of a userspace TLS library. - -A patchset to OpenSSL to use ktls as the record layer is here: - -https://github.com/Mellanox/openssl/commits/tls_rx2 - -An example of calling send directly after a handshake using -gnutls. Since it doesn't implement a full record layer, control -messages are not supported: - -https://github.com/ktls/af_ktls-tool/commits/RX -- cgit From f42c104f2ec94a9255a835cd4cd1bd76279d4d06 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 21 May 2019 18:57:14 -0700 Subject: Documentation: add TLS offload documentation Describe existing kernel TLS offload (added back in Linux 4.19) - the mechanism, the expected behavior and the notable corner cases. This documentation is mostly targeting hardware vendors who want to implement offload, to ensure consistency between implementations. v2: - add emphasis around TLS_SW/TLS_HW/TLS_HW_RECORD; - remove mentions of ongoing work (Boris); - split the flow of data in SW vs. HW cases in TX overview (Boris); - call out which fields are updated by the device and which are filled by the stack (Boris); - move error handling into it's own section (Boris); - add more words about fallback (Boris); - note that checksum validation is required (Alexei); - note that drivers shouldn't pay attention to the TLS device features. Signed-off-by: Jakub Kicinski Acked-by: Dave Watson Acked-by: Alexei Starovoitov Acked-by: Boris Pismenny Signed-off-by: David S. Miller --- Documentation/networking/index.rst | 1 + Documentation/networking/tls-offload-layers.svg | 1 + .../networking/tls-offload-reorder-bad.svg | 1 + .../networking/tls-offload-reorder-good.svg | 1 + Documentation/networking/tls-offload.rst | 482 +++++++++++++++++++++ Documentation/networking/tls.rst | 2 + 6 files changed, 488 insertions(+) create mode 100644 Documentation/networking/tls-offload-layers.svg create mode 100644 Documentation/networking/tls-offload-reorder-bad.svg create mode 100644 Documentation/networking/tls-offload-reorder-good.svg create mode 100644 Documentation/networking/tls-offload.rst (limited to 'Documentation') diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index f0f97eef091c..a46fca264bee 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -29,6 +29,7 @@ Contents: segmentation-offloads scaling tls + tls-offload .. only:: subproject diff --git a/Documentation/networking/tls-offload-layers.svg b/Documentation/networking/tls-offload-layers.svg new file mode 100644 index 000000000000..cf72f05dbb21 --- /dev/null +++ b/Documentation/networking/tls-offload-layers.svg @@ -0,0 +1 @@ + diff --git a/Documentation/networking/tls-offload-reorder-bad.svg b/Documentation/networking/tls-offload-reorder-bad.svg new file mode 100644 index 000000000000..d107aaf0f71e --- /dev/null +++ b/Documentation/networking/tls-offload-reorder-bad.svg @@ -0,0 +1 @@ + diff --git a/Documentation/networking/tls-offload-reorder-good.svg b/Documentation/networking/tls-offload-reorder-good.svg new file mode 100644 index 000000000000..10e17d91f70c --- /dev/null +++ b/Documentation/networking/tls-offload-reorder-good.svg @@ -0,0 +1 @@ + diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst new file mode 100644 index 000000000000..cb85af559dff --- /dev/null +++ b/Documentation/networking/tls-offload.rst @@ -0,0 +1,482 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +================== +Kernel TLS offload +================== + +Kernel TLS operation +==================== + +Linux kernel provides TLS connection offload infrastructure. Once a TCP +connection is in ``ESTABLISHED`` state user space can enable the TLS Upper +Layer Protocol (ULP) and install the cryptographic connection state. +For details regarding the user-facing interface refer to the TLS +documentation in :ref:`Documentation/networking/tls.rst `. + +``ktls`` can operate in three modes: + + * Software crypto mode (``TLS_SW``) - CPU handles the cryptography. + In most basic cases only crypto operations synchronous with the CPU + can be used, but depending on calling context CPU may utilize + asynchronous crypto accelerators. The use of accelerators introduces extra + latency on socket reads (decryption only starts when a read syscall + is made) and additional I/O load on the system. + * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto + on a packet by packet basis, provided the packets arrive in order. + This mode integrates best with the kernel stack and is described in detail + in the remaining part of this document + (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``). + * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where + NIC driver and firmware replace the kernel networking stack + with its own TCP handling, it is not usable in production environments + making use of the Linux networking stack for example any firewalling + abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``). + +The operation mode is selected automatically based on device configuration, +offload opt-in or opt-out on per-connection basis is not currently supported. + +TX +-- + +At a high level user write requests are turned into a scatter list, the TLS ULP +intercepts them, inserts record framing, performs encryption (in ``TLS_SW`` +mode) and then hands the modified scatter list to the TCP layer. From this +point on the TCP stack proceeds as normal. + +In ``TLS_HW`` mode the encryption is not performed in the TLS ULP. +Instead packets reach a device driver, the driver will mark the packets +for crypto offload based on the socket the packet is attached to, +and send them to the device for encryption and transmission. + +RX +-- + +On the receive side if the device handled decryption and authentication +successfully, the driver will set the decrypted bit in the associated +:c:type:`struct sk_buff `. The packets reach the TCP stack and +are handled normally. ``ktls`` is informed when data is queued to the socket +and the ``strparser`` mechanism is used to delineate the records. Upon read +request, records are retrieved from the socket and passed to decryption routine. +If device decrypted all the segments of the record the decryption is skipped, +otherwise software path handles decryption. + +.. kernel-figure:: tls-offload-layers.svg + :alt: TLS offload layers + :align: center + :figwidth: 28em + + Layers of Kernel TLS stack + +Device configuration +==================== + +During driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and +``NETIF_F_HW_TLS_TX`` features and installs its +:c:type:`struct tlsdev_ops ` +pointer in the :c:member:`tlsdev_ops` member of the +:c:type:`struct net_device `. + +When TLS cryptographic connection state is installed on a ``ktls`` socket +(note that it is done twice, once for RX and once for TX direction, +and the two are completely independent), the kernel checks if the underlying +network device is offload-capable and attempts the offload. In case offload +fails the connection is handled entirely in software using the same mechanism +as if the offload was never tried. + +Offload request is performed via the :c:member:`tls_dev_add` callback of +:c:type:`struct tlsdev_ops `: + +.. code-block:: c + + int (*tls_dev_add)(struct net_device *netdev, struct sock *sk, + enum tls_offload_ctx_dir direction, + struct tls_crypto_info *crypto_info, + u32 start_offload_tcp_sn); + +``direction`` indicates whether the cryptographic information is for +the received or transmitted packets. Driver uses the ``sk`` parameter +to retrieve the connection 5-tuple and socket family (IPv4 vs IPv6). +Cryptographic information in ``crypto_info`` includes the key, iv, salt +as well as TLS record sequence number. ``start_offload_tcp_sn`` indicates +which TCP sequence number corresponds to the beginning of the record with +sequence number from ``crypto_info``. The driver can add its state +at the end of kernel structures (see :c:member:`driver_state` members +in ``include/net/tls.h``) to avoid additional allocations and pointer +dereferences. + +TX +-- + +After TX state is installed, the stack guarantees that the first segment +of the stream will start exactly at the ``start_offload_tcp_sn`` sequence +number, simplifying TCP sequence number matching. + +TX offload being fully initialized does not imply that all segments passing +through the driver and which belong to the offloaded socket will be after +the expected sequence number and will have kernel record information. +In particular, already encrypted data may have been queued to the socket +before installing the connection state in the kernel. + +RX +-- + +In RX direction local networking stack has little control over the segmentation, +so the initial records' TCP sequence number may be anywhere inside the segment. + +Normal operation +================ + +At the minimum the device maintains the following state for each connection, in +each direction: + + * crypto secrets (key, iv, salt) + * crypto processing state (partial blocks, partial authentication tag, etc.) + * record metadata (sequence number, processing offset and length) + * expected TCP sequence number + +There are no guarantees on record length or record segmentation. In particular +segments may start at any point of a record and contain any number of records. +Assuming segments are received in order, the device should be able to perform +crypto operations and authentication regardless of segmentation. For this +to be possible device has to keep small amount of segment-to-segment state. +This includes at least: + + * partial headers (if a segment carried only a part of the TLS header) + * partial data block + * partial authentication tag (all data had been seen but part of the + authentication tag has to be written or read from the subsequent segment) + +Record reassembly is not necessary for TLS offload. If the packets arrive +in order the device should be able to handle them separately and make +forward progress. + +TX +-- + +The kernel stack performs record framing reserving space for the authentication +tag and populating all other TLS header and tailer fields. + +Both the device and the driver maintain expected TCP sequence numbers +due to the possibility of retransmissions and the lack of software fallback +once the packet reaches the device. +For segments passed in order, the driver marks the packets with +a connection identifier (note that a 5-tuple lookup is insufficient to identify +packets requiring HW offload, see the :ref:`5tuple_problems` section) +and hands them to the device. The device identifies the packet as requiring +TLS handling and confirms the sequence number matches its expectation. +The device performs encryption and authentication of the record data. +It replaces the authentication tag and TCP checksum with correct values. + +RX +-- + +Before a packet is DMAed to the host (but after NIC's embedded switching +and packet transformation functions) the device validates the Layer 4 +checksum and performs a 5-tuple lookup to find any TLS connection the packet +may belong to (technically a 4-tuple +lookup is sufficient - IP addresses and TCP port numbers, as the protocol +is always TCP). If connection is matched device confirms if the TCP sequence +number is the expected one and proceeds to TLS handling (record delineation, +decryption, authentication for each record in the packet). The device leaves +the record framing unmodified, the stack takes care of record decapsulation. +Device indicates successful handling of TLS offload in the per-packet context +(descriptor) passed to the host. + +Upon reception of a TLS offloaded packet, the driver sets +the :c:member:`decrypted` mark in :c:type:`struct sk_buff ` +corresponding to the segment. Networking stack makes sure decrypted +and non-decrypted segments do not get coalesced (e.g. by GRO or socket layer) +and takes care of partial decryption. + +Resync handling +=============== + +In presence of packet drops or network packet reordering, the device may lose +synchronization with the TLS stream, and require a resync with the kernel's +TCP stack. + +Note that resync is only attempted for connections which were successfully +added to the device table and are in TLS_HW mode. For example, +if the table was full when cryptographic state was installed in the kernel, +such connection will never get offloaded. Therefore the resync request +does not carry any cryptographic connection state. + +TX +-- + +Segments transmitted from an offloaded socket can get out of sync +in similar ways to the receive side-retransmissions - local drops +are possible, though network reorders are not. + +Whenever an out of order segment is transmitted the driver provides +the device with enough information to perform cryptographic operations. +This means most likely that the part of the record preceding the current +segment has to be passed to the device as part of the packet context, +together with its TCP sequence number and TLS record number. The device +can then initialize its crypto state, process and discard the preceding +data (to be able to insert the authentication tag) and move onto handling +the actual packet. + +In this mode depending on the implementation the driver can either ask +for a continuation with the crypto state and the new sequence number +(next expected segment is the one after the out of order one), or continue +with the previous stream state - assuming that the out of order segment +was just a retransmission. The former is simpler, and does not require +retransmission detection therefore it is the recommended method until +such time it is proven inefficient. + +RX +-- + +A small amount of RX reorder events may not require a full resynchronization. +In particular the device should not lose synchronization +when record boundary can be recovered: + +.. kernel-figure:: tls-offload-reorder-good.svg + :alt: reorder of non-header segment + :align: center + + Reorder of non-header segment + +Green segments are successfully decrypted, blue ones are passed +as received on wire, red stripes mark start of new records. + +In above case segment 1 is received and decrypted successfully. +Segment 2 was dropped so 3 arrives out of order. The device knows +the next record starts inside 3, based on record length in segment 1. +Segment 3 is passed untouched, because due to lack of data from segment 2 +the remainder of the previous record inside segment 3 cannot be handled. +The device can, however, collect the authentication algorithm's state +and partial block from the new record in segment 3 and when 4 and 5 +arrive continue decryption. Finally when 2 arrives it's completely outside +of expected window of the device so it's passed as is without special +handling. ``ktls`` software fallback handles the decryption of record +spanning segments 1, 2 and 3. The device did not get out of sync, +even though two segments did not get decrypted. + +Kernel synchronization may be necessary if the lost segment contained +a record header and arrived after the next record header has already passed: + +.. kernel-figure:: tls-offload-reorder-bad.svg + :alt: reorder of header segment + :align: center + + Reorder of segment with a TLS header + +In this example segment 2 gets dropped, and it contains a record header. +Device can only detect that segment 4 also contains a TLS header +if it knows the length of the previous record from segment 2. In this case +the device will lose synchronization with the stream. + +When the device gets out of sync and the stream reaches TCP sequence +numbers more than a max size record past the expected TCP sequence number, +the device starts scanning for a known header pattern. For example +for TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur +in the SSL/TLS version field of the header. Once pattern is matched +the device continues attempting parsing headers at expected locations +(based on the length fields at guessed locations). +Whenever the expected location does not contain a valid header the scan +is restarted. + +When the header is matched the device sends a confirmation request +to the kernel, asking if the guessed location is correct (if a TLS record +really starts there), and which record sequence number the given header had. +The kernel confirms the guessed location was correct and tells the device +the record sequence number. Meanwhile, the device had been parsing +and counting all records since the just-confirmed one, it adds the number +of records it had seen to the record number provided by the kernel. +At this point the device is in sync and can resume decryption at next +segment boundary. + +In a pathological case the device may latch onto a sequence of matching +headers and never hear back from the kernel (there is no negative +confirmation from the kernel). The implementation may choose to periodically +restart scan. Given how unlikely falsely-matching stream is, however, +periodic restart is not deemed necessary. + +Special care has to be taken if the confirmation request is passed +asynchronously to the packet stream and record may get processed +by the kernel before the confirmation request. + +Error handling +============== + +TX +-- + +Packets may be redirected or rerouted by the stack to a different +device than the selected TLS offload device. The stack will handle +such condition using the :c:func:`sk_validate_xmit_skb` helper +(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook). +Offload maintains information about all records until the data is +fully acknowledged, so if skbs reach the wrong device they can be handled +by software fallback. + +Any device TLS offload handling error on the transmission side must result +in the packet being dropped. For example if a packet got out of order +due to a bug in the stack or the device, reached the device and can't +be encrypted such packet must be dropped. + +RX +-- + +If the device encounters any problems with TLS offload on the receive +side it should pass the packet to the host's networking stack as it was +received on the wire. + +For example authentication failure for any record in the segment should +result in passing the unmodified packet to the software fallback. This means +packets should not be modified "in place". Splitting segments to handle partial +decryption is not advised. In other words either all records in the packet +had been handled successfully and authenticated or the packet has to be passed +to the host's stack as it was on the wire (recovering original packet in the +driver if device provides precise error is sufficient). + +The Linux networking stack does not provide a way of reporting per-packet +decryption and authentication errors, packets with errors must simply not +have the :c:member:`decrypted` mark set. + +A packet should also not be handled by the TLS offload if it contains +incorrect checksums. + +Performance metrics +=================== + +TLS offload can be characterized by the following basic metrics: + + * max connection count + * connection installation rate + * connection installation latency + * total cryptographic performance + +Note that each TCP connection requires a TLS session in both directions, +the performance may be reported treating each direction separately. + +Max connection count +-------------------- + +The number of connections device can support can be exposed via +``devlink resource`` API. + +Total cryptographic performance +------------------------------- + +Offload performance may depend on segment and record size. + +Overload of the cryptographic subsystem of the device should not have +significant performance impact on non-offloaded streams. + +Statistics +========== + +Following minimum set of TLS-related statistics should be reported +by the driver: + + * ``rx_tls_decrypted`` - number of successfully decrypted TLS segments + * ``tx_tls_encrypted`` - number of in-order TLS segments passed to device + for encryption + * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream + but did not arrive in the expected order + * ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because + they arrived out of order and associated record could not be found + (see also :ref:`pre_tls_data`) + +Notable corner cases, exceptions and additional requirements +============================================================ + +.. _5tuple_problems: + +5-tuple matching limitations +---------------------------- + +The device can only recognize received packets based on the 5-tuple +of the socket. Current ``ktls`` implementation will not offload sockets +routed through software interfaces such as those used for tunneling +or virtual networking. However, many packet transformations performed +by the networking stack (most notably any BPF logic) do not require +any intermediate software device, therefore a 5-tuple match may +consistently miss at the device level. In such cases the device +should still be able to perform TX offload (encryption) and should +fallback cleanly to software decryption (RX). + +Out of order +------------ + +Introducing extra processing in NICs should not cause packets to be +transmitted or received out of order, for example pure ACK packets +should not be reordered with respect to data segments. + +Ingress reorder +--------------- + +A device is permitted to perform packet reordering for consecutive +TCP segments (i.e. placing packets in the correct order) but any form +of additional buffering is disallowed. + +Coexistence with standard networking offload features +----------------------------------------------------- + +Offloaded ``ktls`` sockets should support standard TCP stack features +transparently. Enabling device TLS offload should not cause any difference +in packets as seen on the wire. + +Transport layer transparency +---------------------------- + +The device should not modify any packet headers for the purpose +of the simplifying TLS offload. + +The device should not depend on any packet headers beyond what is strictly +necessary for TLS offload. + +Segment drops +------------- + +Dropping packets is acceptable only in the event of catastrophic +system errors and should never be used as an error handling mechanism +in cases arising from normal operation. In other words, reliance +on TCP retransmissions to handle corner cases is not acceptable. + +TLS device features +------------------- + +Drivers should ignore the changes to TLS the device feature flags. +These flags will be acted upon accordingly by the core ``ktls`` code. +TLS device feature flags only control adding of new TLS connection +offloads, old connections will remain active after flags are cleared. + +Known bugs +========== + +skb_orphan() leaks clear text +----------------------------- + +Currently drivers depend on the :c:member:`sk` member of +:c:type:`struct sk_buff ` to identify segments requiring +encryption. Any operation which removes or does not preserve the socket +association such as :c:func:`skb_orphan` or :c:func:`skb_clone` +will cause the driver to miss the packets and lead to clear text leaks. + +Redirects leak clear text +------------------------- + +In the RX direction, if segment has already been decrypted by the device +and it gets redirected or mirrored - clear text will be transmitted out. + +.. _pre_tls_data: + +Transmission of pre-TLS data +---------------------------- + +User can enqueue some already encrypted and framed records before enabling +``ktls`` on the socket. Those records have to get sent as they are. This is +perfectly easy to handle in the software case - such data will be waiting +in the TCP layer, TLS ULP won't see it. In the offloaded case when pre-queued +segment reaches transmission point it appears to be out of order (before the +expected TCP sequence number) and the stack does not have a record information +associated. + +All segments without record information cannot, however, be assumed to be +pre-queued data, because a race condition exists between TCP stack queuing +a retransmission, the driver seeing the retransmission and TCP ACK arriving +for the retransmitted data. diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst index 482bd73f18a2..5bcbf75e2025 100644 --- a/Documentation/networking/tls.rst +++ b/Documentation/networking/tls.rst @@ -1,3 +1,5 @@ +.. _kernel_tls: + ========== Kernel TLS ========== -- cgit From a6cd0d2d493ab7806b49f738b4f66362437cc09e Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 27 May 2019 19:06:38 -0700 Subject: Documentation: net-sysfs: Remove duplicate PHY device documentation Both sysfs-bus-mdio and sysfs-class-net-phydev contain the same duplication information. There is not currently any MDIO bus specific attribute, but there are PHY device (struct phy_device) specific attributes. Use the more precise description from sysfs-bus-mdio and carry that over to sysfs-class-net-phydev. Fixes: 86f22d04dfb5 ("net: sysfs: Document PHY device sysfs attributes") Signed-off-by: Florian Fainelli Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- Documentation/ABI/testing/sysfs-bus-mdio | 29 ------------------------ Documentation/ABI/testing/sysfs-class-net-phydev | 19 +++++++++++----- 2 files changed, 13 insertions(+), 35 deletions(-) delete mode 100644 Documentation/ABI/testing/sysfs-bus-mdio (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-mdio b/Documentation/ABI/testing/sysfs-bus-mdio deleted file mode 100644 index 491baaf4285f..000000000000 --- a/Documentation/ABI/testing/sysfs-bus-mdio +++ /dev/null @@ -1,29 +0,0 @@ -What: /sys/bus/mdio_bus/devices/.../phy_id -Date: November 2012 -KernelVersion: 3.8 -Contact: netdev@vger.kernel.org -Description: - This attribute contains the 32-bit PHY Identifier as reported - by the device during bus enumeration, encoded in hexadecimal. - This ID is used to match the device with the appropriate - driver. - -What: /sys/bus/mdio_bus/devices/.../phy_interface -Date: February 2014 -KernelVersion: 3.15 -Contact: netdev@vger.kernel.org -Description: - This attribute contains the PHY interface as configured by the - Ethernet driver during bus enumeration, encoded in string. - This interface mode is used to configure the Ethernet MAC with the - appropriate mode for its data lines to the PHY hardware. - -What: /sys/bus/mdio_bus/devices/.../phy_has_fixups -Date: February 2014 -KernelVersion: 3.15 -Contact: netdev@vger.kernel.org -Description: - This attribute contains the boolean value whether a given PHY - device has had any "fixup" workaround running on it, encoded as - a boolean. This information is provided to help troubleshooting - PHY configurations. diff --git a/Documentation/ABI/testing/sysfs-class-net-phydev b/Documentation/ABI/testing/sysfs-class-net-phydev index 6ebabfb27912..2a5723343aba 100644 --- a/Documentation/ABI/testing/sysfs-class-net-phydev +++ b/Documentation/ABI/testing/sysfs-class-net-phydev @@ -11,24 +11,31 @@ Date: February 2014 KernelVersion: 3.15 Contact: netdev@vger.kernel.org Description: - Boolean value indicating whether the PHY device has - any fixups registered against it (phy_register_fixup) + This attribute contains the boolean value whether a given PHY + device has had any "fixup" workaround running on it, encoded as + a boolean. This information is provided to help troubleshooting + PHY configurations. What: /sys/class/mdio_bus///phy_id Date: November 2012 KernelVersion: 3.8 Contact: netdev@vger.kernel.org Description: - 32-bit hexadecimal value corresponding to the PHY device's OUI, - model and revision number. + This attribute contains the 32-bit PHY Identifier as reported + by the device during bus enumeration, encoded in hexadecimal. + This ID is used to match the device with the appropriate + driver. What: /sys/class/mdio_bus///phy_interface Date: February 2014 KernelVersion: 3.15 Contact: netdev@vger.kernel.org Description: - String value indicating the PHY interface, possible - values are:. + This attribute contains the PHY interface as configured by the + Ethernet driver during bus enumeration, encoded in string. + This interface mode is used to configure the Ethernet MAC with the + appropriate mode for its data lines to the PHY hardware. + Possible values are: (not available), mii, gmii, sgmii, tbi, rev-mii, rmii, rgmii, rgmii-id, rgmii-rxid, rgmii-txid, rtbi, smii xgmii, moca, qsgmii, trgmii, 1000base-x, 2500base-x, rxaui, -- cgit