From 0efaaa86581c596f9426482c731f262d843807b6 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 15 Jun 2020 08:50:08 +0200 Subject: docs: crypto: convert asymmetric-keys.txt to ReST This file is almost compatible with ReST. Just minor changes were needed: - Adjust document and titles markups; - Adjust numbered list markups; - Add a comments markup for the Contents section; - Add markups for literal blocks. Acked-by: Jarkko Sakkinen Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/c2275ea94e0507a01b020ab66dfa824d8b1c2545.1592203650.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/crypto/asymmetric-keys.rst | 424 ++++++++++++++++++++++++++++++ Documentation/crypto/asymmetric-keys.txt | 429 ------------------------------- Documentation/crypto/index.rst | 1 + 3 files changed, 425 insertions(+), 429 deletions(-) create mode 100644 Documentation/crypto/asymmetric-keys.rst delete mode 100644 Documentation/crypto/asymmetric-keys.txt (limited to 'Documentation/crypto') diff --git a/Documentation/crypto/asymmetric-keys.rst b/Documentation/crypto/asymmetric-keys.rst new file mode 100644 index 000000000000..349f44a29392 --- /dev/null +++ b/Documentation/crypto/asymmetric-keys.rst @@ -0,0 +1,424 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================= +Asymmetric / Public-key Cryptography Key Type +============================================= + +.. Contents: + + - Overview. + - Key identification. + - Accessing asymmetric keys. + - Signature verification. + - Asymmetric key subtypes. + - Instantiation data parsers. + - Keyring link restrictions. + + +Overview +======== + +The "asymmetric" key type is designed to be a container for the keys used in +public-key cryptography, without imposing any particular restrictions on the +form or mechanism of the cryptography or form of the key. + +The asymmetric key is given a subtype that defines what sort of data is +associated with the key and provides operations to describe and destroy it. +However, no requirement is made that the key data actually be stored in the +key. + +A completely in-kernel key retention and operation subtype can be defined, but +it would also be possible to provide access to cryptographic hardware (such as +a TPM) that might be used to both retain the relevant key and perform +operations using that key. In such a case, the asymmetric key would then +merely be an interface to the TPM driver. + +Also provided is the concept of a data parser. Data parsers are responsible +for extracting information from the blobs of data passed to the instantiation +function. The first data parser that recognises the blob gets to set the +subtype of the key and define the operations that can be done on that key. + +A data parser may interpret the data blob as containing the bits representing a +key, or it may interpret it as a reference to a key held somewhere else in the +system (for example, a TPM). + + +Key Identification +================== + +If a key is added with an empty name, the instantiation data parsers are given +the opportunity to pre-parse a key and to determine the description the key +should be given from the content of the key. + +This can then be used to refer to the key, either by complete match or by +partial match. The key type may also use other criteria to refer to a key. + +The asymmetric key type's match function can then perform a wider range of +comparisons than just the straightforward comparison of the description with +the criterion string: + + 1) If the criterion string is of the form "id:" then the match + function will examine a key's fingerprint to see if the hex digits given + after the "id:" match the tail. For instance:: + + keyctl search @s asymmetric id:5acc2142 + + will match a key with fingerprint:: + + 1A00 2040 7601 7889 DE11 882C 3823 04AD 5ACC 2142 + + 2) If the criterion string is of the form ":" then the + match will match the ID as in (1), but with the added restriction that + only keys of the specified subtype (e.g. tpm) will be matched. For + instance:: + + keyctl search @s asymmetric tpm:5acc2142 + +Looking in /proc/keys, the last 8 hex digits of the key fingerprint are +displayed, along with the subtype:: + + 1a39e171 I----- 1 perm 3f010000 0 0 asymmetric modsign.0: DSA 5acc2142 [] + + +Accessing Asymmetric Keys +========================= + +For general access to asymmetric keys from within the kernel, the following +inclusion is required:: + + #include + +This gives access to functions for dealing with asymmetric / public keys. +Three enums are defined there for representing public-key cryptography +algorithms:: + + enum pkey_algo + +digest algorithms used by those:: + + enum pkey_hash_algo + +and key identifier representations:: + + enum pkey_id_type + +Note that the key type representation types are required because key +identifiers from different standards aren't necessarily compatible. For +instance, PGP generates key identifiers by hashing the key data plus some +PGP-specific metadata, whereas X.509 has arbitrary certificate identifiers. + +The operations defined upon a key are: + + 1) Signature verification. + +Other operations are possible (such as encryption) with the same key data +required for verification, but not currently supported, and others +(eg. decryption and signature generation) require extra key data. + + +Signature Verification +---------------------- + +An operation is provided to perform cryptographic signature verification, using +an asymmetric key to provide or to provide access to the public key:: + + int verify_signature(const struct key *key, + const struct public_key_signature *sig); + +The caller must have already obtained the key from some source and can then use +it to check the signature. The caller must have parsed the signature and +transferred the relevant bits to the structure pointed to by sig:: + + struct public_key_signature { + u8 *digest; + u8 digest_size; + enum pkey_hash_algo pkey_hash_algo : 8; + u8 nr_mpi; + union { + MPI mpi[2]; + ... + }; + }; + +The algorithm used must be noted in sig->pkey_hash_algo, and all the MPIs that +make up the actual signature must be stored in sig->mpi[] and the count of MPIs +placed in sig->nr_mpi. + +In addition, the data must have been digested by the caller and the resulting +hash must be pointed to by sig->digest and the size of the hash be placed in +sig->digest_size. + +The function will return 0 upon success or -EKEYREJECTED if the signature +doesn't match. + +The function may also return -ENOTSUPP if an unsupported public-key algorithm +or public-key/hash algorithm combination is specified or the key doesn't +support the operation; -EBADMSG or -ERANGE if some of the parameters have weird +data; or -ENOMEM if an allocation can't be performed. -EINVAL can be returned +if the key argument is the wrong type or is incompletely set up. + + +Asymmetric Key Subtypes +======================= + +Asymmetric keys have a subtype that defines the set of operations that can be +performed on that key and that determines what data is attached as the key +payload. The payload format is entirely at the whim of the subtype. + +The subtype is selected by the key data parser and the parser must initialise +the data required for it. The asymmetric key retains a reference on the +subtype module. + +The subtype definition structure can be found in:: + + #include + +and looks like the following:: + + struct asymmetric_key_subtype { + struct module *owner; + const char *name; + + void (*describe)(const struct key *key, struct seq_file *m); + void (*destroy)(void *payload); + int (*query)(const struct kernel_pkey_params *params, + struct kernel_pkey_query *info); + int (*eds_op)(struct kernel_pkey_params *params, + const void *in, void *out); + int (*verify_signature)(const struct key *key, + const struct public_key_signature *sig); + }; + +Asymmetric keys point to this with their payload[asym_subtype] member. + +The owner and name fields should be set to the owning module and the name of +the subtype. Currently, the name is only used for print statements. + +There are a number of operations defined by the subtype: + + 1) describe(). + + Mandatory. This allows the subtype to display something in /proc/keys + against the key. For instance the name of the public key algorithm type + could be displayed. The key type will display the tail of the key + identity string after this. + + 2) destroy(). + + Mandatory. This should free the memory associated with the key. The + asymmetric key will look after freeing the fingerprint and releasing the + reference on the subtype module. + + 3) query(). + + Mandatory. This is a function for querying the capabilities of a key. + + 4) eds_op(). + + Optional. This is the entry point for the encryption, decryption and + signature creation operations (which are distinguished by the operation ID + in the parameter struct). The subtype may do anything it likes to + implement an operation, including offloading to hardware. + + 5) verify_signature(). + + Optional. This is the entry point for signature verification. The + subtype may do anything it likes to implement an operation, including + offloading to hardware. + +Instantiation Data Parsers +========================== + +The asymmetric key type doesn't generally want to store or to deal with a raw +blob of data that holds the key data. It would have to parse it and error +check it each time it wanted to use it. Further, the contents of the blob may +have various checks that can be performed on it (eg. self-signatures, validity +dates) and may contain useful data about the key (identifiers, capabilities). + +Also, the blob may represent a pointer to some hardware containing the key +rather than the key itself. + +Examples of blob formats for which parsers could be implemented include: + + - OpenPGP packet stream [RFC 4880]. + - X.509 ASN.1 stream. + - Pointer to TPM key. + - Pointer to UEFI key. + - PKCS#8 private key [RFC 5208]. + - PKCS#5 encrypted private key [RFC 2898]. + +During key instantiation each parser in the list is tried until one doesn't +return -EBADMSG. + +The parser definition structure can be found in:: + + #include + +and looks like the following:: + + struct asymmetric_key_parser { + struct module *owner; + const char *name; + + int (*parse)(struct key_preparsed_payload *prep); + }; + +The owner and name fields should be set to the owning module and the name of +the parser. + +There is currently only a single operation defined by the parser, and it is +mandatory: + + 1) parse(). + + This is called to preparse the key from the key creation and update paths. + In particular, it is called during the key creation _before_ a key is + allocated, and as such, is permitted to provide the key's description in + the case that the caller declines to do so. + + The caller passes a pointer to the following struct with all of the fields + cleared, except for data, datalen and quotalen [see + Documentation/security/keys/core.rst]:: + + struct key_preparsed_payload { + char *description; + void *payload[4]; + const void *data; + size_t datalen; + size_t quotalen; + }; + + The instantiation data is in a blob pointed to by data and is datalen in + size. The parse() function is not permitted to change these two values at + all, and shouldn't change any of the other values _unless_ they are + recognise the blob format and will not return -EBADMSG to indicate it is + not theirs. + + If the parser is happy with the blob, it should propose a description for + the key and attach it to ->description, ->payload[asym_subtype] should be + set to point to the subtype to be used, ->payload[asym_crypto] should be + set to point to the initialised data for that subtype, + ->payload[asym_key_ids] should point to one or more hex fingerprints and + quotalen should be updated to indicate how much quota this key should + account for. + + When clearing up, the data attached to ->payload[asym_key_ids] and + ->description will be kfree()'d and the data attached to + ->payload[asm_crypto] will be passed to the subtype's ->destroy() method + to be disposed of. A module reference for the subtype pointed to by + ->payload[asym_subtype] will be put. + + + If the data format is not recognised, -EBADMSG should be returned. If it + is recognised, but the key cannot for some reason be set up, some other + negative error code should be returned. On success, 0 should be returned. + + The key's fingerprint string may be partially matched upon. For a + public-key algorithm such as RSA and DSA this will likely be a printable + hex version of the key's fingerprint. + +Functions are provided to register and unregister parsers:: + + int register_asymmetric_key_parser(struct asymmetric_key_parser *parser); + void unregister_asymmetric_key_parser(struct asymmetric_key_parser *subtype); + +Parsers may not have the same name. The names are otherwise only used for +displaying in debugging messages. + + +Keyring Link Restrictions +========================= + +Keyrings created from userspace using add_key can be configured to check the +signature of the key being linked. Keys without a valid signature are not +allowed to link. + +Several restriction methods are available: + + 1) Restrict using the kernel builtin trusted keyring + + - Option string used with KEYCTL_RESTRICT_KEYRING: + - "builtin_trusted" + + The kernel builtin trusted keyring will be searched for the signing key. + If the builtin trusted keyring is not configured, all links will be + rejected. The ca_keys kernel parameter also affects which keys are used + for signature verification. + + 2) Restrict using the kernel builtin and secondary trusted keyrings + + - Option string used with KEYCTL_RESTRICT_KEYRING: + - "builtin_and_secondary_trusted" + + The kernel builtin and secondary trusted keyrings will be searched for the + signing key. If the secondary trusted keyring is not configured, this + restriction will behave like the "builtin_trusted" option. The ca_keys + kernel parameter also affects which keys are used for signature + verification. + + 3) Restrict using a separate key or keyring + + - Option string used with KEYCTL_RESTRICT_KEYRING: + - "key_or_keyring:[:chain]" + + Whenever a key link is requested, the link will only succeed if the key + being linked is signed by one of the designated keys. This key may be + specified directly by providing a serial number for one asymmetric key, or + a group of keys may be searched for the signing key by providing the + serial number for a keyring. + + When the "chain" option is provided at the end of the string, the keys + within the destination keyring will also be searched for signing keys. + This allows for verification of certificate chains by adding each + certificate in order (starting closest to the root) to a keyring. For + instance, one keyring can be populated with links to a set of root + certificates, with a separate, restricted keyring set up for each + certificate chain to be validated:: + + # Create and populate a keyring for root certificates + root_id=`keyctl add keyring root-certs "" @s` + keyctl padd asymmetric "" $root_id < root1.cert + keyctl padd asymmetric "" $root_id < root2.cert + + # Create and restrict a keyring for the certificate chain + chain_id=`keyctl add keyring chain "" @s` + keyctl restrict_keyring $chain_id asymmetric key_or_keyring:$root_id:chain + + # Attempt to add each certificate in the chain, starting with the + # certificate closest to the root. + keyctl padd asymmetric "" $chain_id < intermediateA.cert + keyctl padd asymmetric "" $chain_id < intermediateB.cert + keyctl padd asymmetric "" $chain_id < end-entity.cert + + If the final end-entity certificate is successfully added to the "chain" + keyring, we can be certain that it has a valid signing chain going back to + one of the root certificates. + + A single keyring can be used to verify a chain of signatures by + restricting the keyring after linking the root certificate:: + + # Create a keyring for the certificate chain and add the root + chain2_id=`keyctl add keyring chain2 "" @s` + keyctl padd asymmetric "" $chain2_id < root1.cert + + # Restrict the keyring that already has root1.cert linked. The cert + # will remain linked by the keyring. + keyctl restrict_keyring $chain2_id asymmetric key_or_keyring:0:chain + + # Attempt to add each certificate in the chain, starting with the + # certificate closest to the root. + keyctl padd asymmetric "" $chain2_id < intermediateA.cert + keyctl padd asymmetric "" $chain2_id < intermediateB.cert + keyctl padd asymmetric "" $chain2_id < end-entity.cert + + If the final end-entity certificate is successfully added to the "chain2" + keyring, we can be certain that there is a valid signing chain going back + to the root certificate that was added before the keyring was restricted. + + +In all of these cases, if the signing key is found the signature of the key to +be linked will be verified using the signing key. The requested key is added +to the keyring only if the signature is successfully verified. -ENOKEY is +returned if the parent certificate could not be found, or -EKEYREJECTED is +returned if the signature check fails or the key is blacklisted. Other errors +may be returned if the signature check could not be performed. diff --git a/Documentation/crypto/asymmetric-keys.txt b/Documentation/crypto/asymmetric-keys.txt deleted file mode 100644 index 8763866b11cf..000000000000 --- a/Documentation/crypto/asymmetric-keys.txt +++ /dev/null @@ -1,429 +0,0 @@ - ============================================= - ASYMMETRIC / PUBLIC-KEY CRYPTOGRAPHY KEY TYPE - ============================================= - -Contents: - - - Overview. - - Key identification. - - Accessing asymmetric keys. - - Signature verification. - - Asymmetric key subtypes. - - Instantiation data parsers. - - Keyring link restrictions. - - -======== -OVERVIEW -======== - -The "asymmetric" key type is designed to be a container for the keys used in -public-key cryptography, without imposing any particular restrictions on the -form or mechanism of the cryptography or form of the key. - -The asymmetric key is given a subtype that defines what sort of data is -associated with the key and provides operations to describe and destroy it. -However, no requirement is made that the key data actually be stored in the -key. - -A completely in-kernel key retention and operation subtype can be defined, but -it would also be possible to provide access to cryptographic hardware (such as -a TPM) that might be used to both retain the relevant key and perform -operations using that key. In such a case, the asymmetric key would then -merely be an interface to the TPM driver. - -Also provided is the concept of a data parser. Data parsers are responsible -for extracting information from the blobs of data passed to the instantiation -function. The first data parser that recognises the blob gets to set the -subtype of the key and define the operations that can be done on that key. - -A data parser may interpret the data blob as containing the bits representing a -key, or it may interpret it as a reference to a key held somewhere else in the -system (for example, a TPM). - - -================== -KEY IDENTIFICATION -================== - -If a key is added with an empty name, the instantiation data parsers are given -the opportunity to pre-parse a key and to determine the description the key -should be given from the content of the key. - -This can then be used to refer to the key, either by complete match or by -partial match. The key type may also use other criteria to refer to a key. - -The asymmetric key type's match function can then perform a wider range of -comparisons than just the straightforward comparison of the description with -the criterion string: - - (1) If the criterion string is of the form "id:" then the match - function will examine a key's fingerprint to see if the hex digits given - after the "id:" match the tail. For instance: - - keyctl search @s asymmetric id:5acc2142 - - will match a key with fingerprint: - - 1A00 2040 7601 7889 DE11 882C 3823 04AD 5ACC 2142 - - (2) If the criterion string is of the form ":" then the - match will match the ID as in (1), but with the added restriction that - only keys of the specified subtype (e.g. tpm) will be matched. For - instance: - - keyctl search @s asymmetric tpm:5acc2142 - -Looking in /proc/keys, the last 8 hex digits of the key fingerprint are -displayed, along with the subtype: - - 1a39e171 I----- 1 perm 3f010000 0 0 asymmetric modsign.0: DSA 5acc2142 [] - - -========================= -ACCESSING ASYMMETRIC KEYS -========================= - -For general access to asymmetric keys from within the kernel, the following -inclusion is required: - - #include - -This gives access to functions for dealing with asymmetric / public keys. -Three enums are defined there for representing public-key cryptography -algorithms: - - enum pkey_algo - -digest algorithms used by those: - - enum pkey_hash_algo - -and key identifier representations: - - enum pkey_id_type - -Note that the key type representation types are required because key -identifiers from different standards aren't necessarily compatible. For -instance, PGP generates key identifiers by hashing the key data plus some -PGP-specific metadata, whereas X.509 has arbitrary certificate identifiers. - -The operations defined upon a key are: - - (1) Signature verification. - -Other operations are possible (such as encryption) with the same key data -required for verification, but not currently supported, and others -(eg. decryption and signature generation) require extra key data. - - -SIGNATURE VERIFICATION ----------------------- - -An operation is provided to perform cryptographic signature verification, using -an asymmetric key to provide or to provide access to the public key. - - int verify_signature(const struct key *key, - const struct public_key_signature *sig); - -The caller must have already obtained the key from some source and can then use -it to check the signature. The caller must have parsed the signature and -transferred the relevant bits to the structure pointed to by sig. - - struct public_key_signature { - u8 *digest; - u8 digest_size; - enum pkey_hash_algo pkey_hash_algo : 8; - u8 nr_mpi; - union { - MPI mpi[2]; - ... - }; - }; - -The algorithm used must be noted in sig->pkey_hash_algo, and all the MPIs that -make up the actual signature must be stored in sig->mpi[] and the count of MPIs -placed in sig->nr_mpi. - -In addition, the data must have been digested by the caller and the resulting -hash must be pointed to by sig->digest and the size of the hash be placed in -sig->digest_size. - -The function will return 0 upon success or -EKEYREJECTED if the signature -doesn't match. - -The function may also return -ENOTSUPP if an unsupported public-key algorithm -or public-key/hash algorithm combination is specified or the key doesn't -support the operation; -EBADMSG or -ERANGE if some of the parameters have weird -data; or -ENOMEM if an allocation can't be performed. -EINVAL can be returned -if the key argument is the wrong type or is incompletely set up. - - -======================= -ASYMMETRIC KEY SUBTYPES -======================= - -Asymmetric keys have a subtype that defines the set of operations that can be -performed on that key and that determines what data is attached as the key -payload. The payload format is entirely at the whim of the subtype. - -The subtype is selected by the key data parser and the parser must initialise -the data required for it. The asymmetric key retains a reference on the -subtype module. - -The subtype definition structure can be found in: - - #include - -and looks like the following: - - struct asymmetric_key_subtype { - struct module *owner; - const char *name; - - void (*describe)(const struct key *key, struct seq_file *m); - void (*destroy)(void *payload); - int (*query)(const struct kernel_pkey_params *params, - struct kernel_pkey_query *info); - int (*eds_op)(struct kernel_pkey_params *params, - const void *in, void *out); - int (*verify_signature)(const struct key *key, - const struct public_key_signature *sig); - }; - -Asymmetric keys point to this with their payload[asym_subtype] member. - -The owner and name fields should be set to the owning module and the name of -the subtype. Currently, the name is only used for print statements. - -There are a number of operations defined by the subtype: - - (1) describe(). - - Mandatory. This allows the subtype to display something in /proc/keys - against the key. For instance the name of the public key algorithm type - could be displayed. The key type will display the tail of the key - identity string after this. - - (2) destroy(). - - Mandatory. This should free the memory associated with the key. The - asymmetric key will look after freeing the fingerprint and releasing the - reference on the subtype module. - - (3) query(). - - Mandatory. This is a function for querying the capabilities of a key. - - (4) eds_op(). - - Optional. This is the entry point for the encryption, decryption and - signature creation operations (which are distinguished by the operation ID - in the parameter struct). The subtype may do anything it likes to - implement an operation, including offloading to hardware. - - (5) verify_signature(). - - Optional. This is the entry point for signature verification. The - subtype may do anything it likes to implement an operation, including - offloading to hardware. - - -========================== -INSTANTIATION DATA PARSERS -========================== - -The asymmetric key type doesn't generally want to store or to deal with a raw -blob of data that holds the key data. It would have to parse it and error -check it each time it wanted to use it. Further, the contents of the blob may -have various checks that can be performed on it (eg. self-signatures, validity -dates) and may contain useful data about the key (identifiers, capabilities). - -Also, the blob may represent a pointer to some hardware containing the key -rather than the key itself. - -Examples of blob formats for which parsers could be implemented include: - - - OpenPGP packet stream [RFC 4880]. - - X.509 ASN.1 stream. - - Pointer to TPM key. - - Pointer to UEFI key. - - PKCS#8 private key [RFC 5208]. - - PKCS#5 encrypted private key [RFC 2898]. - -During key instantiation each parser in the list is tried until one doesn't -return -EBADMSG. - -The parser definition structure can be found in: - - #include - -and looks like the following: - - struct asymmetric_key_parser { - struct module *owner; - const char *name; - - int (*parse)(struct key_preparsed_payload *prep); - }; - -The owner and name fields should be set to the owning module and the name of -the parser. - -There is currently only a single operation defined by the parser, and it is -mandatory: - - (1) parse(). - - This is called to preparse the key from the key creation and update paths. - In particular, it is called during the key creation _before_ a key is - allocated, and as such, is permitted to provide the key's description in - the case that the caller declines to do so. - - The caller passes a pointer to the following struct with all of the fields - cleared, except for data, datalen and quotalen [see - Documentation/security/keys/core.rst]. - - struct key_preparsed_payload { - char *description; - void *payload[4]; - const void *data; - size_t datalen; - size_t quotalen; - }; - - The instantiation data is in a blob pointed to by data and is datalen in - size. The parse() function is not permitted to change these two values at - all, and shouldn't change any of the other values _unless_ they are - recognise the blob format and will not return -EBADMSG to indicate it is - not theirs. - - If the parser is happy with the blob, it should propose a description for - the key and attach it to ->description, ->payload[asym_subtype] should be - set to point to the subtype to be used, ->payload[asym_crypto] should be - set to point to the initialised data for that subtype, - ->payload[asym_key_ids] should point to one or more hex fingerprints and - quotalen should be updated to indicate how much quota this key should - account for. - - When clearing up, the data attached to ->payload[asym_key_ids] and - ->description will be kfree()'d and the data attached to - ->payload[asm_crypto] will be passed to the subtype's ->destroy() method - to be disposed of. A module reference for the subtype pointed to by - ->payload[asym_subtype] will be put. - - - If the data format is not recognised, -EBADMSG should be returned. If it - is recognised, but the key cannot for some reason be set up, some other - negative error code should be returned. On success, 0 should be returned. - - The key's fingerprint string may be partially matched upon. For a - public-key algorithm such as RSA and DSA this will likely be a printable - hex version of the key's fingerprint. - -Functions are provided to register and unregister parsers: - - int register_asymmetric_key_parser(struct asymmetric_key_parser *parser); - void unregister_asymmetric_key_parser(struct asymmetric_key_parser *subtype); - -Parsers may not have the same name. The names are otherwise only used for -displaying in debugging messages. - - -========================= -KEYRING LINK RESTRICTIONS -========================= - -Keyrings created from userspace using add_key can be configured to check the -signature of the key being linked. Keys without a valid signature are not -allowed to link. - -Several restriction methods are available: - - (1) Restrict using the kernel builtin trusted keyring - - - Option string used with KEYCTL_RESTRICT_KEYRING: - - "builtin_trusted" - - The kernel builtin trusted keyring will be searched for the signing key. - If the builtin trusted keyring is not configured, all links will be - rejected. The ca_keys kernel parameter also affects which keys are used - for signature verification. - - (2) Restrict using the kernel builtin and secondary trusted keyrings - - - Option string used with KEYCTL_RESTRICT_KEYRING: - - "builtin_and_secondary_trusted" - - The kernel builtin and secondary trusted keyrings will be searched for the - signing key. If the secondary trusted keyring is not configured, this - restriction will behave like the "builtin_trusted" option. The ca_keys - kernel parameter also affects which keys are used for signature - verification. - - (3) Restrict using a separate key or keyring - - - Option string used with KEYCTL_RESTRICT_KEYRING: - - "key_or_keyring:[:chain]" - - Whenever a key link is requested, the link will only succeed if the key - being linked is signed by one of the designated keys. This key may be - specified directly by providing a serial number for one asymmetric key, or - a group of keys may be searched for the signing key by providing the - serial number for a keyring. - - When the "chain" option is provided at the end of the string, the keys - within the destination keyring will also be searched for signing keys. - This allows for verification of certificate chains by adding each - certificate in order (starting closest to the root) to a keyring. For - instance, one keyring can be populated with links to a set of root - certificates, with a separate, restricted keyring set up for each - certificate chain to be validated: - - # Create and populate a keyring for root certificates - root_id=`keyctl add keyring root-certs "" @s` - keyctl padd asymmetric "" $root_id < root1.cert - keyctl padd asymmetric "" $root_id < root2.cert - - # Create and restrict a keyring for the certificate chain - chain_id=`keyctl add keyring chain "" @s` - keyctl restrict_keyring $chain_id asymmetric key_or_keyring:$root_id:chain - - # Attempt to add each certificate in the chain, starting with the - # certificate closest to the root. - keyctl padd asymmetric "" $chain_id < intermediateA.cert - keyctl padd asymmetric "" $chain_id < intermediateB.cert - keyctl padd asymmetric "" $chain_id < end-entity.cert - - If the final end-entity certificate is successfully added to the "chain" - keyring, we can be certain that it has a valid signing chain going back to - one of the root certificates. - - A single keyring can be used to verify a chain of signatures by - restricting the keyring after linking the root certificate: - - # Create a keyring for the certificate chain and add the root - chain2_id=`keyctl add keyring chain2 "" @s` - keyctl padd asymmetric "" $chain2_id < root1.cert - - # Restrict the keyring that already has root1.cert linked. The cert - # will remain linked by the keyring. - keyctl restrict_keyring $chain2_id asymmetric key_or_keyring:0:chain - - # Attempt to add each certificate in the chain, starting with the - # certificate closest to the root. - keyctl padd asymmetric "" $chain2_id < intermediateA.cert - keyctl padd asymmetric "" $chain2_id < intermediateB.cert - keyctl padd asymmetric "" $chain2_id < end-entity.cert - - If the final end-entity certificate is successfully added to the "chain2" - keyring, we can be certain that there is a valid signing chain going back - to the root certificate that was added before the keyring was restricted. - - -In all of these cases, if the signing key is found the signature of the key to -be linked will be verified using the signing key. The requested key is added -to the keyring only if the signature is successfully verified. -ENOKEY is -returned if the parent certificate could not be found, or -EKEYREJECTED is -returned if the signature check fails or the key is blacklisted. Other errors -may be returned if the signature check could not be performed. diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst index c4ff5d791233..2bcaf422731e 100644 --- a/Documentation/crypto/index.rst +++ b/Documentation/crypto/index.rst @@ -18,6 +18,7 @@ for cryptographic use cases, as well as programming examples. intro architecture + asymmetric-keys devel-algos userspace-if crypto_engine -- cgit From 5846551bb147d2ae4a47f36de3c4ea46892ba180 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 15 Jun 2020 08:50:09 +0200 Subject: docs: crypto: convert api-intro.txt to ReST format - Change title markups; - Mark literal blocks; - Use list markups at authors/credits; - Add blank lines when needed; - Remove trailing whitespaces. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/c71e2c73a787ec7814db09bec3c1359779785bfa.1592203650.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/crypto/api-intro.rst | 262 +++++++++++++++++++++++++++++++++++++ Documentation/crypto/api-intro.txt | 250 ----------------------------------- Documentation/crypto/index.rst | 1 + 3 files changed, 263 insertions(+), 250 deletions(-) create mode 100644 Documentation/crypto/api-intro.rst delete mode 100644 Documentation/crypto/api-intro.txt (limited to 'Documentation/crypto') diff --git a/Documentation/crypto/api-intro.rst b/Documentation/crypto/api-intro.rst new file mode 100644 index 000000000000..bcff47d42189 --- /dev/null +++ b/Documentation/crypto/api-intro.rst @@ -0,0 +1,262 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================= +Scatterlist Cryptographic API +============================= + +Introduction +============ + +The Scatterlist Crypto API takes page vectors (scatterlists) as +arguments, and works directly on pages. In some cases (e.g. ECB +mode ciphers), this will allow for pages to be encrypted in-place +with no copying. + +One of the initial goals of this design was to readily support IPsec, +so that processing can be applied to paged skb's without the need +for linearization. + + +Details +======= + +At the lowest level are algorithms, which register dynamically with the +API. + +'Transforms' are user-instantiated objects, which maintain state, handle all +of the implementation logic (e.g. manipulating page vectors) and provide an +abstraction to the underlying algorithms. However, at the user +level they are very simple. + +Conceptually, the API layering looks like this:: + + [transform api] (user interface) + [transform ops] (per-type logic glue e.g. cipher.c, compress.c) + [algorithm api] (for registering algorithms) + +The idea is to make the user interface and algorithm registration API +very simple, while hiding the core logic from both. Many good ideas +from existing APIs such as Cryptoapi and Nettle have been adapted for this. + +The API currently supports five main types of transforms: AEAD (Authenticated +Encryption with Associated Data), Block Ciphers, Ciphers, Compressors and +Hashes. + +Please note that Block Ciphers is somewhat of a misnomer. It is in fact +meant to support all ciphers including stream ciphers. The difference +between Block Ciphers and Ciphers is that the latter operates on exactly +one block while the former can operate on an arbitrary amount of data, +subject to block size requirements (i.e., non-stream ciphers can only +process multiples of blocks). + +Here's an example of how to use the API:: + + #include + #include + #include + + struct scatterlist sg[2]; + char result[128]; + struct crypto_ahash *tfm; + struct ahash_request *req; + + tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(tfm)) + fail(); + + /* ... set up the scatterlists ... */ + + req = ahash_request_alloc(tfm, GFP_ATOMIC); + if (!req) + fail(); + + ahash_request_set_callback(req, 0, NULL, NULL); + ahash_request_set_crypt(req, sg, result, 2); + + if (crypto_ahash_digest(req)) + fail(); + + ahash_request_free(req); + crypto_free_ahash(tfm); + + +Many real examples are available in the regression test module (tcrypt.c). + + +Developer Notes +=============== + +Transforms may only be allocated in user context, and cryptographic +methods may only be called from softirq and user contexts. For +transforms with a setkey method it too should only be called from +user context. + +When using the API for ciphers, performance will be optimal if each +scatterlist contains data which is a multiple of the cipher's block +size (typically 8 bytes). This prevents having to do any copying +across non-aligned page fragment boundaries. + + +Adding New Algorithms +===================== + +When submitting a new algorithm for inclusion, a mandatory requirement +is that at least a few test vectors from known sources (preferably +standards) be included. + +Converting existing well known code is preferred, as it is more likely +to have been reviewed and widely tested. If submitting code from LGPL +sources, please consider changing the license to GPL (see section 3 of +the LGPL). + +Algorithms submitted must also be generally patent-free (e.g. IDEA +will not be included in the mainline until around 2011), and be based +on a recognized standard and/or have been subjected to appropriate +peer review. + +Also check for any RFCs which may relate to the use of specific algorithms, +as well as general application notes such as RFC2451 ("The ESP CBC-Mode +Cipher Algorithms"). + +It's a good idea to avoid using lots of macros and use inlined functions +instead, as gcc does a good job with inlining, while excessive use of +macros can cause compilation problems on some platforms. + +Also check the TODO list at the web site listed below to see what people +might already be working on. + + +Bugs +==== + +Send bug reports to: + linux-crypto@vger.kernel.org + +Cc: + Herbert Xu , + David S. Miller + + +Further Information +=================== + +For further patches and various updates, including the current TODO +list, see: +http://gondor.apana.org.au/~herbert/crypto/ + + +Authors +======= + +- James Morris +- David S. Miller +- Herbert Xu + + +Credits +======= + +The following people provided invaluable feedback during the development +of the API: + + - Alexey Kuznetzov + - Rusty Russell + - Herbert Valerio Riedel + - Jeff Garzik + - Michael Richardson + - Andrew Morton + - Ingo Oeser + - Christoph Hellwig + +Portions of this API were derived from the following projects: + + Kerneli Cryptoapi (http://www.kerneli.org/) + - Alexander Kjeldaas + - Herbert Valerio Riedel + - Kyle McMartin + - Jean-Luc Cooke + - David Bryson + - Clemens Fruhwirth + - Tobias Ringstrom + - Harald Welte + +and; + + Nettle (http://www.lysator.liu.se/~nisse/nettle/) + - Niels Möller + +Original developers of the crypto algorithms: + + - Dana L. How (DES) + - Andrew Tridgell and Steve French (MD4) + - Colin Plumb (MD5) + - Steve Reid (SHA1) + - Jean-Luc Cooke (SHA256, SHA384, SHA512) + - Kazunori Miyazawa / USAGI (HMAC) + - Matthew Skala (Twofish) + - Dag Arne Osvik (Serpent) + - Brian Gladman (AES) + - Kartikey Mahendra Bhatt (CAST6) + - Jon Oberheide (ARC4) + - Jouni Malinen (Michael MIC) + - NTT(Nippon Telegraph and Telephone Corporation) (Camellia) + +SHA1 algorithm contributors: + - Jean-Francois Dive + +DES algorithm contributors: + - Raimar Falke + - Gisle Sælensminde + - Niels Möller + +Blowfish algorithm contributors: + - Herbert Valerio Riedel + - Kyle McMartin + +Twofish algorithm contributors: + - Werner Koch + - Marc Mutz + +SHA256/384/512 algorithm contributors: + - Andrew McDonald + - Kyle McMartin + - Herbert Valerio Riedel + +AES algorithm contributors: + - Alexander Kjeldaas + - Herbert Valerio Riedel + - Kyle McMartin + - Adam J. Richter + - Fruhwirth Clemens (i586) + - Linus Torvalds (i586) + +CAST5 algorithm contributors: + - Kartikey Mahendra Bhatt (original developers unknown, FSF copyright). + +TEA/XTEA algorithm contributors: + - Aaron Grothe + - Michael Ringe + +Khazad algorithm contributors: + - Aaron Grothe + +Whirlpool algorithm contributors: + - Aaron Grothe + - Jean-Luc Cooke + +Anubis algorithm contributors: + - Aaron Grothe + +Tiger algorithm contributors: + - Aaron Grothe + +VIA PadLock contributors: + - Michal Ludvig + +Camellia algorithm contributors: + - NTT(Nippon Telegraph and Telephone Corporation) (Camellia) + +Generic scatterwalk code by Adam J. Richter + +Please send any credits updates or corrections to: +Herbert Xu diff --git a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt deleted file mode 100644 index 45d943fcae5b..000000000000 --- a/Documentation/crypto/api-intro.txt +++ /dev/null @@ -1,250 +0,0 @@ - - Scatterlist Cryptographic API - -INTRODUCTION - -The Scatterlist Crypto API takes page vectors (scatterlists) as -arguments, and works directly on pages. In some cases (e.g. ECB -mode ciphers), this will allow for pages to be encrypted in-place -with no copying. - -One of the initial goals of this design was to readily support IPsec, -so that processing can be applied to paged skb's without the need -for linearization. - - -DETAILS - -At the lowest level are algorithms, which register dynamically with the -API. - -'Transforms' are user-instantiated objects, which maintain state, handle all -of the implementation logic (e.g. manipulating page vectors) and provide an -abstraction to the underlying algorithms. However, at the user -level they are very simple. - -Conceptually, the API layering looks like this: - - [transform api] (user interface) - [transform ops] (per-type logic glue e.g. cipher.c, compress.c) - [algorithm api] (for registering algorithms) - -The idea is to make the user interface and algorithm registration API -very simple, while hiding the core logic from both. Many good ideas -from existing APIs such as Cryptoapi and Nettle have been adapted for this. - -The API currently supports five main types of transforms: AEAD (Authenticated -Encryption with Associated Data), Block Ciphers, Ciphers, Compressors and -Hashes. - -Please note that Block Ciphers is somewhat of a misnomer. It is in fact -meant to support all ciphers including stream ciphers. The difference -between Block Ciphers and Ciphers is that the latter operates on exactly -one block while the former can operate on an arbitrary amount of data, -subject to block size requirements (i.e., non-stream ciphers can only -process multiples of blocks). - -Here's an example of how to use the API: - - #include - #include - #include - - struct scatterlist sg[2]; - char result[128]; - struct crypto_ahash *tfm; - struct ahash_request *req; - - tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC); - if (IS_ERR(tfm)) - fail(); - - /* ... set up the scatterlists ... */ - - req = ahash_request_alloc(tfm, GFP_ATOMIC); - if (!req) - fail(); - - ahash_request_set_callback(req, 0, NULL, NULL); - ahash_request_set_crypt(req, sg, result, 2); - - if (crypto_ahash_digest(req)) - fail(); - - ahash_request_free(req); - crypto_free_ahash(tfm); - - -Many real examples are available in the regression test module (tcrypt.c). - - -DEVELOPER NOTES - -Transforms may only be allocated in user context, and cryptographic -methods may only be called from softirq and user contexts. For -transforms with a setkey method it too should only be called from -user context. - -When using the API for ciphers, performance will be optimal if each -scatterlist contains data which is a multiple of the cipher's block -size (typically 8 bytes). This prevents having to do any copying -across non-aligned page fragment boundaries. - - -ADDING NEW ALGORITHMS - -When submitting a new algorithm for inclusion, a mandatory requirement -is that at least a few test vectors from known sources (preferably -standards) be included. - -Converting existing well known code is preferred, as it is more likely -to have been reviewed and widely tested. If submitting code from LGPL -sources, please consider changing the license to GPL (see section 3 of -the LGPL). - -Algorithms submitted must also be generally patent-free (e.g. IDEA -will not be included in the mainline until around 2011), and be based -on a recognized standard and/or have been subjected to appropriate -peer review. - -Also check for any RFCs which may relate to the use of specific algorithms, -as well as general application notes such as RFC2451 ("The ESP CBC-Mode -Cipher Algorithms"). - -It's a good idea to avoid using lots of macros and use inlined functions -instead, as gcc does a good job with inlining, while excessive use of -macros can cause compilation problems on some platforms. - -Also check the TODO list at the web site listed below to see what people -might already be working on. - - -BUGS - -Send bug reports to: -linux-crypto@vger.kernel.org -Cc: Herbert Xu , - David S. Miller - - -FURTHER INFORMATION - -For further patches and various updates, including the current TODO -list, see: -http://gondor.apana.org.au/~herbert/crypto/ - - -AUTHORS - -James Morris -David S. Miller -Herbert Xu - - -CREDITS - -The following people provided invaluable feedback during the development -of the API: - - Alexey Kuznetzov - Rusty Russell - Herbert Valerio Riedel - Jeff Garzik - Michael Richardson - Andrew Morton - Ingo Oeser - Christoph Hellwig - -Portions of this API were derived from the following projects: - - Kerneli Cryptoapi (http://www.kerneli.org/) - Alexander Kjeldaas - Herbert Valerio Riedel - Kyle McMartin - Jean-Luc Cooke - David Bryson - Clemens Fruhwirth - Tobias Ringstrom - Harald Welte - -and; - - Nettle (http://www.lysator.liu.se/~nisse/nettle/) - Niels Möller - -Original developers of the crypto algorithms: - - Dana L. How (DES) - Andrew Tridgell and Steve French (MD4) - Colin Plumb (MD5) - Steve Reid (SHA1) - Jean-Luc Cooke (SHA256, SHA384, SHA512) - Kazunori Miyazawa / USAGI (HMAC) - Matthew Skala (Twofish) - Dag Arne Osvik (Serpent) - Brian Gladman (AES) - Kartikey Mahendra Bhatt (CAST6) - Jon Oberheide (ARC4) - Jouni Malinen (Michael MIC) - NTT(Nippon Telegraph and Telephone Corporation) (Camellia) - -SHA1 algorithm contributors: - Jean-Francois Dive - -DES algorithm contributors: - Raimar Falke - Gisle Sælensminde - Niels Möller - -Blowfish algorithm contributors: - Herbert Valerio Riedel - Kyle McMartin - -Twofish algorithm contributors: - Werner Koch - Marc Mutz - -SHA256/384/512 algorithm contributors: - Andrew McDonald - Kyle McMartin - Herbert Valerio Riedel - -AES algorithm contributors: - Alexander Kjeldaas - Herbert Valerio Riedel - Kyle McMartin - Adam J. Richter - Fruhwirth Clemens (i586) - Linus Torvalds (i586) - -CAST5 algorithm contributors: - Kartikey Mahendra Bhatt (original developers unknown, FSF copyright). - -TEA/XTEA algorithm contributors: - Aaron Grothe - Michael Ringe - -Khazad algorithm contributors: - Aaron Grothe - -Whirlpool algorithm contributors: - Aaron Grothe - Jean-Luc Cooke - -Anubis algorithm contributors: - Aaron Grothe - -Tiger algorithm contributors: - Aaron Grothe - -VIA PadLock contributors: - Michal Ludvig - -Camellia algorithm contributors: - NTT(Nippon Telegraph and Telephone Corporation) (Camellia) - -Generic scatterwalk code by Adam J. Richter - -Please send any credits updates or corrections to: -Herbert Xu - diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst index 2bcaf422731e..b2eeab3c8631 100644 --- a/Documentation/crypto/index.rst +++ b/Documentation/crypto/index.rst @@ -17,6 +17,7 @@ for cryptographic use cases, as well as programming examples. :maxdepth: 2 intro + api-intro architecture asymmetric-keys devel-algos -- cgit From ddc92399cc656d4fafc80a92743f8b4473ec9b3f Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 15 Jun 2020 08:50:10 +0200 Subject: docs: crypto: convert async-tx-api.txt to ReST format - Place the txt index inside a comment; - Use title and chapter markups; - Adjust markups for numbered list; - Mark literal blocks as such; - Use tables markup. - Adjust indentation when needed. Acked-By: Vinod Koul # dmaengine Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/98977242130efe86d1200f7a167299d4c1c205c5.1592203650.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/crypto/async-tx-api.rst | 270 ++++++++++++++++++++++++++++++++++ Documentation/crypto/async-tx-api.txt | 225 ---------------------------- Documentation/crypto/index.rst | 2 + 3 files changed, 272 insertions(+), 225 deletions(-) create mode 100644 Documentation/crypto/async-tx-api.rst delete mode 100644 Documentation/crypto/async-tx-api.txt (limited to 'Documentation/crypto') diff --git a/Documentation/crypto/async-tx-api.rst b/Documentation/crypto/async-tx-api.rst new file mode 100644 index 000000000000..bfc773991bdc --- /dev/null +++ b/Documentation/crypto/async-tx-api.rst @@ -0,0 +1,270 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Asynchronous Transfers/Transforms API +===================================== + +.. Contents + + 1. INTRODUCTION + + 2 GENEALOGY + + 3 USAGE + 3.1 General format of the API + 3.2 Supported operations + 3.3 Descriptor management + 3.4 When does the operation execute? + 3.5 When does the operation complete? + 3.6 Constraints + 3.7 Example + + 4 DMAENGINE DRIVER DEVELOPER NOTES + 4.1 Conformance points + 4.2 "My application needs exclusive control of hardware channels" + + 5 SOURCE + +1. Introduction +=============== + +The async_tx API provides methods for describing a chain of asynchronous +bulk memory transfers/transforms with support for inter-transactional +dependencies. It is implemented as a dmaengine client that smooths over +the details of different hardware offload engine implementations. Code +that is written to the API can optimize for asynchronous operation and +the API will fit the chain of operations to the available offload +resources. + +2.Genealogy +=========== + +The API was initially designed to offload the memory copy and +xor-parity-calculations of the md-raid5 driver using the offload engines +present in the Intel(R) Xscale series of I/O processors. It also built +on the 'dmaengine' layer developed for offloading memory copies in the +network stack using Intel(R) I/OAT engines. The following design +features surfaced as a result: + +1. implicit synchronous path: users of the API do not need to know if + the platform they are running on has offload capabilities. The + operation will be offloaded when an engine is available and carried out + in software otherwise. +2. cross channel dependency chains: the API allows a chain of dependent + operations to be submitted, like xor->copy->xor in the raid5 case. The + API automatically handles cases where the transition from one operation + to another implies a hardware channel switch. +3. dmaengine extensions to support multiple clients and operation types + beyond 'memcpy' + +3. Usage +======== + +3.1 General format of the API +----------------------------- + +:: + + struct dma_async_tx_descriptor * + async_(, struct async_submit ctl *submit) + +3.2 Supported operations +------------------------ + +======== ==================================================================== +memcpy memory copy between a source and a destination buffer +memset fill a destination buffer with a byte value +xor xor a series of source buffers and write the result to a + destination buffer +xor_val xor a series of source buffers and set a flag if the + result is zero. The implementation attempts to prevent + writes to memory +pq generate the p+q (raid6 syndrome) from a series of source buffers +pq_val validate that a p and or q buffer are in sync with a given series of + sources +datap (raid6_datap_recov) recover a raid6 data block and the p block + from the given sources +2data (raid6_2data_recov) recover 2 raid6 data blocks from the given + sources +======== ==================================================================== + +3.3 Descriptor management +------------------------- + +The return value is non-NULL and points to a 'descriptor' when the operation +has been queued to execute asynchronously. Descriptors are recycled +resources, under control of the offload engine driver, to be reused as +operations complete. When an application needs to submit a chain of +operations it must guarantee that the descriptor is not automatically recycled +before the dependency is submitted. This requires that all descriptors be +acknowledged by the application before the offload engine driver is allowed to +recycle (or free) the descriptor. A descriptor can be acked by one of the +following methods: + +1. setting the ASYNC_TX_ACK flag if no child operations are to be submitted +2. submitting an unacknowledged descriptor as a dependency to another + async_tx call will implicitly set the acknowledged state. +3. calling async_tx_ack() on the descriptor. + +3.4 When does the operation execute? +------------------------------------ + +Operations do not immediately issue after return from the +async_ call. Offload engine drivers batch operations to +improve performance by reducing the number of mmio cycles needed to +manage the channel. Once a driver-specific threshold is met the driver +automatically issues pending operations. An application can force this +event by calling async_tx_issue_pending_all(). This operates on all +channels since the application has no knowledge of channel to operation +mapping. + +3.5 When does the operation complete? +------------------------------------- + +There are two methods for an application to learn about the completion +of an operation. + +1. Call dma_wait_for_async_tx(). This call causes the CPU to spin while + it polls for the completion of the operation. It handles dependency + chains and issuing pending operations. +2. Specify a completion callback. The callback routine runs in tasklet + context if the offload engine driver supports interrupts, or it is + called in application context if the operation is carried out + synchronously in software. The callback can be set in the call to + async_, or when the application needs to submit a chain of + unknown length it can use the async_trigger_callback() routine to set a + completion interrupt/callback at the end of the chain. + +3.6 Constraints +--------------- + +1. Calls to async_ are not permitted in IRQ context. Other + contexts are permitted provided constraint #2 is not violated. +2. Completion callback routines cannot submit new operations. This + results in recursion in the synchronous case and spin_locks being + acquired twice in the asynchronous case. + +3.7 Example +----------- + +Perform a xor->copy->xor operation where each operation depends on the +result from the previous operation:: + + void callback(void *param) + { + struct completion *cmp = param; + + complete(cmp); + } + + void run_xor_copy_xor(struct page **xor_srcs, + int xor_src_cnt, + struct page *xor_dest, + size_t xor_len, + struct page *copy_src, + struct page *copy_dest, + size_t copy_len) + { + struct dma_async_tx_descriptor *tx; + addr_conv_t addr_conv[xor_src_cnt]; + struct async_submit_ctl submit; + addr_conv_t addr_conv[NDISKS]; + struct completion cmp; + + init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL, + addr_conv); + tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit) + + submit->depend_tx = tx; + tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit); + + init_completion(&cmp); + init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx, + callback, &cmp, addr_conv); + tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit); + + async_tx_issue_pending_all(); + + wait_for_completion(&cmp); + } + +See include/linux/async_tx.h for more information on the flags. See the +ops_run_* and ops_complete_* routines in drivers/md/raid5.c for more +implementation examples. + +4. Driver Development Notes +=========================== + +4.1 Conformance points +---------------------- + +There are a few conformance points required in dmaengine drivers to +accommodate assumptions made by applications using the async_tx API: + +1. Completion callbacks are expected to happen in tasklet context +2. dma_async_tx_descriptor fields are never manipulated in IRQ context +3. Use async_tx_run_dependencies() in the descriptor clean up path to + handle submission of dependent operations + +4.2 "My application needs exclusive control of hardware channels" +----------------------------------------------------------------- + +Primarily this requirement arises from cases where a DMA engine driver +is being used to support device-to-memory operations. A channel that is +performing these operations cannot, for many platform specific reasons, +be shared. For these cases the dma_request_channel() interface is +provided. + +The interface is:: + + struct dma_chan *dma_request_channel(dma_cap_mask_t mask, + dma_filter_fn filter_fn, + void *filter_param); + +Where dma_filter_fn is defined as:: + + typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param); + +When the optional 'filter_fn' parameter is set to NULL +dma_request_channel simply returns the first channel that satisfies the +capability mask. Otherwise, when the mask parameter is insufficient for +specifying the necessary channel, the filter_fn routine can be used to +disposition the available channels in the system. The filter_fn routine +is called once for each free channel in the system. Upon seeing a +suitable channel filter_fn returns DMA_ACK which flags that channel to +be the return value from dma_request_channel. A channel allocated via +this interface is exclusive to the caller, until dma_release_channel() +is called. + +The DMA_PRIVATE capability flag is used to tag dma devices that should +not be used by the general-purpose allocator. It can be set at +initialization time if it is known that a channel will always be +private. Alternatively, it is set when dma_request_channel() finds an +unused "public" channel. + +A couple caveats to note when implementing a driver and consumer: + +1. Once a channel has been privately allocated it will no longer be + considered by the general-purpose allocator even after a call to + dma_release_channel(). +2. Since capabilities are specified at the device level a dma_device + with multiple channels will either have all channels public, or all + channels private. + +5. Source +--------- + +include/linux/dmaengine.h: + core header file for DMA drivers and api users +drivers/dma/dmaengine.c: + offload engine channel management routines +drivers/dma/: + location for offload engine drivers +include/linux/async_tx.h: + core header file for the async_tx api +crypto/async_tx/async_tx.c: + async_tx interface to dmaengine and common code +crypto/async_tx/async_memcpy.c: + copy offload +crypto/async_tx/async_xor.c: + xor and xor zero sum offload diff --git a/Documentation/crypto/async-tx-api.txt b/Documentation/crypto/async-tx-api.txt deleted file mode 100644 index 7bf1be20d93a..000000000000 --- a/Documentation/crypto/async-tx-api.txt +++ /dev/null @@ -1,225 +0,0 @@ - Asynchronous Transfers/Transforms API - -1 INTRODUCTION - -2 GENEALOGY - -3 USAGE -3.1 General format of the API -3.2 Supported operations -3.3 Descriptor management -3.4 When does the operation execute? -3.5 When does the operation complete? -3.6 Constraints -3.7 Example - -4 DMAENGINE DRIVER DEVELOPER NOTES -4.1 Conformance points -4.2 "My application needs exclusive control of hardware channels" - -5 SOURCE - ---- - -1 INTRODUCTION - -The async_tx API provides methods for describing a chain of asynchronous -bulk memory transfers/transforms with support for inter-transactional -dependencies. It is implemented as a dmaengine client that smooths over -the details of different hardware offload engine implementations. Code -that is written to the API can optimize for asynchronous operation and -the API will fit the chain of operations to the available offload -resources. - -2 GENEALOGY - -The API was initially designed to offload the memory copy and -xor-parity-calculations of the md-raid5 driver using the offload engines -present in the Intel(R) Xscale series of I/O processors. It also built -on the 'dmaengine' layer developed for offloading memory copies in the -network stack using Intel(R) I/OAT engines. The following design -features surfaced as a result: -1/ implicit synchronous path: users of the API do not need to know if - the platform they are running on has offload capabilities. The - operation will be offloaded when an engine is available and carried out - in software otherwise. -2/ cross channel dependency chains: the API allows a chain of dependent - operations to be submitted, like xor->copy->xor in the raid5 case. The - API automatically handles cases where the transition from one operation - to another implies a hardware channel switch. -3/ dmaengine extensions to support multiple clients and operation types - beyond 'memcpy' - -3 USAGE - -3.1 General format of the API: -struct dma_async_tx_descriptor * -async_(, struct async_submit ctl *submit) - -3.2 Supported operations: -memcpy - memory copy between a source and a destination buffer -memset - fill a destination buffer with a byte value -xor - xor a series of source buffers and write the result to a - destination buffer -xor_val - xor a series of source buffers and set a flag if the - result is zero. The implementation attempts to prevent - writes to memory -pq - generate the p+q (raid6 syndrome) from a series of source buffers -pq_val - validate that a p and or q buffer are in sync with a given series of - sources -datap - (raid6_datap_recov) recover a raid6 data block and the p block - from the given sources -2data - (raid6_2data_recov) recover 2 raid6 data blocks from the given - sources - -3.3 Descriptor management: -The return value is non-NULL and points to a 'descriptor' when the operation -has been queued to execute asynchronously. Descriptors are recycled -resources, under control of the offload engine driver, to be reused as -operations complete. When an application needs to submit a chain of -operations it must guarantee that the descriptor is not automatically recycled -before the dependency is submitted. This requires that all descriptors be -acknowledged by the application before the offload engine driver is allowed to -recycle (or free) the descriptor. A descriptor can be acked by one of the -following methods: -1/ setting the ASYNC_TX_ACK flag if no child operations are to be submitted -2/ submitting an unacknowledged descriptor as a dependency to another - async_tx call will implicitly set the acknowledged state. -3/ calling async_tx_ack() on the descriptor. - -3.4 When does the operation execute? -Operations do not immediately issue after return from the -async_ call. Offload engine drivers batch operations to -improve performance by reducing the number of mmio cycles needed to -manage the channel. Once a driver-specific threshold is met the driver -automatically issues pending operations. An application can force this -event by calling async_tx_issue_pending_all(). This operates on all -channels since the application has no knowledge of channel to operation -mapping. - -3.5 When does the operation complete? -There are two methods for an application to learn about the completion -of an operation. -1/ Call dma_wait_for_async_tx(). This call causes the CPU to spin while - it polls for the completion of the operation. It handles dependency - chains and issuing pending operations. -2/ Specify a completion callback. The callback routine runs in tasklet - context if the offload engine driver supports interrupts, or it is - called in application context if the operation is carried out - synchronously in software. The callback can be set in the call to - async_, or when the application needs to submit a chain of - unknown length it can use the async_trigger_callback() routine to set a - completion interrupt/callback at the end of the chain. - -3.6 Constraints: -1/ Calls to async_ are not permitted in IRQ context. Other - contexts are permitted provided constraint #2 is not violated. -2/ Completion callback routines cannot submit new operations. This - results in recursion in the synchronous case and spin_locks being - acquired twice in the asynchronous case. - -3.7 Example: -Perform a xor->copy->xor operation where each operation depends on the -result from the previous operation: - -void callback(void *param) -{ - struct completion *cmp = param; - - complete(cmp); -} - -void run_xor_copy_xor(struct page **xor_srcs, - int xor_src_cnt, - struct page *xor_dest, - size_t xor_len, - struct page *copy_src, - struct page *copy_dest, - size_t copy_len) -{ - struct dma_async_tx_descriptor *tx; - addr_conv_t addr_conv[xor_src_cnt]; - struct async_submit_ctl submit; - addr_conv_t addr_conv[NDISKS]; - struct completion cmp; - - init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL, - addr_conv); - tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit) - - submit->depend_tx = tx; - tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit); - - init_completion(&cmp); - init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST | ASYNC_TX_ACK, tx, - callback, &cmp, addr_conv); - tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit); - - async_tx_issue_pending_all(); - - wait_for_completion(&cmp); -} - -See include/linux/async_tx.h for more information on the flags. See the -ops_run_* and ops_complete_* routines in drivers/md/raid5.c for more -implementation examples. - -4 DRIVER DEVELOPMENT NOTES - -4.1 Conformance points: -There are a few conformance points required in dmaengine drivers to -accommodate assumptions made by applications using the async_tx API: -1/ Completion callbacks are expected to happen in tasklet context -2/ dma_async_tx_descriptor fields are never manipulated in IRQ context -3/ Use async_tx_run_dependencies() in the descriptor clean up path to - handle submission of dependent operations - -4.2 "My application needs exclusive control of hardware channels" -Primarily this requirement arises from cases where a DMA engine driver -is being used to support device-to-memory operations. A channel that is -performing these operations cannot, for many platform specific reasons, -be shared. For these cases the dma_request_channel() interface is -provided. - -The interface is: -struct dma_chan *dma_request_channel(dma_cap_mask_t mask, - dma_filter_fn filter_fn, - void *filter_param); - -Where dma_filter_fn is defined as: -typedef bool (*dma_filter_fn)(struct dma_chan *chan, void *filter_param); - -When the optional 'filter_fn' parameter is set to NULL -dma_request_channel simply returns the first channel that satisfies the -capability mask. Otherwise, when the mask parameter is insufficient for -specifying the necessary channel, the filter_fn routine can be used to -disposition the available channels in the system. The filter_fn routine -is called once for each free channel in the system. Upon seeing a -suitable channel filter_fn returns DMA_ACK which flags that channel to -be the return value from dma_request_channel. A channel allocated via -this interface is exclusive to the caller, until dma_release_channel() -is called. - -The DMA_PRIVATE capability flag is used to tag dma devices that should -not be used by the general-purpose allocator. It can be set at -initialization time if it is known that a channel will always be -private. Alternatively, it is set when dma_request_channel() finds an -unused "public" channel. - -A couple caveats to note when implementing a driver and consumer: -1/ Once a channel has been privately allocated it will no longer be - considered by the general-purpose allocator even after a call to - dma_release_channel(). -2/ Since capabilities are specified at the device level a dma_device - with multiple channels will either have all channels public, or all - channels private. - -5 SOURCE - -include/linux/dmaengine.h: core header file for DMA drivers and api users -drivers/dma/dmaengine.c: offload engine channel management routines -drivers/dma/: location for offload engine drivers -include/linux/async_tx.h: core header file for the async_tx api -crypto/async_tx/async_tx.c: async_tx interface to dmaengine and common code -crypto/async_tx/async_memcpy.c: copy offload -crypto/async_tx/async_xor.c: xor and xor zero sum offload diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst index b2eeab3c8631..22a6870bf356 100644 --- a/Documentation/crypto/index.rst +++ b/Documentation/crypto/index.rst @@ -19,6 +19,8 @@ for cryptographic use cases, as well as programming examples. intro api-intro architecture + + async-tx-api asymmetric-keys devel-algos userspace-if -- cgit From 740369c5794b07d1645704cc98731ef22754ef80 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 15 Jun 2020 08:50:11 +0200 Subject: docs: crypto: descore-readme.txt: convert to ReST format Convert this readme file to ReST file format, preserving its contents as-is as much as possible. The only changes are: - Added chapter and title markups; - Added blank lines where needed; - Added list markups where needed; - Use a table markup; - replace markups like `foo' to ``foo``; - add one extra literal markup to avoid warnings. Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/1426be1c7758c0224418352665040220b8a31799.1592203650.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet --- Documentation/crypto/descore-readme.rst | 414 ++++++++++++++++++++++++++++++++ Documentation/crypto/descore-readme.txt | 352 --------------------------- Documentation/crypto/index.rst | 1 + 3 files changed, 415 insertions(+), 352 deletions(-) create mode 100644 Documentation/crypto/descore-readme.rst delete mode 100644 Documentation/crypto/descore-readme.txt (limited to 'Documentation/crypto') diff --git a/Documentation/crypto/descore-readme.rst b/Documentation/crypto/descore-readme.rst new file mode 100644 index 000000000000..45bd9c8babf4 --- /dev/null +++ b/Documentation/crypto/descore-readme.rst @@ -0,0 +1,414 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +=========================================== +Fast & Portable DES encryption & decryption +=========================================== + +.. note:: + + Below is the original README file from the descore.shar package, + converted to ReST format. + +------------------------------------------------------------------------------ + +des - fast & portable DES encryption & decryption. + +Copyright |copy| 1992 Dana L. How + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU Library General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Library General Public License for more details. + +You should have received a copy of the GNU Library General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +Author's address: how@isl.stanford.edu + +.. README,v 1.15 1992/05/20 00:25:32 how E + +==>> To compile after untarring/unsharring, just ``make`` <<== + +This package was designed with the following goals: + +1. Highest possible encryption/decryption PERFORMANCE. +2. PORTABILITY to any byte-addressable host with a 32bit unsigned C type +3. Plug-compatible replacement for KERBEROS's low-level routines. + +This second release includes a number of performance enhancements for +register-starved machines. My discussions with Richard Outerbridge, +71755.204@compuserve.com, sparked a number of these enhancements. + +To more rapidly understand the code in this package, inspect desSmallFips.i +(created by typing ``make``) BEFORE you tackle desCode.h. The latter is set +up in a parameterized fashion so it can easily be modified by speed-daemon +hackers in pursuit of that last microsecond. You will find it more +illuminating to inspect one specific implementation, +and then move on to the common abstract skeleton with this one in mind. + + +performance comparison to other available des code which i could +compile on a SPARCStation 1 (cc -O4, gcc -O2): + +this code (byte-order independent): + + - 30us per encryption (options: 64k tables, no IP/FP) + - 33us per encryption (options: 64k tables, FIPS standard bit ordering) + - 45us per encryption (options: 2k tables, no IP/FP) + - 48us per encryption (options: 2k tables, FIPS standard bit ordering) + - 275us to set a new key (uses 1k of key tables) + + this has the quickest encryption/decryption routines i've seen. + since i was interested in fast des filters rather than crypt(3) + and password cracking, i haven't really bothered yet to speed up + the key setting routine. also, i have no interest in re-implementing + all the other junk in the mit kerberos des library, so i've just + provided my routines with little stub interfaces so they can be + used as drop-in replacements with mit's code or any of the mit- + compatible packages below. (note that the first two timings above + are highly variable because of cache effects). + +kerberos des replacement from australia (version 1.95): + + - 53us per encryption (uses 2k of tables) + - 96us to set a new key (uses 2.25k of key tables) + + so despite the author's inclusion of some of the performance + improvements i had suggested to him, this package's + encryption/decryption is still slower on the sparc and 68000. + more specifically, 19-40% slower on the 68020 and 11-35% slower + on the sparc, depending on the compiler; + in full gory detail (ALT_ECB is a libdes variant): + + =============== ============== =============== ================= + compiler machine desCore libdes ALT_ECB slower by + =============== ============== =============== ================= + gcc 2.1 -O2 Sun 3/110 304 uS 369.5uS 461.8uS 22% + cc -O1 Sun 3/110 336 uS 436.6uS 399.3uS 19% + cc -O2 Sun 3/110 360 uS 532.4uS 505.1uS 40% + cc -O4 Sun 3/110 365 uS 532.3uS 505.3uS 38% + gcc 2.1 -O2 Sun 4/50 48 uS 53.4uS 57.5uS 11% + cc -O2 Sun 4/50 48 uS 64.6uS 64.7uS 35% + cc -O4 Sun 4/50 48 uS 64.7uS 64.9uS 35% + =============== ============== =============== ================= + + (my time measurements are not as accurate as his). + + the comments in my first release of desCore on version 1.92: + + - 68us per encryption (uses 2k of tables) + - 96us to set a new key (uses 2.25k of key tables) + + this is a very nice package which implements the most important + of the optimizations which i did in my encryption routines. + it's a bit weak on common low-level optimizations which is why + it's 39%-106% slower. because he was interested in fast crypt(3) and + password-cracking applications, he also used the same ideas to + speed up the key-setting routines with impressive results. + (at some point i may do the same in my package). he also implements + the rest of the mit des library. + + (code from eay@psych.psy.uq.oz.au via comp.sources.misc) + +fast crypt(3) package from denmark: + + the des routine here is buried inside a loop to do the + crypt function and i didn't feel like ripping it out and measuring + performance. his code takes 26 sparc instructions to compute one + des iteration; above, Quick (64k) takes 21 and Small (2k) takes 37. + he claims to use 280k of tables but the iteration calculation seems + to use only 128k. his tables and code are machine independent. + + (code from glad@daimi.aau.dk via alt.sources or comp.sources.misc) + +swedish reimplementation of Kerberos des library + + - 108us per encryption (uses 34k worth of tables) + - 134us to set a new key (uses 32k of key tables to get this speed!) + + the tables used seem to be machine-independent; + he seems to have included a lot of special case code + so that, e.g., ``long`` loads can be used instead of 4 ``char`` loads + when the machine's architecture allows it. + + (code obtained from chalmers.se:pub/des) + +crack 3.3c package from england: + + as in crypt above, the des routine is buried in a loop. it's + also very modified for crypt. his iteration code uses 16k + of tables and appears to be slow. + + (code obtained from aem@aber.ac.uk via alt.sources or comp.sources.misc) + +``highly optimized`` and tweaked Kerberos/Athena code (byte-order dependent): + + - 165us per encryption (uses 6k worth of tables) + - 478us to set a new key (uses <1k of key tables) + + so despite the comments in this code, it was possible to get + faster code AND smaller tables, as well as making the tables + machine-independent. + (code obtained from prep.ai.mit.edu) + +UC Berkeley code (depends on machine-endedness): + - 226us per encryption + - 10848us to set a new key + + table sizes are unclear, but they don't look very small + (code obtained from wuarchive.wustl.edu) + + +motivation and history +====================== + +a while ago i wanted some des routines and the routines documented on sun's +man pages either didn't exist or dumped core. i had heard of kerberos, +and knew that it used des, so i figured i'd use its routines. but once +i got it and looked at the code, it really set off a lot of pet peeves - +it was too convoluted, the code had been written without taking +advantage of the regular structure of operations such as IP, E, and FP +(i.e. the author didn't sit down and think before coding), +it was excessively slow, the author had attempted to clarify the code +by adding MORE statements to make the data movement more ``consistent`` +instead of simplifying his implementation and cutting down on all data +movement (in particular, his use of L1, R1, L2, R2), and it was full of +idiotic ``tweaks`` for particular machines which failed to deliver significant +speedups but which did obfuscate everything. so i took the test data +from his verification program and rewrote everything else. + +a while later i ran across the great crypt(3) package mentioned above. +the fact that this guy was computing 2 sboxes per table lookup rather +than one (and using a MUCH larger table in the process) emboldened me to +do the same - it was a trivial change from which i had been scared away +by the larger table size. in his case he didn't realize you don't need to keep +the working data in TWO forms, one for easy use of half the sboxes in +indexing, the other for easy use of the other half; instead you can keep +it in the form for the first half and use a simple rotate to get the other +half. this means i have (almost) half the data manipulation and half +the table size. in fairness though he might be encoding something particular +to crypt(3) in his tables - i didn't check. + +i'm glad that i implemented it the way i did, because this C version is +portable (the ifdef's are performance enhancements) and it is faster +than versions hand-written in assembly for the sparc! + + +porting notes +============= + +one thing i did not want to do was write an enormous mess +which depended on endedness and other machine quirks, +and which necessarily produced different code and different lookup tables +for different machines. see the kerberos code for an example +of what i didn't want to do; all their endedness-specific ``optimizations`` +obfuscate the code and in the end were slower than a simpler machine +independent approach. however, there are always some portability +considerations of some kind, and i have included some options +for varying numbers of register variables. +perhaps some will still regard the result as a mess! + +1) i assume everything is byte addressable, although i don't actually + depend on the byte order, and that bytes are 8 bits. + i assume word pointers can be freely cast to and from char pointers. + note that 99% of C programs make these assumptions. + i always use unsigned char's if the high bit could be set. +2) the typedef ``word`` means a 32 bit unsigned integral type. + if ``unsigned long`` is not 32 bits, change the typedef in desCore.h. + i assume sizeof(word) == 4 EVERYWHERE. + +the (worst-case) cost of my NOT doing endedness-specific optimizations +in the data loading and storing code surrounding the key iterations +is less than 12%. also, there is the added benefit that +the input and output work areas do not need to be word-aligned. + + +OPTIONAL performance optimizations +================================== + +1) you should define one of ``i386,`` ``vax,`` ``mc68000,`` or ``sparc,`` + whichever one is closest to the capabilities of your machine. + see the start of desCode.h to see exactly what this selection implies. + note that if you select the wrong one, the des code will still work; + these are just performance tweaks. +2) for those with functional ``asm`` keywords: you should change the + ROR and ROL macros to use machine rotate instructions if you have them. + this will save 2 instructions and a temporary per use, + or about 32 to 40 instructions per en/decryption. + + note that gcc is smart enough to translate the ROL/R macros into + machine rotates! + +these optimizations are all rather persnickety, yet with them you should +be able to get performance equal to assembly-coding, except that: + +1) with the lack of a bit rotate operator in C, rotates have to be synthesized + from shifts. so access to ``asm`` will speed things up if your machine + has rotates, as explained above in (3) (not necessary if you use gcc). +2) if your machine has less than 12 32-bit registers i doubt your compiler will + generate good code. + + ``i386`` tries to configure the code for a 386 by only declaring 3 registers + (it appears that gcc can use ebx, esi and edi to hold register variables). + however, if you like assembly coding, the 386 does have 7 32-bit registers, + and if you use ALL of them, use ``scaled by 8`` address modes with displacement + and other tricks, you can get reasonable routines for DesQuickCore... with + about 250 instructions apiece. For DesSmall... it will help to rearrange + des_keymap, i.e., now the sbox # is the high part of the index and + the 6 bits of data is the low part; it helps to exchange these. + + since i have no way to conveniently test it i have not provided my + shoehorned 386 version. note that with this release of desCore, gcc is able + to put everything in registers(!), and generate about 370 instructions apiece + for the DesQuickCore... routines! + +coding notes +============ + +the en/decryption routines each use 6 necessary register variables, +with 4 being actively used at once during the inner iterations. +if you don't have 4 register variables get a new machine. +up to 8 more registers are used to hold constants in some configurations. + +i assume that the use of a constant is more expensive than using a register: + +a) additionally, i have tried to put the larger constants in registers. + registering priority was by the following: + + - anything more than 12 bits (bad for RISC and CISC) + - greater than 127 in value (can't use movq or byte immediate on CISC) + - 9-127 (may not be able to use CISC shift immediate or add/sub quick), + - 1-8 were never registered, being the cheapest constants. + +b) the compiler may be too stupid to realize table and table+256 should + be assigned to different constant registers and instead repetitively + do the arithmetic, so i assign these to explicit ``m`` register variables + when possible and helpful. + +i assume that indexing is cheaper or equivalent to auto increment/decrement, +where the index is 7 bits unsigned or smaller. +this assumption is reversed for 68k and vax. + +i assume that addresses can be cheaply formed from two registers, +or from a register and a small constant. +for the 68000, the ``two registers and small offset`` form is used sparingly. +all index scaling is done explicitly - no hidden shifts by log2(sizeof). + +the code is written so that even a dumb compiler +should never need more than one hidden temporary, +increasing the chance that everything will fit in the registers. +KEEP THIS MORE SUBTLE POINT IN MIND IF YOU REWRITE ANYTHING. + +(actually, there are some code fragments now which do require two temps, +but fixing it would either break the structure of the macros or +require declaring another temporary). + + +special efficient data format +============================== + +bits are manipulated in this arrangement most of the time (S7 S5 S3 S1):: + + 003130292827xxxx242322212019xxxx161514131211xxxx080706050403xxxx + +(the x bits are still there, i'm just emphasizing where the S boxes are). +bits are rotated left 4 when computing S6 S4 S2 S0:: + + 282726252423xxxx201918171615xxxx121110090807xxxx040302010031xxxx + +the rightmost two bits are usually cleared so the lower byte can be used +as an index into an sbox mapping table. the next two x'd bits are set +to various values to access different parts of the tables. + + +how to use the routines + +datatypes: + pointer to 8 byte area of type DesData + used to hold keys and input/output blocks to des. + + pointer to 128 byte area of type DesKeys + used to hold full 768-bit key. + must be long-aligned. + +DesQuickInit() + call this before using any other routine with ``Quick`` in its name. + it generates the special 64k table these routines need. +DesQuickDone() + frees this table + +DesMethod(m, k) + m points to a 128byte block, k points to an 8 byte des key + which must have odd parity (or -1 is returned) and which must + not be a (semi-)weak key (or -2 is returned). + normally DesMethod() returns 0. + + m is filled in from k so that when one of the routines below + is called with m, the routine will act like standard des + en/decryption with the key k. if you use DesMethod, + you supply a standard 56bit key; however, if you fill in + m yourself, you will get a 768bit key - but then it won't + be standard. it's 768bits not 1024 because the least significant + two bits of each byte are not used. note that these two bits + will be set to magic constants which speed up the encryption/decryption + on some machines. and yes, each byte controls + a specific sbox during a specific iteration. + + you really shouldn't use the 768bit format directly; i should + provide a routine that converts 128 6-bit bytes (specified in + S-box mapping order or something) into the right format for you. + this would entail some byte concatenation and rotation. + +Des{Small|Quick}{Fips|Core}{Encrypt|Decrypt}(d, m, s) + performs des on the 8 bytes at s into the 8 bytes at + ``d. (d,s: char *)``. + + uses m as a 768bit key as explained above. + + the Encrypt|Decrypt choice is obvious. + + Fips|Core determines whether a completely standard FIPS initial + and final permutation is done; if not, then the data is loaded + and stored in a nonstandard bit order (FIPS w/o IP/FP). + + Fips slows down Quick by 10%, Small by 9%. + + Small|Quick determines whether you use the normal routine + or the crazy quick one which gobbles up 64k more of memory. + Small is 50% slower then Quick, but Quick needs 32 times as much + memory. Quick is included for programs that do nothing but DES, + e.g., encryption filters, etc. + + +Getting it to compile on your machine +===================================== + +there are no machine-dependencies in the code (see porting), +except perhaps the ``now()`` macro in desTest.c. +ALL generated tables are machine independent. +you should edit the Makefile with the appropriate optimization flags +for your compiler (MAX optimization). + + +Speeding up kerberos (and/or its des library) +============================================= + +note that i have included a kerberos-compatible interface in desUtil.c +through the functions des_key_sched() and des_ecb_encrypt(). +to use these with kerberos or kerberos-compatible code put desCore.a +ahead of the kerberos-compatible library on your linker's command line. +you should not need to #include desCore.h; just include the header +file provided with the kerberos library. + +Other uses +========== + +the macros in desCode.h would be very useful for putting inline des +functions in more complicated encryption routines. diff --git a/Documentation/crypto/descore-readme.txt b/Documentation/crypto/descore-readme.txt deleted file mode 100644 index 16e9e6350755..000000000000 --- a/Documentation/crypto/descore-readme.txt +++ /dev/null @@ -1,352 +0,0 @@ -Below is the original README file from the descore.shar package. ------------------------------------------------------------------------------- - -des - fast & portable DES encryption & decryption. -Copyright (C) 1992 Dana L. How - -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU Library General Public License as published by -the Free Software Foundation; either version 2 of the License, or -(at your option) any later version. - -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU Library General Public License for more details. - -You should have received a copy of the GNU Library General Public License -along with this program; if not, write to the Free Software -Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - -Author's address: how@isl.stanford.edu - -$Id: README,v 1.15 1992/05/20 00:25:32 how E $ - - -==>> To compile after untarring/unsharring, just `make' <<== - - -This package was designed with the following goals: -1. Highest possible encryption/decryption PERFORMANCE. -2. PORTABILITY to any byte-addressable host with a 32bit unsigned C type -3. Plug-compatible replacement for KERBEROS's low-level routines. - -This second release includes a number of performance enhancements for -register-starved machines. My discussions with Richard Outerbridge, -71755.204@compuserve.com, sparked a number of these enhancements. - -To more rapidly understand the code in this package, inspect desSmallFips.i -(created by typing `make') BEFORE you tackle desCode.h. The latter is set -up in a parameterized fashion so it can easily be modified by speed-daemon -hackers in pursuit of that last microsecond. You will find it more -illuminating to inspect one specific implementation, -and then move on to the common abstract skeleton with this one in mind. - - -performance comparison to other available des code which i could -compile on a SPARCStation 1 (cc -O4, gcc -O2): - -this code (byte-order independent): - 30us per encryption (options: 64k tables, no IP/FP) - 33us per encryption (options: 64k tables, FIPS standard bit ordering) - 45us per encryption (options: 2k tables, no IP/FP) - 48us per encryption (options: 2k tables, FIPS standard bit ordering) - 275us to set a new key (uses 1k of key tables) - this has the quickest encryption/decryption routines i've seen. - since i was interested in fast des filters rather than crypt(3) - and password cracking, i haven't really bothered yet to speed up - the key setting routine. also, i have no interest in re-implementing - all the other junk in the mit kerberos des library, so i've just - provided my routines with little stub interfaces so they can be - used as drop-in replacements with mit's code or any of the mit- - compatible packages below. (note that the first two timings above - are highly variable because of cache effects). - -kerberos des replacement from australia (version 1.95): - 53us per encryption (uses 2k of tables) - 96us to set a new key (uses 2.25k of key tables) - so despite the author's inclusion of some of the performance - improvements i had suggested to him, this package's - encryption/decryption is still slower on the sparc and 68000. - more specifically, 19-40% slower on the 68020 and 11-35% slower - on the sparc, depending on the compiler; - in full gory detail (ALT_ECB is a libdes variant): - compiler machine desCore libdes ALT_ECB slower by - gcc 2.1 -O2 Sun 3/110 304 uS 369.5uS 461.8uS 22% - cc -O1 Sun 3/110 336 uS 436.6uS 399.3uS 19% - cc -O2 Sun 3/110 360 uS 532.4uS 505.1uS 40% - cc -O4 Sun 3/110 365 uS 532.3uS 505.3uS 38% - gcc 2.1 -O2 Sun 4/50 48 uS 53.4uS 57.5uS 11% - cc -O2 Sun 4/50 48 uS 64.6uS 64.7uS 35% - cc -O4 Sun 4/50 48 uS 64.7uS 64.9uS 35% - (my time measurements are not as accurate as his). - the comments in my first release of desCore on version 1.92: - 68us per encryption (uses 2k of tables) - 96us to set a new key (uses 2.25k of key tables) - this is a very nice package which implements the most important - of the optimizations which i did in my encryption routines. - it's a bit weak on common low-level optimizations which is why - it's 39%-106% slower. because he was interested in fast crypt(3) and - password-cracking applications, he also used the same ideas to - speed up the key-setting routines with impressive results. - (at some point i may do the same in my package). he also implements - the rest of the mit des library. - (code from eay@psych.psy.uq.oz.au via comp.sources.misc) - -fast crypt(3) package from denmark: - the des routine here is buried inside a loop to do the - crypt function and i didn't feel like ripping it out and measuring - performance. his code takes 26 sparc instructions to compute one - des iteration; above, Quick (64k) takes 21 and Small (2k) takes 37. - he claims to use 280k of tables but the iteration calculation seems - to use only 128k. his tables and code are machine independent. - (code from glad@daimi.aau.dk via alt.sources or comp.sources.misc) - -swedish reimplementation of Kerberos des library - 108us per encryption (uses 34k worth of tables) - 134us to set a new key (uses 32k of key tables to get this speed!) - the tables used seem to be machine-independent; - he seems to have included a lot of special case code - so that, e.g., `long' loads can be used instead of 4 `char' loads - when the machine's architecture allows it. - (code obtained from chalmers.se:pub/des) - -crack 3.3c package from england: - as in crypt above, the des routine is buried in a loop. it's - also very modified for crypt. his iteration code uses 16k - of tables and appears to be slow. - (code obtained from aem@aber.ac.uk via alt.sources or comp.sources.misc) - -``highly optimized'' and tweaked Kerberos/Athena code (byte-order dependent): - 165us per encryption (uses 6k worth of tables) - 478us to set a new key (uses <1k of key tables) - so despite the comments in this code, it was possible to get - faster code AND smaller tables, as well as making the tables - machine-independent. - (code obtained from prep.ai.mit.edu) - -UC Berkeley code (depends on machine-endedness): - 226us per encryption -10848us to set a new key - table sizes are unclear, but they don't look very small - (code obtained from wuarchive.wustl.edu) - - -motivation and history - -a while ago i wanted some des routines and the routines documented on sun's -man pages either didn't exist or dumped core. i had heard of kerberos, -and knew that it used des, so i figured i'd use its routines. but once -i got it and looked at the code, it really set off a lot of pet peeves - -it was too convoluted, the code had been written without taking -advantage of the regular structure of operations such as IP, E, and FP -(i.e. the author didn't sit down and think before coding), -it was excessively slow, the author had attempted to clarify the code -by adding MORE statements to make the data movement more `consistent' -instead of simplifying his implementation and cutting down on all data -movement (in particular, his use of L1, R1, L2, R2), and it was full of -idiotic `tweaks' for particular machines which failed to deliver significant -speedups but which did obfuscate everything. so i took the test data -from his verification program and rewrote everything else. - -a while later i ran across the great crypt(3) package mentioned above. -the fact that this guy was computing 2 sboxes per table lookup rather -than one (and using a MUCH larger table in the process) emboldened me to -do the same - it was a trivial change from which i had been scared away -by the larger table size. in his case he didn't realize you don't need to keep -the working data in TWO forms, one for easy use of half the sboxes in -indexing, the other for easy use of the other half; instead you can keep -it in the form for the first half and use a simple rotate to get the other -half. this means i have (almost) half the data manipulation and half -the table size. in fairness though he might be encoding something particular -to crypt(3) in his tables - i didn't check. - -i'm glad that i implemented it the way i did, because this C version is -portable (the ifdef's are performance enhancements) and it is faster -than versions hand-written in assembly for the sparc! - - -porting notes - -one thing i did not want to do was write an enormous mess -which depended on endedness and other machine quirks, -and which necessarily produced different code and different lookup tables -for different machines. see the kerberos code for an example -of what i didn't want to do; all their endedness-specific `optimizations' -obfuscate the code and in the end were slower than a simpler machine -independent approach. however, there are always some portability -considerations of some kind, and i have included some options -for varying numbers of register variables. -perhaps some will still regard the result as a mess! - -1) i assume everything is byte addressable, although i don't actually - depend on the byte order, and that bytes are 8 bits. - i assume word pointers can be freely cast to and from char pointers. - note that 99% of C programs make these assumptions. - i always use unsigned char's if the high bit could be set. -2) the typedef `word' means a 32 bit unsigned integral type. - if `unsigned long' is not 32 bits, change the typedef in desCore.h. - i assume sizeof(word) == 4 EVERYWHERE. - -the (worst-case) cost of my NOT doing endedness-specific optimizations -in the data loading and storing code surrounding the key iterations -is less than 12%. also, there is the added benefit that -the input and output work areas do not need to be word-aligned. - - -OPTIONAL performance optimizations - -1) you should define one of `i386,' `vax,' `mc68000,' or `sparc,' - whichever one is closest to the capabilities of your machine. - see the start of desCode.h to see exactly what this selection implies. - note that if you select the wrong one, the des code will still work; - these are just performance tweaks. -2) for those with functional `asm' keywords: you should change the - ROR and ROL macros to use machine rotate instructions if you have them. - this will save 2 instructions and a temporary per use, - or about 32 to 40 instructions per en/decryption. - note that gcc is smart enough to translate the ROL/R macros into - machine rotates! - -these optimizations are all rather persnickety, yet with them you should -be able to get performance equal to assembly-coding, except that: -1) with the lack of a bit rotate operator in C, rotates have to be synthesized - from shifts. so access to `asm' will speed things up if your machine - has rotates, as explained above in (3) (not necessary if you use gcc). -2) if your machine has less than 12 32-bit registers i doubt your compiler will - generate good code. - `i386' tries to configure the code for a 386 by only declaring 3 registers - (it appears that gcc can use ebx, esi and edi to hold register variables). - however, if you like assembly coding, the 386 does have 7 32-bit registers, - and if you use ALL of them, use `scaled by 8' address modes with displacement - and other tricks, you can get reasonable routines for DesQuickCore... with - about 250 instructions apiece. For DesSmall... it will help to rearrange - des_keymap, i.e., now the sbox # is the high part of the index and - the 6 bits of data is the low part; it helps to exchange these. - since i have no way to conveniently test it i have not provided my - shoehorned 386 version. note that with this release of desCore, gcc is able - to put everything in registers(!), and generate about 370 instructions apiece - for the DesQuickCore... routines! - -coding notes - -the en/decryption routines each use 6 necessary register variables, -with 4 being actively used at once during the inner iterations. -if you don't have 4 register variables get a new machine. -up to 8 more registers are used to hold constants in some configurations. - -i assume that the use of a constant is more expensive than using a register: -a) additionally, i have tried to put the larger constants in registers. - registering priority was by the following: - anything more than 12 bits (bad for RISC and CISC) - greater than 127 in value (can't use movq or byte immediate on CISC) - 9-127 (may not be able to use CISC shift immediate or add/sub quick), - 1-8 were never registered, being the cheapest constants. -b) the compiler may be too stupid to realize table and table+256 should - be assigned to different constant registers and instead repetitively - do the arithmetic, so i assign these to explicit `m' register variables - when possible and helpful. - -i assume that indexing is cheaper or equivalent to auto increment/decrement, -where the index is 7 bits unsigned or smaller. -this assumption is reversed for 68k and vax. - -i assume that addresses can be cheaply formed from two registers, -or from a register and a small constant. -for the 68000, the `two registers and small offset' form is used sparingly. -all index scaling is done explicitly - no hidden shifts by log2(sizeof). - -the code is written so that even a dumb compiler -should never need more than one hidden temporary, -increasing the chance that everything will fit in the registers. -KEEP THIS MORE SUBTLE POINT IN MIND IF YOU REWRITE ANYTHING. -(actually, there are some code fragments now which do require two temps, -but fixing it would either break the structure of the macros or -require declaring another temporary). - - -special efficient data format - -bits are manipulated in this arrangement most of the time (S7 S5 S3 S1): - 003130292827xxxx242322212019xxxx161514131211xxxx080706050403xxxx -(the x bits are still there, i'm just emphasizing where the S boxes are). -bits are rotated left 4 when computing S6 S4 S2 S0: - 282726252423xxxx201918171615xxxx121110090807xxxx040302010031xxxx -the rightmost two bits are usually cleared so the lower byte can be used -as an index into an sbox mapping table. the next two x'd bits are set -to various values to access different parts of the tables. - - -how to use the routines - -datatypes: - pointer to 8 byte area of type DesData - used to hold keys and input/output blocks to des. - - pointer to 128 byte area of type DesKeys - used to hold full 768-bit key. - must be long-aligned. - -DesQuickInit() - call this before using any other routine with `Quick' in its name. - it generates the special 64k table these routines need. -DesQuickDone() - frees this table - -DesMethod(m, k) - m points to a 128byte block, k points to an 8 byte des key - which must have odd parity (or -1 is returned) and which must - not be a (semi-)weak key (or -2 is returned). - normally DesMethod() returns 0. - m is filled in from k so that when one of the routines below - is called with m, the routine will act like standard des - en/decryption with the key k. if you use DesMethod, - you supply a standard 56bit key; however, if you fill in - m yourself, you will get a 768bit key - but then it won't - be standard. it's 768bits not 1024 because the least significant - two bits of each byte are not used. note that these two bits - will be set to magic constants which speed up the encryption/decryption - on some machines. and yes, each byte controls - a specific sbox during a specific iteration. - you really shouldn't use the 768bit format directly; i should - provide a routine that converts 128 6-bit bytes (specified in - S-box mapping order or something) into the right format for you. - this would entail some byte concatenation and rotation. - -Des{Small|Quick}{Fips|Core}{Encrypt|Decrypt}(d, m, s) - performs des on the 8 bytes at s into the 8 bytes at d. (d,s: char *). - uses m as a 768bit key as explained above. - the Encrypt|Decrypt choice is obvious. - Fips|Core determines whether a completely standard FIPS initial - and final permutation is done; if not, then the data is loaded - and stored in a nonstandard bit order (FIPS w/o IP/FP). - Fips slows down Quick by 10%, Small by 9%. - Small|Quick determines whether you use the normal routine - or the crazy quick one which gobbles up 64k more of memory. - Small is 50% slower then Quick, but Quick needs 32 times as much - memory. Quick is included for programs that do nothing but DES, - e.g., encryption filters, etc. - - -Getting it to compile on your machine - -there are no machine-dependencies in the code (see porting), -except perhaps the `now()' macro in desTest.c. -ALL generated tables are machine independent. -you should edit the Makefile with the appropriate optimization flags -for your compiler (MAX optimization). - - -Speeding up kerberos (and/or its des library) - -note that i have included a kerberos-compatible interface in desUtil.c -through the functions des_key_sched() and des_ecb_encrypt(). -to use these with kerberos or kerberos-compatible code put desCore.a -ahead of the kerberos-compatible library on your linker's command line. -you should not need to #include desCore.h; just include the header -file provided with the kerberos library. - -Other uses - -the macros in desCode.h would be very useful for putting inline des -functions in more complicated encryption routines. diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst index 22a6870bf356..21338fa92642 100644 --- a/Documentation/crypto/index.rst +++ b/Documentation/crypto/index.rst @@ -27,3 +27,4 @@ for cryptographic use cases, as well as programming examples. crypto_engine api api-samples + descore-readme -- cgit