linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2017-02-11	crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers	Ard Biesheuvel
	The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13	crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64	Ard Biesheuvel
	This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% () faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13	crypto: arm64/aes - add scalar implementation	Ard Biesheuvel
	This adds a scalar implementation of AES, based on the precomputed tables that are exposed by the generic AES code. Since rotates are cheap on arm64, this implementation only uses the 4 core tables (of 1 KB each), and avoids the prerotated ones, reducing the D-cache footprint by 75%. On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13	crypto: arm64/chacha20 - implement NEON version based on SSE3 code	Ard Biesheuvel
	This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-28	Revert "crypto: arm64/ARM: NEON accelerated ChaCha20"	Herbert Xu
	This patch reverts the following commits: 8621caa0d45e731f2e9f5889ff5bb384fcd6e059 8096667273477e735b0072b11a6d617ccee45e5f I should not have applied them because they had already been obsoleted by a subsequent patch series. They also cause a build failure because of the subsequent commit 9ae433bc79f9. Fixes: 9ae433bc79f ("crypto: chacha20 - convert generic and...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-27	crypto: arm64/chacha20 - implement NEON version based on SSE3 code	Ard Biesheuvel
	This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07	crypto: arm64/crc32 - accelerated support based on x86 SSE implementation	Ard Biesheuvel
	This is a combination of the the Intel algorithm implemented using SSE and PCLMULQDQ instructions from arch/x86/crypto/crc32-pclmul_asm.S, and the new CRC32 extensions introduced for both 32-bit and 64-bit ARM in version 8 of the architecture. Two versions of the above combo are provided, one for CRC32 and one for CRC32C. The PMULL/NEON algorithm is faster, but operates on blocks of at least 64 bytes, and on multiples of 16 bytes only. For the remaining input, or for all input on systems that lack the PMULL 64x64->128 instructions, the CRC32 instructions will be used. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-12-07	crypto: arm64/crct10dif - port x86 SSE implementation to arm64	Ard Biesheuvel
	This is a transliteration of the Intel algorithm implemented using SSE and PCLMULQDQ instructions that resides in the file arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only operate on buffers that are 16 byte aligned (but of any size) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-11-28	crypto: arm64/sha2 - integrate OpenSSL implementations of SHA256/SHA512	Ard Biesheuvel
	This integrates both the accelerated scalar and the NEON implementations of SHA-224/256 as well as SHA-384/512 from the OpenSSL project. Relative performance compared to the respective generic C versions: \| SHA256-scalar \| SHA256-NEON* \| SHA512 \| ------------+-----------------+--------------+----------+ Cortex-A53 \| 1.63x \| 1.63x \| 2.34x \| Cortex-A57 \| 1.43x \| 1.59x \| 1.95x \| Cortex-A73 \| 1.26x \| 1.56x \| ? \| The core crypto code was authored by Andy Polyakov of the OpenSSL project, in collaboration with whom the upstream code was adapted so that this module can be built from the same version of sha512-armv8.pl. The version in this patch was taken from OpenSSL commit 32bbb62ea634 ("sha/asm/sha512-armv8.pl: fix big-endian support in __KERNEL__ case.") * The core SHA algorithm is fundamentally sequential, but there is a secondary transformation involved, called the schedule update, which can be performed independently. The NEON version of SHA-224/SHA-256 only implements this part of the algorithm using NEON instructions, the sequential part is always done using scalar instructions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-02-26	arm64: crypto: increase AES interleave to 4x	Ard Biesheuvel
	This patch increases the interleave factor for parallel AES modes to 4x. This improves performance on Cortex-A57 by ~35%. This is due to the 3-cycle latency of AES instructions on the A57's relatively deep pipeline (compared to Cortex-A53 where the AES instruction latency is only 2 cycles). At the same time, disable inline expansion of the core AES functions, as the performance benefit of this feature is negligible. Measured on AMD Seattle (using tcrypt.ko mode=500 sec=1): Baseline (2x interleave, inline expansion) ------------------------------------------ testing speed of async cbc(aes) (cbc-aes-ce) decryption test 4 (128 bit key, 8192 byte blocks): 95545 operations in 1 seconds test 14 (256 bit key, 8192 byte blocks): 68496 operations in 1 seconds This patch (4x interleave, no inline expansion) ----------------------------------------------- testing speed of async cbc(aes) (cbc-aes-ce) decryption test 4 (128 bit key, 8192 byte blocks): 124735 operations in 1 seconds test 14 (256 bit key, 8192 byte blocks): 92328 operations in 1 seconds Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-11-20	crypto: crc32 - Add ARM64 CRC32 hw accelerated module	Yazen Ghannam
	This module registers a crc32 algorithm and a crc32c algorithm that use the optional CRC32 and CRC32C instructions in ARMv8. Tested on AMD Seattle. Improvement compared to crc32c-generic algorithm: TCRYPT CRC32C speed test shows ~450% speedup. Simple dd write tests to btrfs filesystem show ~30% speedup. Signed-off-by: Yazen Ghannam <yazen.ghannam@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-07-24	arm64/crypto: fix makefile rule for aes-glue-%.o	Andreas Schwab
	This fixes the following build failure when building with CONFIG_MODVERSIONS enabled: CC [M] arch/arm64/crypto/aes-glue-ce.o ld: cannot find arch/arm64/crypto/aes-glue-ce.o: No such file or directory make[1]: * [arch/arm64/crypto/aes-ce-blk.o] Error 1 make: * [arch/arm64/crypto] Error 2 The $(obj)/aes-glue-%.o rule only creates $(obj)/.tmp_aes-glue-ce.o, it should use if_changed_rule instead of if_changed_dep. Signed-off-by: Andreas Schwab <schwab@suse.de> [ardb: mention CONFIG_MODVERSIONS in commit log] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-14	arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions	Ard Biesheuvel
	This adds ARMv8 implementations of AES in ECB, CBC, CTR and XTS modes, both for ARMv8 with Crypto Extensions and for plain ARMv8 NEON. The Crypto Extensions version can only run on ARMv8 implementations that have support for these optional extensions. The plain NEON version is a table based yet time invariant implementation. All S-box substitutions are performed in parallel, leveraging the wide range of ARMv8's tbl/tbx instructions, and the huge NEON register file, which can comfortably hold the entire S-box and still have room to spare for doing the actual computations. The key expansion routines were borrowed from aes_generic. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14	arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions	Ard Biesheuvel
	This patch adds support for the AES-CCM encryption algorithm for CPUs that have support for the AES part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14	arm64/crypto: AES using ARMv8 Crypto Extensions	Ard Biesheuvel
	This patch adds support for the AES symmetric encryption algorithm for CPUs that have support for the AES part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14	arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions	Ard Biesheuvel
	This is a port to ARMv8 (Crypto Extensions) of the Intel implementation of the GHASH Secure Hash (used in the Galois/Counter chaining mode). It relies on the optional PMULL/PMULL2 instruction (polynomial multiply long, what Intel call carry-less multiply). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14	arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions	Ard Biesheuvel
	This patch adds support for the SHA-224 and SHA-256 Secure Hash Algorithms for CPUs that have support for the SHA-2 part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14	arm64/crypto: SHA-1 using ARMv8 Crypto Extensions	Ard Biesheuvel
	This patch adds support for the SHA-1 Secure Hash Algorithm for CPUs that have support for the SHA-1 part of the ARM v8 Crypto Extensions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>