summaryrefslogtreecommitdiff
path: root/arch/arm64/crypto/aes-neon.S
AgeCommit message (Collapse)Author
2019-09-09crypto: arm64/aes-neonbs - implement ciphertext stealing for XTSArd Biesheuvel
Update the AES-XTS implementation based on NEON instructions so that it can deal with inputs whose size is not a multiple of the cipher block size. This is part of the original XTS specification, but was never implemented before in the Linux kernel. Since the bit slicing driver is only faster if it can operate on at least 7 blocks of input at the same time, let's reuse the alternate path we are adding for CTS to process any data tail whose size is not a multiple of 128 bytes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: arm64/aes-neon - switch to shared AES SboxesArd Biesheuvel
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-08Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3: API: - Test shash interface directly in testmgr - cra_driver_name is now mandatory Algorithms: - Replace arc4 crypto_cipher with library helper - Implement 5 way interleave for ECB, CBC and CTR on arm64 - Add xxhash - Add continuous self-test on noise source to drbg - Update jitter RNG Drivers: - Add support for SHA204A random number generator - Add support for 7211 in iproc-rng200 - Fix fuzz test failures in inside-secure - Fix fuzz test failures in talitos - Fix fuzz test failures in qat" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits) crypto: stm32/hash - remove interruptible condition for dma crypto: stm32/hash - Fix hmac issue more than 256 bytes crypto: stm32/crc32 - rename driver file crypto: amcc - remove memset after dma_alloc_coherent crypto: ccp - Switch to SPDX license identifiers crypto: ccp - Validate the the error value used to index error messages crypto: doc - Fix formatting of new crypto engine content crypto: doc - Add parameter documentation crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR crypto: arm64/aes-ce - add 5 way interleave routines crypto: talitos - drop icv_ool crypto: talitos - fix hash on SEC1. crypto: talitos - move struct talitos_edesc into talitos.h lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto/NX: Set receive window credits to max number of CRBs in RxFIFO crypto: asymmetric_keys - select CRYPTO_HASH where needed crypto: serpent - mark __serpent_setkey_sbox noinline crypto: testmgr - dynamically allocate crypto_shash crypto: testmgr - dynamically allocate testvec_config crypto: talitos - eliminate unneeded 'done' functions at build time ...
2019-07-03crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTRArd Biesheuvel
This implements 5-way interleaving for ECB, CBC decryption and CTR, resulting in a speedup of ~11% on Marvell ThunderX2, which has a very deep pipeline and therefore a high issue latency for NEON instructions operating on the same registers. Note that XTS is left alone: implementing 5-way interleave there would either involve spilling of the calculated tweaks to the stack, or recalculating them after the encryption operation, and doing either of those would most likely penalize low end cores. For ECB, this is not a concern at all, given that we have plenty of spare registers. For CTR and CBC decryption, we take advantage of the fact that v16 is not used by the CE version of the code (which is the only one targeted by the optimization), and so we can reshuffle the code a bit and avoid having to spill to memory (with the exception of one extra reload in the CBC routine) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-03crypto: arm64/aes-ce - add 5 way interleave routinesArd Biesheuvel
In preparation of tweaking the accelerated AES chaining mode routines to be able to use a 5-way stride, implement the core routines to support processing 5 blocks of input at a time. While at it, drop the 2 way versions, which have been unused for a while now. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-21crypto: arm64/aes-blk - improve XTS mask handlingArd Biesheuvel
The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-18crypto: arm64/aes-neon - move literal data to .rodata sectionArd Biesheuvel
Move the S-boxes and some other literals to the .rodata section where it is safe from being exploited by speculative execution. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes-neon-blk - tweak performance for low end coresArd Biesheuvel
The non-bitsliced AES implementation using the NEON is highly sensitive to micro-architectural details, and, as it turns out, the Cortex-A53 on the Raspberry Pi 3 is a core that can benefit from this code, given that its scalar AES performance is abysmal (32.9 cycles per byte). The new bitsliced AES code manages 19.8 cycles per byte on this core, but can only operate on 8 blocks at a time, which is not supported by all chaining modes. With a bit of tweaking, we can get the plain NEON code to run at 22.0 cycles per byte, making it useful for sequential modes like CBC encryption. (Like bitsliced NEON, the plain NEON implementation does not use any lookup tables, which makes it easy on the D-cache, and invulnerable to cache timing attacks) So tweak the plain NEON AES code to use tbl instructions rather than shl/sri pairs, and to avoid the need to reload permutation vectors or other constants from memory in every round. Also, improve the decryption performance by switching to 16x8 pmul instructions for the performing the multiplications in GF(2^8). To allow the ECB and CBC encrypt routines to be reused by the bitsliced NEON code in a subsequent patch, export them from the module. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-10-21crypto: arm64/aes-neon - fix for big endianArd Biesheuvel
The AES implementation using pure NEON instructions relies on the generic AES key schedule generation routines, which store the round keys as arrays of 32-bit quantities stored in memory using native endianness. This means we should refer to these round keys using 4x4 loads rather than 16x1 loads. In addition, the ShiftRows tables are loading using a single scalar load, which is also affected by endianness, so emit these tables in the correct order depending on whether we are building for big endian or not. Fixes: 49788fe2a128 ("arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-05-14arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto ExtensionsArd Biesheuvel
This adds ARMv8 implementations of AES in ECB, CBC, CTR and XTS modes, both for ARMv8 with Crypto Extensions and for plain ARMv8 NEON. The Crypto Extensions version can only run on ARMv8 implementations that have support for these optional extensions. The plain NEON version is a table based yet time invariant implementation. All S-box substitutions are performed in parallel, leveraging the wide range of ARMv8's tbl/tbx instructions, and the huge NEON register file, which can comfortably hold the entire S-box and still have room to spare for doing the actual computations. The key expansion routines were borrowed from aes_generic. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>