summaryrefslogtreecommitdiff
path: root/arch/arm64/crypto/aes-neon.S
diff options
context:
space:
mode:
authorArd Biesheuvel <ard.biesheuvel@linaro.org>2019-06-24 19:38:31 +0200
committerHerbert Xu <herbert@gondor.apana.org.au>2019-07-03 22:13:12 +0800
commit7367bfeb2c141fb3ddff6b09bb5dfeb739b3d245 (patch)
tree39a672738a414e61d0e2afd3d6460a8ab4afa262 /arch/arm64/crypto/aes-neon.S
parente217413964a453fc2eeb437c32deb00581cf899d (diff)
crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR
This implements 5-way interleaving for ECB, CBC decryption and CTR, resulting in a speedup of ~11% on Marvell ThunderX2, which has a very deep pipeline and therefore a high issue latency for NEON instructions operating on the same registers. Note that XTS is left alone: implementing 5-way interleave there would either involve spilling of the calculated tweaks to the stack, or recalculating them after the encryption operation, and doing either of those would most likely penalize low end cores. For ECB, this is not a concern at all, given that we have plenty of spare registers. For CTR and CBC decryption, we take advantage of the fact that v16 is not used by the CE version of the code (which is the only one targeted by the optimization), and so we can reshuffle the code a bit and avoid having to spill to memory (with the exception of one extra reload in the CBC routine) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'arch/arm64/crypto/aes-neon.S')
-rw-r--r--arch/arm64/crypto/aes-neon.S2
1 files changed, 2 insertions, 0 deletions
diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S
index 33bb6af309a3..8bd66a6c4749 100644
--- a/arch/arm64/crypto/aes-neon.S
+++ b/arch/arm64/crypto/aes-neon.S
@@ -15,6 +15,8 @@
#define AES_ENDPROC(func) ENDPROC(neon_ ## func)
xtsmask .req v7
+ cbciv .req v7
+ vctr .req v4
.macro xts_reload_mask, tmp
xts_load_mask \tmp