diff options
author | Eric Biggers <ebiggers@kernel.org> | 2025-07-06 16:11:00 -0700 |
---|---|---|
committer | Eric Biggers <ebiggers@kernel.org> | 2025-07-11 14:29:42 -0700 |
commit | 9f65592b7e1f24569bb6ced064df5b4319f725ce (patch) | |
tree | 52f0b37eeb16174beb4f22d2bed273d9a7d10e3d /scripts/lib/kdoc/kdoc_files.py | |
parent | 16f2c30e290e04135b70ad374fb7e1d1ed9ff5e7 (diff) |
lib/crypto: x86/poly1305: Fix performance regression on short messages
Restore the len >= 288 condition on using the AVX implementation, which
was incidentally removed by commit 318c53ae02f2 ("crypto: x86/poly1305 -
Add block-only interface"). This check took into account the overhead
in key power computation, kernel-mode "FPU", and tail handling
associated with the AVX code. Indeed, restoring this check slightly
improves performance for len < 256 as measured using poly1305_kunit on
an "AMD Ryzen AI 9 365" (Zen 5) CPU:
Length Before After
====== ========== ==========
1 30 MB/s 36 MB/s
16 516 MB/s 598 MB/s
64 1700 MB/s 1882 MB/s
127 2265 MB/s 2651 MB/s
128 2457 MB/s 2827 MB/s
200 2702 MB/s 3238 MB/s
256 3841 MB/s 3768 MB/s
511 4580 MB/s 4585 MB/s
512 5430 MB/s 5398 MB/s
1024 7268 MB/s 7305 MB/s
3173 8999 MB/s 8948 MB/s
4096 9942 MB/s 9921 MB/s
16384 10557 MB/s 10545 MB/s
While the optimal threshold for this CPU might be slightly lower than
288 (see the len == 256 case), other CPUs would need to be tested too,
and these sorts of benchmarks can underestimate the true cost of
kernel-mode "FPU". Therefore, for now just restore the 288 threshold.
Fixes: 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface")
Cc: stable@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250706231100.176113-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_files.py')
0 files changed, 0 insertions, 0 deletions