Age | Commit message (Collapse) | Author |
|
Command Queuing was enabled completely for J721e controllers which lead
to partial enablement even for Am65x. Complete CQ implementation for
AM65x by adding the irq callback.
Fixes: f545702b74f9 ("mmc: sdhci_am654: Add Support for Command Queuing Engine to J721E")
Cc: stable@vger.kernel.org
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200108143301.1929-4-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The tuning data is leftover in the buffer after tuning. This can cause
issues in future data commands, especially with CQHCI. Reset the command
and data lines after tuning to continue from a clean state.
Fixes: 41fd4caeb00b ("mmc: sdhci_am654: Add Initial Support for AM654 SDHCI driver")
Cc: stable@vger.kernel.org
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200108143301.1929-3-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The MMC/SD controllers on am65x and j721e don't in fact detect the write
protect line as inverted. No issues were detected because of this
because the sdwp line is not connected on any of the evms. Fix this by
removing the flag.
Fixes: 1accbced1c32 ("mmc: sdhci_am654: Add Support for 4 bit IP on J721E")
Cc: stable@vger.kernel.org
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200108143301.1929-2-faiz_abbas@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
After commit 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u"
threads with cpu_stop_work"), the percpu soft_lockup_hrtimer_cnt is
not used any more, so remove it and related code.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191218131720.4146aea2@xhacker.debian
|
|
This patch is to fix clock setting code for different controller
versions. Two of HW changes after vendor version 2.2 are removing
PEREN/HCKEN/IPGEN bits in system control register, and adding SD
clock stable bit in present state register. This patch cleans up
related code too.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Link: https://lore.kernel.org/r/20200108040713.38888-2-yangbo.lu@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
This patch is to fix operating in esdhc_reset() for different
controller versions, and to add bus-width restoring after data
reset for eSDHC (verdor version <= 2.2).
Also add annotation for understanding.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200108040713.38888-1-yangbo.lu@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
There is an official update for eSDHC tuning erratum A-008171.
This patch is to implement the changes,
- Affect all revisions of SoC.
- Changes for tuning window checking.
- Hardware hits a new condition that tuning succeeds although
the eSDHC might not have tuned properly for type2 SoCs
(soc_tuning_erratum_type2[] array in driver). So check
tuning window after tuning succeeds.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20191212075219.48625-2-yangbo.lu@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Convert to use a new function esdhc_tuning_window_ptr() to
get tuning window start point and end point.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20191212075219.48625-1-yangbo.lu@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Cortex-A55 erratum 1530923 allows TLB entries to be allocated as a
result of a speculative AT instruction. This may happen in the middle of
a guest world switch while the relevant VMSA configuration is in an
inconsistent state, leading to erroneous content being allocated into
TLBs.
The same workaround as is used for Cortex-A76 erratum 1165522
(WORKAROUND_SPECULATIVE_AT_VHE) can be used here. Note that this
mandates the use of VHE on affected parts.
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To match SPECULATIVE_AT_VHE let's also have a generic name for the NVHE
variant.
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Cortex-A55 is affected by a similar erratum, so rename the existing
workaround for errarum 1165522 so it can be used for both errata.
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In commit 242587616710 ("gpiolib: Add support for the irqdomain which
doesn't use irq_fwspec as arg") we have changed the return type of
gpiochip_populate_parent_fwspec_twocell/fourcell() from void to void *,
but forgot to add a return statement for these two dummy functions.
Add "return NULL" to fix the build warnings.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Link: https://lore.kernel.org/r/20200116095003.30324-1-haokexin@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Add debugfs interface to provide debugging information of devfreq device.
It contains 'devfreq_summary' entry to show the summary of registered
devfreq devices as following and the additional debugfs file will be added.
- /sys/kernel/debug/devfreq/devfreq_summary
[Detailed description of each field of 'devfreq_summary' debugfs file]
- dev_name : Device name of h/w
- dev : Device name made by devfreq core
- parent_dev : If devfreq device uses the passive governor,
show parent devfreq device name. Otherwise, show 'null'.
- governor : Devfreq governor name
- polling_ms : If devfreq device uses the simple_ondemand governor,
polling_ms is necessary for the period. (unit: millisecond)
- cur_freq_Hz : Current frequency (unit: Hz)
- min_freq_Hz : Minimum frequency (unit: Hz)
- max_freq_Hz : Maximum frequency (unit: Hz)
[For example on Exynos5422-based Odroid-XU3 board]
$ cat /sys/kernel/debug/devfreq/devfreq_summary
dev_name dev parent_dev governor polling_ms cur_freq_Hz min_freq_Hz max_freq_Hz
------------------------------ ---------- ---------- --------------- ---------- ------------ ------------ ------------
10c20000.memory-controller devfreq0 null simple_ondemand 0 165000000 165000000 825000000
soc:bus_wcore devfreq1 null simple_ondemand 50 532000000 88700000 532000000
soc:bus_noc devfreq2 devfreq1 passive 0 111000000 66600000 111000000
soc:bus_fsys_apb devfreq3 devfreq1 passive 0 222000000 111000000 222000000
soc:bus_fsys devfreq4 devfreq1 passive 0 200000000 75000000 200000000
soc:bus_fsys2 devfreq5 devfreq1 passive 0 200000000 75000000 200000000
soc:bus_mfc devfreq6 devfreq1 passive 0 333000000 83250000 333000000
soc:bus_gen devfreq7 devfreq1 passive 0 266000000 88700000 266000000
soc:bus_peri devfreq8 devfreq1 passive 0 66600000 66600000 66600000
soc:bus_g2d devfreq9 devfreq1 passive 0 333000000 83250000 333000000
soc:bus_g2d_acp devfreq10 devfreq1 passive 0 266000000 66500000 266000000
soc:bus_jpeg devfreq11 devfreq1 passive 0 300000000 75000000 300000000
soc:bus_jpeg_apb devfreq12 devfreq1 passive 0 166500000 83250000 166500000
soc:bus_disp1_fimd devfreq13 devfreq1 passive 0 200000000 120000000 200000000
soc:bus_disp1 devfreq14 devfreq1 passive 0 300000000 120000000 300000000
soc:bus_gscl_scaler devfreq15 devfreq1 passive 0 300000000 150000000 300000000
soc:bus_mscl devfreq16 devfreq1 passive 0 666000000 84000000 666000000
[lkp: Reported the build error]
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
|
|
Buffered read in fuse normally goes via:
-> generic_file_buffered_read()
-> fuse_readpages()
-> fuse_send_readpages()
->fuse_simple_request() [called since v5.4]
In the case of a read request, fuse_simple_request() will return a
non-negative bytecount on success or a negative error value. A positive
bytecount was taken to be an error and the PG_error flag set on the page.
This resulted in generic_file_buffered_read() falling back to ->readpage(),
which would repeat the read request and succeed. Because of the repeated
read succeeding the bug was not detected with regression tests or other use
cases.
The FTP module in GVFS however fails the second read due to the
non-seekable nature of FTP downloads.
Fix by checking and ignoring positive return value from
fuse_simple_request().
Reported-by: Ondrej Holy <oholy@redhat.com>
Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
Fixes: 134831e36bbd ("fuse: convert readpages to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This fixes crackling sound during playback.
Further note: MOTU is known for reusing Product IDs for different
devices or different generations of the device (e.g. MicroBook
I/II/IIc shares a single Product ID). This patch was only tested with
M4 audio interface, but the same Product ID is also used by M2. Hope
it will work for M2 as well.
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20200115151358.56672-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-01-15
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 13 files changed, 95 insertions(+), 43 deletions(-).
The main changes are:
1) Fix refcount leak for TCP time wait and request sockets for socket lookup
related BPF helpers, from Lorenz Bauer.
2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.
3) Batch of several sockmap and related TLS fixes found while operating
more complex BPF programs with Cilium and OpenSSL, from John Fastabend.
4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
to avoid purging data upon teardown, from Lingpeng Chen.
5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
dump a BPF map's value with BTF, from Martin KaFai Lau.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Increment the mgmt revision due to the recently added commands.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
|
|
"AEAD" is capitalized everywhere else.
Use "an" when followed by a written or spoken vowel.
Fixes: be1eb7f78aa8fbe3 ("crypto: essiv - create wrapper template for ESSIV generation")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This branch prediction macro on the hot path can improve
small performance(about 2%) according to the test.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Reorder the input parameters of hpre_crt_para_get to
make it cleaner.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1.Use memzero_explicit to clear key;
2.Fix some little endian writings;
3.Fix some other bugs and stuff of code style;
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1.Fixed the bug of software tfm leakage.
2.Update HW error log message.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
authenc(hmac(sha1),cbc(aes)), authenc(hmac(sha256),cbc(aes)), and
authenc(hmac(sha512),cbc(aes)) support are added for SEC v2.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1.Define base initiation of QP for context which can be reused.
2.Define cipher initiation for other algorithms.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
After adding branch prediction for skcipher hot path,
a little bit income of performance is gotten.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add error type parameter for call back checking inside.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1.Adjust call back function.
2.Adjust parameter checking function.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1.Put resource including request and resource list into
QP context structure to avoid allocate memory repeatedly.
2.Add max context queue number to void kcalloc large memory for QP context.
3.Remove the resource allocation operation.
4.Redefine resource allocation APIs to be shared by other algorithms.
5.Move resource allocation and free inner functions out of
operations 'struct sec_req_op', and they are called directly.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1.Adjust dma map function to be reused by AEAD algorithms;
2.Update some names of internal functions and variables to
support AEAD algorithms;
3.Rename 'sec_skcipher_exit' as 'sec_skcipher_uninit';
4.Rename 'sec_get/put_queue_id' as 'sec_alloc/free_queue_id';
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Fixed some print, coding style and comments of HiSilicon SEC V2.
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Applied some advices of Marco Elver on atomic usage of Debugfs,
which is carried out by basing on Arnd Bergmann's fixing patch.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Marco Elver <elver@google.com>
Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Remove NULL check for pool variable, since in the current
code path it is guaranteed to be non-NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Rename err label to err_device_unregister for better
readability.
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Currently, if tee_device_alloc() fails, then tee_device_unregister()
is a no-op. Therefore, skip the function call to tee_device_unregister() by
introducing a new goto label 'err_free_pool'.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If there is no TEE with which the driver can communicate, then
print an error message and return.
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Remove unused variable initialization from driver code.
If enabled as a compiler option, compiler may throw warning for
unused assignments.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 757cc3e9ff1d ("tee: add AMD-TEE driver")
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When the kernel XTS implementation was extended to deal with ciphertext
stealing in commit 8083b1bf8163 ("crypto: xts - add support for ciphertext
stealing"), a check was added to reject inputs that were too short.
However, in the vmx enablement - commit 239668419349 ("crypto: vmx/xts -
use fallback for ciphertext stealing"), that check wasn't added to the
vmx implementation. This disparity leads to errors like the following:
alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[<flush>66.99%@+10, 33.1%@alignmask+1155]"
Return -EINVAL if asked to operate with a cryptlen smaller than the AES
block size. This brings vmx in line with the generic implementation.
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206049
Fixes: 239668419349 ("crypto: vmx/xts - use fallback for ciphertext stealing")
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[dja: commit message]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If CRYPTO_CURVE25519 is y, CRYPTO_LIB_CURVE25519_GENERIC will be
y, but CRYPTO_LIB_CURVE25519 may be set to m, this causes build
errors:
lib/crypto/curve25519-selftest.o: In function `curve25519':
curve25519-selftest.c:(.text.unlikely+0xc): undefined reference to `curve25519_arch'
lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
curve25519-selftest.c:(.init.text+0x17e): undefined reference to `curve25519_base_arch'
This is because the curve25519 self-test code is being controlled
by the GENERIC option rather than the overall CURVE25519 option,
as is the case with blake2s. To recap, the GENERIC and ARCH options
for CURVE25519 are internal only and selected by users such as
the Crypto API, or the externally visible CURVE25519 option which
in turn is selected by wireguard. The self-test is specific to the
the external CURVE25519 option and should not be enabled by the
Crypto API.
This patch fixes this by splitting the GENERIC module from the
CURVE25519 module with the latter now containing just the self-test.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: aa127963f1ca ("crypto: lib/curve25519 - re-add selftests")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add support for the crypto engine used in i.mx8mn (i.MX 8M "Nano"),
which is very similar to the one used in i.mx8mq, i.mx8mm.
Since the clocks are identical for all members of i.MX 8M family,
simplify the SoC <--> clock array mapping table.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Tested-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Reviewed-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Some code were left in the final driver but without any use.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Removing the driver cause an oops due to the fact we clean an extra
channel.
Let's give the right index to the cleaning function.
Fixes: 06f751b61329 ("crypto: allwinner - Add sun8i-ce Crypto Engine")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Removing the driver cause an oops due to the fact we clean an extra
channel.
Let's give the right index to the cleaning function.
Fixes: 48fe583fe541 ("crypto: amlogic - Add crypto accelerator for amlogic GXL")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Removing the driver cause an oops due to the fact we clean an extra
channel.
Let's give the right index to the cleaning function.
Fixes: f08fcced6d00 ("crypto: allwinner - Add sun8i-ss cryptographic offloader")
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This appears to be some kind of copy and paste error, and is actually
dead code.
Pre: f = 0 ⇒ (f >> 32) = 0
f = (f >> 32) + le32_to_cpu(digest[0]);
Post: 0 ≤ f < 2³²
put_unaligned_le32(f, dst);
Pre: 0 ≤ f < 2³² ⇒ (f >> 32) = 0
f = (f >> 32) + le32_to_cpu(digest[1]);
Post: 0 ≤ f < 2³²
put_unaligned_le32(f, dst + 4);
Pre: 0 ≤ f < 2³² ⇒ (f >> 32) = 0
f = (f >> 32) + le32_to_cpu(digest[2]);
Post: 0 ≤ f < 2³²
put_unaligned_le32(f, dst + 8);
Pre: 0 ≤ f < 2³² ⇒ (f >> 32) = 0
f = (f >> 32) + le32_to_cpu(digest[3]);
Post: 0 ≤ f < 2³²
put_unaligned_le32(f, dst + 12);
Therefore this sequence is redundant. And Andy's code appears to handle
misalignment acceptably.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F.
The AVX-512F implementation is disabled on Skylake, due to throttling,
but it is quite fast on >= Cannonlake.
On the left is cycle counts on a Core i7 6700HQ using the AVX-2
codepath, comparing this implementation ("new") to the implementation in
the current crypto api ("old"). On the right are benchmarks on a Xeon
Gold 5120 using the AVX-512 codepath. The new implementation is faster
on all benchmarks.
AVX-2 AVX-512
--------- -----------
size old new size old new
---- ---- ---- ---- ---- ----
0 70 68 0 74 70
16 92 90 16 96 92
32 134 104 32 136 106
48 172 120 48 184 124
64 218 136 64 218 138
80 254 158 80 260 160
96 298 174 96 300 176
112 342 192 112 342 194
128 388 212 128 384 212
144 428 228 144 420 226
160 466 246 160 464 248
176 510 264 176 504 264
192 550 282 192 544 282
208 594 302 208 582 300
224 628 316 224 624 318
240 676 334 240 662 338
256 716 354 256 708 358
272 764 374 272 748 372
288 802 352 288 788 358
304 420 366 304 422 370
320 428 360 320 432 364
336 484 378 336 486 380
352 426 384 352 434 390
368 478 400 368 480 408
384 488 394 384 490 398
400 542 408 400 542 412
416 486 416 416 492 426
432 534 430 432 538 436
448 544 422 448 546 432
464 600 438 464 600 448
480 540 448 480 548 456
496 594 464 496 594 476
512 602 456 512 606 470
528 656 476 528 656 480
544 600 480 544 606 498
560 650 494 560 652 512
576 664 490 576 662 508
592 714 508 592 716 522
608 656 514 608 664 538
624 708 532 624 710 552
640 716 524 640 720 516
656 770 536 656 772 526
672 716 548 672 722 544
688 770 562 688 768 556
704 774 552 704 778 556
720 826 568 720 832 568
736 768 574 736 780 584
752 822 592 752 826 600
768 830 584 768 836 560
784 884 602 784 888 572
800 828 610 800 838 588
816 884 628 816 884 604
832 888 618 832 894 598
848 942 632 848 946 612
864 884 644 864 896 628
880 936 660 880 942 644
896 948 652 896 952 608
912 1000 664 912 1004 616
928 942 676 928 954 634
944 994 690 944 1000 646
960 1002 680 960 1008 646
976 1054 694 976 1062 658
992 1002 706 992 1012 674
1008 1052 720 1008 1058 690
This commit wires in the prior implementation from Andy, and makes the
following changes to be suitable for kernel land.
- Some cosmetic and structural changes, like renaming labels to
.Lname, constants, and other Linux conventions, as well as making
the code easy for us to maintain moving forward.
- CPU feature checking is done in C by the glue code.
- We avoid jumping into the middle of functions, to appease objtool,
and instead parameterize shared code.
- We maintain frame pointers so that stack traces make sense.
- We remove the dependency on the perl xlate code, which transforms
the output into things that assemblers we don't care about use.
Importantly, none of our changes affect the arithmetic or core code, but
just involve the differing environment of kernel space.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
These x86_64 vectorized implementations come from Andy Polyakov's
CRYPTOGAMS implementation, and are included here in raw form without
modification, so that subsequent commits that fix these up for the
kernel can see how it has changed.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
These two C implementations from Zinc -- a 32x32 one and a 64x64 one,
depending on the platform -- come from Andrew Moon's public domain
poly1305-donna portable code, modified for usage in the kernel. The
precomputation in the 32-bit version and the use of 64x64 multiplies in
the 64-bit version make these perform better than the code it replaces.
Moon's code is also very widespread and has received many eyeballs of
scrutiny.
There's a bit of interference between the x86 implementation, which
relies on internal details of the old scalar implementation. In the next
commit, the x86 implementation will be replaced with a faster one that
doesn't rely on this, so none of this matters much. But for now, to keep
this passing the tests, we inline the bits of the old implementation
that the x86 implementation relied on. Also, since we now support a
slightly larger key space, via the union, some offsets had to be fixed
up.
Nonce calculation was folded in with the emit function, to take
advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no
nonce handling in emit, so this path was conditionalized. We also
introduced a new struct, poly1305_core_key, to represent the precise
amount of space that particular implementation uses.
Testing with kbench9000, depending on the CPU, the update function for
the 32x32 version has been improved by 4%-7%, and for the 64x64 by
19%-30%. The 32x32 gains are small, but I think there's great value in
having a parallel implementation to the 64x64 one so that the two can be
compared side-by-side as nice stand-alone units.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Merge crypto tree to pick up hisilicon patch.
|
|
This patch registers hdev->shutdown() callback and also sets
HCI_QUIRK_NON_PERSISTENT_SETUP for QCA Rome. It will power-off the BT chip
during hci down and power-on/initialize the chip again during hci up. As
wcn399x already enabled this, this patch also removed the callback register
and QUIRK setting in qca_setup() for wcn399x and uniformly do this in the
probe() routine.
Signed-off-by: Rocky Liao <rjliao@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|