summaryrefslogtreecommitdiff
path: root/arch/powerpc/lib/checksum_64.S
AgeCommit message (Collapse)Author
2017-01-25powerpc/64: Use optimized checksum routines on little-endianPaul Mackerras
Currently we have optimized hand-coded assembly checksum routines for big-endian 64-bit systems, but for little-endian we use the generic C routines. This modifies the optimized routines to work for little-endian. With this, we no longer need to enable CONFIG_GENERIC_CSUM. This also fixes a couple of comments in checksum_64.S so they accurately reflect what the associated instruction does. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Use the more common __BIG_ENDIAN__] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-14powerpc: EX_TABLE macro for exception tablesNicholas Piggin
This macro is taken from s390, and allows more flexibility in changing exception table format. mpe: Put it in ppc_asm.h and only define one version using stringinfy_in_c(). Add some empty definitions and headers to keep the selftests happy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-07ppc: move exports to definitionsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-15powerpc/lib: Clarify that adde is an instruction and we mean pluralStewart Smith
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc: optimise csum_partial() call when len is constantChristophe Leroy
csum_partial is often called for small fixed length packets for which it is suboptimal to use the generic csum_partial() function. For instance, in my configuration, I got: * One place calling it with constant len 4 * Seven places calling it with constant len 8 * Three places calling it with constant len 14 * One place calling it with constant len 20 * One place calling it with constant len 24 * One place calling it with constant len 32 This patch renames csum_partial() to __csum_partial() and implements csum_partial() as a wrapper inline function which * uses csum_add() for small 16bits multiple constant length * uses ip_fast_csum() for other 32bits multiple constant * uses __csum_partial() in all other cases Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04powerpc: inline ip_fast_csum()Christophe Leroy
In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: whitespace and cast fixes] Signed-off-by: Scott Wood <oss@buserror.net>
2015-08-07powerpc: put csum_tcpudp_magic inlineLEROY Christophe
csum_tcpudp_magic() is only a few instructions, and does modify really few registers. So it is not worth having it as a separate function and suffer function branching and saving of volatile registers. This patch makes it inline by use of the already existing csum_tcpudp_nofold() function. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-03powerpc: Restore registers on error exit from csum_partial_copy_generic()Paul E. McKenney
The csum_partial_copy_generic() function saves the PowerPC non-volatile r14, r15, and r16 registers for the main checksum-and-copy loop. Unfortunately, it fails to restore them upon error exit from this loop, which results in silent corruption of these registers in the presumably rare event of an access exception within that loop. This commit therefore restores these register on error exit from the loop. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-03powerpc: Fix parameter clobber in csum_partial_copy_generic()Paul E. McKenney
The csum_partial_copy_generic() uses register r7 to adjust the remaining bytes to process. Unfortunately, r7 also holds a parameter, namely the address of the flag to set in case of access exceptions while reading the source buffer. Lacking a quantum implementation of PowerPC, this commit instead uses register r9 to do the adjusting, leaving r7's pointer uncorrupted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10powerpc: Merge STK_REG/PARAM/FRAMESIZEMichael Neuling
Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different places. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-10powerpc: Fix usage of register macros getting ready for %r0 changeMichael Neuling
Anything that uses a constructed instruction (ie. from ppc-opcode.h), need to use the new R0 macro, as %r0 is not going to work. Also convert usages of macros where we are just determining an offset (usually for a load/store), like: std r14,STK_REG(r14)(r1) Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since it's just calculating an offset. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02powerpc: Optimise 64bit csum_partial_copy_generic and add ↵Anton Blanchard
csum_and_copy_from_user We use the same core loop as the new csum_partial, adding in the stores and exception handling code. To keep things simple we do all the exception fixup in csum_and_copy_from_user. This wrapper function is modelled on the generic checksum code and is careful to always calculate a complete checksum even if we only copied part of the data to userspace. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 19% with this patch. If I forced both the sender and receiver onto the same cpu (with the hope of shifting the benchmark from being cache bandwidth limited to cpu limited), adding this patch improved performance by 55% Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02powerpc: Optimise 64bit csum_partialAnton Blanchard
The main loop of csum_partial runs very slowly on recent POWER CPUs. After some analysis on both POWER6 and POWER7 I came up with routine below. First we get the source aligned to a double word, ignoring any odd alignment to keep things simple. Then we do 64 bytes at a time, with an entry and exit limb of a further 64 bytes. On both POWER6 and POWER7 this should be as fast as we can go since we are limited by the latency of the adde instructions. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 11% with this patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2005-10-10powerpc: Rename files to have consistent _32/_64 suffixesPaul Mackerras
This doesn't change any code, just renames things so we consistently have foo_32.c and foo_64.c where we have separate 32- and 64-bit versions. Signed-off-by: Paul Mackerras <paulus@samba.org>