summaryrefslogtreecommitdiffstats
path: root/arch/arm64/lib/crc32.S
AgeCommit message (Collapse)AuthorFilesLines
2022-01-31arm64: lib: accelerate crc32_beKevin Bracey1-14/+73
It makes no sense to leave crc32_be using the generic code while we only accelerate the little-endian ops. Even though the big-endian form doesn't fit as smoothly into the arm64, we can speed it up and avoid hitting the D cache. Tested on Cortex-A53. Without acceleration: crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 crc32: self tests passed, processed 225944 bytes in 192240 nsec crc32c: CRC_LE_BITS = 64 crc32c: self tests passed, processed 112972 bytes in 21360 nsec With acceleration: crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 crc32: self tests passed, processed 225944 bytes in 53480 nsec crc32c: CRC_LE_BITS = 64 crc32c: self tests passed, processed 112972 bytes in 21480 nsec Signed-off-by: Kevin Bracey <kevin@bracey.fi> Tested-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-28arm64: lib: Consistently enable crc32 extensionMark Brown1-1/+1
Currently most of the assembly files that use architecture extensions enable them using the .arch directive but crc32.S uses .cpu instead. Move that over to .arch for consistency. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20200414182843.31664-1-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08arm64: lib: Use modern annotations for assembly functionsMark Brown1-4/+4
In an effort to clarify and simplify the annotation of assembly functions in the kernel new macros have been introduced. These replace ENTRY and ENDPROC and also add a new annotation for static functions which previously had no ENTRY equivalent. Update the annotations in the library code to the new macros. Signed-off-by: Mark Brown <broonie@kernel.org> [will: Use SYM_FUNC_START_WEAK_PI] Signed-off-by: Will Deacon <will@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner1-4/+1
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-30arm64/lib: improve CRC32 performance for deep pipelinesArd Biesheuvel1-5/+49
Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include <stdlib.h> extern void crc32_le(unsigned short, char const*, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 * 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun <sunrui26@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-10arm64/lib: add accelerated crc32 routinesArd Biesheuvel1-0/+60
Unlike crc32c(), which is wired up to the crypto API internally so the optimal driver is selected based on the platform's capabilities, crc32_le() is implemented as a library function using a slice-by-8 table based C implementation. Even though few of the call sites may be bottlenecks, calling a time variant implementation with a non-negligible D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates support for the CRC32 instructions that were optional in ARMv8.0, but are already widely available, even on the Cortex-A53 based Raspberry Pi. So implement routines that use these instructions if available, and fall back to the existing generic routines otherwise. The selection is based on alternatives patching. Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE, this just codifies the status quo. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>