summaryrefslogtreecommitdiffstats
path: root/arch/arm64/crypto/aes-neonbs-glue.c
AgeCommit message (Collapse)AuthorFilesLines
2022-02-05crypto: arm64/aes-neonbs-xts - use plain NEON for non-power-of-2 input sizesArd Biesheuvel1-16/+17
Even though the kernel's implementations of AES-XTS were updated to implement ciphertext stealing and can operate on inputs of any size larger than or equal to the AES block size, this feature is rarely used in practice. In fact, in the kernel, AES-XTS is only used to operate on 4096 or 512 byte blocks, which means that not only the ciphertext stealing is effectively dead code, the logic in the bit sliced NEON implementation to deal with fewer than 8 blocks at a time is also never used. Since the bit-sliced NEON driver already depends on the plain NEON version, which is slower but can operate on smaller data quantities more straightforwardly, let's fallback to the plain NEON implementation of XTS for any residual inputs that are not multiples of 128 bytes. This allows us to remove a lot of complicated logic that rarely gets exercised in practice. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-05crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunkArd Biesheuvel1-35/+29
Instead of processing the entire input with the 8-way bit sliced algorithm, which is sub-optimal for inputs that are not a multiple of 128 bytes in size, invoke the plain NEON version of CTR for the remainder of the input after processing the bulk using 128 byte strides. This allows us to greatly simplify the asm code that implements CTR, and get rid of all the branches and special code paths. It also gains us a couple of percent of performance. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-09-17crypto: arm64/aes-neonbs - stop using SIMD helper for skciphersArd Biesheuvel1-113/+9
Calls into the skcipher API can only occur from contexts where the SIMD unit is available, so there is no need for the SIMD helper. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-20crypto: arm/neon - memzero_explicit aes-cbc keyTorsten Duwe1-0/+1
At function exit, do not leave the expanded key in the rk struct which got allocated on the stack. Signed-off-by: Torsten Duwe <duwe@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-11-01crypto: arm64/aes-neonbs - add return value of skcipher_walk_done() in ↵Yunfeng Ye1-1/+1
__xts_crypt() A warning is found by the static code analysis tool: "Identical condition 'err', second condition is always false" Fix this by adding return value of skcipher_walk_done(). Fixes: 67cfa5d3b721 ("crypto: arm64/aes-neonbs - implement ciphertext stealing for XTS") Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-09crypto: arm64/aes-neonbs - implement ciphertext stealing for XTSArd Biesheuvel1-14/+97
Update the AES-XTS implementation based on NEON instructions so that it can deal with inputs whose size is not a multiple of the cipher block size. This is part of the original XTS specification, but was never implemented before in the Linux kernel. Since the bit slicing driver is only faster if it can operate on at least 7 blocks of input at the same time, let's reuse the alternate path we are adding for CTS to process any data tail whose size is not a multiple of 128 bytes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: aes - move sync ctr(aes) to AES library and generic helperArd Biesheuvel1-5/+16
In preparation of duplicating the sync ctr(aes) functionality to modules under arch/arm, move the helper function from a inline .h file to the AES library, which is already depended upon by the drivers that use this fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26crypto: arm64/aes-neonbs - switch to library version of key expansion routineArd Biesheuvel1-4/+4
Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner1-4/+1
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-06Merge branch 'linus' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for AEAD in simd - Add fuzz testing to testmgr - Add panic_on_fail module parameter to testmgr - Use per-CPU struct instead multiple variables in scompress - Change verify API for akcipher Algorithms: - Convert x86 AEAD algorithms over to simd - Forbid 2-key 3DES in FIPS mode - Add EC-RDSA (GOST 34.10) algorithm Drivers: - Set output IV with ctr-aes in crypto4xx - Set output IV in rockchip - Fix potential length overflow with hashing in sun4i-ss - Fix computation error with ctr in vmx - Add SM4 protected keys support in ccree - Remove long-broken mxc-scc driver - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits) crypto: ccree - use a proper le32 type for le32 val crypto: ccree - remove set but not used variable 'du_size' crypto: ccree - Make cc_sec_disable static crypto: ccree - fix spelling mistake "protedcted" -> "protected" crypto: caam/qi2 - generate hash keys in-place crypto: caam/qi2 - fix DMA mapping of stack memory crypto: caam/qi2 - fix zero-length buffer DMA mapping crypto: stm32/cryp - update to return iv_out crypto: stm32/cryp - remove request mutex protection crypto: stm32/cryp - add weak key check for DES crypto: atmel - remove set but not used variable 'alg_name' crypto: picoxcell - Use dev_get_drvdata() crypto: crypto4xx - get rid of redundant using_sd variable crypto: crypto4xx - use sync skcipher for fallback crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: crypto4xx - fix ctr-aes missing output IV crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o crypto: ccree - handle tee fips error during power management resume crypto: ccree - add function to handle cryptocell tee fips error ...
2019-04-18crypto: arm64/aes-neonbs - don't access already-freed walk.ivEric Biggers1-0/+2
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free. xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and unconditionally accessing walk.iv has caused a real problem in other algorithms. Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt(). Fixes: 1abee99eafab ("crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-16arm64: HWCAP: add support for AT_HWCAP2Andrew Murray1-1/+1
As we will exhaust the first 32 bits of AT_HWCAP let's start exposing AT_HWCAP2 to userspace to give us up to 64 caps. Whilst it's possible to use the remaining 32 bits of AT_HWCAP, we prefer to expand into AT_HWCAP2 in order to provide a consistent view to userspace between ILP32 and LP64. However internal to the kernel we prefer to continue to use the full space of elf_hwcap. To reduce complexity and allow for future expansion, we now represent hwcaps in the kernel as ordinals and use a KERNEL_HWCAP_ prefix. This allows us to support automatic feature based module loading for all our hwcaps. We introduce cpu_set_feature to set hwcaps which complements the existing cpu_have_feature helper. These helpers allow us to clean up existing direct uses of elf_hwcap and reduce any future effort required to move beyond 64 caps. For convenience we also introduce cpu_{have,set}_named_feature which makes use of the cpu_feature macro to allow providing a hwcap name without a {KERNEL_}HWCAP_ prefix. Signed-off-by: Andrew Murray <andrew.murray@arm.com> [will: use const_ilog2() and tweak documentation] Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-03-22crypto: arm64 - convert to use crypto_simd_usable()Eric Biggers1-1/+1
Replace all calls to may_use_simd() in the arm64 crypto code with crypto_simd_usable(), in order to allow testing the no-SIMD code paths. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-16crypto: arm64/aes-bs - move kernel mode neon en/disable into loopArd Biesheuvel1-19/+17
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-16crypto: arm64/aes-blk - move kernel mode neon en/disable into loopArd Biesheuvel1-8/+6
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTRArd Biesheuvel1-5/+43
Of the various chaining modes implemented by the bit sliced AES driver, only CTR is exposed as a synchronous cipher, and requires a fallback in order to remain usable once we update the kernel mode NEON handling logic to disallow nested use. So wire up the existing CTR fallback C code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04crypto: algapi - make crypto_xor() take separate dst and src argumentsArd Biesheuvel1-3/+2
There are quite a number of occurrences in the kernel of the pattern if (dst != src) memcpy(dst, src, walk.total % AES_BLOCK_SIZE); crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE); or crypto_xor(keystream, src, nbytes); memcpy(dst, keystream, nbytes); where crypto_xor() is preceded or followed by a memcpy() invocation that is only there because crypto_xor() uses its output parameter as one of the inputs. To avoid having to add new instances of this pattern in the arm64 code, which will be refactored to implement non-SIMD fallbacks, add an alternative implementation called crypto_xor_cpy(), taking separate input and output arguments. This removes the need for the separate memcpy(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes - don't use IV buffer to return final keystream blockArd Biesheuvel1-4/+5
The arm64 bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes - replace scalar fallback with plain NEON fallbackArd Biesheuvel1-10/+28
The new bitsliced NEON implementation of AES uses a fallback in two places: CBC encryption (which is strictly sequential, whereas this driver can only operate efficiently on 8 blocks at a time), and the XTS tweak generation, which involves encrypting a single AES block with a different key schedule. The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback, given that it is faster than scalar on low end cores (which is what the NEON implementations target, since high end cores have dedicated instructions for AES), and shows similar behavior in terms of D-cache footprint and sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64Ard Biesheuvel1-0/+420
This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% (*) faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) * Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>