summaryrefslogtreecommitdiffstats
path: root/arch/x86/crypto
AgeCommit message (Collapse)AuthorFilesLines
2021-04-28Merge tag 'objtool-core-2021-04-28' of ↵Linus Torvalds10-110/+85
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Standardize the crypto asm code so that it looks like compiler- generated code to objtool - so that it can understand it. This enables unwinding from crypto asm code - and also fixes the last known remaining objtool warnings for LTO and more. - x86 decoder fixes: clean up and fix the decoder, and also extend it a bit - Misc fixes and cleanups * tag 'objtool-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/crypto: Enable objtool in crypto code x86/crypto/sha512-ssse3: Standardize stack alignment prologue x86/crypto/sha512-avx2: Standardize stack alignment prologue x86/crypto/sha512-avx: Standardize stack alignment prologue x86/crypto/sha256-avx2: Standardize stack alignment prologue x86/crypto/sha1_avx2: Standardize stack alignment prologue x86/crypto/sha_ni: Standardize stack alignment prologue x86/crypto/crc32c-pcl-intel: Standardize jump table x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack buffer x86/crypto/aesni-intel_avx: Standardize stack alignment prologue x86/crypto/aesni-intel_avx: Fix register usage comments x86/crypto/aesni-intel_avx: Remove unused macros objtool: Support asm jump tables objtool: Parse options from OBJTOOL_ARGS objtool: Collate parse_options() users objtool: Add --backup objtool,x86: More ModRM sugar objtool,x86: Rewrite ADD/SUB/AND objtool,x86: Support %riz encodings objtool,x86: Simplify register decode ...
2021-04-26Merge tag 'x86_cleanups_for_v5.13' of ↵Linus Torvalds4-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions
2021-04-19x86/crypto: Enable objtool in crypto codeJosh Poimboeuf1-2/+0
Now that all the stack alignment prologues have been cleaned up in the crypto code, enable objtool. Among other benefits, this will allow ORC unwinding to work. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/fc2a1918c50e33e46ef0e9a5de02743f2f6e3639.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha512-ssse3: Standardize stack alignment prologueJosh Poimboeuf1-22/+19
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/6ecaaac9f3828fbb903513bf90c34a08380a8e35.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha512-avx2: Standardize stack alignment prologueJosh Poimboeuf1-22/+20
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/b1a7b29fcfc65d60a3b6e77ef75f4762a5b8488d.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha512-avx: Standardize stack alignment prologueJosh Poimboeuf1-22/+19
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/d36e9ea1c819d87fa89b3df3fa83e2a1ede18146.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha256-avx2: Standardize stack alignment prologueJosh Poimboeuf1-7/+6
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/8048e7444c49a8137f05265262b83dc50f8fb7f3.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha1_avx2: Standardize stack alignment prologueJosh Poimboeuf1-4/+4
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/fdaaf8670ed1f52f55ba9a6bbac98c1afddc1af6.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha_ni: Standardize stack alignment prologueJosh Poimboeuf1-4/+4
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/5033e1a79867dff1b18e1b4d0783c38897d3f223.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/crc32c-pcl-intel: Standardize jump tableJosh Poimboeuf1-5/+2
Simplify the jump table code so that it resembles a compiler-generated table. This enables ORC unwinding by allowing objtool to follow all the potential code paths. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/5357a039def90b8ef6b5874ef12cda008ecf18ba.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack bufferJosh Poimboeuf1-3/+2
A conditional stack allocation violates traditional unwinding requirements when a single instruction can have differing stack layouts. There's no benefit in allocating the stack buffer conditionally. Just do it unconditionally. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/85ac96613ee5784b6239c18d3f68b1f3c509caa3.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/aesni-intel_avx: Standardize stack alignment prologueJosh Poimboeuf1-6/+4
Use RBP instead of R14 for saving the old stack pointer before realignment. This resembles what compilers normally do. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/02d00a0903a0959f4787e186e2a07d271e1f63d4.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/aesni-intel_avx: Fix register usage commentsJosh Poimboeuf1-5/+5
Fix register usage comments to match reality. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/8655d4513a0ed1eddec609165064153973010aa2.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/aesni-intel_avx: Remove unused macrosJosh Poimboeuf1-8/+0
These macros are no longer used; remove them. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/53f7136ea93ebdbca399959e6d2991ecb46e733e.1614182415.git.jpoimboe@redhat.com
2021-04-02crypto: poly1305 - fix poly1305_core_setkey() declarationArnd Bergmann1-3/+3
gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16]) | ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’} 21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-21x86: Fix various typos in comments, take #2Ingo Molnar2-2/+2
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-18x86: Fix various typos in commentsIngo Molnar2-4/+4
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-01-22crypto: aesni - release FPU during skcipher walk API callsArd Biesheuvel1-41/+32
Taking ownership of the FPU in kernel mode disables preemption, and this may result in excessive scheduling blackouts if the size of the data being processed on the FPU is unbounded. Given that taking and releasing the FPU is cheap these days on x86, we can limit the impact of this issue easily for skcipher implementations, by moving the FPU begin/end calls inside the skcipher walk processing loop. Considering that skcipher walks operate on at most one page at a time, doing so fully mitigates this issue. This also permits the skcipher walk logic to use non-atomic kmalloc() calls etc so we can change the 'atomic' bool argument in the calls to skcipher_walk_virt() to false as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-22crypto: aesni - replace CTR function pointer with static callArd Biesheuvel1-6/+7
Indirect calls are very expensive on x86, so use a static call to set the system-wide AES-NI/CTR asm helper. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86 - use local headers for x86 specific shared declarationsArd Biesheuvel12-8/+177
The Camellia, Serpent and Twofish related header files only contain declarations that are shared between different implementations of the respective algorithms residing under arch/x86/crypto, and none of their contents should be used elsewhere. So move the header files into the same location, and use local #includes instead. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86 - remove glue helper moduleArd Biesheuvel2-157/+0
All dependencies on the x86 glue helper module have been replaced by local instantiations of the new ECB/CBC preprocessor helper macros, so the glue helper module can be retired. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/twofish - drop dependency on glue helperArd Biesheuvel2-109/+44
Replace the glue helper dependency with implementations of ECB and CBC based on the new CPP macros, which avoid the need for indirect calls. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/cast6 - drop dependency on glue helperArd Biesheuvel1-44/+17
Replace the glue helper dependency with implementations of ECB and CBC based on the new CPP macros, which avoid the need for indirect calls. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/cast5 - drop dependency on glue helperArd Biesheuvel1-167/+17
Replace the glue helper dependency with implementations of ECB and CBC based on the new CPP macros, which avoid the need for indirect calls. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/serpent - drop dependency on glue helperArd Biesheuvel3-154/+61
Replace the glue helper dependency with implementations of ECB and CBC based on the new CPP macros, which avoid the need for indirect calls. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/camellia - drop dependency on glue helperArd Biesheuvel3-166/+67
Replace the glue helper dependency with implementations of ECB and CBC based on the new CPP macros, which avoid the need for indirect calls. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86 - add some helper macros for ECB and CBC modesArd Biesheuvel1-0/+76
The x86 glue helper module is starting to show its age: - It relies heavily on function pointers to invoke asm helper functions that operate on fixed input sizes that are relatively small. This means the performance is severely impacted by retpolines. - It goes to great lengths to amortize the cost of kernel_fpu_begin()/end() over as much work as possible, which is no longer necessary now that FPU save/restore is done lazily, and doing so may cause unbounded scheduling blackouts due to the fact that enabling the FPU in kernel mode disables preemption. - The CBC mode decryption helper makes backward strides through the input, in order to avoid a single block size memcpy() between chunks. Consuming the input in this manner is highly likely to defeat any hardware prefetchers, so it is better to go through the data linearly, and perform the extra memcpy() where needed (which is turned into direct loads and stores by the compiler anyway). Note that benchmarks won't show this effect, given that the memory they use is always cache hot. - It implements blockwise XOR in terms of le128 pointers, which imply an alignment that is not guaranteed by the API, violating the C standard. GCC does not seem to be smart enough to elide the indirect calls when the function pointers are passed as arguments to static inline helper routines modeled after the existing ones. So instead, let's create some CPP macros that encapsulate the core of the ECB and CBC processing, so we can wire them up for existing users of the glue helper module, i.e., Camellia, Serpent, Twofish and CAST6. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/blowfish - drop CTR mode implementationArd Biesheuvel1-107/+0
Blowfish in counter mode is never used in the kernel, so there is no point in keeping an accelerated implementation around. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/des - drop CTR mode implementationArd Biesheuvel1-104/+0
DES or Triple DES in counter mode is never used in the kernel, so there is no point in keeping an accelerated implementation around. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/glue-helper - drop CTR helper routinesArd Biesheuvel3-175/+0
The glue helper's CTR routines are no longer used, so drop them. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/twofish - drop CTR mode implementationArd Biesheuvel3-143/+0
Twofish in CTR mode is never used by the kernel directly, and is highly unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop the accelerated CTR mode implementation, and instead, rely on the CTR template and the bare cipher. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/cast6 - drop CTR mode implementationArd Biesheuvel2-76/+0
CAST6 in CTR mode is never used by the kernel directly, and is highly unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop the accelerated CTR mode implementation, and instead, rely on the CTR template and the bare cipher. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/cast5 - drop CTR mode implementationArd Biesheuvel1-103/+0
CAST5 in CTR mode is never used by the kernel directly, and is highly unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop the accelerated CTR mode implementation, and instead, rely on the CTR template and the bare cipher. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/serpent - drop CTR mode implementationArd Biesheuvel5-201/+0
Serpent in CTR mode is never used by the kernel directly, and is highly unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop the accelerated CTR mode implementation, and instead, rely on the CTR template and the bare cipher. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/camellia - drop CTR mode implementationArd Biesheuvel5-410/+0
Camellia in CTR mode is never used by the kernel directly, and is highly unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop the accelerated CTR mode implementation, and instead, rely on the CTR template and the bare cipher. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/glue-helper - drop XTS helper routinesArd Biesheuvel3-291/+0
The glue helper's XTS routines are no longer used, so drop them. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/twofish - switch to XTS templateArd Biesheuvel2-151/+0
Now that the XTS template can wrap accelerated ECB modes, it can be used to implement Twofish in XTS mode as well, which turns out to be at least as fast, and sometimes even faster Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/serpent- switch to XTS templateArd Biesheuvel4-283/+0
Now that the XTS template can wrap accelerated ECB modes, it can be used to implement Serpent in XTS mode as well, which turns out to be at least as fast, and sometimes even faster Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/cast6 - switch to XTS templateArd Biesheuvel2-154/+0
Now that the XTS template can wrap accelerated ECB modes, it can be used to implement CAST6 in XTS mode as well, which turns out to be at least as fast, and sometimes even faster Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: x86/camellia - switch to XTS templateArd Biesheuvel4-558/+1
Now that the XTS template can wrap accelerated ECB modes, it can be used to implement Camellia in XTS mode as well, which turns out to be at least as fast, and sometimes even faster. Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: aesni - replace function pointers with static branchesArd Biesheuvel1-44/+54
Replace the function pointers in the GCM implementation with static branches, which are based on code patching, which occurs only at module load time. This avoids the severe performance penalty caused by the use of retpolines. In order to retain the ability to switch between different versions of the implementation based on the input size on cores that support AVX and AVX2, use static branches instead of static calls. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: aesni - refactor scatterlist processingArd Biesheuvel1-83/+56
Currently, the gcm(aes-ni) driver open codes the scatterlist handling that is encapsulated by the skcipher walk API. So let's switch to that instead. Also, move the handling at the end of gcmaes_crypt_by_sg() that is dependent on whether we are encrypting or decrypting into the callers, which always do one or the other. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: aesni - clean up mapping of associated dataArd Biesheuvel1-4/+5
The gcm(aes-ni) driver is only built for x86_64, which does not make use of highmem. So testing for PageHighMem is pointless and can be omitted. While at it, replace GFP_ATOMIC with the appropriate runtime decided value based on the context. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: aesni - drop unused asm prototypesArd Biesheuvel1-67/+0
Drop some prototypes that are declared but never called. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-14crypto: aesni - prevent misaligned buffers on the stackArd Biesheuvel1-12/+16
The GCM mode driver uses 16 byte aligned buffers on the stack to pass the IV to the asm helpers, but unfortunately, the x86 port does not guarantee that the stack pointer is 16 byte aligned upon entry in the first place. Since the compiler is not aware of this, it will not emit the additional stack realignment sequence that is needed, and so the alignment is not guaranteed to be more than 8 bytes. So instead, allocate some padding on the stack, and realign the IV pointer by hand. Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-08crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helperArd Biesheuvel2-144/+356
The AES-NI driver implements XTS via the glue helper, which consumes a struct with sets of function pointers which are invoked on chunks of input data of the appropriate size, as annotated in the struct. Let's get rid of this indirection, so that we can perform direct calls to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e., provide a helper which can consume inputs of any size, provided that the penultimate, full block is passed via the last call if ciphertext stealing needs to be applied. This also allows us to enable the XTS mode for i386. Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-08crypto: x86/aes-ni-xts - use direct calls to and 4-way strideArd Biesheuvel2-56/+84
The XTS asm helper arrangement is a bit odd: the 8-way stride helper consists of back-to-back calls to the 4-way core transforms, which are called indirectly, based on a boolean that indicates whether we are performing encryption or decryption. Given how costly indirect calls are on x86, let's switch to direct calls, and given how the 8-way stride doesn't really add anything substantial, use a 4-way stride instead, and make the asm core routine deal with any multiple of 4 blocks. Since 512 byte sectors or 4 KB blocks are the typical quantities XTS operates on, increase the stride exported to the glue helper to 512 bytes as well. As a result, the number of indirect calls is reduced from 3 per 64 bytes of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU) Fixes: 9697fa39efd3f ("x86/retpoline/crypto: Convert crypto assembler indirect jumps") Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: blake2s - share the "shash" API boilerplate codeEric Biggers1-67/+7
Add helper functions for shash implementations of BLAKE2s to include/crypto/internal/blake2s.h, taking advantage of __blake2s_update() and __blake2s_final() that were added by the previous patch to share more code between the library and shash implementations. crypto_blake2s_setkey() and crypto_blake2s_init() are usable as shash_alg::setkey and shash_alg::init directly, while crypto_blake2s_update() and crypto_blake2s_final() take an extra 'blake2s_compress_t' function pointer parameter. This allows the implementation of the compression function to be overridden, which is the only part that optimized implementations really care about. The new functions are inline functions (similar to those in sha1_base.h, sha256_base.h, and sm3_base.h) because this avoids needing to add a new module blake2s_helpers.ko, they aren't *too* long, and this avoids indirect calls which are expensive these days. Note that they can't go in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency. Finally, use these new helper functions in the x86 implementation of BLAKE2s. (This part should be a separate patch, but unfortunately the x86 implementation used the exact same function names like "crypto_blake2s_update()", so it had to be updated at the same time.) Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: x86/blake2s - define shash_alg structs using macrosEric Biggers1-61/+23
The shash_alg structs for the four variants of BLAKE2s are identical except for the algorithm name, driver name, and digest size. So, avoid code duplication by using a macro to define these structs. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: aesni - implement support for cts(cbc(aes))Ard Biesheuvel2-1/+261
Follow the same approach as the arm64 driver for implementing a version of AES-NI in CBC mode that supports ciphertext stealing. This results in a ~2x speed increase for relatively short inputs (less than 256 bytes), which is relevant given that AES-CBC with ciphertext stealing is used for filename encryption in the fscrypt layer. For larger inputs, the speedup is still significant (~25% on decryption, ~6% on encryption) Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>