diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-03-20 18:23:21 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-03-20 18:23:21 -0700 |
commit | 26660a4046b171a752e72a1dd32153230234fe3a (patch) | |
tree | 1389db3db2130e1082250fa0ab1e8684f7e31f39 /arch/x86/crypto | |
parent | 46e595a17dcf11404f713845ecb5b06b92a94e43 (diff) | |
parent | 1bcb58a099938c33acda78b212ed67b06b3359ef (diff) | |
download | linux-26660a4046b171a752e72a1dd32153230234fe3a.tar.bz2 |
Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull 'objtool' stack frame validation from Ingo Molnar:
"This tree adds a new kernel build-time object file validation feature
(ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
It was written by and is maintained by Josh Poimboeuf.
The motivation: there's a category of hard to find kernel bugs, most
of them in assembly code (but also occasionally in C code), that
degrades the quality of kernel stack dumps/backtraces. These bugs are
hard to detect at the source code level. Such bugs result in
incorrect/incomplete backtraces most of time - but can also in some
rare cases result in crashes or other undefined behavior.
The build time correctness checking is done via the new 'objtool'
user-space utility that was written for this purpose and which is
hosted in the kernel repository in tools/objtool/. The tool's (very
simple) UI and source code design is shaped after Git and perf and
shares quite a bit of infrastructure with tools/perf (which tooling
infrastructure sharing effort got merged via perf and is already
upstream). Objtool follows the well-known kernel coding style.
Objtool does not try to check .c or .S files, it instead analyzes the
resulting .o generated machine code from first principles: it decodes
the instruction stream and interprets it. (Right now objtool supports
the x86-64 architecture.)
From tools/objtool/Documentation/stack-validation.txt:
"The kernel CONFIG_STACK_VALIDATION option enables a host tool named
objtool which runs at compile time. It has a "check" subcommand
which analyzes every .o file and ensures the validity of its stack
metadata. It enforces a set of rules on asm code and C inline
assembly code so that stack traces can be reliable.
Currently it only checks frame pointer usage, but there are plans to
add CFI validation for C files and CFI generation for asm files.
For each function, it recursively follows all possible code paths
and validates the correct frame pointer state at each instruction.
It also follows code paths involving special sections, like
.altinstructions, __jump_table, and __ex_table, which can add
alternative execution paths to a given instruction (or set of
instructions). Similarly, it knows how to follow switch statements,
for which gcc sometimes uses jump tables."
When this new kernel option is enabled (it's disabled by default), the
tool, if it finds any suspicious assembly code pattern, outputs
warnings in compiler warning format:
warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer
... so that scripts that pick up compiler warnings will notice them.
All known warnings triggered by the tool are fixed by the tree, most
of the commits in fact prepare the kernel to be warning-free. Most of
them are bugfixes or cleanups that stand on their own, but there are
also some annotations of 'special' stack frames for justified cases
such entries to JIT-ed code (BPF) or really special boot time code.
There are two other long-term motivations behind this tool as well:
- To improve the quality and reliability of kernel stack frames, so
that they can be used for optimized live patching.
- To create independent infrastructure to check the correctness of
CFI stack frames at build time. CFI debuginfo is notoriously
unreliable and we cannot use it in the kernel as-is without extra
checking done both on the kernel side and on the build side.
The quality of kernel stack frames matters to debuggability as well,
so IMO we can merge this without having to consider the live patching
or CFI debuginfo angle"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
objtool: Only print one warning per function
objtool: Add several performance improvements
tools: Copy hashtable.h into tools directory
objtool: Fix false positive warnings for functions with multiple switch statements
objtool: Rename some variables and functions
objtool: Remove superflous INIT_LIST_HEAD
objtool: Add helper macros for traversing instructions
objtool: Fix false positive warnings related to sibling calls
objtool: Compile with debugging symbols
objtool: Detect infinite recursion
objtool: Prevent infinite recursion in noreturn detection
objtool: Detect and warn if libelf is missing and don't break the build
tools: Support relative directory path for 'O='
objtool: Support CROSS_COMPILE
x86/asm/decoder: Use explicitly signed chars
objtool: Enable stack metadata validation on 64-bit x86
objtool: Add CONFIG_STACK_VALIDATION option
objtool: Add tool to perform compile-time stack metadata validation
x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
sched: Always inline context_switch()
...
Diffstat (limited to 'arch/x86/crypto')
-rw-r--r-- | arch/x86/crypto/aesni-intel_asm.S | 75 | ||||
-rw-r--r-- | arch/x86/crypto/camellia-aesni-avx-asm_64.S | 15 | ||||
-rw-r--r-- | arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 15 | ||||
-rw-r--r-- | arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 9 | ||||
-rw-r--r-- | arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 13 | ||||
-rw-r--r-- | arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 8 | ||||
-rw-r--r-- | arch/x86/crypto/ghash-clmulni-intel_asm.S | 5 | ||||
-rw-r--r-- | arch/x86/crypto/serpent-avx-x86_64-asm_64.S | 13 | ||||
-rw-r--r-- | arch/x86/crypto/serpent-avx2-asm_64.S | 13 | ||||
-rw-r--r-- | arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S | 35 | ||||
-rw-r--r-- | arch/x86/crypto/sha-mb/sha1_mb_mgr_submit_avx2.S | 36 | ||||
-rw-r--r-- | arch/x86/crypto/twofish-avx-x86_64-asm_64.S | 13 |
12 files changed, 162 insertions, 88 deletions
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 6bd2c6c95373..383a6f84a060 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -31,6 +31,7 @@ #include <linux/linkage.h> #include <asm/inst.h> +#include <asm/frame.h> /* * The following macros are used to move an (un)aligned 16 byte value to/from @@ -1800,11 +1801,12 @@ ENDPROC(_key_expansion_256b) * unsigned int key_len) */ ENTRY(aesni_set_key) + FRAME_BEGIN #ifndef __x86_64__ pushl KEYP - movl 8(%esp), KEYP # ctx - movl 12(%esp), UKEYP # in_key - movl 16(%esp), %edx # key_len + movl (FRAME_OFFSET+8)(%esp), KEYP # ctx + movl (FRAME_OFFSET+12)(%esp), UKEYP # in_key + movl (FRAME_OFFSET+16)(%esp), %edx # key_len #endif movups (UKEYP), %xmm0 # user key (first 16 bytes) movaps %xmm0, (KEYP) @@ -1905,6 +1907,7 @@ ENTRY(aesni_set_key) #ifndef __x86_64__ popl KEYP #endif + FRAME_END ret ENDPROC(aesni_set_key) @@ -1912,12 +1915,13 @@ ENDPROC(aesni_set_key) * void aesni_enc(struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src) */ ENTRY(aesni_enc) + FRAME_BEGIN #ifndef __x86_64__ pushl KEYP pushl KLEN - movl 12(%esp), KEYP - movl 16(%esp), OUTP - movl 20(%esp), INP + movl (FRAME_OFFSET+12)(%esp), KEYP # ctx + movl (FRAME_OFFSET+16)(%esp), OUTP # dst + movl (FRAME_OFFSET+20)(%esp), INP # src #endif movl 480(KEYP), KLEN # key length movups (INP), STATE # input @@ -1927,6 +1931,7 @@ ENTRY(aesni_enc) popl KLEN popl KEYP #endif + FRAME_END ret ENDPROC(aesni_enc) @@ -2101,12 +2106,13 @@ ENDPROC(_aesni_enc4) * void aesni_dec (struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src) */ ENTRY(aesni_dec) + FRAME_BEGIN #ifndef __x86_64__ pushl KEYP pushl KLEN - movl 12(%esp), KEYP - movl 16(%esp), OUTP - movl 20(%esp), INP + movl (FRAME_OFFSET+12)(%esp), KEYP # ctx + movl (FRAME_OFFSET+16)(%esp), OUTP # dst + movl (FRAME_OFFSET+20)(%esp), INP # src #endif mov 480(KEYP), KLEN # key length add $240, KEYP @@ -2117,6 +2123,7 @@ ENTRY(aesni_dec) popl KLEN popl KEYP #endif + FRAME_END ret ENDPROC(aesni_dec) @@ -2292,14 +2299,15 @@ ENDPROC(_aesni_dec4) * size_t len) */ ENTRY(aesni_ecb_enc) + FRAME_BEGIN #ifndef __x86_64__ pushl LEN pushl KEYP pushl KLEN - movl 16(%esp), KEYP - movl 20(%esp), OUTP - movl 24(%esp), INP - movl 28(%esp), LEN + movl (FRAME_OFFSET+16)(%esp), KEYP # ctx + movl (FRAME_OFFSET+20)(%esp), OUTP # dst + movl (FRAME_OFFSET+24)(%esp), INP # src + movl (FRAME_OFFSET+28)(%esp), LEN # len #endif test LEN, LEN # check length jz .Lecb_enc_ret @@ -2342,6 +2350,7 @@ ENTRY(aesni_ecb_enc) popl KEYP popl LEN #endif + FRAME_END ret ENDPROC(aesni_ecb_enc) @@ -2350,14 +2359,15 @@ ENDPROC(aesni_ecb_enc) * size_t len); */ ENTRY(aesni_ecb_dec) + FRAME_BEGIN #ifndef __x86_64__ pushl LEN pushl KEYP pushl KLEN - movl 16(%esp), KEYP - movl 20(%esp), OUTP - movl 24(%esp), INP - movl 28(%esp), LEN + movl (FRAME_OFFSET+16)(%esp), KEYP # ctx + movl (FRAME_OFFSET+20)(%esp), OUTP # dst + movl (FRAME_OFFSET+24)(%esp), INP # src + movl (FRAME_OFFSET+28)(%esp), LEN # len #endif test LEN, LEN jz .Lecb_dec_ret @@ -2401,6 +2411,7 @@ ENTRY(aesni_ecb_dec) popl KEYP popl LEN #endif + FRAME_END ret ENDPROC(aesni_ecb_dec) @@ -2409,16 +2420,17 @@ ENDPROC(aesni_ecb_dec) * size_t len, u8 *iv) */ ENTRY(aesni_cbc_enc) + FRAME_BEGIN #ifndef __x86_64__ pushl IVP pushl LEN pushl KEYP pushl KLEN - movl 20(%esp), KEYP - movl 24(%esp), OUTP - movl 28(%esp), INP - movl 32(%esp), LEN - movl 36(%esp), IVP + movl (FRAME_OFFSET+20)(%esp), KEYP # ctx + movl (FRAME_OFFSET+24)(%esp), OUTP # dst + movl (FRAME_OFFSET+28)(%esp), INP # src + movl (FRAME_OFFSET+32)(%esp), LEN # len + movl (FRAME_OFFSET+36)(%esp), IVP # iv #endif cmp $16, LEN jb .Lcbc_enc_ret @@ -2443,6 +2455,7 @@ ENTRY(aesni_cbc_enc) popl LEN popl IVP #endif + FRAME_END ret ENDPROC(aesni_cbc_enc) @@ -2451,16 +2464,17 @@ ENDPROC(aesni_cbc_enc) * size_t len, u8 *iv) */ ENTRY(aesni_cbc_dec) + FRAME_BEGIN #ifndef __x86_64__ pushl IVP pushl LEN pushl KEYP pushl KLEN - movl 20(%esp), KEYP - movl 24(%esp), OUTP - movl 28(%esp), INP - movl 32(%esp), LEN - movl 36(%esp), IVP + movl (FRAME_OFFSET+20)(%esp), KEYP # ctx + movl (FRAME_OFFSET+24)(%esp), OUTP # dst + movl (FRAME_OFFSET+28)(%esp), INP # src + movl (FRAME_OFFSET+32)(%esp), LEN # len + movl (FRAME_OFFSET+36)(%esp), IVP # iv #endif cmp $16, LEN jb .Lcbc_dec_just_ret @@ -2534,13 +2548,16 @@ ENTRY(aesni_cbc_dec) popl LEN popl IVP #endif + FRAME_END ret ENDPROC(aesni_cbc_dec) #ifdef __x86_64__ +.pushsection .rodata .align 16 .Lbswap_mask: .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +.popsection /* * _aesni_inc_init: internal ABI @@ -2598,6 +2615,7 @@ ENDPROC(_aesni_inc) * size_t len, u8 *iv) */ ENTRY(aesni_ctr_enc) + FRAME_BEGIN cmp $16, LEN jb .Lctr_enc_just_ret mov 480(KEYP), KLEN @@ -2651,6 +2669,7 @@ ENTRY(aesni_ctr_enc) .Lctr_enc_ret: movups IV, (IVP) .Lctr_enc_just_ret: + FRAME_END ret ENDPROC(aesni_ctr_enc) @@ -2677,6 +2696,7 @@ ENDPROC(aesni_ctr_enc) * bool enc, u8 *iv) */ ENTRY(aesni_xts_crypt8) + FRAME_BEGIN cmpb $0, %cl movl $0, %ecx movl $240, %r10d @@ -2777,6 +2797,7 @@ ENTRY(aesni_xts_crypt8) pxor INC, STATE4 movdqu STATE4, 0x70(OUTP) + FRAME_END ret ENDPROC(aesni_xts_crypt8) diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index ce71f9212409..aa9e8bd163f6 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -16,6 +16,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #define CAMELLIA_TABLE_BYTE_LEN 272 @@ -726,6 +727,7 @@ __camellia_enc_blk16: * %xmm0..%xmm15: 16 encrypted blocks, order swapped: * 7, 8, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 */ + FRAME_BEGIN leaq 8 * 16(%rax), %rcx; @@ -780,6 +782,7 @@ __camellia_enc_blk16: %xmm8, %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm15, (key_table)(CTX, %r8, 8), (%rax), 1 * 16(%rax)); + FRAME_END ret; .align 8 @@ -812,6 +815,7 @@ __camellia_dec_blk16: * %xmm0..%xmm15: 16 plaintext blocks, order swapped: * 7, 8, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 */ + FRAME_BEGIN leaq 8 * 16(%rax), %rcx; @@ -865,6 +869,7 @@ __camellia_dec_blk16: %xmm8, %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm15, (key_table)(CTX), (%rax), 1 * 16(%rax)); + FRAME_END ret; .align 8 @@ -890,6 +895,7 @@ ENTRY(camellia_ecb_enc_16way) * %rsi: dst (16 blocks) * %rdx: src (16 blocks) */ + FRAME_BEGIN inpack16_pre(%xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, @@ -904,6 +910,7 @@ ENTRY(camellia_ecb_enc_16way) %xmm15, %xmm14, %xmm13, %xmm12, %xmm11, %xmm10, %xmm9, %xmm8, %rsi); + FRAME_END ret; ENDPROC(camellia_ecb_enc_16way) @@ -913,6 +920,7 @@ ENTRY(camellia_ecb_dec_16way) * %rsi: dst (16 blocks) * %rdx: src (16 blocks) */ + FRAME_BEGIN cmpl $16, key_length(CTX); movl $32, %r8d; @@ -932,6 +940,7 @@ ENTRY(camellia_ecb_dec_16way) %xmm15, %xmm14, %xmm13, %xmm12, %xmm11, %xmm10, %xmm9, %xmm8, %rsi); + FRAME_END ret; ENDPROC(camellia_ecb_dec_16way) @@ -941,6 +950,7 @@ ENTRY(camellia_cbc_dec_16way) * %rsi: dst (16 blocks) * %rdx: src (16 blocks) */ + FRAME_BEGIN cmpl $16, key_length(CTX); movl $32, %r8d; @@ -981,6 +991,7 @@ ENTRY(camellia_cbc_dec_16way) %xmm15, %xmm14, %xmm13, %xmm12, %xmm11, %xmm10, %xmm9, %xmm8, %rsi); + FRAME_END ret; ENDPROC(camellia_cbc_dec_16way) @@ -997,6 +1008,7 @@ ENTRY(camellia_ctr_16way) * %rdx: src (16 blocks) * %rcx: iv (little endian, 128bit) */ + FRAME_BEGIN subq $(16 * 16), %rsp; movq %rsp, %rax; @@ -1092,6 +1104,7 @@ ENTRY(camellia_ctr_16way) %xmm15, %xmm14, %xmm13, %xmm12, %xmm11, %xmm10, %xmm9, %xmm8, %rsi); + FRAME_END ret; ENDPROC(camellia_ctr_16way) @@ -1112,6 +1125,7 @@ camellia_xts_crypt_16way: * %r8: index for input whitening key * %r9: pointer to __camellia_enc_blk16 or __camellia_dec_blk16 */ + FRAME_BEGIN subq $(16 * 16), %rsp; movq %rsp, %rax; @@ -1234,6 +1248,7 @@ camellia_xts_crypt_16way: %xmm15, %xmm14, %xmm13, %xmm12, %xmm11, %xmm10, %xmm9, %xmm8, %rsi); + FRAME_END ret; ENDPROC(camellia_xts_crypt_16way) diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index 0e0b8863a34b..16186c18656d 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -11,6 +11,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #define CAMELLIA_TABLE_BYTE_LEN 272 @@ -766,6 +767,7 @@ __camellia_enc_blk32: * %ymm0..%ymm15: 32 encrypted blocks, order swapped: * 7, 8, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 */ + FRAME_BEGIN leaq 8 * 32(%rax), %rcx; @@ -820,6 +822,7 @@ __camellia_enc_blk32: %ymm8, %ymm9, %ymm10, %ymm11, %ymm12, %ymm13, %ymm14, %ymm15, (key_table)(CTX, %r8, 8), (%rax), 1 * 32(%rax)); + FRAME_END ret; .align 8 @@ -852,6 +855,7 @@ __camellia_dec_blk32: * %ymm0..%ymm15: 16 plaintext blocks, order swapped: * 7, 8, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 */ + FRAME_BEGIN leaq 8 * 32(%rax), %rcx; @@ -905,6 +909,7 @@ __camellia_dec_blk32: %ymm8, %ymm9, %ymm10, %ymm11, %ymm12, %ymm13, %ymm14, %ymm15, (key_table)(CTX), (%rax), 1 * 32(%rax)); + FRAME_END ret; .align 8 @@ -930,6 +935,7 @@ ENTRY(camellia_ecb_enc_32way) * %rsi: dst (32 blocks) * %rdx: src (32 blocks) */ + FRAME_BEGIN vzeroupper; @@ -948,6 +954,7 @@ ENTRY(camellia_ecb_enc_32way) vzeroupper; + FRAME_END ret; ENDPROC(camellia_ecb_enc_32way) @@ -957,6 +964,7 @@ ENTRY(camellia_ecb_dec_32way) * %rsi: dst (32 blocks) * %rdx: src (32 blocks) */ + FRAME_BEGIN vzeroupper; @@ -980,6 +988,7 @@ ENTRY(camellia_ecb_dec_32way) vzeroupper; + FRAME_END ret; ENDPROC(camellia_ecb_dec_32way) @@ -989,6 +998,7 @@ ENTRY(camellia_cbc_dec_32way) * %rsi: dst (32 blocks) * %rdx: src (32 blocks) */ + FRAME_BEGIN vzeroupper; @@ -1046,6 +1056,7 @@ ENTRY(camellia_cbc_dec_32way) vzeroupper; + FRAME_END ret; ENDPROC(camellia_cbc_dec_32way) @@ -1070,6 +1081,7 @@ ENTRY(camellia_ctr_32way) * %rdx: src (32 blocks) * %rcx: iv (little endian, 128bit) */ + FRAME_BEGIN vzeroupper; @@ -1184,6 +1196,7 @@ ENTRY(camellia_ctr_32way) vzeroupper; + FRAME_END ret; ENDPROC(camellia_ctr_32way) @@ -1216,6 +1229,7 @@ camellia_xts_crypt_32way: * %r8: index for input whitening key * %r9: pointer to __camellia_enc_blk32 or __camellia_dec_blk32 */ + FRAME_BEGIN vzeroupper; @@ -1349,6 +1363,7 @@ camellia_xts_crypt_32way: vzeroupper; + FRAME_END ret; ENDPROC(camellia_xts_crypt_32way) diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S index c35fd5d6ecd2..14fa1966bf01 100644 --- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S @@ -24,6 +24,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> .file "cast5-avx-x86_64-asm_64.S" @@ -365,6 +366,7 @@ ENTRY(cast5_ecb_enc_16way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN movq %rsi, %r11; @@ -388,6 +390,7 @@ ENTRY(cast5_ecb_enc_16way) vmovdqu RR4, (6*4*4)(%r11); vmovdqu RL4, (7*4*4)(%r11); + FRAME_END ret; ENDPROC(cast5_ecb_enc_16way) @@ -398,6 +401,7 @@ ENTRY(cast5_ecb_dec_16way) * %rdx: src */ + FRAME_BEGIN movq %rsi, %r11; vmovdqu (0*4*4)(%rdx), RL1; @@ -420,6 +424,7 @@ ENTRY(cast5_ecb_dec_16way) vmovdqu RR4, (6*4*4)(%r11); vmovdqu RL4, (7*4*4)(%r11); + FRAME_END ret; ENDPROC(cast5_ecb_dec_16way) @@ -429,6 +434,7 @@ ENTRY(cast5_cbc_dec_16way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN pushq %r12; @@ -469,6 +475,7 @@ ENTRY(cast5_cbc_dec_16way) popq %r12; + FRAME_END ret; ENDPROC(cast5_cbc_dec_16way) @@ -479,6 +486,7 @@ ENTRY(cast5_ctr_16way) * %rdx: src * %rcx: iv (big endian, 64bit) */ + FRAME_BEGIN pushq %r12; @@ -542,5 +550,6 @@ ENTRY(cast5_ctr_16way) popq %r12; + FRAME_END ret; ENDPROC(cast5_ctr_16way) diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S index e3531f833951..c419389889cd 100644 --- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S @@ -24,6 +24,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #include "glue_helper-asm-avx.S" .file "cast6-avx-x86_64-asm_64.S" @@ -349,6 +350,7 @@ ENTRY(cast6_ecb_enc_8way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN movq %rsi, %r11; @@ -358,6 +360,7 @@ ENTRY(cast6_ecb_enc_8way) store_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(cast6_ecb_enc_8way) @@ -367,6 +370,7 @@ ENTRY(cast6_ecb_dec_8way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN movq %rsi, %r11; @@ -376,6 +380,7 @@ ENTRY(cast6_ecb_dec_8way) store_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(cast6_ecb_dec_8way) @@ -385,6 +390,7 @@ ENTRY(cast6_cbc_dec_8way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN pushq %r12; @@ -399,6 +405,7 @@ ENTRY(cast6_cbc_dec_8way) popq %r12; + FRAME_END ret; ENDPROC(cast6_cbc_dec_8way) @@ -409,6 +416,7 @@ ENTRY(cast6_ctr_8way) * %rdx: src * %rcx: iv (little endian, 128bit) */ + FRAME_BEGIN pushq %r12; @@ -424,6 +432,7 @@ ENTRY(cast6_ctr_8way) popq %r12; + FRAME_END ret; ENDPROC(cast6_ctr_8way) @@ -434,6 +443,7 @@ ENTRY(cast6_xts_enc_8way) * %rdx: src * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN movq %rsi, %r11; @@ -446,6 +456,7 @@ ENTRY(cast6_xts_enc_8way) /* dst <= regs xor IVs(in dst) */ store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(cast6_xts_enc_8way) @@ -456,6 +467,7 @@ ENTRY(cast6_xts_dec_8way) * %rdx: src * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN movq %rsi, %r11; @@ -468,5 +480,6 @@ ENTRY(cast6_xts_dec_8way) /* dst <= regs xor IVs(in dst) */ store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(cast6_xts_dec_8way) diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index 4fe27e074194..dc05f010ca9b 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -170,8 +170,8 @@ continue_block: ## branch into array lea jump_table(%rip), bufp movzxw (bufp, %rax, 2), len - offset=crc_array-jump_table - lea offset(bufp, len, 1), bufp + lea crc_array(%rip), bufp + lea (bufp, len, 1), bufp jmp *bufp ################################################################ @@ -310,7 +310,9 @@ do_return: popq %rdi popq %rbx ret +ENDPROC(crc_pcl) +.section .rodata, "a", %progbits ################################################################ ## jump table Table is 129 entries x 2 bytes each ################################################################ @@ -324,13 +326,11 @@ JMPTBL_ENTRY %i i=i+1 .endr -ENDPROC(crc_pcl) ################################################################ ## PCLMULQDQ tables ## Table is 128 entries x 2 words (8 bytes) each ################################################################ -.section .rodata, "a", %progbits .align 8 K_table: .long 0x493c7d27, 0x00000001 diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S index 5d1e0075ac24..eed55c8cca4f 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_asm.S +++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S @@ -18,6 +18,7 @@ #include <linux/linkage.h> #include <asm/inst.h> +#include <asm/frame.h> .data @@ -94,6 +95,7 @@ ENDPROC(__clmul_gf128mul_ble) /* void clmul_ghash_mul(char *dst, const u128 *shash) */ ENTRY(clmul_ghash_mul) + FRAME_BEGIN movups (%rdi), DATA movups (%rsi), SHASH movaps .Lbswap_mask, BSWAP @@ -101,6 +103,7 @@ ENTRY(clmul_ghash_mul) call __clmul_gf128mul_ble PSHUFB_XMM BSWAP DATA movups DATA, (%rdi) + FRAME_END ret ENDPROC(clmul_ghash_mul) @@ -109,6 +112,7 @@ ENDPROC(clmul_ghash_mul) * const u128 *shash); */ ENTRY(clmul_ghash_update) + FRAME_BEGIN cmp $16, %rdx jb .Lupdate_just_ret # check length movaps .Lbswap_mask, BSWAP @@ -128,5 +132,6 @@ ENTRY(clmul_ghash_update) PSHUFB_XMM BSWAP DATA movups DATA, (%rdi) .Lupdate_just_ret: + FRAME_END ret ENDPROC(clmul_ghash_update) diff --git a/arch/x86/crypto/serpent-avx-x86_64-asm_64.S b/arch/x86/crypto/serpent-avx-x86_64-asm_64.S index 2f202f49872b..8be571808342 100644 --- a/arch/x86/crypto/serpent-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/serpent-avx-x86_64-asm_64.S @@ -24,6 +24,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #include "glue_helper-asm-avx.S" .file "serpent-avx-x86_64-asm_64.S" @@ -681,6 +682,7 @@ ENTRY(serpent_ecb_enc_8way_avx) * %rsi: dst * %rdx: src */ + FRAME_BEGIN load_8way(%rdx, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); @@ -688,6 +690,7 @@ ENTRY(serpent_ecb_enc_8way_avx) store_8way(%rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(serpent_ecb_enc_8way_avx) @@ -697,6 +700,7 @@ ENTRY(serpent_ecb_dec_8way_avx) * %rsi: dst * %rdx: src */ + FRAME_BEGIN load_8way(%rdx, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); @@ -704,6 +708,7 @@ ENTRY(serpent_ecb_dec_8way_avx) store_8way(%rsi, RC1, RD1, RB1, RE1, RC2, RD2, RB2, RE2); + FRAME_END ret; ENDPROC(serpent_ecb_dec_8way_avx) @@ -713,6 +718,7 @@ ENTRY(serpent_cbc_dec_8way_avx) * %rsi: dst * %rdx: src */ + FRAME_BEGIN load_8way(%rdx, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); @@ -720,6 +726,7 @@ ENTRY(serpent_cbc_dec_8way_avx) store_cbc_8way(%rdx, %rsi, RC1, RD1, RB1, RE1, RC2, RD2, RB2, RE2); + FRAME_END ret; ENDPROC(serpent_cbc_dec_8way_avx) @@ -730,6 +737,7 @@ ENTRY(serpent_ctr_8way_avx) * %rdx: src * %rcx: iv (little endian, 128bit) */ + FRAME_BEGIN load_ctr_8way(%rcx, .Lbswap128_mask, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2, RK0, RK1, RK2); @@ -738,6 +746,7 @@ ENTRY(serpent_ctr_8way_avx) store_ctr_8way(%rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(serpent_ctr_8way_avx) @@ -748,6 +757,7 @@ ENTRY(serpent_xts_enc_8way_avx) * %rdx: src * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN /* regs <= src, dst <= IVs, regs <= regs xor IVs */ load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2, @@ -758,6 +768,7 @@ ENTRY(serpent_xts_enc_8way_avx) /* dst <= regs xor IVs(in dst) */ store_xts_8way(%rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(serpent_xts_enc_8way_avx) @@ -768,6 +779,7 @@ ENTRY(serpent_xts_dec_8way_avx) * %rdx: src * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN /* regs <= src, dst <= IVs, regs <= regs xor IVs */ load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2, @@ -778,5 +790,6 @@ ENTRY(serpent_xts_dec_8way_avx) /* dst <= regs xor IVs(in dst) */ store_xts_8way(%rsi, RC1, RD1, RB1, RE1, RC2, RD2, RB2, RE2); + FRAME_END ret; ENDPROC(serpent_xts_dec_8way_avx) diff --git a/arch/x86/crypto/serpent-avx2-asm_64.S b/arch/x86/crypto/serpent-avx2-asm_64.S index b222085cccac..97c48add33ed 100644 --- a/arch/x86/crypto/serpent-avx2-asm_64.S +++ b/arch/x86/crypto/serpent-avx2-asm_64.S @@ -15,6 +15,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #include "glue_helper-asm-avx2.S" .file "serpent-avx2-asm_64.S" @@ -673,6 +674,7 @@ ENTRY(serpent_ecb_enc_16way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN vzeroupper; @@ -684,6 +686,7 @@ ENTRY(serpent_ecb_enc_16way) vzeroupper; + FRAME_END ret; ENDPROC(serpent_ecb_enc_16way) @@ -693,6 +696,7 @@ ENTRY(serpent_ecb_dec_16way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN vzeroupper; @@ -704,6 +708,7 @@ ENTRY(serpent_ecb_dec_16way) vzeroupper; + FRAME_END ret; ENDPROC(serpent_ecb_dec_16way) @@ -713,6 +718,7 @@ ENTRY(serpent_cbc_dec_16way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN vzeroupper; @@ -725,6 +731,7 @@ ENTRY(serpent_cbc_dec_16way) vzeroupper; + FRAME_END ret; ENDPROC(serpent_cbc_dec_16way) @@ -735,6 +742,7 @@ ENTRY(serpent_ctr_16way) * %rdx: src (16 blocks) * %rcx: iv (little endian, 128bit) */ + FRAME_BEGIN vzeroupper; @@ -748,6 +756,7 @@ ENTRY(serpent_ctr_16way) vzeroupper; + FRAME_END ret; ENDPROC(serpent_ctr_16way) @@ -758,6 +767,7 @@ ENTRY(serpent_xts_enc_16way) * %rdx: src (16 blocks) * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN vzeroupper; @@ -772,6 +782,7 @@ ENTRY(serpent_xts_enc_16way) vzeroupper; + FRAME_END ret; ENDPROC(serpent_xts_enc_16way) @@ -782,6 +793,7 @@ ENTRY(serpent_xts_dec_16way) * %rdx: src (16 blocks) * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN vzeroupper; @@ -796,5 +808,6 @@ ENTRY(serpent_xts_dec_16way) vzeroupper; + FRAME_END ret; ENDPROC(serpent_xts_dec_16way) diff --git a/arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S index 85c4e1cf7172..96df6a39d7e2 100644 --- a/arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S +++ b/arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S @@ -52,6 +52,7 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include <linux/linkage.h> +#include <asm/frame.h> #include "sha1_mb_mgr_datastruct.S" @@ -86,16 +87,6 @@ #define extra_blocks %arg2 #define p %arg2 - -# STACK_SPACE needs to be an odd multiple of 8 -_XMM_SAVE_SIZE = 10*16 -_GPR_SAVE_SIZE = 8*8 -_ALIGN_SIZE = 8 - -_XMM_SAVE = 0 -_GPR_SAVE = _XMM_SAVE + _XMM_SAVE_SIZE -STACK_SPACE = _GPR_SAVE + _GPR_SAVE_SIZE + _ALIGN_SIZE - .macro LABEL prefix n \prefix\n\(): .endm @@ -113,16 +104,8 @@ offset = \_offset # JOB* sha1_mb_mgr_flush_avx2(MB_MGR *state) # arg 1 : rcx : state ENTRY(sha1_mb_mgr_flush_avx2) - mov %rsp, %r10 - sub $STACK_SPACE, %rsp - and $~31, %rsp - mov %rbx, _GPR_SAVE(%rsp) - mov %r10, _GPR_SAVE+8*1(%rsp) #save rsp - mov %rbp, _GPR_SAVE+8*3(%rsp) - mov %r12, _GPR_SAVE+8*4(%rsp) - mov %r13, _GPR_SAVE+8*5(%rsp) - mov %r14, _GPR_SAVE+8*6(%rsp) - mov %r15, _GPR_SAVE+8*7(%rsp) + FRAME_BEGIN + push %rbx # If bit (32+3) is set, then all lanes are empty mov _unused_lanes(state), unused_lanes @@ -230,16 +213,8 @@ len_is_0: mov tmp2_w, offset(job_rax) return: - - mov _GPR_SAVE(%rsp), %rbx - mov _GPR_SAVE+8*1(%rsp), %r10 #saved rsp - mov _GPR_SAVE+8*3(%rsp), %rbp - mov _GPR_SAVE+8*4(%rsp), %r12 - mov _GPR_SAVE+8*5(%rsp), %r13 - mov _GPR_SAVE+8*6(%rsp), %r14 - mov _GPR_SAVE+8*7(%rsp), %r15 - mov %r10, %rsp - + pop %rbx + FRAME_END ret return_null: diff --git a/arch/x86/crypto/sha-mb/sha1_mb_mgr_submit_avx2.S b/arch/x86/crypto/sha-mb/sha1_mb_mgr_submit_avx2.S index c420d89b175f..63a0d9c8e31f 100644 --- a/arch/x86/crypto/sha-mb/sha1_mb_mgr_submit_avx2.S +++ b/arch/x86/crypto/sha-mb/sha1_mb_mgr_submit_avx2.S @@ -53,6 +53,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #include "sha1_mb_mgr_datastruct.S" @@ -86,33 +87,21 @@ job_rax = %rax len = %rax DWORD_len = %eax -lane = %rbp -tmp3 = %rbp +lane = %r12 +tmp3 = %r12 tmp = %r9 DWORD_tmp = %r9d lane_data = %r10 -# STACK_SPACE needs to be an odd multiple of 8 -STACK_SPACE = 8*8 + 16*10 + 8 - # JOB* submit_mb_mgr_submit_avx2(MB_MGR *state, job_sha1 *job) # arg 1 : rcx : state # arg 2 : rdx : job ENTRY(sha1_mb_mgr_submit_avx2) - - mov %rsp, %r10 - sub $STACK_SPACE, %rsp - and $~31, %rsp - - mov %rbx, (%rsp) - mov %r10, 8*2(%rsp) #save old rsp - mov %rbp, 8*3(%rsp) - mov %r12, 8*4(%rsp) - mov %r13, 8*5(%rsp) - mov %r14, 8*6(%rsp) - mov %r15, 8*7(%rsp) + FRAME_BEGIN + push %rbx + push %r12 mov _unused_lanes(state), unused_lanes mov unused_lanes, lane @@ -203,16 +192,9 @@ len_is_0: movl DWORD_tmp, _result_digest+1*16(job_rax) return: - - mov (%rsp), %rbx - mov 8*2(%rsp), %r10 #save old rsp - mov 8*3(%rsp), %rbp - mov 8*4(%rsp), %r12 - mov 8*5(%rsp), %r13 - mov 8*6(%rsp), %r14 - mov 8*7(%rsp), %r15 - mov %r10, %rsp - + pop %r12 + pop %rbx + FRAME_END ret return_null: diff --git a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S index 05058134c443..dc66273e610d 100644 --- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S @@ -24,6 +24,7 @@ */ #include <linux/linkage.h> +#include <asm/frame.h> #include "glue_helper-asm-avx.S" .file "twofish-avx-x86_64-asm_64.S" @@ -333,6 +334,7 @@ ENTRY(twofish_ecb_enc_8way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN movq %rsi, %r11; @@ -342,6 +344,7 @@ ENTRY(twofish_ecb_enc_8way) store_8way(%r11, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2); + FRAME_END ret; ENDPROC(twofish_ecb_enc_8way) @@ -351,6 +354,7 @@ ENTRY(twofish_ecb_dec_8way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN movq %rsi, %r11; @@ -360,6 +364,7 @@ ENTRY(twofish_ecb_dec_8way) store_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(twofish_ecb_dec_8way) @@ -369,6 +374,7 @@ ENTRY(twofish_cbc_dec_8way) * %rsi: dst * %rdx: src */ + FRAME_BEGIN pushq %r12; @@ -383,6 +389,7 @@ ENTRY(twofish_cbc_dec_8way) popq %r12; + FRAME_END ret; ENDPROC(twofish_cbc_dec_8way) @@ -393,6 +400,7 @@ ENTRY(twofish_ctr_8way) * %rdx: src * %rcx: iv (little endian, 128bit) */ + FRAME_BEGIN pushq %r12; @@ -408,6 +416,7 @@ ENTRY(twofish_ctr_8way) popq %r12; + FRAME_END ret; ENDPROC(twofish_ctr_8way) @@ -418,6 +427,7 @@ ENTRY(twofish_xts_enc_8way) * %rdx: src * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN movq %rsi, %r11; @@ -430,6 +440,7 @@ ENTRY(twofish_xts_enc_8way) /* dst <= regs xor IVs(in dst) */ store_xts_8way(%r11, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2); + FRAME_END ret; ENDPROC(twofish_xts_enc_8way) @@ -440,6 +451,7 @@ ENTRY(twofish_xts_dec_8way) * %rdx: src * %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸)) */ + FRAME_BEGIN movq %rsi, %r11; @@ -452,5 +464,6 @@ ENTRY(twofish_xts_dec_8way) /* dst <= regs xor IVs(in dst) */ store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2); + FRAME_END ret; ENDPROC(twofish_xts_dec_8way) |