aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/crypto
AgeCommit message (Collapse)Author
2023-03-11crypto: x86/ghash - fix unaligned access in ghash_setkey()Eric Biggers
[ Upstream commit 116db2704c193fff6d73ea6c2219625f0c9bdfc8 ] The key can be unaligned, so use the unaligned memory access helpers. Fixes: 8ceee72808d1 ("crypto: ghash-clmulni-intel - use C implementation for setkey()") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-25crypto: x86/poly1305 - Fixup SLSPeter Zijlstra
commit 7ed7aa4de9421229be6d331ed52d5cd09c99f409 upstream. Due to being a perl generated asm file, it got missed by the mass convertion script. arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_init_x86_64()+0x3a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_x86_64()+0xf2: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_emit_x86_64()+0x37: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: __poly1305_block()+0x6d: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: __poly1305_init_avx()+0x1e8: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx()+0x18a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx()+0xaf8: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_emit_avx()+0x99: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx2()+0x18a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx2()+0x776: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx512()+0x18a: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx512()+0x796: missing int3 after ret arch/x86/crypto/poly1305-x86_64-cryptogams.o: warning: objtool: poly1305_blocks_avx512()+0x10bd: missing int3 after ret Fixes: f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-25x86: Prepare asm files for straight-line-speculationPeter Zijlstra
commit f94909ceb1ed4bfdb2ada72f93236305e6d6951f upstream. Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \*.S | while read file do sed -i 's/\<ret[q]*\>/RET/' $file done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org [bwh: Backported to 5.10: ran the above command] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-30lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFIJason A. Donenfeld
commit d2a02e3c8bb6b347818518edff5a4b40ff52d6d8 upstream. blake2s_compress_generic is weakly aliased by blake2s_compress. The current harness for function selection uses a function pointer, which is ordinarily inlined and resolved at compile time. But when Clang's CFI is enabled, CFI still triggers when making an indirect call via a weak symbol. This seems like a bug in Clang's CFI, as though it's bucketing weak symbols and strong symbols differently. It also only seems to trigger when "full LTO" mode is used, rather than "thin LTO". [ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444) [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1 [ 0.000000][ T0] Hardware name: MT6873 (DT) [ 0.000000][ T0] Call trace: [ 0.000000][ T0] dump_backtrace+0xfc/0x1dc [ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c [ 0.000000][ T0] panic+0x194/0x464 [ 0.000000][ T0] __cfi_check_fail+0x54/0x58 [ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0 [ 0.000000][ T0] blake2s_update+0x14c/0x178 [ 0.000000][ T0] _extract_entropy+0xf4/0x29c [ 0.000000][ T0] crng_initialize_primary+0x24/0x94 [ 0.000000][ T0] rand_initialize+0x2c/0x6c [ 0.000000][ T0] start_kernel+0x2f8/0x65c [ 0.000000][ T0] __primary_switched+0xc4/0x7be4 [ 0.000000][ T0] Rebooting in 5 seconds.. Nonetheless, the function pointer method isn't so terrific anyway, so this patch replaces it with a simple boolean, which also gets inlined away. This successfully works around the Clang bug. In general, I'm not too keen on all of the indirection involved here; it clearly does more harm than good. Hopefully the whole thing can get cleaned up down the road when lib/crypto is overhauled more comprehensively. But for now, we go with a simple bandaid. Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in") Link: https://github.com/ClangBuiltLinux/linux/issues/1567 Reported-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-30lib/crypto: blake2s: include as built-inJason A. Donenfeld
commit 6048fdcc5f269c7f31d774c295ce59081b36e6f9 upstream. In preparation for using blake2s in the RNG, we change the way that it is wired-in to the build system. Instead of using ifdefs to select the right symbol, we use weak symbols. And because ARM doesn't need the generic implementation, we make the generic one default only if an arch library doesn't need it already, and then have arch libraries that do need it opt-in. So that the arch libraries can remain tristate rather than bool, we then split the shash part from the glue code. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-30crypto: blake2s - share the "shash" API boilerplate codeEric Biggers
commit 8c4a93a1270ddffc7660ae43fa8030ecfe9c06d9 upstream. Add helper functions for shash implementations of BLAKE2s to include/crypto/internal/blake2s.h, taking advantage of __blake2s_update() and __blake2s_final() that were added by the previous patch to share more code between the library and shash implementations. crypto_blake2s_setkey() and crypto_blake2s_init() are usable as shash_alg::setkey and shash_alg::init directly, while crypto_blake2s_update() and crypto_blake2s_final() take an extra 'blake2s_compress_t' function pointer parameter. This allows the implementation of the compression function to be overridden, which is the only part that optimized implementations really care about. The new functions are inline functions (similar to those in sha1_base.h, sha256_base.h, and sm3_base.h) because this avoids needing to add a new module blake2s_helpers.ko, they aren't *too* long, and this avoids indirect calls which are expensive these days. Note that they can't go in blake2s_generic.ko, as that would require selecting CRYPTO_BLAKE2S from CRYPTO_BLAKE2S_X86, which would cause a recursive dependency. Finally, use these new helper functions in the x86 implementation of BLAKE2s. (This part should be a separate patch, but unfortunately the x86 implementation used the exact same function names like "crypto_blake2s_update()", so it had to be updated at the same time.) Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-30crypto: x86/blake2s - define shash_alg structs using macrosEric Biggers
commit 1aa90f4cf034ed4f016a02330820ac0551a6c13c upstream. The shash_alg structs for the four variants of BLAKE2s are identical except for the algorithm name, driver name, and digest size. So, avoid code duplication by using a macro to define these structs. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25crypto: x86/chacha20 - Avoid spurious jumps to other functionsPeter Zijlstra
[ Upstream commit 4327d168515fd8b5b92fa1efdf1d219fb6514460 ] The chacha_Nblock_xor_avx512vl() functions all have their own, identical, .LdoneN label, however in one particular spot {2,4} jump to the 8 version instead of their own. Resulting in: arch/x86/crypto/chacha-x86_64.o: warning: objtool: chacha_2block_xor_avx512vl() falls through to next function chacha_8block_xor_avx512vl() arch/x86/crypto/chacha-x86_64.o: warning: objtool: chacha_4block_xor_avx512vl() falls through to next function chacha_8block_xor_avx512vl() Make each function consistently use its own done label. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14crypto: x86/curve25519 - fix cpu feature checking logic in mod_exitHangbin Liu
[ Upstream commit 1b82435d17774f3eaab35dce239d354548aa9da2 ] In curve25519_mod_init() the curve25519_alg will be registered only when (X86_FEATURE_BMI2 && X86_FEATURE_ADX). But in curve25519_mod_exit() it still checks (X86_FEATURE_BMI2 || X86_FEATURE_ADX) when do crypto unregister. This will trigger a BUG_ON in crypto_unregister_alg() as alg->cra_refcnt is 0 if the cpu only supports one of X86_FEATURE_BMI2 and X86_FEATURE_ADX. Fixes: 07b586fe0662 ("crypto: x86/curve25519 - replace with formally verified implementation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14crypto: poly1305 - fix poly1305_core_setkey() declarationArnd Bergmann
[ Upstream commit 8d195e7a8ada68928f2aedb2c18302a4518fe68e ] gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16]) | ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’} 21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-20crypto: x86/aes-ni-xts - use direct calls to and 4-way strideArd Biesheuvel
[ Upstream commit 86ad60a65f29dd862a11c22bb4b5be28d6c5cef1 ] The XTS asm helper arrangement is a bit odd: the 8-way stride helper consists of back-to-back calls to the 4-way core transforms, which are called indirectly, based on a boolean that indicates whether we are performing encryption or decryption. Given how costly indirect calls are on x86, let's switch to direct calls, and given how the 8-way stride doesn't really add anything substantial, use a 4-way stride instead, and make the asm core routine deal with any multiple of 4 blocks. Since 512 byte sectors or 4 KB blocks are the typical quantities XTS operates on, increase the stride exported to the glue helper to 512 bytes as well. As a result, the number of indirect calls is reduced from 3 per 64 bytes of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU) Fixes: 9697fa39efd3f ("x86/retpoline/crypto: Convert crypto assembler indirect jumps") Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-20crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%regUros Bizjak
[ Upstream commit 032d049ea0f45b45c21f3f02b542aa18bc6b6428 ] CMP $0,%reg can't set overflow flag, so we can use shorter TEST %reg,%reg instruction when only zero and sign flags are checked (E,L,LE,G,GE conditions). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04crypto: aesni - prevent misaligned buffers on the stackArd Biesheuvel
commit a13ed1d15b07a04b1f74b2df61ff7a5e47f45dd8 upstream. The GCM mode driver uses 16 byte aligned buffers on the stack to pass the IV to the asm helpers, but unfortunately, the x86 port does not guarantee that the stack pointer is 16 byte aligned upon entry in the first place. Since the compiler is not aware of this, it will not emit the additional stack realignment sequence that is needed, and so the alignment is not guaranteed to be more than 8 bytes. So instead, allocate some padding on the stack, and realign the IV pointer by hand. Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-24crypto: x86/poly1305 - add back a needed assignmentEric Biggers
One of the assignments that was removed by commit 4a0c1de64bf9 ("crypto: x86/poly1305 - Remove assignments with no effect") is actually needed, since it affects the return value. This fixes the following crypto self-test failure: alg: shash: poly1305-simd test failed (wrong result) on test vector 2, cfg="init+update+final aligned buffer" Fixes: 4a0c1de64bf9 ("crypto: x86/poly1305 - Remove assignments with no effect") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-10-02crypto: x86/poly1305 - Remove assignments with no effectHerbert Xu
This patch removes a few ineffectual assignments from the function crypto_poly1305_setdctxkey. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11crypto: poly1305-x86_64 - Use XORL r32,32Uros Bizjak
x86_64 zero extends 32bit operations, so for 64bit operands, XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids a REX prefix byte when legacy registers are used. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11crypto: curve25519-x86_64 - Use XORL r32,32Uros Bizjak
x86_64 zero extends 32bit operations, so for 64bit operands, XORL r32,r32 is functionally equal to XORL r64,r64, but avoids a REX prefix byte when legacy registers are used. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-08-21crypto: x86/crc32c-intel - Use CRC32 mnemonicUros Bizjak
Current minimum required version of binutils is 2.23, which supports CRC32 instruction mnemonic. Replace the byte-wise specification of CRC32 with this proper mnemonic. The compiler is now able to pass memory operand to the instruction, so there is no need for a temporary register anymore. Some examples of the improvement: 12a: 48 8b 08 mov (%rax),%rcx 12d: f2 48 0f 38 f1 f1 crc32q %rcx,%rsi 133: 48 83 c0 08 add $0x8,%rax 137: 48 39 d0 cmp %rdx,%rax 13a: 75 ee jne 12a <crc32c_intel_update+0x1a> to: 125: f2 48 0f 38 f1 06 crc32q (%rsi),%rax 12b: 48 83 c6 08 add $0x8,%rsi 12f: 48 39 d6 cmp %rdx,%rsi 132: 75 f1 jne 125 <crc32c_intel_update+0x15> and: 146: 0f b6 08 movzbl (%rax),%ecx 149: f2 0f 38 f0 f1 crc32b %cl,%esi 14e: 48 83 c0 01 add $0x1,%rax 152: 48 39 d0 cmp %rdx,%rax 155: 75 ef jne 146 <crc32c_intel_update+0x36> to: 13b: f2 0f 38 f0 02 crc32b (%rdx),%eax 140: 48 83 c2 01 add $0x1,%rdx 144: 48 39 ca cmp %rcx,%rdx 147: 75 f2 jne 13b <crc32c_intel_update+0x2b> As the compiler has some more freedom w.r.t. register allocation, there is also a couple of reg-reg moves removed. There are no hidden states for CRC32 insn, so there is no need to mark assembly as volatile. v2: Introduce CRC32_INST define. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-08-20crypto: algapi - Remove skbuff.h inclusionHerbert Xu
The header file algapi.h includes skbuff.h unnecessarily since all we need is a forward declaration for struct sk_buff. This patch removes that inclusion. Unfortunately skbuff.h pulls in a lot of things and drivers over the years have come to rely on it so this patch adds a lot of missing inclusions that result from this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-31crypto: x86/curve25519 - Remove unused carry variablesHerbert Xu
The carry variables are assigned but never used, which upsets the compiler. This patch removes them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Karthikeyan Bhargavan <karthik.bhargavan@gmail.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-23crypto: x86/crc32c - fix building with clang iasArnd Bergmann
The clang integrated assembler complains about movzxw: arch/x86/crypto/crc32c-pcl-intel-asm_64.S:173:2: error: invalid instruction mnemonic 'movzxw' It seems that movzwq is the mnemonic that it expects instead, and this is what objdump prints when disassembling the file. Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-16crypto: x86 - Remove include/asm/inst.hUros Bizjak
Current minimum required version of binutils is 2.23, which supports PSHUFB, PCLMULQDQ, PEXTRD, AESKEYGENASSIST, AESIMC, AESENC, AESENCLAST, AESDEC, AESDECLAST and MOVQ instruction mnemonics. Substitute macros from include/asm/inst.h with a proper instruction mnemonics in various assmbly files from x86/crypto directory, and remove now unneeded file. The patch was tested by calculating and comparing sha256sum hashes of stripped object files before and after the patch, to be sure that executable code didn't change. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-16crypto: x86/chacha-sse3 - use unaligned loads for state arrayArd Biesheuvel
Due to the fact that the x86 port does not support allocating objects on the stack with an alignment that exceeds 8 bytes, we have a rather ugly hack in the x86 code for ChaCha to ensure that the state array is aligned to 16 bytes, allowing the SSE3 implementation of the algorithm to use aligned loads. Given that the performance benefit of using of aligned loads appears to be limited (~0.25% for 1k blocks using tcrypt on a Corei7-8650U), and the fact that this hack has leaked into generic ChaCha code, let's just remove it. Cc: Martin Willi <martin@strongswan.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin Willi <martin@strongswan.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-09crypto: aesni - Fix build with LLVM_IAS=1Sedat Dilek
When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS) from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils I see the following breakage in Debian/testing AMD64: <instantiation>:15:74: error: too many positional arguments PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, ^ arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp) ^ <instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc ^ arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation GCM_ENC_DEC dec ^ <instantiation>:15:74: error: too many positional arguments PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, ^ arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp) ^ <instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc ^ arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation GCM_ENC_DEC enc Craig Topper suggested me in ClangBuiltLinux issue #1050: > I think the "too many positional arguments" is because the parser isn't able > to handle the trailing commas. > > The "unknown use of instruction mnemonic" is because the macro was named > GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with > GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the > macro instantiation, but llvm doesn't. First, I removed the trailing comma in the PRECOMPUTE line. Second, I substituted: 1. GHASH_4_ENCRYPT_4_PARALLEL_DEC -> GHASH_4_ENCRYPT_4_PARALLEL_dec 2. GHASH_4_ENCRYPT_4_PARALLEL_ENC -> GHASH_4_ENCRYPT_4_PARALLEL_enc With these changes I was able to build with LLVM_IAS=1 and boot on bare metal. I confirmed that this works with Linux-kernel v5.7.5 final. NOTE: This patch is on top of Linux v5.7 final. Thanks to Craig and especially Nick for double-checking and his comments. Suggested-by: Craig Topper <craig.topper@intel.com> Suggested-by: Craig Topper <craig.topper@gmail.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: "ClangBuiltLinux" <clang-built-linux@googlegroups.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1050 Link: https://bugs.llvm.org/show_bug.cgi?id=24494 Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-03crypto: aesni - add compatibility with IASJian Cai
Clang's integrated assembler complains "invalid reassignment of non-absolute variable 'var_ddq_add'" while assembling arch/x86/crypto/aes_ctrby8_avx-x86_64.S. It was because var_ddq_add was reassigned with non-absolute values several times, which IAS did not support. We can avoid the reassignment by replacing the uses of var_ddq_add with its definitions accordingly to have compatilibility with IAS. Link: https://github.com/ClangBuiltLinux/linux/issues/1008 Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-by: Fangrui Song <maskray@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot Linux v5.7.5; clang v11.0.0-git Signed-off-by: Jian Cai <caij2003@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-06-01Merge tag 'objtool-core-2020-06-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "There are a lot of objtool changes in this cycle, all across the map: - Speed up objtool significantly, especially when there are large number of sections - Improve objtool's understanding of special instructions such as IRET, to reduce the number of annotations required - Implement 'noinstr' validation - Do baby steps for non-x86 objtool use - Simplify/fix retpoline decoding - Add vmlinux validation - Improve documentation - Fix various bugs and apply smaller cleanups" * tag 'objtool-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) objtool: Enable compilation of objtool for all architectures objtool: Move struct objtool_file into arch-independent header objtool: Exit successfully when requesting help objtool: Add check_kcov_mode() to the uaccess safelist samples/ftrace: Fix asm function ELF annotations objtool: optimize add_dead_ends for split sections objtool: use gelf_getsymshndx to handle >64k sections objtool: Allow no-op CFI ops in alternatives x86/retpoline: Fix retpoline unwind x86: Change {JMP,CALL}_NOSPEC argument x86: Simplify retpoline declaration x86/speculation: Change FILL_RETURN_BUFFER to work with objtool objtool: Add support for intra-function calls objtool: Move the IRET hack into the arch decoder objtool: Remove INSN_STACK objtool: Make handle_insn_ops() unconditional objtool: Rework allocating stack_ops on decode objtool: UNWIND_HINT_RET_OFFSET should not check registers objtool: is_fentry_call() crashes if call has no destination x86,smap: Fix smap_{save,restore}() alternatives ...
2020-06-01Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Introduce crypto_shash_tfm_digest() and use it wherever possible. - Fix use-after-free and race in crypto_spawn_alg. - Add support for parallel and batch requests to crypto_engine. Algorithms: - Update jitter RNG for SP800-90B compliance. - Always use jitter RNG as seed in drbg. Drivers: - Add Arm CryptoCell driver cctrng. - Add support for SEV-ES to the PSP driver in ccp" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits) crypto: hisilicon - fix driver compatibility issue with different versions of devices crypto: engine - do not requeue in case of fatal error crypto: cavium/nitrox - Fix a typo in a comment crypto: hisilicon/qm - change debugfs file name from qm_regs to regs crypto: hisilicon/qm - add DebugFS for xQC and xQE dump crypto: hisilicon/zip - add debugfs for Hisilicon ZIP crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC crypto: hisilicon/qm - add debugfs to the QM state machine crypto: hisilicon/qm - add debugfs for QM crypto: stm32/crc32 - protect from concurrent accesses crypto: stm32/crc32 - don't sleep in runtime pm crypto: stm32/crc32 - fix multi-instance crypto: stm32/crc32 - fix run-time self test issue. crypto: stm32/crc32 - fix ext4 chksum BUG_ON() crypto: hisilicon/zip - Use temporary sqe when doing work crypto: hisilicon - add device error report through abnormal irq crypto: hisilicon - remove codes of directly report device errors through MSI crypto: hisilicon - QM memory management optimization crypto: hisilicon - unify initial value assignment into QM ...
2020-05-18Merge tag 'v5.7-rc6' into objtool/core, to pick up fixes and resolve ↵Ingo Molnar
semantic conflict Resolve structural conflict between: 59566b0b622e: ("x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up") which introduced a new reference to 'ftrace_epilogue', and: 0298739b7983: ("x86,ftrace: Fix ftrace_regs_caller() unwind") Which renamed it to 'ftrace_caller_end'. Rename the new usage site in the merge commit. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-08crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.hEric Biggers
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Most files that include this header don't actually need it. So in preparation for removing it, remove all these unneeded includes of it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30x86: Change {JMP,CALL}_NOSPEC argumentPeter Zijlstra
In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line versions of the retpoline magic, we need to remove the '%' from the argument, such that we can paste it onto symbol names. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org
2020-04-30crypto: arch/nhpoly1305 - process in explicit 4k chunksJason A. Donenfeld
Rather than chunking via PAGE_SIZE, this commit changes the arch implementations to chunk in explicit 4k parts, so that calculations on maximum acceptable latency don't suddenly become invalid on platforms where PAGE_SIZE isn't 4k, such as arm64. Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305") Fixes: 012c82388c03 ("crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305") Fixes: a00fa0c88774 ("crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305") Fixes: 16aae3595a9d ("crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30crypto: arch/lib - limit simd usage to 4k chunksJason A. Donenfeld
The initial Zinc patchset, after some mailing list discussion, contained code to ensure that kernel_fpu_enable would not be kept on for more than a 4k chunk, since it disables preemption. The choice of 4k isn't totally scientific, but it's not a bad guess either, and it's what's used in both the x86 poly1305, blake2s, and nhpoly1305 code already (in the form of PAGE_SIZE, which this commit corrects to be explicitly 4k for the former two). Ard did some back of the envelope calculations and found that at 5 cycles/byte (overestimate) on a 1ghz processor (pretty slow), 4k means we have a maximum preemption disabling of 20us, which Sebastian confirmed was probably a good limit. Unfortunately the chunking appears to have been left out of the final patchset that added the glue code. So, this commit adds it back in. Fixes: 84e03fa39fbe ("crypto: x86/chacha - expose SIMD ChaCha routine as library function") Fixes: b3aad5bad26a ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function") Fixes: a44a3430d71b ("crypto: arm/chacha - expose ARM ChaCha routine as library function") Fixes: d7d7b8535662 ("crypto: x86/poly1305 - wire up faster implementations for kernel") Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Fixes: a6b803b3ddc7 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation") Cc: Eric Biggers <ebiggers@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-09x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2Jason A. Donenfeld
Now that the kernel specifies binutils 2.23 as the minimum version, we can remove ifdefs for AVX2 and ADX throughout. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-09crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'Masahiro Yamada
poly1305-x86_64-cryptogams.S is a generated file, so it should be cleaned up by 'make clean'. Assigning it to the variable 'targets' teaches Kbuild that it is a generated file. However, this line is not evaluated when cleaning because scripts/Makefile.clean does not include include/config/auto.conf. Remove the ifneq-conditional, so this file is correctly cleaned up. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09crypto: x86 - rework configuration based on KconfigJason A. Donenfeld
Now that assembler capabilities are probed inside of Kconfig, we can set up proper Kconfig-based dependencies. We also take this opportunity to reorder the Makefile, so that items are grouped logically by primitive. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-09x86: remove always-defined CONFIG_AS_AVXMasahiro Yamada
CONFIG_AS_AVX was introduced by commit ea4d26ae24e5 ("raid5: add AVX optimized RAID5 checksumming"). We raise the minimal supported binutils version from time to time. The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum required binutils version to 2.21"). I confirmed the code in $(call as-instr,...) can be assembled by the binutils 2.21 assembler and also by LLVM integrated assembler. Remove CONFIG_AS_AVX, which is always defined. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-09x86: remove always-defined CONFIG_AS_SSSE3Masahiro Yamada
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3e6a4 ("x86/raid6: correctly check for assembler capabilities"). We raise the minimal supported binutils version from time to time. The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum required binutils version to 2.21"). I confirmed the code in $(call as-instr,...) can be assembled by the binutils 2.21 assembler and also by LLVM integrated assembler. Remove CONFIG_AS_SSSE3, which is always defined. I added ifdef CONFIG_X86 to lib/raid6/algos.c to avoid link errors on non-x86 architectures. lib/raid6/algos.c is built not only for the kernel but also for testing the library code from userspace. I added -DCONFIG_X86 to lib/raid6/test/Makefile to cator to this usecase. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2020-04-03Merge tag 'spdx-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-04-01Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix out-of-sync IVs in self-test for IPsec AEAD algorithms Algorithms: - Use formally verified implementation of x86/curve25519 Drivers: - Enhance hwrng support in caam - Use crypto_engine for skcipher/aead/rsa/hash in caam - Add Xilinx AES driver - Add uacce driver - Register zip engine to uacce in hisilicon - Add support for OCTEON TX CPT engine in marvell" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits) crypto: af_alg - bool type cosmetics crypto: arm[64]/poly1305 - add artifact to .gitignore files crypto: caam - limit single JD RNG output to maximum of 16 bytes crypto: caam - enable prediction resistance in HRWNG bus: fsl-mc: add api to retrieve mc version crypto: caam - invalidate entropy register during RNG initialization crypto: caam - check if RNG job failed crypto: caam - simplify RNG implementation crypto: caam - drop global context pointer and init_done crypto: caam - use struct hwrng's .init for initialization crypto: caam - allocate RNG instantiation descriptor with GFP_DMA crypto: ccree - remove duplicated include from cc_aead.c crypto: chelsio - remove set but not used variable 'adap' crypto: marvell - enable OcteonTX cpt options for build crypto: marvell - add the Virtual Function driver for CPT crypto: marvell - add support for OCTEON TX CPT engine crypto: marvell - create common Kconfig and Makefile for Marvell crypto: arm/neon - memzero_explicit aes-cbc key crypto: bcm - Use scnprintf() for avoiding potential buffer overflow crypto: atmel-i2c - Fix wakeup fail ...
2020-03-25Merge branch 'x86/cpu' into perf/core, to resolve conflictIngo Molnar
Conflicts: arch/x86/events/intel/uncore.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25.gitignore: add SPDX License IdentifierMasahiro Yamada
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-24crypto: Convert to new CPU match macrosThomas Gleixner
The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/20200320131510.700250889@linutronix.de
2020-03-06crypto: x86/curve25519 - leave r12 as spare registerJason A. Donenfeld
This updates to the newer register selection proved by HACL*, which leads to a more compact instruction encoding, and saves around 100 cycles. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-03-05crypto: x86/curve25519 - support assemblers with no adx supportJason A. Donenfeld
Some older version of GAS do not support the ADX instructions, similarly to how they also don't support AVX and such. This commit adds the same build-time detection mechanisms we use for AVX and others for ADX, and then makes sure that the curve25519 library dispatcher calls the right functions. Reported-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-02-13crypto: x86/curve25519 - replace with formally verified implementationJason A. Donenfeld
This comes from INRIA's HACL*/Vale. It implements the same algorithm and implementation strategy as the code it replaces, only this code has been formally verified, sans the base point multiplication, which uses code similar to prior, only it uses the formally verified field arithmetic alongside reproducable ladder generation steps. This doesn't have a pure-bmi2 version, which means haswell no longer benefits, but the increased (doubled) code complexity is not worth it for a single generation of chips that's already old. Performance-wise, this is around 1% slower on older microarchitectures, and slightly faster on newer microarchitectures, mainly 10nm ones or backports of 10nm to 14nm. This implementation is "everest" below: Xeon E5-2680 v4 (Broadwell) armfazh: 133340 cycles per call everest: 133436 cycles per call Xeon Gold 5120 (Sky Lake Server) armfazh: 112636 cycles per call everest: 113906 cycles per call Core i5-6300U (Sky Lake Client) armfazh: 116810 cycles per call everest: 117916 cycles per call Core i7-7600U (Kaby Lake) armfazh: 119523 cycles per call everest: 119040 cycles per call Core i7-8750H (Coffee Lake) armfazh: 113914 cycles per call everest: 113650 cycles per call Core i9-9880H (Coffee Lake Refresh) armfazh: 112616 cycles per call everest: 114082 cycles per call Core i3-8121U (Cannon Lake) armfazh: 113202 cycles per call everest: 111382 cycles per call Core i7-8265U (Whiskey Lake) armfazh: 127307 cycles per call everest: 127697 cycles per call Core i7-8550U (Kaby Lake Refresh) armfazh: 127522 cycles per call everest: 127083 cycles per call Xeon Platinum 8275CL (Cascade Lake) armfazh: 114380 cycles per call everest: 114656 cycles per call Achieving these kind of results with formally verified code is quite remarkable, especialy considering that performance is favorable for newer chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22crypto: x86/poly1305 - emit does base conversion itselfJason A. Donenfeld
The emit code does optional base conversion itself in assembly, so we don't need to do that here. Also, neither one of these functions uses simd instructions, so checking for that doesn't make sense either. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22crypto: x86/poly1305 - fix .gitignore typoJason A. Donenfeld
Admist the kbuild robot induced changes, the .gitignore file for the generated file wasn't updated with the non-clashing filename. This commit adjusts that. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-22crypto: x86/sha - Eliminate casts on asm implementationsKees Cook
In order to avoid CFI function prototype mismatches, this removes the casts on assembly implementations of sha1/256/512 accelerators. The safety checks from BUILD_BUG_ON() remain. Additionally, this renames various arguments for clarity, as suggested by Eric Biggers. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16crypto: x86/poly1305 - wire up faster implementations for kernelJason A. Donenfeld
These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F. The AVX-512F implementation is disabled on Skylake, due to throttling, but it is quite fast on >= Cannonlake. On the left is cycle counts on a Core i7 6700HQ using the AVX-2 codepath, comparing this implementation ("new") to the implementation in the current crypto api ("old"). On the right are benchmarks on a Xeon Gold 5120 using the AVX-512 codepath. The new implementation is faster on all benchmarks. AVX-2 AVX-512 --------- ----------- size old new size old new ---- ---- ---- ---- ---- ---- 0 70 68 0 74 70 16 92 90 16 96 92 32 134 104 32 136 106 48 172 120 48 184 124 64 218 136 64 218 138 80 254 158 80 260 160 96 298 174 96 300 176 112 342 192 112 342 194 128 388 212 128 384 212 144 428 228 144 420 226 160 466 246 160 464 248 176 510 264 176 504 264 192 550 282 192 544 282 208 594 302 208 582 300 224 628 316 224 624 318 240 676 334 240 662 338 256 716 354 256 708 358 272 764 374 272 748 372 288 802 352 288 788 358 304 420 366 304 422 370 320 428 360 320 432 364 336 484 378 336 486 380 352 426 384 352 434 390 368 478 400 368 480 408 384 488 394 384 490 398 400 542 408 400 542 412 416 486 416 416 492 426 432 534 430 432 538 436 448 544 422 448 546 432 464 600 438 464 600 448 480 540 448 480 548 456 496 594 464 496 594 476 512 602 456 512 606 470 528 656 476 528 656 480 544 600 480 544 606 498 560 650 494 560 652 512 576 664 490 576 662 508 592 714 508 592 716 522 608 656 514 608 664 538 624 708 532 624 710 552 640 716 524 640 720 516 656 770 536 656 772 526 672 716 548 672 722 544 688 770 562 688 768 556 704 774 552 704 778 556 720 826 568 720 832 568 736 768 574 736 780 584 752 822 592 752 826 600 768 830 584 768 836 560 784 884 602 784 888 572 800 828 610 800 838 588 816 884 628 816 884 604 832 888 618 832 894 598 848 942 632 848 946 612 864 884 644 864 896 628 880 936 660 880 942 644 896 948 652 896 952 608 912 1000 664 912 1004 616 928 942 676 928 954 634 944 994 690 944 1000 646 960 1002 680 960 1008 646 976 1054 694 976 1062 658 992 1002 706 992 1012 674 1008 1052 720 1008 1058 690 This commit wires in the prior implementation from Andy, and makes the following changes to be suitable for kernel land. - Some cosmetic and structural changes, like renaming labels to .Lname, constants, and other Linux conventions, as well as making the code easy for us to maintain moving forward. - CPU feature checking is done in C by the glue code. - We avoid jumping into the middle of functions, to appease objtool, and instead parameterize shared code. - We maintain frame pointers so that stack traces make sense. - We remove the dependency on the perl xlate code, which transforms the output into things that assemblers we don't care about use. Importantly, none of our changes affect the arithmetic or core code, but just involve the differing environment of kernel space. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Co-developed-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-16crypto: x86/poly1305 - import unmodified cryptogams implementationJason A. Donenfeld
These x86_64 vectorized implementations come from Andy Polyakov's CRYPTOGAMS implementation, and are included here in raw form without modification, so that subsequent commits that fix these up for the kernel can see how it has changed. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>