aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/crypto/aesni-intel_asm.S
AgeCommit message (Collapse)Author
2024-01-03arch/x86: Fix typosBjorn Helgaas
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2023-09-20crypto: aesni - Fix double word in commentsBo Liu
Remove the repeated word "if" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/aesni - Use local .L symbols for codeArd Biesheuvel
Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/aesni - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. In the GCM case, we can get rid of the oversized permutation array entirely while at it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-22x86: clean up symbol aliasingMark Rutland
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those to simplify the definition of function aliases across arch/x86. For clarity, where there are multiple annotations such as EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For example, where a function has a name and an alias which are both exported, this is organised as: SYM_FUNC_START(func) ... asm insns ... SYM_FUNC_END(func) EXPORT_SYMBOL(func) SYM_FUNC_ALIAS(alias, func) EXPORT_SYMBOL(alias) Where there are only aliases and no exports or other annotations, I have not bothered with line spacing, e.g. SYM_FUNC_START(func) ... asm insns ... SYM_FUNC_END(func) SYM_FUNC_ALIAS(alias, func) The tools/perf/ copies of memset_64.S and memset_64.S are updated likewise to avoid the build system complaining these are mismatched: | Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' | diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S | Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S' | diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220216162229.1076788-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-08x86: Prepare asm files for straight-line-speculationPeter Zijlstra
Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \*.S | while read file do sed -i 's/\<ret[q]*\>/RET/' $file done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org
2021-01-08crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helperArd Biesheuvel
The AES-NI driver implements XTS via the glue helper, which consumes a struct with sets of function pointers which are invoked on chunks of input data of the appropriate size, as annotated in the struct. Let's get rid of this indirection, so that we can perform direct calls to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e., provide a helper which can consume inputs of any size, provided that the penultimate, full block is passed via the last call if ciphertext stealing needs to be applied. This also allows us to enable the XTS mode for i386. Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-08crypto: x86/aes-ni-xts - use direct calls to and 4-way strideArd Biesheuvel
The XTS asm helper arrangement is a bit odd: the 8-way stride helper consists of back-to-back calls to the 4-way core transforms, which are called indirectly, based on a boolean that indicates whether we are performing encryption or decryption. Given how costly indirect calls are on x86, let's switch to direct calls, and given how the 8-way stride doesn't really add anything substantial, use a 4-way stride instead, and make the asm core routine deal with any multiple of 4 blocks. Since 512 byte sectors or 4 KB blocks are the typical quantities XTS operates on, increase the stride exported to the glue helper to 512 bytes as well. As a result, the number of indirect calls is reduced from 3 per 64 bytes of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU) Fixes: 9697fa39efd3f ("x86/retpoline/crypto: Convert crypto assembler indirect jumps") Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: aesni - implement support for cts(cbc(aes))Ard Biesheuvel
Follow the same approach as the arm64 driver for implementing a version of AES-NI in CBC mode that supports ciphertext stealing. This results in a ~2x speed increase for relatively short inputs (less than 256 bytes), which is relevant given that AES-CBC with ciphertext stealing is used for filename encryption in the fscrypt layer. For larger inputs, the speedup is still significant (~25% on decryption, ~6% on encryption) Tested-by: Eric Biggers <ebiggers@google.com> # x86_64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-04crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%regUros Bizjak
CMP $0,%reg can't set overflow flag, so we can use shorter TEST %reg,%reg instruction when only zero and sign flags are checked (E,L,LE,G,GE conditions). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-16crypto: x86 - Remove include/asm/inst.hUros Bizjak
Current minimum required version of binutils is 2.23, which supports PSHUFB, PCLMULQDQ, PEXTRD, AESKEYGENASSIST, AESIMC, AESENC, AESENCLAST, AESDEC, AESDECLAST and MOVQ instruction mnemonics. Substitute macros from include/asm/inst.h with a proper instruction mnemonics in various assmbly files from x86/crypto directory, and remove now unneeded file. The patch was tested by calculating and comparing sha256sum hashes of stripped object files before and after the patch, to be sure that executable code didn't change. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-07-09crypto: aesni - Fix build with LLVM_IAS=1Sedat Dilek
When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS) from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils I see the following breakage in Debian/testing AMD64: <instantiation>:15:74: error: too many positional arguments PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, ^ arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp) ^ <instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc ^ arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation GCM_ENC_DEC dec ^ <instantiation>:15:74: error: too many positional arguments PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, ^ arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp) ^ <instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc ^ arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation GCM_ENC_DEC enc Craig Topper suggested me in ClangBuiltLinux issue #1050: > I think the "too many positional arguments" is because the parser isn't able > to handle the trailing commas. > > The "unknown use of instruction mnemonic" is because the macro was named > GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with > GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the > macro instantiation, but llvm doesn't. First, I removed the trailing comma in the PRECOMPUTE line. Second, I substituted: 1. GHASH_4_ENCRYPT_4_PARALLEL_DEC -> GHASH_4_ENCRYPT_4_PARALLEL_dec 2. GHASH_4_ENCRYPT_4_PARALLEL_ENC -> GHASH_4_ENCRYPT_4_PARALLEL_enc With these changes I was able to build with LLVM_IAS=1 and boot on bare metal. I confirmed that this works with Linux-kernel v5.7.5 final. NOTE: This patch is on top of Linux v5.7 final. Thanks to Craig and especially Nick for double-checking and his comments. Suggested-by: Craig Topper <craig.topper@intel.com> Suggested-by: Craig Topper <craig.topper@gmail.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: "ClangBuiltLinux" <clang-built-linux@googlegroups.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1050 Link: https://bugs.llvm.org/show_bug.cgi?id=24494 Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-04-30x86: Change {JMP,CALL}_NOSPEC argumentPeter Zijlstra
In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line versions of the retpoline magic, we need to remove the '%' from the argument, such that we can paste it onto symbol names. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org
2019-12-11crypto: x86 - Regularize glue function prototypesKees Cook
The crypto glue performed function prototype casting via macros to make indirect calls to assembly routines. Instead of performing casts at the call sites (which trips Control Flow Integrity prototype checking), switch each prototype to a common standard set of arguments which allows the removal of the existing macros. In order to keep pointer math unchanged, internal casting between u128 pointers and u8 pointers is added. Co-developed-by: João Moreira <joao.moreira@intel.com> Signed-off-by: João Moreira <joao.moreira@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-10-18x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*Jiri Slaby
These are all functions which are invoked from elsewhere, so annotate them as global using the new SYM_FUNC_START and their ENDPROC's by SYM_FUNC_END. Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the last users. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate] Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits] Acked-by: Herbert Xu <herbert@gondor.apana.org.au> [crypto] Cc: Allison Randal <allison@lohutok.net> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Armijn Hemel <armijn@tjaldur.nl> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Enrico Weigelt <info@metux.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: kvm ML <kvm@vger.kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: linux-arch@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: linux-efi <linux-efi@vger.kernel.org> Cc: linux-efi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: platform-driver-x86@vger.kernel.org Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com> Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz
2019-10-18x86/asm: Annotate aliasesJiri Slaby
_key_expansion_128 is an alias to _key_expansion_256a, __memcpy to memcpy, xen_syscall32_target to xen_sysenter_target, and so on. Annotate them all using the new SYM_FUNC_START_ALIAS, SYM_FUNC_START_LOCAL_ALIAS, and SYM_FUNC_END_ALIAS. This will make the tools generating the debuginfo happy as it avoids nesting and double symbols. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> [xen parts] Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-arch@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191011115108.12392-10-jslaby@suse.cz
2019-10-18x86/asm/crypto: Annotate local functionsJiri Slaby
Use the newly added SYM_FUNC_START_LOCAL to annotate beginnings of all functions which do not have ".globl" annotation, but their endings are annotated by ENDPROC. This is needed to balance ENDPROC for tools that generate debuginfo. These function names are not prepended with ".L" as they might appear in call traces and they wouldn't be visible after such change. To be symmetric, the functions' ENDPROCs are converted to the new SYM_FUNC_END. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-arch@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191011115108.12392-7-jslaby@suse.cz
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-29Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - Check for the right CPU feature bit in sm4-ce on arm64. - Fix scatterwalk WARN_ON in aes-gcm-ce on arm64. - Fix unaligned fault in aesni on x86. - Fix potential NULL pointer dereference on exit in chtls. - Fix DMA mapping direction for RSA in caam. - Fix error path return value for xts setkey in caam. - Fix address endianness when DMA unmapping in caam. - Fix sleep-in-atomic in vmx. - Fix command corruption when queue is full in cavium/nitrox. * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: cavium/nitrox - fix for command corruption in queue full case with backlog submissions. crypto: vmx - Fix sleep-in-atomic bugs crypto: arm64/aes-gcm-ce - fix scatterwalk API violation crypto: aesni - Use unaligned loads from gcm_context_data crypto: chtls - fix null dereference chtls_free_uld() crypto: arm64/sm4-ce - check for the right CPU feature bit crypto: caam - fix DMA mapping direction for RSA forms 2 & 3 crypto: caam/qi - fix error path in xts setkey crypto: caam/jr - fix descriptor DMA unmapping
2018-08-25crypto: aesni - Use unaligned loads from gcm_context_dataDave Watson
A regression was reported bisecting to 1476db2d12 "Move HashKey computation from stack to gcm_context". That diff moved HashKey computation from the stack, which was explicitly aligned in the asm, to a struct provided from the C code, depending on AESNI_ALIGN_ATTR for alignment. It appears some compilers may not align this struct correctly, resulting in a crash on the movdqa instruction when attempting to encrypt or decrypt data. Fix by using unaligned loads for the HashKeys. On modern hardware there is no perf difference between the unaligned and aligned loads. All other accesses to gcm_context_data already use unaligned loads. Reported-by: Mauro Rossi <issor.oruam@gmail.com> Fixes: 1476db2d12 ("Move HashKey computation from stack to gcm_context") Cc: <stable@vger.kernel.org> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-03x86/asm/64: Use 32-bit XOR to zero registersJan Beulich
Some Intel CPUs don't recognize 64-bit XORs as zeroing idioms. Zeroing idioms don't require execution bandwidth, as they're being taken care of in the frontend (through register renaming). Use 32-bit XORs instead. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: herbert@gondor.apana.org.au Cc: pavel@ucw.cz Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/5B39FF1A02000078001CFB54@prv1-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-22crypto: aesni - Introduce scatter/gather asm function stubsDave Watson
The asm macros are all set up now, introduce entry points. GCM_INIT and GCM_COMPLETE have arguments supplied, so that the new scatter/gather entry points don't have to take all the arguments, and only the ones they need. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add fast path for > 16 byte updateDave Watson
We can fast-path any < 16 byte read if the full message is > 16 bytes, and shift over by the appropriate amount. Usually we are reading > 16 bytes, so this should be faster than the READ_PARTIAL macro introduced in b20209c91e2 for the average case. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce partial block macroDave Watson
Before this diff, multiple calls to GCM_ENC_DEC will succeed, but only if all calls are a multiple of 16 bytes. Handle partial blocks at the start of GCM_ENC_DEC, and update aadhash as appropriate. The data offset %r11 is also updated after the partial block. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Move HashKey computation from stack to gcm_contextDave Watson
HashKey computation only needs to happen once per scatter/gather operation, save it between calls in gcm_context struct instead of on the stack. Since the asm no longer stores anything on the stack, we can use %rsp directly, and clean up the frame save/restore macros a bit. Hashkeys actually only need to be calculated once per key and could be moved to when set_key is called, however, the current glue code falls back to generic aes code if fpu is disabled. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Move ghash_mul to GCM_COMPLETEDave Watson
Prepare to handle partial blocks between scatter/gather calls. For the last partial block, we only want to calculate the aadhash in GCM_COMPLETE, and a new partial block macro will handle both aadhash update and encrypting partial blocks between calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Fill in new context data structuresDave Watson
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values. pblocklen, aadhash, and pblockenckey are also updated at the end of each scatter/gather operation, to be carried over to the next operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Split AAD hash calculation to separate macroDave Watson
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce gcm_context_dataDave Watson
Introduce a gcm_context_data struct that will be used to pass context data between scatter/gather update calls. It is passed as the second argument (after crypto keys), other args are renumbered. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Merge encode and decode to GCM_ENC_DEC macroDave Watson
Make a macro for the main encode/decode routine. Only a small handful of lines differ for enc and dec. This will also become the main scatter/gather update routine. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add GCM_COMPLETE macroDave Watson
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add GCM_INIT macroDave Watson
Reduce code duplication by introducting GCM_INIT macro. This macro will also be exposed as a function for implementing scatter/gather support, since INIT only needs to be called once for the full operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Macro-ify func save/restoreDave Watson
Macro-ify function save and restore. These will be used in new functions added for scatter/gather update operations. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Merge INITIAL_BLOCKS_ENC/DECDave Watson
Use macro operations to merge implemetations of INITIAL_BLOCKS, since they differ by only a small handful of lines. Use macro counter \@ to simplify implementation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-31Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Enforce the setting of keys for keyed aead/hash/skcipher algorithms. - Add multibuf speed tests in tcrypt. Algorithms: - Improve performance of sha3-generic. - Add native sha512 support on arm64. - Add v8.2 Crypto Extentions version of sha3/sm3 on arm64. - Avoid hmac nesting by requiring underlying algorithm to be unkeyed. - Add cryptd_max_cpu_qlen module parameter to cryptd. Drivers: - Add support for EIP97 engine in inside-secure. - Add inline IPsec support to chelsio. - Add RevB core support to crypto4xx. - Fix AEAD ICV check in crypto4xx. - Add stm32 crypto driver. - Add support for BCM63xx platforms in bcm2835 and remove bcm63xx. - Add Derived Key Protocol (DKP) support in caam. - Add Samsung Exynos True RNG driver. - Add support for Exynos5250+ SoCs in exynos PRNG driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (166 commits) crypto: picoxcell - Fix error handling in spacc_probe() crypto: arm64/sha512 - fix/improve new v8.2 Crypto Extensions code crypto: arm64/sm3 - new v8.2 Crypto Extensions implementation crypto: arm64/sha3 - new v8.2 Crypto Extensions implementation crypto: testmgr - add new testcases for sha3 crypto: sha3-generic - export init/update/final routines crypto: sha3-generic - simplify code crypto: sha3-generic - rewrite KECCAK transform to help the compiler optimize crypto: sha3-generic - fixes for alignment and big endian operation crypto: aesni - handle zero length dst buffer crypto: artpec6 - remove select on non-existing CRYPTO_SHA384 hwrng: bcm2835 - Remove redundant dev_err call in bcm2835_rng_probe() crypto: stm32 - remove redundant dev_err call in stm32_cryp_probe() crypto: axis - remove unnecessary platform_get_resource() error check crypto: testmgr - test misuse of result in ahash crypto: inside-secure - make function safexcel_try_push_requests static crypto: aes-generic - fix aes-generic regression on powerpc crypto: chelsio - Fix indentation warning crypto: arm64/sha1-ce - get rid of literal pool crypto: arm64/sha2-ce - move the round constant table to .rodata section ...
2018-01-12x86/retpoline/crypto: Convert crypto assembler indirect jumpsDavid Woodhouse
Convert all indirect jumps in crypto assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-6-git-send-email-dwmw@amazon.co.uk
2017-12-28crypto: aesni - Fix out-of-bounds access of the AAD buffer in generic-gcm-aesniJunaid Shahid
The aesni_gcm_enc/dec functions can access memory after the end of the AAD buffer if the AAD length is not a multiple of 4 bytes. It didn't matter with rfc4106-gcm-aesni as in that case the AAD was always followed by the 8 byte IV, but that is no longer the case with generic-gcm-aesni. This can potentially result in accessing a page that is not mapped and thus causing the machine to crash. This patch fixes that by reading the last <16 byte block of the AAD byte-by-byte and optionally via an 8-byte load if the block was at least 8 bytes. Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen") Cc: <stable@vger.kernel.org> Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-12-28crypto: aesni - Fix out-of-bounds access of the data buffer in generic-gcm-aesniJunaid Shahid
The aesni_gcm_enc/dec functions can access memory before the start of the data buffer if the length of the data buffer is less than 16 bytes. This is because they perform the read via a single 16-byte load. This can potentially result in accessing a page that is not mapped and thus causing the machine to crash. This patch fixes that by reading the partial block byte-by-byte and optionally an via 8-byte load if the block was at least 8 bytes. Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen") Cc: <stable@vger.kernel.org> Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-05-18crypto: aesni - make non-AVX AES-GCM work with all valid auth_tag_lenSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-05-18crypto: aesni - make non-AVX AES-GCM work with any aadlenSabrina Dubroca
This is the first step to make the aesni AES-GCM implementation generic. The current code was written for rfc4106, so it handles only some specific sizes of associated data. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: x86 - make constants readonly, allow linker to merge themDenys Vlasenko
A lot of asm-optimized routines in arch/x86/crypto/ keep its constants in .data. This is wrong, they should be on .rodata. Mnay of these constants are the same in different modules. For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F exists in at least half a dozen places. There is a way to let linker merge them and use just one copy. The rules are as follows: mergeable objects of different sizes should not share sections. You can't put them all in one .rodata section, they will lose "mergeability". GCC puts its mergeable constants in ".rodata.cstSIZE" sections, or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used. This patch does the same: .section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16 It is important that all data in such section consists of 16-byte elements, not larger ones, and there are no implicit use of one element from another. When this is not the case, use non-mergeable section: .section .rodata[.VAR_NAME], "a", @progbits This reduces .data by ~15 kbytes: text data bss dec hex filename 11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o 11112095 2690672 2630712 16433479 fac147 vmlinux.o Merged objects are visible in System.map: ffffffff81a28810 r POLY ffffffff81a28810 r POLY ffffffff81a28820 r TWOONE ffffffff81a28820 r TWOONE ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of ffffffff81a28830 r SHUF_MASK <------------- the name difference ffffffff81a28830 r SHUF_MASK ffffffff81a28830 r SHUF_MASK .. ffffffff81a28d00 r K512 <- merged three identical 640-byte tables ffffffff81a28d00 r K512 ffffffff81a28d00 r K512 Use of object names in section name suffixes is not strictly necessary, but might help if someday link stage will use garbage collection to eliminate unused sections (ld --gc-sections). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-24x86/asm/crypto: Create stack frames in crypto functionsJosh Poimboeuf
The crypto code has several callable non-leaf functions which don't honor CONFIG_FRAME_POINTER, which can result in bad stack traces. Create stack frames for them when CONFIG_FRAME_POINTER is enabled. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: David S. Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/6c20192bcf1102ae18ae5a242cabf30ce9b29895.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24x86/asm/crypto: Move .Lbswap_mask data to .rodata sectionJosh Poimboeuf
stacktool reports the following warning: stacktool: arch/x86/crypto/aesni-intel_asm.o: _aesni_inc_init(): can't find starting instruction stacktool gets confused when it tries to disassemble the following data in the .text section: .Lbswap_mask: .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 Move it to .rodata which is a more appropriate section for read-only data. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: David S. Miller <davem@davemloft.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/b6a2f3f8bda705143e127c025edb2b53c86e6eb4.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-14crypto: aesni - Add support for 192 & 256 bit keys to AESNI RFC4106Timothy McCaffrey
These patches fix the RFC4106 implementation in the aesni-intel module so it supports 192 & 256 bit keys. Since the AVX support that was added to this module also only supports 128 bit keys, and this patch only affects the SSE implementation, changes were also made to use the SSE version if key sizes other than 128 are specified. RFC4106 specifies that 192 & 256 bit keys must be supported (section 8.4). Also, this should fix Strongswan issue 341 where the aesni module needs to be unloaded if 256 bit keys are used: http://wiki.strongswan.org/issues/341 This patch has been tested with Sandy Bridge and Haswell processors. With 128 bit keys and input buffers > 512 bytes a slight performance degradation was noticed (~1%). For input buffers of less than 512 bytes there was no performance impact. Compared to 128 bit keys, 256 bit key size performance is approx. .5 cycles per byte slower on Sandy Bridge, and .37 cycles per byte slower on Haswell (vs. SSE code). This patch has also been tested with StrongSwan IPSec connections where it worked correctly. I created this diff from a git clone of crypto-2.6.git. Any questions, please feel free to contact me. Signed-off-by: Timothy McCaffrey <timothy.mccaffrey@unisys.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-13crypto: aesni_intel - fix accessing of unaligned memoryJussi Kivilinna
The new XTS code for aesni_intel uses input buffers directly as memory operands for pxor instructions, which causes crash if those buffers are not aligned to 16 bytes. Patch changes XTS code to handle unaligned memory correctly, by loading memory with movdqu instead. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: aesni_intel - add more optimized XTS mode for x86-64Jussi Kivilinna
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B 1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B 1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-20crypto: aesni-intel - add ENDPROC statements for assembler functionsJussi Kivilinna
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-05-31crypto: aesni-intel - fix unaligned cbc decrypt for x86-32Mathias Krause
The 32 bit variant of cbc(aes) decrypt is using instructions requiring 128 bit aligned memory locations but fails to ensure this constraint in the code. Fix this by loading the data into intermediate registers with load unaligned instructions. This fixes reported general protection faults related to aesni. References: https://bugzilla.kernel.org/show_bug.cgi?id=43223 Reported-by: Daniel <garkein@mailueberfall.de> Cc: stable@kernel.org [v2.6.39+] Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-03-27crypto: aesni-intel - fixed problem with packets that are not multiple of ↵Tadeusz Struk
64bytes This patch fixes problem with packets that are not multiple of 64bytes. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-03-18x86: Fix common misspellingsLucas De Marchi
They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>