aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/crypto
AgeCommit message (Collapse)Author
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 335Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin st fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 111 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157Thomas Gleixner
Based on 3 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [graeme] [gregory] [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema] [hk] [hemahk]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1105 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for AEAD in simd - Add fuzz testing to testmgr - Add panic_on_fail module parameter to testmgr - Use per-CPU struct instead multiple variables in scompress - Change verify API for akcipher Algorithms: - Convert x86 AEAD algorithms over to simd - Forbid 2-key 3DES in FIPS mode - Add EC-RDSA (GOST 34.10) algorithm Drivers: - Set output IV with ctr-aes in crypto4xx - Set output IV in rockchip - Fix potential length overflow with hashing in sun4i-ss - Fix computation error with ctr in vmx - Add SM4 protected keys support in ccree - Remove long-broken mxc-scc driver - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits) crypto: ccree - use a proper le32 type for le32 val crypto: ccree - remove set but not used variable 'du_size' crypto: ccree - Make cc_sec_disable static crypto: ccree - fix spelling mistake "protedcted" -> "protected" crypto: caam/qi2 - generate hash keys in-place crypto: caam/qi2 - fix DMA mapping of stack memory crypto: caam/qi2 - fix zero-length buffer DMA mapping crypto: stm32/cryp - update to return iv_out crypto: stm32/cryp - remove request mutex protection crypto: stm32/cryp - add weak key check for DES crypto: atmel - remove set but not used variable 'alg_name' crypto: picoxcell - Use dev_get_drvdata() crypto: crypto4xx - get rid of redundant using_sd variable crypto: crypto4xx - use sync skcipher for fallback crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: crypto4xx - fix ctr-aes missing output IV crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o crypto: ccree - handle tee fips error during power management resume crypto: ccree - add function to handle cryptocell tee fips error ...
2019-04-25crypto: shash - remove shash_desc::flagsEric Biggers
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-08crypto: x86/poly1305 - fix overflow during partial reductionEric Biggers
The x86_64 implementation of Poly1305 produces the wrong result on some inputs because poly1305_4block_avx2() incorrectly assumes that when partially reducing the accumulator, the bits carried from limb 'd4' to limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic which processes only one block at a time. However, it's not true for the AVX2 implementation, which processes 4 blocks at a time and therefore can produce intermediate limbs about 4x larger. Fix it by making the relevant calculations use 64-bit arithmetic rather than 32-bit. Note that most of the carries already used 64-bit arithmetic, but the d4 -> h0 carry was different for some reason. To be safe I also made the same change to the corresponding SSE2 code, though that only operates on 1 or 2 blocks at a time. I don't think it's really needed for poly1305_block_sse2(), but it doesn't hurt because it's already x86_64 code. It *might* be needed for poly1305_2block_sse2(), but overflows aren't easy to reproduce there. This bug was originally detected by my patches that improve testmgr to fuzz algorithms against their generic implementation. But also add a test vector which reproduces it directly (in the AVX2 case). Fixes: b1ccc8f4b631 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64") Fixes: c70f4abef07a ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64") Cc: <stable@vger.kernel.org> # v4.3+ Cc: Martin Willi <martin@strongswan.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-08crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()Eric Biggers
The ->digest() method of crct10dif-pclmul reads the current CRC value from the shash_desc context. But this value is uninitialized, causing crypto_shash_digest() to compute the wrong result. Fix it. Probably this wasn't noticed before because lib/crc-t10dif.c only uses crypto_shash_update(), not crypto_shash_digest(). Likewise, crypto_shash_digest() is not yet tested by the crypto self-tests because those only test the ahash API which only uses shash init/update/final. Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform") Cc: <stable@vger.kernel.org> # v3.11+ Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86 - convert to use crypto_simd_usable()Eric Biggers
Replace all calls to irq_fpu_usable() in the x86 crypto code with crypto_simd_usable(), in order to allow testing the no-SIMD code paths. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/morus1280 - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementations of MORUS-1280 to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/morus640 - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementation of MORUS-640 to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/aegis256 - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementation of AEGIS-256 to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/aegis128l - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementation of AEGIS-128L to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/aegis128 - convert to use AEAD SIMD helpersEric Biggers
Convert the x86 implementation of AEGIS-128 to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/aesni - convert to use AEAD SIMD helpersEric Biggers
Convert the AES-NI implementations of "gcm(aes)" and "rfc4106(gcm(aes))" to use the AEAD SIMD helpers, rather than hand-rolling the same functionality. This simplifies the code and also fixes the bug where the user-provided aead_request is modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-22crypto: x86/aesni - convert to use skcipher SIMD bulk registrationEric Biggers
Convert the AES-NI glue code to use simd_register_skciphers_compat() to create SIMD wrappers for all the internal skcipher algorithms at once, rather than wrapping each one individually. This simplifies the code. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-28crypto: x86/poly1305 - Clear key material from stack in SSE2 variantTommi Hirvola
1-block SSE2 variant of poly1305 stores variables s1..s4 containing key material on the stack. This commit adds missing zeroing of the stack memory. Benchmarks show negligible performance hit (tested on i7-3770). Signed-off-by: Tommi Hirvola <tommi@hirvola.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-08crypto: x86/aesni-gcm - fix crash on empty plaintextEric Biggers
gcmaes_crypt_by_sg() dereferences the NULL pointer returned by scatterwalk_ffwd() when encrypting an empty plaintext and the source scatterlist ends immediately after the associated data. Fix it by only fast-forwarding to the src/dst data scatterlists if the data length is nonzero. This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run with the new AEAD test manager. Fixes: e845520707f8 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather") Cc: <stable@vger.kernel.org> # v4.17+ Cc: Dave Watson <davejwatson@fb.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-08crypto: x86/morus - fix handling chunked inputs and MAY_SLEEPEric Biggers
The x86 MORUS implementations all fail the improved AEAD tests because they produce the wrong result with some data layouts. The issue is that they assume that if the skcipher_walk API gives 'nbytes' not aligned to the walksize (a.k.a. walk.stride), then it is the end of the data. In fact, this can happen before the end. Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can incorrectly sleep in the skcipher_walk_*() functions while preemption has been disabled by kernel_fpu_begin(). Fix these bugs. Fixes: 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-08crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEPEric Biggers
The x86 AEGIS implementations all fail the improved AEAD tests because they produce the wrong result with some data layouts. The issue is that they assume that if the skcipher_walk API gives 'nbytes' not aligned to the walksize (a.k.a. walk.stride), then it is the end of the data. In fact, this can happen before the end. Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can incorrectly sleep in the skcipher_walk_*() functions while preemption has been disabled by kernel_fpu_begin(). Fix these bugs. Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations") Cc: <stable@vger.kernel.org> # v4.18+ Cc: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-02-08crypto: x86/crct10dif-pcl - cleanup and optimizationsEric Biggers
The x86, arm, and arm64 asm implementations of crct10dif are very difficult to understand partly because many of the comments, labels, and macros are named incorrectly: the lengths mentioned are usually off by a factor of two from the actual code. Many other things are unnecessarily convoluted as well, e.g. there are many more fold constants than actually needed and some aren't fully reduced. This series therefore cleans up all these implementations to be much more maintainable. I also made some small optimizations where I saw opportunities, resulting in slightly better performance. This patch cleans up the x86 version. As part of this, I removed support for len < 16 from the x86 assembly; now the glue code falls back to the generic table-based implementation in this case. Due to the overhead of kernel_fpu_begin(), this actually significantly improves performance on these lengths. (And even if kernel_fpu_begin() were free, the generic code is still faster for about len < 11.) This removal also eliminates error-prone special cases and makes the x86, arm32, and arm64 ports of the code match more closely. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-01-18crypto: x86/aesni-gcm - make 'struct aesni_gcm_tfm_s' static constEric Biggers
Add missing static keywords to fix the following sparse warnings: arch/x86/crypto/aesni-intel_glue.c:197:24: warning: symbol 'aesni_gcm_tfm_sse' was not declared. Should it be static? arch/x86/crypto/aesni-intel_glue.c:246:24: warning: symbol 'aesni_gcm_tfm_avx_gen2' was not declared. Should it be static? arch/x86/crypto/aesni-intel_glue.c:291:24: warning: symbol 'aesni_gcm_tfm_avx_gen4' was not declared. Should it be static? I also made the affected structures 'const', and adjusted the indentation in the struct definition to not be insane. Cc: Dave Watson <davejwatson@fb.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-27Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add 1472-byte test to tcrypt for IPsec - Reintroduced crypto stats interface with numerous changes - Support incremental algorithm dumps Algorithms: - Add xchacha12/20 - Add nhpoly1305 - Add adiantum - Add streebog hash - Mark cts(cbc(aes)) as FIPS allowed Drivers: - Improve performance of arm64/chacha20 - Improve performance of x86/chacha20 - Add NEON-accelerated nhpoly1305 - Add SSE2 accelerated nhpoly1305 - Add AVX2 accelerated nhpoly1305 - Add support for 192/256-bit keys in gcmaes AVX - Add SG support in gcmaes AVX - ESN for inline IPsec tx in chcr - Add support for CryptoCell 703 in ccree - Add support for CryptoCell 713 in ccree - Add SM4 support in ccree - Add SM3 support in ccree - Add support for chacha20 in caam/qi2 - Add support for chacha20 + poly1305 in caam/jr - Add support for chacha20 + poly1305 in caam/qi2 - Add AEAD cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits) crypto: skcipher - remove remnants of internal IV generators crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS crypto: salsa20-generic - don't unnecessarily use atomic walk crypto: skcipher - add might_sleep() to skcipher_walk_virt() crypto: x86/chacha - avoid sleeping under kernel_fpu_begin() crypto: cavium/nitrox - Added AEAD cipher support crypto: mxc-scc - fix build warnings on ARM64 crypto: api - document missing stats member crypto: user - remove unused dump functions crypto: chelsio - Fix wrong error counter increments crypto: chelsio - Reset counters on cxgb4 Detach crypto: chelsio - Handle PCI shutdown event crypto: chelsio - cleanup:send addr as value in function argument crypto: chelsio - Use same value for both channel in single WR crypto: chelsio - Swap location of AAD and IV sent in WR crypto: chelsio - remove set but not used variable 'kctx_len' crypto: ux500 - Use proper enum in hash_set_dma_transfer crypto: ux500 - Use proper enum in cryp_set_dma_transfer crypto: aesni - Add scatter/gather avx stubs, and use them in C crypto: aesni - Introduce partial block macro ..
2018-12-23crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()Eric Biggers
Passing atomic=true to skcipher_walk_virt() only makes the later skcipher_walk_done() calls use atomic memory allocations, not skcipher_walk_virt() itself. Thus, we have to move it outside of the preemption-disabled region (kernel_fpu_begin()/kernel_fpu_end()). (skcipher_walk_virt() only allocates memory for certain layouts of the input scatterlist, hence why I didn't notice this earlier...) Reported-by: syzbot+9bf843c33f782d73ae7d@syzkaller.appspotmail.com Fixes: 4af78261870a ("crypto: x86/chacha20 - add XChaCha20 support") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Add scatter/gather avx stubs, and use them in CDave Watson
Add the appropriate scatter/gather stubs to the avx asm. In the C code, we can now always use crypt_by_sg, since both sse and asm code now support scatter/gather. Introduce a new struct, aesni_gcm_tfm, that is initialized on startup to point to either the SSE, AVX, or AVX2 versions of the four necessary encryption/decryption routines. GENX_OPTSIZE is still checked at the start of crypt_by_sg. The total size of the data is checked, since the additional overhead is in the init function, calculating additional HashKeys. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Introduce partial block macroDave Watson
Before this diff, multiple calls to GCM_ENC_DEC will succeed, but only if all calls are a multiple of 16 bytes. Handle partial blocks at the start of GCM_ENC_DEC, and update aadhash as appropriate. The data offset %r11 is also updated after the partial block. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Introduce READ_PARTIAL_BLOCK macroDave Watson
Introduce READ_PARTIAL_BLOCK macro, and use it in the two existing partial block cases: AAD and the end of ENC_DEC. In particular, the ENC_DEC case should be faster, since we read by 8/4 bytes if possible. This macro will also be used to read partial blocks between enc_update and dec_update calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Move ghash_mul to GCM_COMPLETEDave Watson
Prepare to handle partial blocks between scatter/gather calls. For the last partial block, we only want to calculate the aadhash in GCM_COMPLETE, and a new partial block macro will handle both aadhash update and encrypting partial blocks between calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Fill in new context data structuresDave Watson
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values. pblocklen, aadhash, and pblockenckey are also updated at the end of each scatter/gather operation, to be carried over to the next operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Merge avx precompute functionsDave Watson
The precompute functions differ only by the sub-macros they call, merge them to a single macro. Later diffs add more code to fill in the gcm_context_data structure, this allows changes in a single place. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Split AAD hash calculation to separate macroDave Watson
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Add GCM_COMPLETE macroDave Watson
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - support 256 byte keys in avx asmDave Watson
Add support for 192/256-bit keys using the avx gcm/aes routines. The sse routines were previously updated in e31ac32d3b (Add support for 192 & 256 bit keys to AESNI RFC4106). Instead of adding an additional loop in the hotpath as in e31ac32d3b, this diff instead generates separate versions of the code using macros, and the entry routines choose which version once. This results in a 5% performance improvement vs. adding a loop to the hot path. This is the same strategy chosen by the intel isa-l_crypto library. The key size checks are removed from the c code where appropriate. Note that this diff depends on using gcm_context_data - 256 bit keys require 16 HashKeys + 15 expanded keys, which is larger than struct crypto_aes_ctx, so they are stored in struct gcm_context_data. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Macro-ify func save/restoreDave Watson
Macro-ify function save and restore. These will be used in new functions added for scatter/gather update operations. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Introduce gcm_context_dataDave Watson
Add the gcm_context_data structure to the avx asm routines. This will be necessary to support both 256 bit keys and scatter/gather. The pre-computed HashKeys are now stored in the gcm_context_data struct, which is expanded to hold the greater number of hashkeys necessary for avx. Loads and stores to the new struct are always done unlaligned to avoid compiler issues, see e5b954e8 "Use unaligned loads from gcm_context_data" Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-23crypto: aesni - Merge GCM_ENC_DECDave Watson
The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they call separate sub-macros. Pass the macros as arguments, and merge them. This facilitates additional refactoring, by requiring changes in only one place. The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs, since it will be used by both AVX and AVX2. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: x86/chacha - yield the FPU occasionallyEric Biggers
To improve responsiveness, yield the FPU (temporarily re-enabling preemption) every 4 KiB encrypted/decrypted, rather than keeping preemption disabled during the entire encryption/decryption operation. Alternatively we could do this for every skcipher_walk step, but steps may be small in some cases, and yielding the FPU is expensive on x86. Suggested-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: x86/chacha - add XChaCha12 supportEric Biggers
Now that the x86_64 SIMD implementations of ChaCha20 and XChaCha20 have been refactored to support varying the number of rounds, add support for XChaCha12. This is identical to XChaCha20 except for the number of rounds, which is 12 instead of 20. This can be used by Adiantum. Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: x86/chacha20 - refactor to allow varying number of roundsEric Biggers
In preparation for adding XChaCha12 support, rename/refactor the x86_64 SIMD implementations of ChaCha20 to support different numbers of rounds. Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: x86/chacha20 - add XChaCha20 supportEric Biggers
Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD implementations of ChaCha20. This can be used by Adiantum. An SSSE3 implementation of single-block HChaCha20 is also added so that XChaCha20 can use it rather than the generic implementation. This required refactoring the ChaCha permutation into its own function. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305Eric Biggers
Add a 64-bit AVX2 implementation of NHPoly1305, an ε-almost-∆-universal hash function used in the Adiantum encryption mode. For now, only the NH portion is actually AVX2-accelerated; the Poly1305 part is less performance-critical so is just implemented in C. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-13crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305Eric Biggers
Add a 64-bit SSE2 implementation of NHPoly1305, an ε-almost-∆-universal hash function used in the Adiantum encryption mode. For now, only the NH portion is actually SSE2-accelerated; the Poly1305 part is less performance-critical so is just implemented in C. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-12-03x86: Fix various typos in commentsIngo Molnar
Go over arch/x86/ and fix common typos in comments, and a typo in an actual function argument name. No change in functionality intended. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-29crypto: x86/chacha20 - Add a 4-block AVX-512VL variantMartin Willi
This version uses the same principle as the AVX2 version by scheduling the operations for two block pairs in parallel. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of the raw block function of ~20%. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-29crypto: x86/chacha20 - Add a 2-block AVX-512VL variantMartin Willi
This version uses the same principle as the AVX2 version. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of ~20%. Unlike the AVX2 version, it is faster than the single block SSSE3 version to process a single block. Hence we engage that function for (partial) single block lengths as well. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-29crypto: x86/chacha20 - Add a 8-block AVX-512VL variantMartin Willi
This variant is similar to the AVX2 version, but benefits from the AVX-512 rotate instructions and the additional registers, so it can operate without any data on the stack. It uses ymm registers only to avoid the massive core throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed improvement compared to the AVX2 variant for random encryption lengths. The AVX2 version uses "rep movsb" for partial block XORing via the stack. With AVX-512, the new "vmovdqu8" can do this much more efficiently. The associated "kmov" instructions to work with dynamic masks is not part of the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given that the major AVX-512VL architectures provide AVX-512BW and this extension does not affect core clocking, this seems to be no problem at least for now. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20crypto: poly1305 - use structures for key and accumulatorEric Biggers
In preparation for exposing a low-level Poly1305 API which implements the ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305 MAC and supports block-aligned inputs only, create structures poly1305_key and poly1305_state which hold the limbs of the Poly1305 "r" key and accumulator, respectively. These structures could actually have the same type (e.g. poly1305_val), but different types are preferable, to prevent misuse. Acked-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20crypto: chacha20-generic - refactor to allow varying number of roundsEric Biggers
In preparation for adding XChaCha12 support, rename/refactor chacha20-generic to support different numbers of rounds. The justification for needing XChaCha12 support is explained in more detail in the patch "crypto: chacha - add XChaCha12 support". The only difference between ChaCha{8,12,20} are the number of rounds itself; all other parts of the algorithm are the same. Therefore, remove the "20" from all definitions, structures, functions, files, etc. that will be shared by all ChaCha versions. Also make ->setkey() store the round count in the chacha_ctx (previously chacha20_ctx). The generic code then passes the round count through to chacha_block(). There will be a ->setkey() function for each explicitly allowed round count; the encrypt/decrypt functions will be the same. I decided not to do it the opposite way (same ->setkey() function for all round counts, with different encrypt/decrypt functions) because that would have required more boilerplate code in architecture-specific implementations of ChaCha and XChaCha. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>