aboutsummaryrefslogtreecommitdiffstats
path: root/net/tls
AgeCommit message (Collapse)Author
2024-03-01tls: stop recv() if initial process_rx_list gave us non-DATASabrina Dubroca
[ Upstream commit fdfbaec5923d9359698cbb286bc0deadbb717504 ] If we have a non-DATA record on the rx_list and another record of the same type still on the queue, we will end up merging them: - process_rx_list copies the non-DATA record - we start the loop and process the first available record since it's of the same type - we break out of the loop since the record was not DATA Just check the record type and jump to the end in case process_rx_list did some work. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.1708007371.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01tls: rx: drop pointless else after gotoJakub Kicinski
[ Upstream commit d5123edd10cf9d324fcb88e276bdc7375f3c5321 ] Pointless else branch after goto makes the code harder to refactor down the line. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01tls: rx: jump to a more appropriate labelJakub Kicinski
[ Upstream commit bfc06e1aaa130b86a81ce3c41ec71a2f5e191690 ] 'recv_end:' checks num_async and decrypted, and is then followed by the 'end' label. Since we know that decrypted and num_async are 0 at the start we can jump to 'end'. Move the init of decrypted and num_async to let the compiler catch if I'm wrong. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01mptcp: fix lockless access in subflow ULP diagPaolo Abeni
commit b8adb69a7d29c2d33eb327bca66476fb6066516b upstream. Since the introduction of the subflow ULP diag interface, the dump callback accessed all the subflow data with lockless. We need either to annotate all the read and write operation accordingly, or acquire the subflow socket lock. Let's do latter, even if slower, to avoid a diffstat havoc. Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-15net: tls, update curr on splice as wellJohn Fastabend
commit c5a595000e2677e865a39f249c056bc05d6e55fd upstream. The curr pointer must also be updated on the splice similar to how we do this for other copy types. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()Liu Jian
[ Upstream commit cfaa80c91f6f99b9342b6557f0f0e1143e434066 ] I got the below warning when do fuzzing test: BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470 Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9 CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE Hardware name: linux,dummy-virt (DT) Workqueue: pencrypt_parallel padata_parallel_worker Call trace: dump_backtrace+0x0/0x420 show_stack+0x34/0x44 dump_stack+0x1d0/0x248 __kasan_report+0x138/0x140 kasan_report+0x44/0x6c __asan_load4+0x94/0xd0 scatterwalk_copychunks+0x320/0x470 skcipher_next_slow+0x14c/0x290 skcipher_walk_next+0x2fc/0x480 skcipher_walk_first+0x9c/0x110 skcipher_walk_aead_common+0x380/0x440 skcipher_walk_aead_encrypt+0x54/0x70 ccm_encrypt+0x13c/0x4d0 crypto_aead_encrypt+0x7c/0xfc pcrypt_aead_enc+0x28/0x84 padata_parallel_worker+0xd0/0x2dc process_one_work+0x49c/0xbdc worker_thread+0x124/0x880 kthread+0x210/0x260 ret_from_fork+0x10/0x18 This is because the value of rec_seq of tls_crypto_info configured by the user program is too large, for example, 0xffffffffffffff. In addition, TLS is asynchronously accelerated. When tls_do_encryption() returns -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow, skmsg is released before the asynchronous encryption process ends. As a result, the UAF problem occurs during the asynchronous processing of the encryption module. If the operation is asynchronous and the encryption module returns EINPROGRESS, do not free the record information. Fixes: 635d93981786 ("net/tls: free record only on encryption error") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30net: deal with most data-races in sk_wait_event()Eric Dumazet
[ Upstream commit d0ac89f6f9879fae316c155de77b5173b3e2c9c9 ] __condition is evaluated twice in sk_wait_event() macro. First invocation is lockless, and reads can race with writes, as spotted by syzbot. BUG: KCSAN: data-race in sk_stream_wait_connect / tcp_disconnect write to 0xffff88812d83d6a0 of 4 bytes by task 9065 on cpu 1: tcp_disconnect+0x2cd/0xdb0 inet_shutdown+0x19e/0x1f0 net/ipv4/af_inet.c:911 __sys_shutdown_sock net/socket.c:2343 [inline] __sys_shutdown net/socket.c:2355 [inline] __do_sys_shutdown net/socket.c:2363 [inline] __se_sys_shutdown+0xf8/0x140 net/socket.c:2361 __x64_sys_shutdown+0x31/0x40 net/socket.c:2361 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88812d83d6a0 of 4 bytes by task 9040 on cpu 0: sk_stream_wait_connect+0x1de/0x3a0 net/core/stream.c:75 tcp_sendmsg_locked+0x2e4/0x2120 net/ipv4/tcp.c:1266 tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1484 inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:651 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] __sys_sendto+0x246/0x300 net/socket.c:2142 __do_sys_sendto net/socket.c:2154 [inline] __se_sys_sendto net/socket.c:2150 [inline] __x64_sys_sendto+0x78/0x90 net/socket.c:2150 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000 -> 0x00000068 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05net: tls: fix possible race condition between do_tls_getsockopt_conf() and ↵Hangyu Hua
do_tls_setsockopt_conf() commit 49c47cc21b5b7a3d8deb18fc57b0aa2ab1286962 upstream. ctx->crypto_send.info is not protected by lock_sock in do_tls_getsockopt_conf(). A race condition between do_tls_getsockopt_conf() and error paths of do_tls_setsockopt_conf() may lead to a use-after-free or null-deref. More discussion: https://lore.kernel.org/all/Y/ht6gQL+u6fj3dG@hog/ Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20230228023344.9623-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Meena Shanmugam <meenashanmugam@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11net: tls: avoid hanging tasks on the tx_lockJakub Kicinski
commit f3221361dc85d4de22586ce8441ec2c67b454f5d upstream. syzbot sent a hung task report and Eric explains that adversarial receiver may keep RWIN at 0 for a long time, so we are not guaranteed to make forward progress. Thread which took tx_lock and went to sleep may not release tx_lock for hours. Use interruptible sleep where possible and reschedule the work if it can't take the lock. Testing: existing selftest passes Reported-by: syzbot+9c0268252b8ef967c62e@syzkaller.appspotmail.com Fixes: 79ffe6087e91 ("net/tls: add a TX lock") Link: https://lore.kernel.org/all/000000000000e412e905f5b46201@google.com/ Cc: stable@vger.kernel.org # wait 4 weeks Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230301002857.2101894-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03net/tls: Remove the context from the list in tls_device_downMaxim Mikityanskiy
commit f6336724a4d4220c89a4ec38bca84b03b178b1a3 upstream. tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-29net/tls: Fix race in TLS device down flowTariq Toukan
[ Upstream commit f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b ] Socket destruction flow and tls_device_down function sync against each other using tls_device_lock and the context refcount, to guarantee the device resources are freed via tls_dev_del() by the end of tls_device_down. In the following unfortunate flow, this won't happen: - refcount is decreased to zero in tls_device_sk_destruct. - tls_device_down starts, skips the context as refcount is zero, going all the way until it flushes the gc work, and returns without freeing the device resources. - only then, tls_device_queue_ctx_destruction is called, queues the gc work and frees the context's device resources. Solve it by decreasing the refcount in the socket's destruction flow under the tls_device_lock, for perfect synchronization. This does not slow down the common likely destructor flow, in which both the refcount is decreased and the spinlock is acquired, anyway. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21net/tls: Check for errors in tls_device_initTariq Toukan
[ Upstream commit 3d8c51b25a235e283e37750943bbf356ef187230 ] Add missing error checks in tls_device_init. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Reported-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-29Revert "net/tls: fix tls_sk_proto_close executed repeatedly"Jakub Kicinski
[ Upstream commit 1b205d948fbb06a7613d87dcea0ff5fd8a08ed91 ] This reverts commit 69135c572d1f84261a6de2a1268513a7e71753e2. This commit was just papering over the issue, ULP should not get ->update() called with its own sk_prot. Each ULP would need to add this check. Fixes: 69135c572d1f ("net/tls: fix tls_sk_proto_close executed repeatedly") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-29net/tls: fix tls_sk_proto_close executed repeatedlyZiyang Xuan
[ Upstream commit 69135c572d1f84261a6de2a1268513a7e71753e2 ] After setting the sock ktls, update ctx->sk_proto to sock->sk_prot by tls_update(), so now ctx->sk_proto->close is tls_sk_proto_close(). When close the sock, tls_sk_proto_close() is called for sock->sk_prot->close is tls_sk_proto_close(). But ctx->sk_proto->close() will be executed later in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly occurred. That will trigger the following bug. ================================================================= KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306 Call Trace: <TASK> tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x18/0x20 net/socket.c:1365 Updating a proto which is same with sock->sk_prot is incorrect. Add proto and sock->sk_prot equality check at the head of tls_update() to fix it. Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free") Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18tls: Fix context leak on tls_device_downMaxim Mikityanskiy
[ Upstream commit 3740651bf7e200109dd42d5b2fb22226b26f960a ] The commit cited below claims to fix a use-after-free condition after tls_device_down. Apparently, the description wasn't fully accurate. The context stayed alive, but ctx->netdev became NULL, and the offload was torn down without a proper fallback, so a bug was present, but a different kind of bug. Due to misunderstanding of the issue, the original patch dropped the refcount_dec_and_test line for the context to avoid the alleged premature deallocation. That line has to be restored, because it matches the refcount_inc_not_zero from the same function, otherwise the contexts that survived tls_device_down are leaked. This patch fixes the described issue by restoring refcount_dec_and_test. After this change, there is no leak anymore, and the fallback to software kTLS still works. Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tls: Skip tls_append_frag on zero copy sizeMaxim Mikityanskiy
[ Upstream commit a0df71948e9548de819a6f1da68f5f1742258a52 ] Calling tls_append_frag when max_open_record_len == record->len might add an empty fragment to the TLS record if the call happens to be on the page boundary. Normally tls_append_frag coalesces the zero-sized fragment to the previous one, but not if it's on page boundary. If a resync happens then, the mlx5 driver posts dump WQEs in tx_post_resync_dump, and the empty fragment may become a data segment with byte_count == 0, which will confuse the NIC and lead to a CQE error. This commit fixes the described issue by skipping tls_append_frag on zero size to avoid adding empty fragments. The fix is not in the driver, because an empty fragment is hardly the desired behavior. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13net/tls: fix slab-out-of-bounds bug in decrypt_internalZiyang Xuan
[ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ] The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in tls_set_sw_offload(). The return value of crypto_aead_ivsize() for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes memory space will trigger slab-out-of-bounds bug as following: ================================================================== BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls] Read of size 16 at addr ffff888114e84e60 by task tls/10911 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db ? decrypt_internal+0x385/0xc40 [tls] kasan_report+0xab/0x120 ? decrypt_internal+0x385/0xc40 [tls] kasan_check_range+0xf9/0x1e0 memcpy+0x20/0x60 decrypt_internal+0x385/0xc40 [tls] ? tls_get_rec+0x2e0/0x2e0 [tls] ? process_rx_list+0x1a5/0x420 [tls] ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls] decrypt_skb_update+0x9d/0x400 [tls] tls_sw_recvmsg+0x3c8/0xb50 [tls] Allocated by task 10911: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 tls_set_sw_offload+0x2eb/0xa20 [tls] tls_setsockopt+0x68c/0x700 [tls] __sys_setsockopt+0xfe/0x1b0 Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size when memcpy() iv value in TLS_1_3_VERSION scenario. Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08net/tls: Fix authentication failure in CCM modeTianjia Zhang
commit 5961060692f8b17cd2080620a3d27b95d2ae05ca upstream. When the TLS cipher suite uses CCM mode, including AES CCM and SM4 CCM, the first byte of the B0 block is flags, and the real IV starts from the second byte. The XOR operation of the IV and rec_seq should be skip this byte, that is, add the iv_offset. Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Cc: Vakul Garg <vakul.garg@nxp.com> Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01tls: fix replacing proto_opsJakub Kicinski
[ Upstream commit f3911f73f51d1534f4db70b516cc1fcb6be05bae ] We replace proto_ops whenever TLS is configured for RX. But our replacement also overrides sendpage_locked, which will crash unless TX is also configured. Similarly we plug both of those in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely different implementation for TX. Last but not least we always plug in something based on inet_stream_ops even though a few of the callbacks differ for IPv6 (getname, release, bind). Use a callback building method similar to what we do for struct proto. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Fixes: d4ffb02dee2f ("net/tls: enable sk_msg redirect to tls socket egress") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01tls: splice_read: fix record type checkJakub Kicinski
[ Upstream commit 520493f66f6822551aef2879cd40207074fe6980 ] We don't support splicing control records. TLS 1.3 changes moved the record type check into the decrypt if(). The skb may already be decrypted and still be an alert. Note that decrypt_skb_update() is idempotent and updates ctx->decrypted so the if() is pointless. Reorder the check for decryption errors with the content type check while touching them. This part is not really a bug, because if decryption failed in TLS 1.3 content type will be DATA, and for TLS 1.2 it will be correct. Nevertheless its strange to touch output before checking if the function has failed. Fixes: fedf201e1296 ("net: tls: Refactor control message handling on recv") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-02net/tls: Fix flipped sign in async_wait.err assignmentDaniel Jordan
commit 1d9d6fd21ad4a28b16ed9ee5432ae738b9dc58aa upstream. sk->sk_err contains a positive number, yet async_wait.err wants the opposite. Fix the missed sign flip, which Jakub caught by inspection. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-02net/tls: Fix flipped sign in tls_err_abort() callsDaniel Jordan
commit da353fac65fede6b8b4cfe207f0d9408e3121105 upstream. sk->sk_err appears to expect a positive value, a convention that ktls doesn't always follow and that leads to memory corruption in other code. For instance, [kworker] tls_encrypt_done(..., err=<negative error from crypto request>) tls_err_abort(.., err) sk->sk_err = err; [task] splice_from_pipe_feed ... tls_sw_do_sendpage if (sk->sk_err) { ret = -sk->sk_err; // ret is positive splice_from_pipe_feed (continued) ret = actor(...) // ret is still positive and interpreted as bytes // written, resulting in underflow of buf->len and // sd->len, leading to huge buf->offset and bogus // addresses computed in later calls to actor() Fix all tls_err_abort() callers to pass a negative error code consistently and centralize the error-prone sign flip there, throwing in a warning to catch future misuse and uninlining the function so it really does only warn once. Cc: stable@vger.kernel.org Fixes: c46234ebb4d1e ("tls: RX path for ktls") Reported-by: syzbot+b187b77c8474f9648fae@syzkaller.appspotmail.com Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14tls: prevent oversized sendfile() hangs by ignoring MSG_MOREJakub Kicinski
[ Upstream commit d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a ] We got multiple reports that multi_chunk_sendfile test case from tls selftest fails. This was sort of expected, as the original fix was never applied (see it in the first Link:). The test in question uses sendfile() with count larger than the size of the underlying file. This will make splice set MSG_MORE on all sendpage calls, meaning TLS will never close and flush the last partial record. Eric seem to have addressed a similar problem in commit 35f9c09fe9c7 ("tcp: tcp_sendpages() should call tcp_push() once") by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE MSG_SENDPAGE_NOTLAST is not set on the last call of a "pipefull" of data (PIPE_DEF_BUFFERS == 16, so every 16 pages or whenever we run out of data). Having a break every 16 pages should be fine, TLS can pack exactly 4 pages into a record, so for aligned reads there should be no difference, unaligned may see one extra record per sendpage(). Sticking to TCP semantics seems preferable to modifying splice, but we can revisit it if real life scenarios show a regression. Reported-by: Vadim Fedorenko <vfedorenko@novek.ru> Reported-by: Seth Forshee <seth.forshee@canonical.com> Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10net/tls: Fix use-after-free after the TLS device goes down and upMaxim Mikityanskiy
[ Upstream commit c55dcdd435aa6c6ad6ccac0a4c636d010ee367a4 ] When a netdev with active TLS offload goes down, tls_device_down is called to stop the offload and tear down the TLS context. However, the socket stays alive, and it still points to the TLS context, which is now deallocated. If a netdev goes up, while the connection is still active, and the data flow resumes after a number of TCP retransmissions, it will lead to a use-after-free of the TLS context. This commit addresses this bug by keeping the context alive until its normal destruction, and implements the necessary fallbacks, so that the connection can resume in software (non-offloaded) kTLS mode. On the TX side tls_sw_fallback is used to encrypt all packets. The RX side already has all the necessary fallbacks, because receiving non-decrypted packets is supported. The thing needed on the RX side is to block resync requests, which are normally produced after receiving non-decrypted packets. The necessary synchronization is implemented for a graceful teardown: first the fallbacks are deployed, then the driver resources are released (it used to be possible to have a tls_dev_resync after tls_dev_del). A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback mode. It's used to skip the RX resync logic completely, as it becomes useless, and some objects may be released (for example, resync_async, which is allocated and freed by the driver). Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10net/tls: Replace TLS_RX_SYNC_RUNNING with RCUMaxim Mikityanskiy
[ Upstream commit 05fc8b6cbd4f979a6f25759c4a17dd5f657f7ecd ] RCU synchronization is guaranteed to finish in finite time, unlike a busy loop that polls a flag. This patch is a preparation for the bugfix in the next patch, where the same synchronize_net() call will also be used to sync with the TX datapath. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAITJim Ma
[ Upstream commit 974271e5ed45cfe4daddbeb16224a2156918530e ] In tls_sw_splice_read, checkout MSG_* is inappropriate, should use SPLICE_*, update tls_wait_data to accept nonblock arguments instead of flags for recvmsg and splice. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Jim Ma <majinjing3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23net: fix proc_fs init handling in af_packet and tlsYonatan Linik
[ Upstream commit a268e0f2455c32653140775662b40c2b1f1b2efa ] proc_fs was used, in af_packet, without a surrounding #ifdef, although there is no hard dependency on proc_fs. That caused the initialization of the af_packet module to fail when CONFIG_PROC_FS=n. Specifically, proc_create_net() was used in af_packet.c, and when it fails, packet_net_init() returns -ENOMEM. It will always fail when the kernel is compiled without proc_fs, because, proc_create_net() for example always returns NULL. The calling order that starts in af_packet.c is as follows: packet_init() register_pernet_subsys() register_pernet_operations() __register_pernet_operations() ops_init() ops->init() (packet_net_ops.init=packet_net_init()) proc_create_net() It worked in the past because register_pernet_subsys()'s return value wasn't checked before this Commit 36096f2f4fa0 ("packet: Fix error path in packet_init."). It always returned an error, but was not checked before, so everything was working even when CONFIG_PROC_FS=n. The fix here is simply to add the necessary #ifdef. This also fixes a similar error in tls_proc.c, that was found by Jakub Kicinski. Fixes: d26b698dd3cd ("net/tls: add skeleton of MIB statistics") Fixes: 36096f2f4fa0 ("packet: Fix error path in packet_init") Signed-off-by: Yonatan Linik <yonatanlinik@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-25net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20net/tls: missing received data after fast remote closeVadim Fedorenko
In case when tcp socket received FIN after some data and the parser haven't started before reading data caller will receive an empty buffer. This behavior differs from plain TCP socket and leads to special treating in user-space. The flow that triggers the race is simple. Server sends small amount of data right after the connection is configured to use TLS and closes the connection. In this case receiver sees TLS Handshake data, configures TLS socket right after Change Cipher Spec record. While the configuration is in process, TCP socket receives small Application Data record, Encrypted Alert record and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with SK_DONE bit set. The received data is not parsed upon arrival and is never sent to user-space. Patch unpauses parser directly if we have unparsed data in tcp receive queue. Fixes: fcf4793e278e ("tls: check RCV_SHUTDOWN in tls_wait_data") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17net/tls: Fix wrong record sn in async mode of device resyncTariq Toukan
In async_resync mode, we log the TCP seq of records until the async request is completed. Later, in case one of the logged seqs matches the resync request, we return it, together with its record serial number. Before this fix, we mistakenly returned the serial number of the current record instead. Fixes: ed9b7646b06a ("net/tls: Add asynchronous resync") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16net/tls: fix corrupted data in recvmsgVadim Fedorenko
If tcp socket has more data than Encrypted Handshake Message then tls_sw_recvmsg will try to decrypt next record instead of returning full control message to userspace as mentioned in comment. The next message - usually Application Data - gets corrupted because it uses zero copy for decryption that's why the data is not stored in skb for next iteration. Revert check to not decrypt next record if current is not Application Data. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Minor conflicts in net/mptcp/protocol.h and tools/testing/selftests/net/Makefile. In both cases code was added on both sides in the same place so just keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-13net/tls: use semicolons rather than commas to separate statementsJulia Lawall
Replace commas with semicolons. Commas introduce unnecessary variability in the code structure and are hard to see. What is done is essentially described by the following Coccinelle semantic patch (http://coccinelle.lip6.fr/): // <smpl> @@ expression e1,e2; @@ e1 -, +; e2 ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/1602412498-32025-6-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09net/tls: sendfile fails with ktls offloadRohit Maheshwari
At first when sendpage gets called, if there is more data, 'more' in tls_push_data() gets set which later sets pending_open_record_frags, but when there is no more data in file left, and last time tls_push_data() gets called, pending_open_record_frags doesn't get reset. And later when 2 bytes of encrypted alert comes as sendmsg, it first checks for pending_open_record_frags, and since this is set, it creates a record with 0 data bytes to encrypt, meaning record length is prepend_size + tag_size only, which causes problem. We should set/reset pending_open_record_frags based on more bit. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net/tls: race causes kernel panicRohit Maheshwari
BUG: kernel NULL pointer dereference, address: 00000000000000b8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S 5.9.0-rc3+ #1 Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018 Workqueue: events tx_work_handler [tls] RIP: 0010:tx_work_handler+0x1b/0x70 [tls] Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b 6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83 e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00 RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122 RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200 RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665 R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700 R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38 FS: 0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: process_one_work+0x1a7/0x370 worker_thread+0x30/0x370 ? process_one_work+0x370/0x370 kthread+0x114/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 tls_sw_release_resources_tx() waits for encrypt_pending, which can have race, so we need similar changes as in commit 0cada33241d9de205522e3858b18e506ca5cce2c here as well. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net/tls: Implement getsockopt SOL_TLS TLS_RXYutaro Hayakawa
Implement the getsockopt SOL_TLS TLS_RX which is currently missing. The primary usecase is to use it in conjunction with TCP_REPAIR to checkpoint/restore the TLS record layer state. TLS connection state usually exists on the user space library. So basically we can easily extract it from there, but when the TLS connections are delegated to the kTLS, it is not the case. We need to have a way to extract the TLS state from the kernel for both of TX and RX side. The new TLS_RX getsockopt copies the crypto_info to user in the same way as TLS_TX does. We have described use cases in our research work in Netdev 0x14 Transport Workshop [1]. Also, there is an TLS implementation called tlse [2] which supports TLS connection migration. They have support of kTLS and their code shows that they are expecting the future support of this option. [1] https://speakerdeck.com/yutarohayakawa/prism-proxies-without-the-pain [2] https://github.com/eduardsui/tlse Signed-off-by: Yutaro Hayakawa <yhayakawa3720@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11net/tls: Fix kmap usageIra Weiny
When MSG_OOB is specified to tls_device_sendpage() the mapped page is never unmapped. Hold off mapping the page until after the flags are checked and the page is actually needed. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07net/tls: allow MSG_CMSG_COMPAT in sendmsgRouven Czerwinski
Trying to use ktls on a system with 32-bit userspace and 64-bit kernel results in a EOPNOTSUPP message during sendmsg: setsockopt(3, SOL_TLS, TLS_TX, …, 40) = 0 sendmsg(3, …, msg_flags=0}, 0) = -1 EOPNOTSUPP (Operation not supported) The tls_sw implementation does strict flag checking and does not allow the MSG_CMSG_COMPAT flag, which is set if the message comes in through the compat syscall. This patch adds MSG_CMSG_COMPAT to the flag check to allow the usage of the TLS SW implementation on systems using the compat syscall path. Note that the same check is present in the sendmsg path for the TLS device implementation, however the flag hasn't been added there for lack of testing hardware. Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-07-28net: remove sockptr_advanceChristoph Hellwig
sockptr_advance never properly worked. Replace it with _offset variants of copy_from_sockptr and copy_to_sockptr. Fixes: ba423fdaa589 ("net: add a new sockptr_t type") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: pass a sockptr_t into ->setsockoptChristoph Hellwig
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-27net/tls: Add asynchronous resyncBoris Pismenny
This patch adds support for asynchronous resynchronization in tls_device. Async resync follows two distinct stages: 1. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. 2. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-27Revert "net/tls: Add force_resync for driver resync"Boris Pismenny
This reverts commit b3ae2459f89773adcbf16fef4b68deaaa3be1929. Revert the force resync API. Not in use. To be replaced by a better async resync API downstream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-10Merge branch 'rwonce/rework' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux Pull READ/WRITE_ONCE rework from Will Deacon: "This the READ_ONCE rework I've been working on for a while, which bumps the minimum GCC version and improves code-gen on arm64 when stack protector is enabled" [ Side note: I'm _really_ tempted to raise the minimum gcc version to 4.9, so that we can just say that we require _Generic() support. That would allow us to more cleanly handle a lot of the cases where we depend on very complex macros with 'sizeof' or __builtin_choose_expr() with __builtin_types_compatible_p() etc. This branch has a workaround for sparse not handling _Generic(), either, but that was already fixed in the sparse development branch, so it's really just gcc-4.9 that we'd require. - Linus ] * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse compiler_types.h: Optimize __unqual_scalar_typeof compilation time compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long) compiler-types.h: Include naked type in __pick_integer_type() match READ_ONCE: Fix comment describing 2x32-bit atomicity gcov: Remove old GCC 3.4 support arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros READ_ONCE: Drop pointer qualifiers when reading from scalar types READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() arm64: csum: Disable KASAN for do_csum() fault_inject: Don't rely on "return value" from WRITE_ONCE() net: tls: Avoid assigning 'const' pointer to non-const pointer netfilter: Avoid assigning 'const' pointer to non-const pointer compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-06-01bpf: Fix running sk_skb program types with ktlsJohn Fastabend
KTLS uses a stream parser to collect TLS messages and send them to the upper layer tls receive handler. This ensures the tls receiver has a full TLS header to parse when it is run. However, when a socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS is enabled we end up with two stream parsers running on the same socket. The result is both try to run on the same socket. First the KTLS stream parser runs and calls read_sock() which will tcp_read_sock which in turn calls tcp_rcv_skb(). This dequeues the skb from the sk_receive_queue. When this is done KTLS code then data_ready() callback which because we stacked KTLS on top of the bpf stream verdict program has been replaced with sk_psock_start_strp(). This will in turn kick the stream parser again and eventually do the same thing KTLS did above calling into tcp_rcv_skb() and dequeuing a skb from the sk_receive_queue. At this point the data stream is broke. Part of the stream was handled by the KTLS side some other bytes may have been handled by the BPF side. Generally this results in either missing data or more likely a "Bad Message" complaint from the kTLS receive handler as the BPF program steals some bytes meant to be in a TLS header and/or the TLS header length is no longer correct. We've already broke the idealized model where we can stack ULPs in any order with generic callbacks on the TX side to handle this. So in this patch we do the same thing but for RX side. We add a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict program is running and add a tls_sw_has_ctx_rx() helper so BPF side can learn there is a TLS ULP on the socket. Then on BPF side we omit calling our stream parser to avoid breaking the data stream for the KTLS receiver. Then on the KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS receiver is done with the packet but before it posts the msg to userspace. This gives us symmetry between the TX and RX halfs and IMO makes it usable again. On the TX side we process packets in this order BPF -> TLS -> TCP and on the receive side in the reverse order TCP -> TLS -> BPF. Discovered while testing OpenSSL 3.0 Alpha2.0 release. Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net/tls: Add force_resync for driver resyncTariq Toukan
This patch adds a field to the tls rx offload context which enables drivers to force a send_resync call. This field can be used by drivers to request a resync at the next possible tls record. It is beneficial for hardware that provides the resync sequence number asynchronously. In such cases, the packet that triggered the resync does not contain the information required for a resync. Instead, the driver requests resync for all the following TLS record until the asynchronous notification with the resync request TCP sequence arrives. A following series for mlx5e ConnectX-6DX TLS RX offload support will use this mechanism. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>