aboutsummaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2022-10-29net: sched: delete duplicate cleanup of backlog and qlenZhengchao Shao
[ Upstream commit c19d893fbf3f2f8fa864ae39652c7fee939edde2 ] qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog _after_ calling qdisc->ops->reset. There is no need to clear them again in the specific reset function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 2a3fc78210b9 ("net: sched: sfb: fix null pointer access issue when sfb_init() fails") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29net: sched: cake: fix null pointer access issue when cake_init() failsZhengchao Shao
[ Upstream commit 51f9a8921ceacd7bf0d3f47fa867a64988ba1dcb ] When the default qdisc is cake, if the qdisc of dev_queue fails to be inited during mqprio_init(), cake_reset() is invoked to clear resources. In this case, the tins is NULL, and it will cause gpf issue. The process is as follows: qdisc_create_dflt() cake_init() q->tins = kvcalloc(...) --->failed, q->tins is NULL ... qdisc_put() ... cake_reset() ... cake_dequeue_one() b = &q->tins[...] --->q->tins is NULL The following is the Call Trace information: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:cake_dequeue_one+0xc9/0x3c0 Call Trace: <TASK> cake_reset+0xb1/0x140 qdisc_reset+0xed/0x6f0 qdisc_destroy+0x82/0x4c0 qdisc_put+0x9e/0xb0 qdisc_create_dflt+0x2c3/0x4a0 mqprio_init+0xa71/0x1760 qdisc_create+0x3eb/0x1000 tc_modify_qdisc+0x408/0x1720 rtnetlink_rcv_msg+0x38e/0xac0 netlink_rcv_skb+0x12d/0x3a0 netlink_unicast+0x4a2/0x740 netlink_sendmsg+0x826/0xcc0 sock_sendmsg+0xc5/0x100 ____sys_sendmsg+0x583/0x690 ___sys_sendmsg+0xe8/0x160 __sys_sendmsg+0xbf/0x160 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f89e5122d04 </TASK> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirementsPablo Neira Ayuso
[ Upstream commit 96df8360dbb435cc69f7c3c8db44bf8b1c24cd7b ] Otherwise EINVAL is bogusly reported to userspace when deleting a set element. NFTA_SET_ELEM_KEY_END does not need to be set in case of: - insertion: if not present, start key is used as end key. - deletion: only start key needs to be specified, end key is ignored. Hence, relax the sanity check. Fixes: 88cccd908d51 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.Guillaume Nault
[ Upstream commit 1fcc064b305a1aadeff0d4bff961094d27660acd ] Currently netfilter's rpfilter and fib modules implicitely initialise ->flowic_uid with 0. This is normally the root UID. However, this isn't the case in user namespaces, where user ID 0 is mapped to a different kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get the root UID of the user namespace, thus keeping the same behaviour whether or not we're running in a user namepspace. Note, this is similar to commit 8bcfd0925ef1 ("ipv4: add missing initialization for flowi4_uid"), which fixed the rp_filter sysctl. Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29netfilter: rpfilter/fib: Populate flowic_l3mdev fieldPhil Sutter
[ Upstream commit acc641ab95b66b813c1ce856c377a2bbe71e7f52 ] Use the introduced field for correct operation with VRF devices instead of conditionally overwriting flowic_oif. This is a partial revert of commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by mistake"), implementing a simpler solution. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Stable-dep-of: 1fcc064b305a ("netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29net: hsr: avoid possible NULL deref in skb_clone()Eric Dumazet
[ Upstream commit d8b57135fd9ffe9a5b445350a686442a531c5339 ] syzbot got a crash [1] in skb_clone(), caused by a bug in hsr_get_untagged_frame(). When/if create_stripped_skb_hsr() returns NULL, we must not attempt to call skb_clone(). While we are at it, replace a WARN_ONCE() by netdev_warn_once(). [1] general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 1 PID: 754 Comm: syz-executor.0 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 RIP: 0010:skb_clone+0x108/0x3c0 net/core/skbuff.c:1641 Code: 93 02 00 00 49 83 7c 24 28 00 0f 85 e9 00 00 00 e8 5d 4a 29 fa 4c 8d 75 7e 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 9e 01 00 00 RSP: 0018:ffffc90003ccf4e0 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: ffffc90003ccf5f8 RCX: ffffc9000c24b000 RDX: 000000000000000f RSI: ffffffff8751cb13 RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000000000000f0 R09: 0000000000000140 R10: fffffbfff181d972 R11: 0000000000000000 R12: ffff888161fc3640 R13: 0000000000000a20 R14: 000000000000007e R15: ffffffff8dc5f620 FS: 00007feb621e4700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007feb621e3ff8 CR3: 00000001643a9000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> hsr_get_untagged_frame+0x4e/0x610 net/hsr/hsr_forward.c:164 hsr_forward_do net/hsr/hsr_forward.c:461 [inline] hsr_forward_skb+0xcca/0x1d50 net/hsr/hsr_forward.c:623 hsr_handle_frame+0x588/0x7c0 net/hsr/hsr_slave.c:69 __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379 __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483 __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599 netif_receive_skb_internal net/core/dev.c:5685 [inline] netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744 tun_rx_batched+0x4ab/0x7a0 drivers/net/tun.c:1544 tun_get_user+0x2686/0x3a00 drivers/net/tun.c:1995 tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2025 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9e9/0xdd0 fs/read_write.c:584 ksys_write+0x127/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: f266a683a480 ("net/hsr: Better frame dispatch") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221017165928.2150130-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failedZhengchao Shao
[ Upstream commit 1ca695207ed2271ecbf8ee6c641970f621c157cc ] If the initialization fails in calling addrconf_init_net(), devconf_all is the pointer that has been released. Then ip6mr_sk_done() is called to release the net, accessing devconf->mc_forwarding directly causes invalid pointer access. The process is as follows: setup_net() ops_init() addrconf_init_net() all = kmemdup(...) ---> alloc "all" ... net->ipv6.devconf_all = all; __addrconf_sysctl_register() ---> failed ... kfree(all); ---> ipv6.devconf_all invalid ... ops_exit_list() ... ip6mr_sk_done() devconf = net->ipv6.devconf_all; //devconf is invalid pointer if (!devconf || !atomic_read(&devconf->mc_forwarding)) The following is the Call Trace information: BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0 Read of size 4 at addr ffff888075508e88 by task ip/14554 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 kasan_check_range+0x35/0x1b0 ip6mr_sk_done+0x112/0x3a0 rawv6_close+0x48/0x70 inet_release+0x109/0x230 inet6_release+0x4c/0x70 sock_release+0x87/0x1b0 igmp6_net_exit+0x6b/0x170 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f7963322547 </TASK> Allocated by task 14554: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc_node_track_caller+0x4a/0xb0 kmemdup+0x28/0x60 addrconf_init_net+0x1be/0x840 ops_init+0xa5/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 14554: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 addrconf_init_net+0x623/0x840 ops_init+0xa5/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 7d9b1b578d67 ("ip6mr: fix use-after-free in ip6mr_sk_done()") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221017080331.16878-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29udp: Update reuse->has_conns under reuseport_lock.Kuniyuki Iwashima
[ Upstream commit 69421bf98482d089e50799f45e48b25ce4a8d154 ] When we call connect() for a UDP socket in a reuseport group, we have to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel could select a unconnected socket wrongly for packets sent to the connected socket. However, the current way to set has_conns is illegal and possible to trigger that problem. reuseport_has_conns() changes has_conns under rcu_read_lock(), which upgrades the RCU reader to the updater. Then, it must do the update under the updater's lock, reuseport_lock, but it doesn't for now. For this reason, there is a race below where we fail to set has_conns resulting in the wrong socket selection. To avoid the race, let's split the reader and updater with proper locking. cpu1 cpu2 +----+ +----+ __ip[46]_datagram_connect() reuseport_grow() . . |- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size) | . | | |- rcu_read_lock() | |- reuse = rcu_dereference(sk->sk_reuseport_cb) | | | | | /* reuse->has_conns == 0 here */ | | |- more_reuse->has_conns = reuse->has_conns | |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */ | | | | | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, | | | more_reuse) | `- rcu_read_unlock() `- kfree_rcu(reuse, rcu) | |- sk->sk_state = TCP_ESTABLISHED Note the likely(reuse) in reuseport_has_conns_set() is always true, but we put the test there for ease of review. [0] For the record, usually, sk_reuseport_cb is changed under lock_sock(). The only exception is reuseport_grow() & TCP reqsk migration case. 1) shutdown() TCP listener, which is moved into the latter part of reuse->socks[] to migrate reqsk. 2) New listen() overflows reuse->socks[] and call reuseport_grow(). 3) reuse->max_socks overflows u16 with the new listener. 4) reuseport_grow() pops the old shutdown()ed listener from the array and update its sk->sk_reuseport_cb as NULL without lock_sock(). shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(), but, reuseport_has_conns_set() is called only for UDP under lock_sock(), so likely(reuse) never be false in reuseport_has_conns_set(). [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/ Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29skmsg: pass gfp argument to alloc_sk_msg()Eric Dumazet
[ Upstream commit 2d1f274b95c6e4ba6a813b3b8e7a1a38d54a0a08 ] syzbot found that alloc_sk_msg() could be called from a non sleepable context. sk_psock_verdict_recv() uses rcu_read_lock() protection. We need the callers to pass a gfp_t argument to avoid issues. syzbot report was: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3613, name: syz-executor414 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 INFO: lockdep is turned off. CPU: 0 PID: 3613 Comm: syz-executor414 Not tainted 6.0.0-syzkaller-09589-g55be6084c8e0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 __might_resched+0x538/0x6a0 kernel/sched/core.c:9877 might_alloc include/linux/sched/mm.h:274 [inline] slab_pre_alloc_hook mm/slab.h:700 [inline] slab_alloc_node mm/slub.c:3162 [inline] slab_alloc mm/slub.c:3256 [inline] kmem_cache_alloc_trace+0x59/0x310 mm/slub.c:3287 kmalloc include/linux/slab.h:600 [inline] kzalloc include/linux/slab.h:733 [inline] alloc_sk_msg net/core/skmsg.c:507 [inline] sk_psock_skb_ingress_self+0x5c/0x330 net/core/skmsg.c:600 sk_psock_verdict_apply+0x395/0x440 net/core/skmsg.c:1014 sk_psock_verdict_recv+0x34d/0x560 net/core/skmsg.c:1201 tcp_read_skb+0x4a1/0x790 net/ipv4/tcp.c:1770 tcp_rcv_established+0x129d/0x1a10 net/ipv4/tcp_input.c:5971 tcp_v4_do_rcv+0x479/0xac0 net/ipv4/tcp_ipv4.c:1681 sk_backlog_rcv include/net/sock.h:1109 [inline] __release_sock+0x1d8/0x4c0 net/core/sock.c:2906 release_sock+0x5d/0x1c0 net/core/sock.c:3462 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1483 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] __sys_sendto+0x46d/0x5f0 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xda/0xf0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 43312915b5ba ("skmsg: Get rid of unncessary memset()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29net/smc: Fix an error code in smc_lgr_create()Dan Carpenter
[ Upstream commit bdee15e8c58b450ad736a2b62ef8c7a12548b704 ] If smc_wr_alloc_lgr_mem() fails then return an error code. Don't return success. Fixes: 8799e310fb3f ("net/smc: add v2 support to the work request layer") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29net/atm: fix proc_mpc_write incorrect return valueXiaobo Liu
[ Upstream commit d8bde3bf7f82dac5fc68a62c2816793a12cafa2a ] Then the input contains '\0' or '\n', proc_mpc_write has read them, so the return value needs +1. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xiaobo Liu <cppcoffee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29tls: strp: make sure the TCP skbs do not have overlapping dataJakub Kicinski
[ Upstream commit 0d87bbd39d7fd1135ab9eca672d760470f6508e8 ] TLS tries to get away with using the TCP input queue directly. This does not work if there is duplicated data (multiple skbs holding bytes for the same seq number range due to retransmits). Check for this condition and fall back to copy mode, it should be rare. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29tipc: fix an information leak in tipc_topsrv_kern_subscrAlexander Potapenko
[ Upstream commit 777ecaabd614d47c482a5c9031579e66da13989a ] Use a 8-byte write to initialize sub.usr_handle in tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized when issuing setsockopt(..., SOL_TIPC, ...). This resulted in an infoleak reported by KMSAN when the packet was received: ===================================================== BUG: KMSAN: kernel-infoleak in copyout+0xbc/0x100 lib/iov_iter.c:169 instrument_copy_to_user ./include/linux/instrumented.h:121 copyout+0xbc/0x100 lib/iov_iter.c:169 _copy_to_iter+0x5c0/0x20a0 lib/iov_iter.c:527 copy_to_iter ./include/linux/uio.h:176 simple_copy_to_iter+0x64/0xa0 net/core/datagram.c:513 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419 skb_copy_datagram_iter+0x58/0x200 net/core/datagram.c:527 skb_copy_datagram_msg ./include/linux/skbuff.h:3903 packet_recvmsg+0x521/0x1e70 net/packet/af_packet.c:3469 ____sys_recvmsg+0x2c4/0x810 net/socket.c:? ___sys_recvmsg+0x217/0x840 net/socket.c:2743 __sys_recvmsg net/socket.c:2773 __do_sys_recvmsg net/socket.c:2783 __se_sys_recvmsg net/socket.c:2780 __x64_sys_recvmsg+0x364/0x540 net/socket.c:2780 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 ... Uninit was stored to memory at: tipc_sub_subscribe+0x42d/0xb50 net/tipc/subscr.c:156 tipc_conn_rcv_sub+0x246/0x620 net/tipc/topsrv.c:375 tipc_topsrv_kern_subscr+0x2e8/0x400 net/tipc/topsrv.c:579 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 tipc_sk_join+0x2a8/0x770 net/tipc/socket.c:3084 tipc_setsockopt+0xae5/0xe40 net/tipc/socket.c:3201 __sys_setsockopt+0x87f/0xdc0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 __se_sys_setsockopt net/socket.c:2260 __x64_sys_setsockopt+0xe0/0x160 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 Local variable sub created at: tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 Bytes 84-87 of 88 are uninitialized Memory access of size 88 starts at ffff88801ed57cd0 Data copied to user address 0000000020000400 ... ===================================================== Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: 026321c6d056a5 ("tipc: rename tipc_server to tipc_topsrv") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29tipc: Fix recognition of trial periodMark Tomlinson
[ Upstream commit 28be7ca4fcfd69a2d52aaa331adbf9dbe91f9e6e ] The trial period exists until jiffies is after addr_trial_end. But as jiffies will eventually overflow, just using time_after will eventually give incorrect results. As the node address is set once the trial period ends, this can be used to know that we are not in the trial period. Fixes: e415577f57f4 ("tipc: correct discovery message handling during address trial period") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26net: flag sockets supporting msghdr originated zerocopyPavel Begunkov
commit e993ffe3da4bcddea0536b03be1031bf35cd8d85 upstream. We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-24Merge tag 'v6.0.2' into v6.0/standard/baseBruce Ashfield
This is the 6.0.2 stable release # gpg: Signature made Sat 15 Oct 2022 02:03:10 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-10-24Merge tag 'v6.0.1' into v6.0/standard/baseBruce Ashfield
This is the 6.0.1 stable release # gpg: Signature made Wed 12 Oct 2022 03:39:10 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-10-21net/ieee802154: don't warn zero-sized raw_sendmsg()Tetsuo Handa
[ Upstream commit b12e924a2f5b960373459c8f8a514f887adf5cac ] syzbot is hitting skb_assert_len() warning at __dev_queue_xmit() [1], for PF_IEEE802154 socket's zero-sized raw_sendmsg() request is hitting __dev_queue_xmit() with skb->len == 0. Since PF_IEEE802154 socket's zero-sized raw_sendmsg() request was able to return 0, don't call __dev_queue_xmit() if packet length is 0. ---------- #include <sys/socket.h> #include <netinet/in.h> int main(int argc, char *argv[]) { struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = htonl(INADDR_LOOPBACK) }; struct iovec iov = { }; struct msghdr hdr = { .msg_name = &addr, .msg_namelen = sizeof(addr), .msg_iov = &iov, .msg_iovlen = 1 }; sendmsg(socket(PF_IEEE802154, SOCK_RAW, 0), &hdr, 0); return 0; } ---------- Note that this might be a sign that commit fd1894224407c484 ("bpf: Don't redirect packets with invalid pkt_len") should be reverted, for skb->len == 0 was acceptable for at least PF_IEEE802154 socket. Link: https://syzkaller.appspot.com/bug?extid=5ea725c25d06fb9114c4 [1] Reported-by: syzbot <syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com> Fixes: fd1894224407c484 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20221005014750.3685555-2-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Revert "net/ieee802154: reject zero-sized raw_sendmsg()"Alexander Aring
[ Upstream commit 2eb2756f6c9e9621e022d78321ce40a62c4520b5 ] This reverts commit 3a4d061c699bd3eedc80dc97a4b2a2e1af83c6f5. There is a v2 which does return zero if zero length is given. Signed-off-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20221005014750.3685555-1-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21net: sched: cls_u32: Avoid memcpy() false-positive warningKees Cook
[ Upstream commit 7cba18332e3635aaae60e4e7d4e52849de50d91b ] To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), use unsafe_memcpy(), as the sizing, bounds-checking, and allocation are all very tightly coupled here. This silences the false-positive reported by syzbot: memcpy: detected field-spanning write (size 80) of single field "&n->sel" at net/sched/cls_u32.c:1043 (size 16) [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Reported-by: syzbot+a2c4601efc75848ba321@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/000000000000a96c0b05e97f0444@google.com/ Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220927153700.3071688-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: L2CAP: Fix user-after-freeLuiz Augusto von Dentz
[ Upstream commit 35fcbc4243aad7e7d020b7c1dfb14bb888b20a4f ] This uses l2cap_chan_hold_unless_zero() after calling __l2cap_get_chan_blah() to prevent the following trace: Bluetooth: l2cap_core.c:static void l2cap_chan_destroy(struct kref *kref) Bluetooth: chan 0000000023c4974d Bluetooth: parent 00000000ae861c08 ================================================================== BUG: KASAN: use-after-free in __mutex_waiter_is_first kernel/locking/mutex.c:191 [inline] BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:671 [inline] BUG: KASAN: use-after-free in __mutex_lock+0x278/0x400 kernel/locking/mutex.c:729 Read of size 8 at addr ffff888006a49b08 by task kworker/u3:2/389 Link: https://lore.kernel.org/lkml/20220622082716.478486-1-lee.jones@linaro.org Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memoryLiu Jian
[ Upstream commit 3f8ef65af927db247418d4e1db49164d7a158fc5 ] Fixes the below NULL pointer dereference: [...] [ 14.471200] Call Trace: [ 14.471562] <TASK> [ 14.471882] lock_acquire+0x245/0x2e0 [ 14.472416] ? remove_wait_queue+0x12/0x50 [ 14.473014] ? _raw_spin_lock_irqsave+0x17/0x50 [ 14.473681] _raw_spin_lock_irqsave+0x3d/0x50 [ 14.474318] ? remove_wait_queue+0x12/0x50 [ 14.474907] remove_wait_queue+0x12/0x50 [ 14.475480] sk_stream_wait_memory+0x20d/0x340 [ 14.476127] ? do_wait_intr_irq+0x80/0x80 [ 14.476704] do_tcp_sendpages+0x287/0x600 [ 14.477283] tcp_bpf_push+0xab/0x260 [ 14.477817] tcp_bpf_sendmsg_redir+0x297/0x500 [ 14.478461] ? __local_bh_enable_ip+0x77/0xe0 [ 14.479096] tcp_bpf_send_verdict+0x105/0x470 [ 14.479729] tcp_bpf_sendmsg+0x318/0x4f0 [ 14.480311] sock_sendmsg+0x2d/0x40 [ 14.480822] ____sys_sendmsg+0x1b4/0x1c0 [ 14.481390] ? copy_msghdr_from_user+0x62/0x80 [ 14.482048] ___sys_sendmsg+0x78/0xb0 [ 14.482580] ? vmf_insert_pfn_prot+0x91/0x150 [ 14.483215] ? __do_fault+0x2a/0x1a0 [ 14.483738] ? do_fault+0x15e/0x5d0 [ 14.484246] ? __handle_mm_fault+0x56b/0x1040 [ 14.484874] ? lock_is_held_type+0xdf/0x130 [ 14.485474] ? find_held_lock+0x2d/0x90 [ 14.486046] ? __sys_sendmsg+0x41/0x70 [ 14.486587] __sys_sendmsg+0x41/0x70 [ 14.487105] ? intel_pmu_drain_pebs_core+0x350/0x350 [ 14.487822] do_syscall_64+0x34/0x80 [ 14.488345] entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] The test scenario has the following flow: thread1 thread2 ----------- --------------- tcp_bpf_sendmsg tcp_bpf_send_verdict tcp_bpf_sendmsg_redir sock_close tcp_bpf_push_locked __sock_release tcp_bpf_push //inet_release do_tcp_sendpages sock->ops->release sk_stream_wait_memory // tcp_close sk_wait_event sk->sk_prot->close release_sock(__sk); *** lock_sock(sk); __tcp_close sock_orphan(sk) sk->sk_wq = NULL release_sock **** lock_sock(__sk); remove_wait_queue(sk_sleep(sk), &wait); sk_sleep(sk) //NULL pointer dereference &rcu_dereference_raw(sk->sk_wq)->wait While waiting for memory in thread1, the socket is released with its wait queue because thread2 has closed it. This caused by tcp_bpf_send_verdict didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1. We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory before accessing the wait queue. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20220823133755.314697-2-liujian56@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21can: bcm: check the result of can_send() in bcm_can_tx()Ziyang Xuan
[ Upstream commit 3fd7bfd28cfd68ae80a2fe92ea1615722cc2ee6e ] If can_send() fail, it should not update frames_abs counter in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx(). Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Suggested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.com Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: hci_event: Make sure ISO events don't affect non-ISO connectionsLuiz Augusto von Dentz
[ Upstream commit ed680f925aea76ac666f34d9923cb40558f4e97b ] ISO events (CIS/BIS) shall only be relevant for connection with link type of ISO_LINK, otherwise the controller is probably buggy or it is the result of fuzzer tools such as syzkaller. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: hci_sysfs: Fix attempting to call device_add multiple timesLuiz Augusto von Dentz
[ Upstream commit 448a496f760664d3e2e79466aa1787e6abc922b5 ] device_add shall not be called multiple times as stated in its documentation: 'Do not call this routine or device_register() more than once for any device structure' Syzkaller reports a bug as follows [1]: ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:33! invalid opcode: 0000 [#1] PREEMPT SMP KASAN [...] Call Trace: <TASK> __list_add include/linux/list.h:69 [inline] list_add_tail include/linux/list.h:102 [inline] kobj_kset_join lib/kobject.c:164 [inline] kobject_add_internal+0x18f/0x8f0 lib/kobject.c:214 kobject_add_varg lib/kobject.c:358 [inline] kobject_add+0x150/0x1c0 lib/kobject.c:410 device_add+0x368/0x1e90 drivers/base/core.c:3452 hci_conn_add_sysfs+0x9b/0x1b0 net/bluetooth/hci_sysfs.c:53 hci_le_cis_estabilished_evt+0x57c/0xae0 net/bluetooth/hci_event.c:6799 hci_le_meta_evt+0x2b8/0x510 net/bluetooth/hci_event.c:7110 hci_event_func net/bluetooth/hci_event.c:7440 [inline] hci_event_packet+0x63d/0xfd0 net/bluetooth/hci_event.c:7495 hci_rx_work+0xae7/0x1230 net/bluetooth/hci_core.c:4007 process_one_work+0x991/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> Link: https://syzkaller.appspot.com/bug?id=da3246e2d33afdb92d66bc166a0934c5b146404a Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Hawkins Jiawei <yin31149@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()Tetsuo Handa
[ Upstream commit 2d2cb3066f2c90cd8ca540b36ba7a55e7f2406e0 ] syzbot is reporting cancel_delayed_work() without INIT_DELAYED_WORK() at l2cap_chan_del() [1], for CONF_NOT_COMPLETE flag (which meant to prevent l2cap_chan_del() from calling cancel_delayed_work()) is cleared by timer which fires before l2cap_chan_del() is called by closing file descriptor created by socket(AF_BLUETOOTH, SOCK_STREAM, BTPROTO_L2CAP). l2cap_bredr_sig_cmd(L2CAP_CONF_REQ) and l2cap_bredr_sig_cmd(L2CAP_CONF_RSP) are calling l2cap_ertm_init(chan), and they call l2cap_chan_ready() (which clears CONF_NOT_COMPLETE flag) only when l2cap_ertm_init(chan) succeeded. l2cap_sock_init() does not call l2cap_ertm_init(chan), and it instead sets CONF_NOT_COMPLETE flag by calling l2cap_chan_set_defaults(). However, when connect() is requested, "command 0x0409 tx timeout" happens after 2 seconds from connect() request, and CONF_NOT_COMPLETE flag is cleared after 4 seconds from connect() request, for l2cap_conn_start() from l2cap_info_timeout() callback scheduled by schedule_delayed_work(&conn->info_timer, L2CAP_INFO_TIMEOUT); in l2cap_connect() is calling l2cap_chan_ready(). Fix this problem by initializing delayed works used by L2CAP_MODE_ERTM mode as soon as l2cap_chan_create() allocates a channel, like I did in commit be8597239379f0f5 ("Bluetooth: initialize skb_queue_head at l2cap_chan_create()"). Link: https://syzkaller.appspot.com/bug?extid=83672956c7aa6af698b3 [1] Reported-by: syzbot <syzbot+83672956c7aa6af698b3@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21xfrm: Update ipcomp_scratches with NULL when freedKhalid Masum
[ Upstream commit 8a04d2fc700f717104bfb95b0f6694e448a4537f ] Currently if ipcomp_alloc_scratches() fails to allocate memory ipcomp_scratches holds obsolete address. So when we try to free the percpu scratches using ipcomp_free_scratches() it tries to vfree non existent vm area. Described below: static void * __percpu *ipcomp_alloc_scratches(void) { ... scratches = alloc_percpu(void *); if (!scratches) return NULL; ipcomp_scratches does not know about this allocation failure. Therefore holding the old obsolete address. ... } So when we free, static void ipcomp_free_scratches(void) { ... scratches = ipcomp_scratches; Assigning obsolete address from ipcomp_scratches if (!scratches) return; for_each_possible_cpu(i) vfree(*per_cpu_ptr(scratches, i)); Trying to free non existent page, causing warning: trying to vfree existent vm area. ... } Fix this breakage by updating ipcomp_scrtches with NULL when scratches is freed Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com Tested-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21net-next: Fix IP_UNICAST_IF option behavior for connected socketsRichard Gobert
[ Upstream commit 0e4d354762cefd3e16b4cff8988ff276e45effc4 ] The IP_UNICAST_IF socket option is used to set the outgoing interface for outbound packets. The IP_UNICAST_IF socket option was added as it was needed by the Wine project, since no other existing option (SO_BINDTODEVICE socket option, IP_PKTINFO socket option or the bind function) provided the needed characteristics needed by the IP_UNICAST_IF socket option. [1] The IP_UNICAST_IF socket option works well for unconnected sockets, that is, the interface specified by the IP_UNICAST_IF socket option is taken into consideration in the route lookup process when a packet is being sent. However, for connected sockets, the outbound interface is chosen when connecting the socket, and in the route lookup process which is done when a packet is being sent, the interface specified by the IP_UNICAST_IF socket option is being ignored. This inconsistent behavior was reported and discussed in an issue opened on systemd's GitHub project [2]. Also, a bug report was submitted in the kernel's bugzilla [3]. To understand the problem in more detail, we can look at what happens for UDP packets over IPv4 (The same analysis was done separately in the referenced systemd issue). When a UDP packet is sent the udp_sendmsg function gets called and the following happens: 1. The oif member of the struct ipcm_cookie ipc (which stores the output interface of the packet) is initialized by the ipcm_init_sk function to inet->sk.sk_bound_dev_if (the device set by the SO_BINDTODEVICE socket option). 2. If the IP_PKTINFO socket option was set, the oif member gets overridden by the call to the ip_cmsg_send function. 3. If no output interface was selected yet, the interface specified by the IP_UNICAST_IF socket option is used. 4. If the socket is connected and no destination address is specified in the send function, the struct ipcm_cookie ipc is not taken into consideration and the cached route, that was calculated in the connect function is being used. Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken into consideration. This patch corrects the behavior of the IP_UNICAST_IF socket option for connect()ed sockets by taking into consideration the IP_UNICAST_IF sockopt when connecting the socket. In order to avoid reconnecting the socket, this option is still ignored when applied on an already connected socket until connect() is called again by the Richard Gobert. Change the __ip4_datagram_connect function, which is called during socket connection, to take into consideration the interface set by the IP_UNICAST_IF socket option, in a similar way to what is done in the udp_sendmsg function. [1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/ [2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018 [3] https://bugzilla.kernel.org/show_bug.cgi?id=210255 Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829111554.GA1771@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21wifi: mac80211: accept STA changes without link changesJohannes Berg
[ Upstream commit b303835dabe0340f932ebb4e260d2229f79b0684 ] If there's no link ID, then check that there are no changes to the link, and if so accept them, unless a new link is created. While at it, reject creating a new link without an address. This fixes authorizing an MLD (peer) that has no link 0. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21tcp: annotate data-race around tcp_md5sig_pool_populatedEric Dumazet
[ Upstream commit aacd467c0a576e5e44d2de4205855dc0fe43f6fb ] tcp_md5sig_pool_populated can be read while another thread changes its value. The race has no consequence because allocations are protected with tcp_md5sig_mutex. This patch adds READ_ONCE() and WRITE_ONCE() to document the race and silence KCSAN. Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21openvswitch: Fix overreporting of drops in dropwatchMike Pattrick
[ Upstream commit c21ab2afa2c64896a7f0e3cbc6845ec63dcfad2e ] Currently queue_userspace_packet will call kfree_skb for all frames, whether or not an error occurred. This can result in a single dropped frame being reported as multiple drops in dropwatch. This functions caller may also call kfree_skb in case of an error. This patch will consume the skbs instead and allow caller's to use kfree_skb. Signed-off-by: Mike Pattrick <mkp@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21openvswitch: Fix double reporting of drops in dropwatchMike Pattrick
[ Upstream commit 1100248a5c5ccd57059eb8d02ec077e839a23826 ] Frames sent to userspace can be reported as dropped in ovs_dp_process_packet, however, if they are dropped in the netlink code then netlink_attachskb will report the same frame as dropped. This patch checks for error codes which indicate that the frame has already been freed. Signed-off-by: Mike Pattrick <mkp@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21once: add DO_ONCE_SLOW() for sleepable contextsEric Dumazet
[ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ] Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21net/ieee802154: reject zero-sized raw_sendmsg()Tetsuo Handa
[ Upstream commit 3a4d061c699bd3eedc80dc97a4b2a2e1af83c6f5 ] syzbot is hitting skb_assert_len() warning at raw_sendmsg() for ieee802154 socket. What commit dc633700f00f726e ("net/af_packet: check len when min_header_len equals to 0") does also applies to ieee802154 socket. Link: https://syzkaller.appspot.com/bug?extid=5ea725c25d06fb9114c4 Reported-by: syzbot <syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com> Fixes: fd1894224407c484 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21af_unix: Fix memory leaks of the whole sk due to OOB skb.Kuniyuki Iwashima
[ Upstream commit 7a62ed61367b8fd01bae1e18e30602c25060d824 ] syzbot reported a sequence of memory leaks, and one of them indicated we failed to free a whole sk: unreferenced object 0xffff8880126e0000 (size 1088): comm "syz-executor419", pid 326, jiffies 4294773607 (age 12.609s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00 ........}....... 01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970 [<0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029 [<00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928 [<00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997 [<0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516 [<00000000da1521e1>] sock_create net/socket.c:1566 [inline] [<00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698 [<000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline] [<000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline] [<000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748 [<000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets, send()ing an OOB skb to each other, and close()ing them without consuming the OOB skbs. int skpair[2]; socketpair(AF_UNIX, SOCK_STREAM, 0, skpair); send(skpair[0], "x", 1, MSG_OOB); send(skpair[1], "x", 1, MSG_OOB); close(skpair[0]); close(skpair[1]); Currently, we free an OOB skb in unix_sock_destructor() which is called via __sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is called only when sk->sk_wmem_alloc is 0. In the repro sequences, we do not consume the OOB skb, so both two sk's sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc. Then, no one can consume the OOB skb nor call __sk_free(), and we finally leak the two whole sk. Thus, we must free the unconsumed OOB skb earlier when close()ing the socket. Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21net: rds: don't hold sock lock when cancelling work from ↵Tetsuo Handa
rds_tcp_reset_callbacks() [ Upstream commit a91b750fd6629354460282bbf5146c01b05c4859 ] syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for commit ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section protected by lock_sock() without realizing that rds_send_xmit() might call lock_sock(). We don't need to protect cancel_delayed_work_sync() using lock_sock(), for even if rds_{send,recv}_worker() re-queued this work while __flush_work() from cancel_delayed_work_sync() was waiting for this work to complete, retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP bit. Link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2 [1] Reported-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com> Co-developed-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+78c55c7bc6f66e53dce2@syzkaller.appspotmail.com> Fixes: ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: hci_sync: Fix not indicating power stateLuiz Augusto von Dentz
[ Upstream commit 6abf0dae8c3c927f54e62c46faf8aba580ba0d04 ] When setting power state using legacy/non-mgmt API (e.g hcitool hci0 up) the likes of mgmt_set_powered_complete won't be called causing clients of the MGMT API to not be notified of the change of the state. Fixes: cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limitedNeal Cardwell
[ Upstream commit f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 ] This commit fixes a bug in the tracking of max_packets_out and is_cwnd_limited. This bug can cause the connection to fail to remember that is_cwnd_limited is true, causing the connection to fail to grow cwnd when it should, causing throughput to be lower than it should be. The following event sequence is an example that triggers the bug: (a) The connection is cwnd_limited, but packets_out is not at its peak due to TSO deferral deciding not to send another skb yet. In such cases the connection can advance max_packets_seq and set tp->is_cwnd_limited to true and max_packets_out to a small number. (b) Then later in the round trip the connection is pacing-limited (not cwnd-limited), and packets_out is larger. In such cases the connection would raise max_packets_out to a bigger number but (unexpectedly) flip tp->is_cwnd_limited from true to false. This commit fixes that bug. One straightforward fix would be to separately track (a) the next window after max_packets_out reaches a maximum, and (b) the next window after tp->is_cwnd_limited is set to true. But this would require consuming an extra u32 sequence number. Instead, to save space we track only the most important information. Specifically, we track the strongest available signal of the degree to which the cwnd is fully utilized: (1) If the connection is cwnd-limited then we remember that fact for the current window. (2) If the connection not cwnd-limited then we track the maximum number of outstanding packets in the current window. In particular, note that the new logic cannot trigger the buggy (a)/(b) sequence above because with the new logic a condition where tp->packets_out > tp->max_packets_out can only trigger an update of tp->is_cwnd_limited if tp->is_cwnd_limited is false. This first showed up in a testing of a BBRv2 dev branch, but this buggy behavior highlighted a general issue with the tcp_cwnd_validate() logic that can cause cwnd to fail to increase at the proper rate for any TCP congestion control, including Reno or CUBIC. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21sctp: handle the error returned from sctp_auth_asoc_init_active_keyXin Long
[ Upstream commit 022152aaebe116a25c39818a07e175a8cd3c1e11 ] When it returns an error from sctp_auth_asoc_init_active_key(), the active_key is actually not updated. The old sh_key will be freeed while it's still used as active key in asoc. Then an use-after-free will be triggered when sending patckets, as found by syzbot: sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 sctp_set_owner_w net/sctp/socket.c:132 [inline] sctp_sendmsg_to_asoc+0xbd5/0x1a20 net/sctp/socket.c:1863 sctp_sendmsg+0x1053/0x1d50 net/sctp/socket.c:2025 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 This patch is to fix it by not replacing the sh_key when it returns errors from sctp_auth_asoc_init_active_key() in sctp_auth_set_key(). For sctp_auth_set_active_key(), old active_key_id will be set back to asoc->active_key_id when the same thing happens. Fixes: 58acd1009226 ("sctp: update active_key for asoc when old key is being replaced") Reported-by: syzbot+a236dd8e9622ed8954a3@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21vhost/vsock: Use kvmalloc/kvfree for larger packets.Junichi Uekawa
[ Upstream commit 0e3f72931fc47bb81686020cc643cde5d9cd0bb8 ] When copying a large file over sftp over vsock, data size is usually 32kB, and kmalloc seems to fail to try to allocate 32 32kB regions. vhost-5837: page allocation failure: order:4, mode:0x24040c0 Call Trace: [<ffffffffb6a0df64>] dump_stack+0x97/0xdb [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138 [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8 [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19 [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7 [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost] [<ffffffffb683ddce>] kthread+0xfd/0x105 [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80 [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 Work around by doing kvmalloc instead. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Junichi Uekawa <uekawa@chromium.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: Prevent double register of suspendAbhishek Pandit-Subedi
[ Upstream commit 4b8af331bb4d4cc8bb91c284b11b98dd1e265185 ] Suspend notifier should only be registered and unregistered once per hdev. Simplify this by only registering during driver registration and simply exiting early when HCI_USER_CHANNEL is set. Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 359ee4f834f5 (Bluetooth: Unregister suspend with userchannel) Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21netfilter: nft_fib: Fix for rpath check with VRF devicesPhil Sutter
[ Upstream commit 2a8a7c0eaa8747c16aa4a48d573aa920d5c00a5c ] Analogous to commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by mistake") but for nftables fib expression: Add special treatment of VRF devices so that typical reverse path filtering via 'fib saddr . iif oif' expression works as expected. Fixes: f6d0cbcf09c50 ("netfilter: nf_tables: add fib expression") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21xfrm: Reinject transport-mode packets through workqueueLiu Jian
[ Upstream commit 4f4920669d21e1060b7243e5118dc3b71ced1276 ] The following warning is displayed when the tcp6-multi-diffip11 stress test case of the LTP test suite is tested: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ns-tcpserver:48198] CPU: 0 PID: 48198 Comm: ns-tcpserver Kdump: loaded Not tainted 6.0.0-rc6+ #39 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : des3_ede_encrypt+0x27c/0x460 [libdes] lr : 0x3f sp : ffff80000ceaa1b0 x29: ffff80000ceaa1b0 x28: ffff0000df056100 x27: ffff0000e51e5280 x26: ffff80004df75030 x25: ffff0000e51e4600 x24: 000000000000003b x23: 0000000000802080 x22: 000000000000003d x21: 0000000000000038 x20: 0000000080000020 x19: 000000000000000a x18: 0000000000000033 x17: ffff0000e51e4780 x16: ffff80004e2d1448 x15: ffff80004e2d1248 x14: ffff0000e51e4680 x13: ffff80004e2d1348 x12: ffff80004e2d1548 x11: ffff80004e2d1848 x10: ffff80004e2d1648 x9 : ffff80004e2d1748 x8 : ffff80004e2d1948 x7 : 000000000bcaf83d x6 : 000000000000001b x5 : ffff80004e2d1048 x4 : 00000000761bf3bf x3 : 000000007f1dd0a3 x2 : ffff0000e51e4780 x1 : ffff0000e3b9a2f8 x0 : 00000000db44e872 Call trace: des3_ede_encrypt+0x27c/0x460 [libdes] crypto_des3_ede_encrypt+0x1c/0x30 [des_generic] crypto_cbc_encrypt+0x148/0x190 crypto_skcipher_encrypt+0x2c/0x40 crypto_authenc_encrypt+0xc8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_rcv_established+0x3c8/0x720 tcp_v6_do_rcv+0xdc/0x4a0 tcp_v6_rcv+0xc24/0xcb0 ip6_protocol_deliver_rcu+0xf0/0x574 ip6_input_finish+0x48/0x7c ip6_input+0x48/0xc0 ip6_rcv_finish+0x80/0x9c xfrm_trans_reinject+0xb0/0xf4 tasklet_action_common.constprop.0+0xf8/0x134 tasklet_action+0x30/0x3c __do_softirq+0x128/0x368 do_softirq+0xb4/0xc0 __local_bh_enable_ip+0xb0/0xb4 put_cpu_fpsimd_context+0x40/0x70 kernel_neon_end+0x20/0x40 sha1_base_do_update.constprop.0.isra.0+0x11c/0x140 [sha1_ce] sha1_ce_finup+0x94/0x110 [sha1_ce] crypto_shash_finup+0x34/0xc0 hmac_finup+0x48/0xe0 crypto_shash_finup+0x34/0xc0 shash_digest_unaligned+0x74/0x90 crypto_shash_digest+0x4c/0x9c shash_ahash_digest+0xc8/0xf0 shash_async_digest+0x28/0x34 crypto_ahash_digest+0x48/0xcc crypto_authenc_genicv+0x88/0xcc [authenc] crypto_authenc_encrypt+0xd8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_push+0xb4/0x14c tcp_sendmsg_locked+0x71c/0xb64 tcp_sendmsg+0x40/0x6c inet6_sendmsg+0x4c/0x80 sock_sendmsg+0x5c/0x6c __sys_sendto+0x128/0x15c __arm64_sys_sendto+0x30/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x170/0x194 do_el0_svc+0x38/0x4c el0_svc+0x28/0xe0 el0t_64_sync_handler+0xbc/0x13c el0t_64_sync+0x180/0x184 Get softirq info by bcc tool: ./softirqs -NT 10 Tracing soft irq event time... Hit Ctrl-C to end. 15:34:34 SOFTIRQ TOTAL_nsecs block 158990 timer 20030920 sched 46577080 net_rx 676746820 tasklet 9906067650 15:34:45 SOFTIRQ TOTAL_nsecs block 86100 sched 38849790 net_rx 676532470 timer 1163848790 tasklet 9409019620 15:34:55 SOFTIRQ TOTAL_nsecs sched 58078450 net_rx 475156720 timer 533832410 tasklet 9431333300 The tasklet software interrupt takes too much time. Therefore, the xfrm_trans_reinject executor is changed from tasklet to workqueue. Add add spin lock to protect the queue. This reduces the processing flow of the tcp_sendmsg function in this scenario. Fixes: acf568ee859f0 ("xfrm: Reinject transport-mode packets through tasklet") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: hci_core: Fix not handling link timeouts propertlyLuiz Augusto von Dentz
[ Upstream commit 116523c8fac05d1d26f748fee7919a4ec5df67ea ] Change that introduced the use of __check_timeout did not account for link types properly, it always assumes ACL_LINK is used thus causing hdev->acl_last_tx to be used even in case of LE_LINK and then again uses ACL_LINK with hci_link_tx_to. To fix this __check_timeout now takes the link type as parameter and then procedure to use the right last_tx based on the link type and pass it to hci_link_tx_to. Fixes: 1b1d29e51499 ("Bluetooth: Make use of __check_timeout on hci_sched_le") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: David Beinder <david@beinder.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21skmsg: Schedule psock work if the cached skb exists on the psockLiu Jian
[ Upstream commit bec217197b412d74168c6a42fc0f76d0cc9cad00 ] In sk_psock_backlog function, for ingress direction skb, if no new data packet arrives after the skb is cached, the cached skb does not have a chance to be added to the receive queue of psock. As a result, the cached skb cannot be received by the upper-layer application. Fix this by reschedule the psock work to dispose the cached skb in sk_msg_recvmsg function. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220907071311.60534-1-liujian56@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21flow_dissector: Do not count vlan tags inside tunnel payloadQingqing Yang
[ Upstream commit 9f87eb4246994e32a4e4ea88476b20ab3b412840 ] We've met the problem that when there is a vlan tag inside GRE encapsulation, the match of num_of_vlans fails. It is caused by the vlan tag inside GRE payload has been counted into num_of_vlans, which is not expected. One example packet is like this: Ethernet II, Src: Broadcom_68:56:07 (00:10:18:68:56:07) Dst: Broadcom_68:56:08 (00:10:18:68:56:08) 802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100 Internet Protocol Version 4, Src: 192.168.1.4, Dst: 192.168.1.200 Generic Routing Encapsulation (Transparent Ethernet bridging) Ethernet II, Src: Broadcom_68:58:07 (00:10:18:68:58:07) Dst: Broadcom_68:58:08 (00:10:18:68:58:08) 802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 200 ... It should match the (num_of_vlans 1) rule, but it matches the (num_of_vlans 2) rule. The vlan tags inside the GRE or other tunnel encapsulated payload should not be taken into num_of_vlans. The fix is to stop counting the vlan number when the encapsulation bit is set. Fixes: 34951fcf26c5 ("flow_dissector: Add number of vlan tags dissector") Signed-off-by: Qingqing Yang <qingqing.yang@broadcom.com> Reviewed-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Link: https://lore.kernel.org/r/20220919074808.136640-1-qingqing.yang@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21netfilter: conntrack: revisit the gc initial rescheduling biasAntoine Tenart
[ Upstream commit 2aa192757005f130b2dd3547dda6e462e761199f ] The previous commit changed the way the rescheduling delay is computed which has a side effect: the bias is now represented as much as the other entries in the rescheduling delay which makes the logic to kick in only with very large sets, as the initial interval is very large (INT_MAX). Revisit the GC initial bias to allow more frequent GC for smaller sets while still avoiding wakeups when a machine is mostly idle. We're moving from a large initial value to pretending we have 100 entries expiring at the upper bound. This way only a few entries having a small timeout won't impact much the rescheduling delay and non-idle machines will have enough entries to lower the delay when needed. This also improves readability as the initial bias is now linked to what is computed instead of being an arbitrary large value. Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21netfilter: conntrack: fix the gc rescheduling delayAntoine Tenart
[ Upstream commit 95eabdd207024312876d0ebed90b4c977e050e85 ] Commit 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") changed the eviction rescheduling to the use average expiry of scanned entries (within 1-60s) by doing: for (...) { expires = clamp(nf_ct_expires(tmp), ...); next_run += expires; next_run /= 2; } The issue is the above will make the average ('next_run' here) more dependent on the last expiration values than the firsts (for sets > 2). Depending on the expiration values used to compute the average, the result can be quite different than what's expected. To fix this we can do the following: for (...) { expires = clamp(nf_ct_expires(tmp), ...); next_run += (expires - next_run) / ++count; } Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/releaseLuiz Augusto von Dentz
[ Upstream commit 812e92b824c1db16c9519f8624d48a9901a0d38f ] Due to change to switch to use lock_sock inside rfcomm_sk_state_change the socket shutdown/release procedure can cause a deadlock: rfcomm_sock_shutdown(): lock_sock(); __rfcomm_sock_close(): rfcomm_dlc_close(): __rfcomm_dlc_close(): rfcomm_dlc_lock(); rfcomm_sk_state_change(): lock_sock(); To fix this when the call __rfcomm_sock_close is now done without holding the lock_sock since rfcomm_dlc_lock exists to protect the dlc data there is no need to use lock_sock in that code path. Link: https://lore.kernel.org/all/CAD+dNTsbuU4w+Y_P7o+VEN7BYCAbZuwZx2+tH+OTzCdcZF82YA@mail.gmail.com/ Fixes: b7ce436a5d79 ("Bluetooth: switch to lock_sock in RFCOMM") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21Bluetooth: avoid hci_dev_test_and_set_flag() in mgmt_init_hdev()Tetsuo Handa
[ Upstream commit f74ca25d6d6629ffd4fd80a1a73037253b57d06b ] syzbot is again reporting attempt to cancel uninitialized work at mgmt_index_removed() [1], for setting of HCI_MGMT flag from mgmt_init_hdev() from hci_mgmt_cmd() from hci_sock_sendmsg() can race with testing of HCI_MGMT flag from mgmt_index_removed() from hci_sock_bind() due to lack of serialization via hci_dev_lock(). Since mgmt_init_hdev() is called with mgmt_chan_list_lock held, we can safely split hci_dev_test_and_set_flag() into hci_dev_test_flag() and hci_dev_set_flag(). Thus, in order to close this race, set HCI_MGMT flag after INIT_DELAYED_WORK() completed. This is a local fix based on mgmt_chan_list_lock. Lack of serialization via hci_dev_lock() might be causing different race conditions somewhere else. But a global fix based on hci_dev_lock() should deserve a future patch. Link: https://syzkaller.appspot.com/bug?extid=844c7bf1b1aa4119c5de Reported-by: syzbot+844c7bf1b1aa4119c5de@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 3f2893d3c142986a ("Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed()") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>