summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2021-10-06netfilter: ipset: Fix oversized kvmalloc() callsJozsef Kadlecsik
commit 7bbc3d385bd813077acaf0e6fdb2a86a901f5382 upstream. The commit commit 7661809d493b426e979f39ab512e3adf41fbcc69 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 14 09:45:49 2021 -0700 mm: don't allow oversized kvmalloc() calls limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the same limit in ipset. Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06net: udp: annotate data race around udp_sk(sk)->corkflagEric Dumazet
commit a9f5970767d11eadc805d5283f202612c7ba1f59 upstream. up->corkflag field can be read or written without any lock. Annotate accesses to avoid possible syzbot/KCSAN reports. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06af_unix: fix races in sk_peer_pid and sk_peer_cred accessesEric Dumazet
[ Upstream commit 35306eb23814444bd4021f8a1c3047d3cb0c8b2b ] Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred. In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written. Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected. Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootbXin Long
[ Upstream commit f7e745f8e94492a8ac0b0a26e25f2b19d342918f ] We should always check if skb_header_pointer's return is NULL before using it, otherwise it may cause null-ptr-deref, as syzbot reported: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline] RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196 Call Trace: <IRQ> sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109 ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422 ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472 dst_input include/net/dst.h:460 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297 Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotapLorenzo Bianconi
[ Upstream commit 13cb6d826e0ac0d144b0d48191ff1a111d32f0c6 ] Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap routine in order to fix the following warning reported by syzbot: WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline] WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244 Modules linked in: CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline] RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244 RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216 RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000 RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003 RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100 R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8 R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004 FS: 00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740 netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089 __dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165 __bpf_tx_skb net/core/filter.c:2114 [inline] __bpf_redirect_no_mac net/core/filter.c:2139 [inline] __bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162 ____bpf_clone_redirect net/core/filter.c:2429 [inline] bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401 bpf_prog_eeb6f53a69e5c6a2+0x59/0x234 bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline] __bpf_prog_run include/linux/filter.h:624 [inline] bpf_prog_run include/linux/filter.h:631 [inline] bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119 bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663 bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline] __sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605 __do_sys_bpf kernel/bpf/syscall.c:4691 [inline] __se_sys_bpf kernel/bpf/syscall.c:4689 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com Fixes: 646e76bb5daf4 ("mac80211: parse VHT info in injected frames") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: Fix ieee80211_amsdu_aggregate frag_tail bugChih-Kang Chang
[ Upstream commit fe94bac626d9c1c5bc98ab32707be8a9d7f8adba ] In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the end of skb_shinfo(head)->frag_list, and use it to bind other skb in the end of this function. But when execute ieee80211_amsdu_aggregate() ->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of skb_shinfo(head)->frag_list will be changed. However, the ieee80211_amsdu_aggregate() not update frag_tail after call pskb_expand_head(). That will cause the second skb can't bind to the head skb appropriately.So we update the address of frag_tail to fix it. Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support") Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com [reword comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06ipvs: check that ip_vs_conn_tab_bits is between 8 and 20Andrea Claudi
[ Upstream commit 69e73dbfda14fbfe748d3812da1244cce2928dcb ] ip_vs_conn_tab_bits may be provided by the user through the conn_tab_bits module parameter. If this value is greater than 31, or less than 0, the shift operator used to derive tab_size causes undefined behaviour. Fix this checking ip_vs_conn_tab_bits value to be in the range specified in ipvs Kconfig. If not, simply use default value. Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-06mac80211: fix use-after-free in CCMP/GCMP RXJohannes Berg
commit 94513069eb549737bcfc3d988d6ed4da948a2de8 upstream. When PN checking is done in mac80211, for fragmentation we need to copy the PN to the RX struct so we can later use it to do a comparison, since commit bf30ca922a0c ("mac80211: check defrag PN against current frame"). Unfortunately, in that commit I used the 'hdr' variable without it being necessarily valid, so use-after-free could occur if it was necessary to reallocate (parts of) the frame. Fix this by reloading the variable after the code that results in the reallocations, if any. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401. Cc: stable@vger.kernel.org Fixes: bf30ca922a0c ("mac80211: check defrag PN against current frame") Link: https://lore.kernel.org/r/20210927115838.12b9ac6bb233.I1d066acd5408a662c3b6e828122cd314fcb28cdb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06tcp: adjust rto_base in retransmits_timed_out()Eric Dumazet
commit 3256a2d6ab1f71f9a1bd2d7f6f18eb8108c48d17 upstream. The cited commit exposed an old retransmits_timed_out() bug which assumed it could call tcp_model_timeout() with TCP_RTO_MIN as rto_base for all states. But flows in SYN_SENT or SYN_RECV state uses a different RTO base (1 sec instead of 200 ms, unless BPF choses another value) This caused a reduction of SYN retransmits from 6 to 4 with the default /proc/sys/net/ipv4/tcp_syn_retries value. Fixes: a41e8a88b06e ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Marek Majkowski <marek@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06tcp: create a helper to model exponential backoffYuchung Cheng
commit 01a523b071618abbc634d1958229fe3bd2dfa5fa upstream. Create a helper to model TCP exponential backoff for the next patch. This is pure refactor w no behavior change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06tcp: always set retrans_stamp on recoveryYuchung Cheng
commit 7ae189759cc48cf8b54beebff566e9fd2d4e7d7c upstream. Previously TCP socket's retrans_stamp is not set if the retransmission has failed to send. As a result if a socket is experiencing local issues to retransmit packets, determining when to abort a socket is complicated w/o knowning the starting time of the recovery since retrans_stamp may remain zero. This complication causes sub-optimal behavior that TCP may use the latest, instead of the first, retransmission time to compute the elapsed time of a stalling connection due to local issues. Then TCP may disrecard TCP retries settings and keep retrying until it finally succeed: not a good idea when the local host is already strained. The simple fix is to always timestamp the start of a recovery. It's worth noting that retrans_stamp is also used to compare echo timestamp values to detect spurious recovery. This patch does not break that because retrans_stamp is still later than when the original packet was sent. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06tcp: address problems caused by EDT misshapsEric Dumazet
commit 9efdda4e3abed13f0903b7b6e4d4c2102019440a upstream. When a qdisc setup including pacing FQ is dismantled and recreated, some TCP packets are sent earlier than instructed by TCP stack. TCP can be fooled when ACK comes back, because the following operation can return a negative value. tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; Some paths in TCP stack were not dealing properly with this, this patch addresses four of them. Fixes: ab408b6dc744 ("tcp: switch tcp and sch_fq to new earliest departure time model") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-06net/smc: add missing error check in smc_clc_prfx_set()Karsten Graul
[ Upstream commit 6c90731980655280ea07ce4b21eb97457bf86286 ] Coverity stumbled over a missing error check in smc_clc_prfx_set(): *** CID 1475954: Error handling issues (CHECKED_RETURN) /net/smc/smc_clc.c: 233 in smc_clc_prfx_set() >>> CID 1475954: Error handling issues (CHECKED_RETURN) >>> Calling "kernel_getsockname" without checking return value (as is done elsewhere 8 out of 10 times). 233 kernel_getsockname(clcsock, (struct sockaddr *)&addrs); Add the return code check in smc_clc_prfx_set(). Fixes: c246d942eabc ("net/smc: restructure netinfo for CLC proposal msgs") Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-269p/trans_virtio: Remove sysfs file on probe failureXie Yongji
commit f997ea3b7afc108eb9761f321b57de2d089c7c48 upstream. This ensures we don't leak the sysfs file if we failed to allocate chan->vc_wq during probe. Link: http://lkml.kernel.org/r/20210517083557.172-1-xieyongji@bytedance.com Fixes: 86c8437383ac ("net/9p: Add sysfs mount_tag file for virtio 9P device") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26sctp: add param size validation for SCTP_PARAM_SET_PRIMARYMarcelo Ricardo Leitner
commit ef6c8d6ccf0c1dccdda092ebe8782777cd7803c9 upstream. When SCTP handles an INIT chunk, it calls for example: sctp_sf_do_5_1B_init sctp_verify_init sctp_verify_param sctp_process_init sctp_process_param handling of SCTP_PARAM_SET_PRIMARY sctp_verify_init() wasn't doing proper size validation and neither the later handling, allowing it to work over the chunk itself, possibly being uninitialized memory. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26sctp: validate chunk size in __rcv_asconf_lookupMarcelo Ricardo Leitner
commit b6ffe7671b24689c09faa5675dd58f93758a97ae upstream. In one of the fallbacks that SCTP has for identifying an association for an incoming packet, it looks for AddIp chunk (from ASCONF) and take a peek. Thing is, at this stage nothing was validating that the chunk actually had enough content for that, allowing the peek to happen over uninitialized memory. Similar check already exists in actual asconf handling in sctp_verify_asconf(). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22ip_gre: validate csum_start only on pullWillem de Bruijn
[ Upstream commit 8a0ed250f911da31a2aef52101bc707846a800ff ] The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengthen the test. Simplify, by moving the check next to the offending pull, making it more self documenting and removing an unnecessary branch from other code paths. Strengthen, by also ensuring that the transport header is correct and therefore the inner headers will be after skb_reset_inner_headers. The transport header is set to csum_start in skb_partial_csum_set. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22fq_codel: reject silly quantum parametersEric Dumazet
[ Upstream commit c7c5e6ff533fe1f9afef7d2fa46678987a1335a7 ] syzbot found that forcing a big quantum attribute would crash hosts fast, essentially using this: tc qd replace dev eth0 root fq_codel quantum 4294967295 This is because fq_codel_dequeue() would have to loop ~2^31 times in : if (flow->deficit <= 0) { flow->deficit += q->quantum; list_move_tail(&flow->flowchain, &q->old_flows); goto begin; } SFQ max quantum is 2^19 (half a megabyte) Lets adopt a max quantum of one megabyte for FQ_CODEL. Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22netfilter: socket: icmp6: fix use-after-scopeBenjamin Hesmans
[ Upstream commit 730affed24bffcd1eebd5903171960f5ff9f1f22 ] Bug reported by KASAN: BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40) Call Trace: (...) inet6_ehashfn (net/ipv6/inet6_hashtables.c:40) (...) nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91 net/ipv6/netfilter/nf_socket_ipv6.c:146) It seems that this bug has already been fixed by Eric Dumazet in the past in: commit 78296c97ca1f ("netfilter: xt_socket: fix a stack corruption bug") But a variant of the same issue has been introduced in commit d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match") `daddr` and `saddr` potentially hold a reference to ipv6_var that is no longer in scope when the call to `nf_socket_get_sock_v6` is made. Fixes: d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match") Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()zhenggy
commit 4f884f3962767877d7aabbc1ec124d2c307a4257 upstream. Commit 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") may directly retrans a multiple segments TSO/GSO packet without split, Since this commit, we can no longer assume that a retransmitted packet is a single segment. This patch fixes the tp->undo_retrans accounting in tcp_sacktag_one() that use the actual segments(pcount) of the retransmitted packet. Before that commit (10d3be569243), the assumption underlying the tp->undo_retrans-- seems correct. Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: zhenggy <zhenggy@chinatelecom.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setupVladimir Oltean
commit 6a52e73368038f47f6618623d75061dc263b26ae upstream. DSA supports connecting to a phy-handle, and has a fallback to a non-OF based method of connecting to an internal PHY on the switch's own MDIO bus, if no phy-handle and no fixed-link nodes were present. The -ENODEV error code from the first attempt (phylink_of_phy_connect) is what triggers the second attempt (phylink_connect_phy). However, when the first attempt returns a different error code than -ENODEV, this results in an unbalance of calls to phylink_create and phylink_destroy by the time we exit the function. The phylink instance has leaked. There are many other error codes that can be returned by phylink_of_phy_connect. For example, phylink_validate returns -EINVAL. So this is a practical issue too. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net/af_unix: fix a data-race in unix_dgram_pollEric Dumazet
commit 04f08eb44b5011493d77b602fdec29ff0f5c6cd5 upstream. syzbot reported another data-race in af_unix [1] Lets change __skb_insert() to use WRITE_ONCE() when changing skb head qlen. Also, change unix_dgram_poll() to use lockless version of unix_recvq_full() It is verry possible we can switch all/most unix_recvq_full() to the lockless version, this will be done in a future kernel version. [1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1 BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0: __skb_insert include/linux/skbuff.h:1938 [inline] __skb_queue_before include/linux/skbuff.h:2043 [inline] __skb_queue_tail include/linux/skbuff.h:2076 [inline] skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264 unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2532 __do_sys_sendmmsg net/socket.c:2561 [inline] __se_sys_sendmmsg net/socket.c:2558 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1: skb_queue_len include/linux/skbuff.h:1869 [inline] unix_recvq_full net/unix/af_unix.c:194 [inline] unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777 sock_poll+0x23e/0x260 net/socket.c:1288 vfs_poll include/linux/poll.h:90 [inline] ep_item_poll fs/eventpoll.c:846 [inline] ep_send_events fs/eventpoll.c:1683 [inline] ep_poll fs/eventpoll.c:1798 [inline] do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226 __do_sys_epoll_wait fs/eventpoll.c:2238 [inline] __se_sys_epoll_wait fs/eventpoll.c:2233 [inline] __x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000001b -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 86b18aaa2b5b ("skbuff: fix a data race in skb_queue_len()") Cc: Qian Cai <cai@lca.pw> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22tipc: increase timeout in tipc_sk_enqueue()Hoang Le
commit f4bb62e64c88c93060c051195d3bbba804e56945 upstream. In tipc_sk_enqueue() we use hardcoded 2 jiffies to extract socket buffer from generic queue to particular socket. The 2 jiffies is too short in case there are other high priority tasks get CPU cycles for multiple jiffies update. As result, no buffer could be enqueued to particular socket. To solve this, we switch to use constant timeout 20msecs. Then, the function will be expired between 2 jiffies (CONFIG_100HZ) and 20 jiffies (CONFIG_1000HZ). Fixes: c637c1035534 ("tipc: resolve race problem at unicast message reception") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net/l2tp: Fix reference count leak in l2tp_udp_recv_coreXiyu Yang
commit 9b6ff7eb666415e1558f1ba8a742f5db6a9954de upstream. The reference count leak issue may take place in an error handling path. If both conditions of tunnel->version == L2TP_HDR_VER_3 and the return value of l2tp_v3_ensure_opt_in_linear is nonzero, the function would directly jump to label invalid, without decrementing the reference count of the l2tp_session object session increased earlier by l2tp_tunnel_get_session(). This may result in refcount leaks. Fix this issue by decrease the reference count before jumping to the label invalid. Fixes: 4522a70db7aa ("l2tp: fix reading optional fields of L2TPv3") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22dccp: don't duplicate ccid when cloning dccp sockLin, Zhenpeng
commit d9ea761fdd197351890418acd462c51f241014a7 upstream. Commit 2677d2067731 ("dccp: don't free ccid2_hc_tx_sock ...") fixed a UAF but reintroduced CVE-2017-6074. When the sock is cloned, two dccps_hc_tx_ccid will reference to the same ccid. So one can free the ccid object twice from two socks after cloning. This issue was found by "Hadar Manor" as well and assigned with CVE-2020-16119, which was fixed in Ubuntu's kernel. So here I port the patch from Ubuntu to fix it. The patch prevents cloned socks from referencing the same ccid. Fixes: 2677d2067731410 ("dccp: don't free ccid2_hc_tx_sock ...") Signed-off-by: Zhenpeng Lin <zplin@psu.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net-caif: avoid user-triggerable WARN_ON(1)Eric Dumazet
commit 550ac9c1aaaaf51fd42e20d461f0b1cdbd55b3d2 upstream. syszbot triggers this warning, which looks something we can easily prevent. If we initialize priv->list_field in chnl_net_init(), then always use list_del_init(), we can remove robust_list_del() completely. WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 robust_list_del net/caif/chnl_net.c:67 [inline] WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375 Modules linked in: CPU: 0 PID: 3233 Comm: syz-executor.3 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:robust_list_del net/caif/chnl_net.c:67 [inline] RIP: 0010:chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375 Code: 89 eb e8 3a a3 ba f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 bf 01 00 00 48 81 fb 00 14 4e 8d 48 8b 2b 75 d0 e8 17 a3 ba f8 <0f> 0b 5b 5d 41 5c 41 5d e9 0a a3 ba f8 4c 89 e3 e8 02 a3 ba f8 4c RSP: 0018:ffffc90009067248 EFLAGS: 00010202 RAX: 0000000000008780 RBX: ffffffff8d4e1400 RCX: ffffc9000fd34000 RDX: 0000000000040000 RSI: ffffffff88bb6e49 RDI: 0000000000000003 RBP: ffff88802cd9ee08 R08: 0000000000000000 R09: ffffffff8d0e6647 R10: ffffffff88bb6dc2 R11: 0000000000000000 R12: ffff88803791ae08 R13: dffffc0000000000 R14: 00000000e600ffce R15: ffff888073ed3480 FS: 00007fed10fa0700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2c322000 CR3: 00000000164a6000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: register_netdevice+0xadf/0x1500 net/core/dev.c:10347 ipcaif_newlink+0x4c/0x260 net/caif/chnl_net.c:468 __rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 __sys_sendto+0x21c/0x320 net/socket.c:2036 __do_sys_sendto net/socket.c:2048 [inline] __se_sys_sendto net/socket.c:2044 [inline] __x64_sys_sendto+0xdd/0x1b0 net/socket.c:2044 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: cc36a070b590 ("net-caif: add CAIF netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22tipc: fix an use-after-free issue in tipc_recvmsgXin Long
commit cc19862ffe454a5b632ca202e5a51bfec9f89fd2 upstream. syzbot reported an use-after-free crash: BUG: KASAN: use-after-free in tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979 Call Trace: tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979 sock_recvmsg_nosec net/socket.c:943 [inline] sock_recvmsg net/socket.c:961 [inline] sock_recvmsg+0xca/0x110 net/socket.c:957 tipc_conn_rcv_from_sock+0x162/0x2f0 net/tipc/topsrv.c:398 tipc_conn_recv_work+0xeb/0x190 net/tipc/topsrv.c:421 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 As Hoang pointed out, it was caused by skb_cb->bytes_read still accessed after calling tsk_advance_rx_queue() to free the skb in tipc_recvmsg(). This patch is to fix it by accessing skb_cb->bytes_read earlier than calling tsk_advance_rx_queue(). Fixes: f4919ff59c28 ("tipc: keep the skb in rcv queue until the whole data is read") Reported-by: syzbot+e6741b97d5552f97c24d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net: fix NULL pointer reference in cipso_v4_doi_free王贇
[ Upstream commit 733c99ee8be9a1410287cdbb943887365e83b2d6 ] In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc failed, we sometime observe panic: BUG: kernel NULL pointer dereference, address: ... RIP: 0010:cipso_v4_doi_free+0x3a/0x80 ... Call Trace: netlbl_cipsov4_add_std+0xf4/0x8c0 netlbl_cipsov4_add+0x13f/0x1b0 genl_family_rcv_msg_doit.isra.15+0x132/0x170 genl_rcv_msg+0x125/0x240 This is because in cipso_v4_doi_free() there is no check on 'doi_def->map.std' when 'doi_def->type' equal 1, which is possibe, since netlbl_cipsov4_add_std() haven't initialize it before alloc 'doi_def->map.std'. This patch just add the check to prevent panic happen for similar cases. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22rpc: fix gss_svc_init cleanup on failureJ. Bruce Fields
[ Upstream commit 5a4753446253a427c0ff1e433b9c4933e5af207c ] The failure case here should be rare, but it's obviously wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQDLuke Hsiao
[ Upstream commit e3faa49bcecdfcc80e94dd75709d6acb1a5d89f6 ] Since the original TFO server code was implemented in commit 168a8f58059a22feb9e9a2dcc1b8053dbbbc12ef ("tcp: TCP Fast Open Server - main code path") the TFO server code has supported the sysctl bit flag TFO_SERVER_COOKIE_NOT_REQD. Currently, when the TFO_SERVER_ENABLE and TFO_SERVER_COOKIE_NOT_REQD sysctl bit flags are set, a server connection will accept a SYN with N bytes of data (N > 0) that has no TFO cookie, create a new fast open connection, process the incoming data in the SYN, and make the connection ready for accepting. After accepting, the connection is ready for read()/recvmsg() to read the N bytes of data in the SYN, ready for write()/sendmsg() calls and data transmissions to transmit data. This commit changes an edge case in this feature by changing this behavior to apply to (N >= 0) bytes of data in the SYN rather than only (N > 0) bytes of data in the SYN. Now, a server will accept a data-less SYN without a TFO cookie if TFO_SERVER_COOKIE_NOT_REQD is set. Caveat! While this enables a new kind of TFO (data-less empty-cookie SYN), some firewall rules setup may not work if they assume such packets are not legit TFOs and will filter them. Signed-off-by: Luke Hsiao <lukehsiao@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210816205105.2533289-1-luke.w.hsiao@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: Fix handling of LE Enhanced Connection CompleteLuiz Augusto von Dentz
[ Upstream commit cafae4cd625502f65d1798659c1aa9b62d38cc56 ] LE Enhanced Connection Complete contains the Local RPA used in the connection which must be used when set otherwise there could problems when pairing since the address used by the remote stack could be the Local RPA: BLUETOOTH CORE SPECIFICATION Version 5.2 | Vol 4, Part E page 2396 'Resolvable Private Address being used by the local device for this connection. This is only valid when the Own_Address_Type (from the HCI_LE_Create_Connection, HCI_LE_Set_Advertising_Parameters, HCI_LE_Set_Extended_Advertising_Parameters, or HCI_LE_Extended_Create_Connection commands) is set to 0x02 or 0x03, and the Controller generated a resolvable private address for the local device using a non-zero local IRK. For other Own_Address_Type values, the Controller shall return all zeros.' Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: avoid circular locks in sco_sock_connectDesmond Cheong Zhi Xi
[ Upstream commit 734bc5ff783115aa3164f4e9dd5967ae78e0a8ab ] In a future patch, calls to bh_lock_sock in sco.c should be replaced by lock_sock now that none of the functions are run in IRQ context. However, doing so results in a circular locking dependency: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc4-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.2/14867 is trying to acquire lock: ffff88803e3c1120 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1613 [inline] ffff88803e3c1120 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at: sco_conn_del+0x12a/0x2a0 net/bluetooth/sco.c:191 but task is already holding lock: ffffffff8d2dc7c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_disconn_cfm include/net/bluetooth/hci_core.h:1497 [inline] ffffffff8d2dc7c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_conn_hash_flush+0xda/0x260 net/bluetooth/hci_conn.c:1608 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (hci_cb_list_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:959 [inline] __mutex_lock+0x12a/0x10a0 kernel/locking/mutex.c:1104 hci_connect_cfm include/net/bluetooth/hci_core.h:1482 [inline] hci_remote_features_evt net/bluetooth/hci_event.c:3263 [inline] hci_event_packet+0x2f4d/0x7c50 net/bluetooth/hci_event.c:6240 hci_rx_work+0x4f8/0xd30 net/bluetooth/hci_core.c:5122 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 kthread+0x3e5/0x4d0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 -> #1 (&hdev->lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:959 [inline] __mutex_lock+0x12a/0x10a0 kernel/locking/mutex.c:1104 sco_connect net/bluetooth/sco.c:245 [inline] sco_sock_connect+0x227/0xa10 net/bluetooth/sco.c:601 __sys_connect_file+0x155/0x1a0 net/socket.c:1879 __sys_connect+0x161/0x190 net/socket.c:1896 __do_sys_connect net/socket.c:1906 [inline] __se_sys_connect net/socket.c:1903 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1903 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3051 [inline] check_prevs_add kernel/locking/lockdep.c:3174 [inline] validate_chain kernel/locking/lockdep.c:3789 [inline] __lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_nested+0xca/0x120 net/core/sock.c:3170 lock_sock include/net/sock.h:1613 [inline] sco_conn_del+0x12a/0x2a0 net/bluetooth/sco.c:191 sco_disconn_cfm+0x71/0xb0 net/bluetooth/sco.c:1202 hci_disconn_cfm include/net/bluetooth/hci_core.h:1500 [inline] hci_conn_hash_flush+0x127/0x260 net/bluetooth/hci_conn.c:1608 hci_dev_do_close+0x528/0x1130 net/bluetooth/hci_core.c:1778 hci_unregister_dev+0x1c0/0x5a0 net/bluetooth/hci_core.c:4015 vhci_release+0x70/0xe0 drivers/bluetooth/hci_vhci.c:340 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xbd4/0x2a60 kernel/exit.c:825 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x47f/0x2160 kernel/signal.c:2808 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302 ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:288 other info that might help us debug this: Chain exists of: sk_lock-AF_BLUETOOTH-BTPROTO_SCO --> &hdev->lock --> hci_cb_list_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hci_cb_list_lock); lock(&hdev->lock); lock(hci_cb_list_lock); lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO); *** DEADLOCK *** The issue is that the lock hierarchy should go from &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO. For example, one such call trace is: hci_dev_do_close(): hci_dev_lock(); hci_conn_hash_flush(): hci_disconn_cfm(): mutex_lock(&hci_cb_list_lock); sco_disconn_cfm(): sco_conn_del(): lock_sock(sk); However, in sco_sock_connect, we call lock_sock before calling hci_dev_lock inside sco_connect, thus inverting the lock hierarchy. We fix this by pulling the call to hci_dev_lock out from sco_connect. Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: schedule SCO timeouts with delayed_workDesmond Cheong Zhi Xi
[ Upstream commit ba316be1b6a00db7126ed9a39f9bee434a508043 ] struct sock.sk_timer should be used as a sock cleanup timer. However, SCO uses it to implement sock timeouts. This causes issues because struct sock.sk_timer's callback is run in an IRQ context, and the timer callback function sco_sock_timeout takes a spin lock on the socket. However, other functions such as sco_conn_del and sco_conn_ready take the spin lock with interrupts enabled. This inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} lock usage could lead to deadlocks as reported by Syzbot [1]: CPU0 ---- lock(slock-AF_BLUETOOTH-BTPROTO_SCO); <Interrupt> lock(slock-AF_BLUETOOTH-BTPROTO_SCO); To fix this, we use delayed work to implement SCO sock timouts instead. This allows us to avoid taking the spin lock on the socket in an IRQ context, and corrects the misuse of struct sock.sk_timer. As a note, cancel_delayed_work is used instead of cancel_delayed_work_sync in sco_sock_set_timer and sco_sock_clear_timer to avoid a deadlock. In the future, the call to bh_lock_sock inside sco_sock_timeout should be changed to lock_sock to synchronize with other functions using lock_sock. However, since sco_sock_set_timer and sco_sock_clear_timer are sometimes called under the locked socket (in sco_connect and __sco_sock_close), cancel_delayed_work_sync might cause them to sleep until an sco_sock_timeout that has started finishes running. But sco_sock_timeout would also sleep until it can grab the lock_sock. Using cancel_delayed_work is fine because sco_sock_timeout does not change from run to run, hence there is no functional difference between: 1. waiting for a timeout to finish running before scheduling another timeout 2. scheduling another timeout while a timeout is running. Link: https://syzkaller.appspot.com/bug?id=9089d89de0502e120f234ca0fc8a703f7368b31e [1] Reported-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com Tested-by: syzbot+2f6d7c28bb4bf7e82060@syzkaller.appspotmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: skip invalid hci_sync_conn_complete_evtDesmond Cheong Zhi Xi
[ Upstream commit 92fe24a7db751b80925214ede43f8d2be792ea7b ] Syzbot reported a corrupted list in kobject_add_internal [1]. This happens when multiple HCI_EV_SYNC_CONN_COMPLETE event packets with status 0 are sent for the same HCI connection. This causes us to register the device more than once which corrupts the kset list. As this is forbidden behavior, we add a check for whether we're trying to process the same HCI_EV_SYNC_CONN_COMPLETE event multiple times for one connection. If that's the case, the event is invalid, so we report an error that the device is misbehaving, and ignore the packet. Link: https://syzkaller.appspot.com/bug?extid=66264bf2fd0476be7e6c [1] Reported-by: syzbot+66264bf2fd0476be7e6c@syzkaller.appspotmail.com Tested-by: syzbot+66264bf2fd0476be7e6c@syzkaller.appspotmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22flow_dissector: Fix out-of-bounds warningsGustavo A. R. Silva
[ Upstream commit 323e0cb473e2a8706ff162b6b4f4fa16023c9ba7 ] Fix the following out-of-bounds warnings: net/core/flow_dissector.c: In function '__skb_flow_dissect': >> net/core/flow_dissector.c:1104:4: warning: 'memcpy' offset [24, 39] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'struct in6_addr' at offset 8 [-Warray-bounds] 1104 | memcpy(&key_addrs->v6addrs, &iph->saddr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1105 | sizeof(key_addrs->v6addrs)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/linux/ipv6.h:5, from net/core/flow_dissector.c:6: include/uapi/linux/ipv6.h:133:18: note: subobject 'saddr' declared here 133 | struct in6_addr saddr; | ^~~~~ >> net/core/flow_dissector.c:1059:4: warning: 'memcpy' offset [16, 19] from the object at '<unknown>' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 12 [-Warray-bounds] 1059 | memcpy(&key_addrs->v4addrs, &iph->saddr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1060 | sizeof(key_addrs->v4addrs)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/linux/ip.h:17, from net/core/flow_dissector.c:5: include/uapi/linux/ip.h:103:9: note: subobject 'saddr' declared here 103 | __be32 saddr; | ^~~~~ The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). So, the compiler legitimately complains about it. As these are just a couple of members, fix this by copying each one of them in separate calls to memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()Gustavo A. R. Silva
[ Upstream commit 6321c7acb82872ef6576c520b0e178eaad3a25c0 ] Fix the following out-of-bounds warning: In function 'ip_copy_addrs', inlined from '__ip_queue_xmit' at net/ipv4/ip_output.c:517:2: net/ipv4/ip_output.c:449:2: warning: 'memcpy' offset [40, 43] from the object at 'fl' is out of the bounds of referenced subobject 'saddr' with type 'unsigned int' at offset 36 [-Warray-bounds] 449 | memcpy(&iph->saddr, &fl4->saddr, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 450 | sizeof(fl4->saddr) + sizeof(fl4->daddr)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &iph->saddr and &fl4->saddr. As these are just a couple of struct members, fix this by using direct assignments, instead of memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/d5ae2e65-1f18-2577-246f-bada7eee6ccd@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22netlink: Deal with ESRCH error in nlmsg_notify()Yajun Deng
[ Upstream commit fef773fc8110d8124c73a5e6610f89e52814637d ] Yonghong Song report: The bpf selftest tc_bpf failed with latest bpf-next. The following is the command to run and the result: $ ./test_progs -n 132 [ 40.947571] bpf_testmod: loading out-of-tree module taints kernel. test_tc_bpf:PASS:test_tc_bpf__open_and_load 0 nsec test_tc_bpf:PASS:bpf_tc_hook_create(BPF_TC_INGRESS) 0 nsec test_tc_bpf:PASS:bpf_tc_hook_create invalid hook.attach_point 0 nsec test_tc_bpf_basic:PASS:bpf_obj_get_info_by_fd 0 nsec test_tc_bpf_basic:PASS:bpf_tc_attach 0 nsec test_tc_bpf_basic:PASS:handle set 0 nsec test_tc_bpf_basic:PASS:priority set 0 nsec test_tc_bpf_basic:PASS:prog_id set 0 nsec test_tc_bpf_basic:PASS:bpf_tc_attach replace mode 0 nsec test_tc_bpf_basic:PASS:bpf_tc_query 0 nsec test_tc_bpf_basic:PASS:handle set 0 nsec test_tc_bpf_basic:PASS:priority set 0 nsec test_tc_bpf_basic:PASS:prog_id set 0 nsec libbpf: Kernel error message: Failed to send filter delete notification test_tc_bpf_basic:FAIL:bpf_tc_detach unexpected error: -3 (errno 3) test_tc_bpf:FAIL:test_tc_internal ingress unexpected error: -3 (errno 3) The failure seems due to the commit cfdf0d9ae75b ("rtnetlink: use nlmsg_notify() in rtnetlink_send()") Deal with ESRCH error in nlmsg_notify() even the report variable is zero. Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Link: https://lore.kernel.org/r/20210719051816.11762-1-yajun.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22tipc: keep the skb in rcv queue until the whole data is readXin Long
[ Upstream commit f4919ff59c2828064b4156e3c3600a169909bcf4 ] Currently, when userspace reads a datagram with a buffer that is smaller than this datagram, the data will be truncated and only part of it can be received by users. It doesn't seem right that users don't know the datagram size and have to use a huge buffer to read it to avoid the truncation. This patch to fix it by keeping the skb in rcv queue until the whole data is read by users. Only the last msg of the datagram will be marked with MSG_EOR, just as TCP/SCTP does. Note that this will work as above only when MSG_EOR is set in the flags parameter of recvmsg(), so that it won't break any old user applications. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-229p/xen: Fix end of loop tests for list_for_each_entryHarshvardhan Jha
commit 732b33d0dbf17e9483f0b50385bf606f724f50a2 upstream. This patch addresses the following problems: - priv can never be NULL, so this part of the check is useless - if the loop ran through the whole list, priv->client is invalid and it is more appropriate and sufficient to check for the end of list_for_each_entry loop condition. Link: http://lkml.kernel.org/r/20210727000709.225032-1-harshvardhan.jha@oracle.com Signed-off-by: Harshvardhan Jha <harshvardhan.jha@oracle.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Tested-by: Stefano Stabellini <sstabellini@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22netns: protect netns ID lookups with RCUGuillaume Nault
commit 2dce224f469f060b9998a5a869151ef83c08ce77 upstream. __peernet2id() can be protected by RCU as it only calls idr_for_each(), which is RCU-safe, and never modifies the nsid table. rtnl_net_dumpid() can also do lockless lookups. It does two nested idr_for_each() calls on nsid tables (one direct call and one indirect call because of rtnl_net_dumpid_one() calling __peernet2id()). The netnsid tables are never updated. Therefore it is safe to not take the nsid_lock and run within an RCU-critical section instead. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
2021-09-22ipv4: fix endianness issue in inet_rtm_getroute_build_skb()Eric Dumazet
[ Upstream commit 92548b0ee220e000d81c27ac9a80e0ede895a881 ] The UDP length field should be in network order. This removes the following sparse error: net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types) net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len net/ipv4/route.c:3173:27: got unsigned long Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failedXiyu Yang
[ Upstream commit c66070125837900163b81a03063ddd657a7e9bfb ] The reference counting issue happens in one exception handling path of cbq_change_class(). When failing to get tcf_block, the function forgets to decrease the refcount of "rtab" increased by qdisc_put_rtab(), causing a refcount leak. Fix this issue by jumping to "failure" label when get tcf_block failed. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22ipv4: make exception cache less predictibleEric Dumazet
[ Upstream commit 67d6d681e15b578c1725bad8ad079e05d1c48a8e ] Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in fnhe_hashfun()"), an attacker can still use brute force to learn some secrets from a victim linux host. One way to defeat these attacks is to make the max depth of the hash table bucket a random value. Before this patch, each bucket of the hash table used to store exceptions could contain 6 items under attack. After the patch, each bucket would contains a random number of items, between 6 and 10. The attacker can no longer infer secrets. This is slightly increasing memory size used by the hash table, by 50% in average, we do not expect this to be a problem. This patch is more complex than the prior one (IPv6 equivalent), because IPv4 was reusing the oldest entry. Since we need to be able to evict more than one entry per update_or_create_fnhe() call, I had to replace fnhe_oldest() with fnhe_remove_oldest(). Also note that we will queue extra kfree_rcu() calls under stress, which hopefully wont be a too big issue. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: add timeout sanity check to hci_inquiryPavel Skripkin
[ Upstream commit f41a4b2b5eb7872109723dab8ae1603bdd9d9ec1 ] Syzbot hit "task hung" bug in hci_req_sync(). The problem was in unreasonable huge inquiry timeout passed from userspace. Fix it by adding sanity check for timeout value to hci_inquiry(). Since hci_inquiry() is the only user of hci_req_sync() with user controlled timeout value, it makes sense to check timeout value in hci_inquiry() and don't touch hci_req_sync(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+be2baed593ea56c6a84c@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22mac80211: Fix insufficient headroom issue for AMSDUChih-Kang Chang
[ Upstream commit f50d2ff8f016b79a2ff4acd5943a1eda40c545d4 ] ieee80211_amsdu_realloc_pad() fails to account for extra_tx_headroom, the original reserved headroom might be eaten. Add the necessary extra_tx_headroom. Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support") Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20210816085128.10931-2-pkshih@realtek.com [fix indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: Move shutdown callback before flushing tx and rx queueKai-Heng Feng
[ Upstream commit 0ea53674d07fb6db2dd7a7ec2fdc85a12eb246c2 ] Commit 0ea9fd001a14 ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled") introduced a regression that makes mtkbtsdio driver stops working: [ 36.593956] Bluetooth: hci0: Firmware already downloaded [ 46.814613] Bluetooth: hci0: Execution of wmt command timed out [ 46.814619] Bluetooth: hci0: Failed to send wmt func ctrl (-110) The shutdown callback depends on the result of hdev->rx_work, so we should call it before flushing rx_work: -> btmtksdio_shutdown() -> mtk_hci_wmt_sync() -> __hci_cmd_send() -> wait for BTMTKSDIO_TX_WAIT_VND_EVT gets cleared -> btmtksdio_recv_event() -> hci_recv_frame() -> queue_work(hdev->workqueue, &hdev->rx_work) -> clears BTMTKSDIO_TX_WAIT_VND_EVT So move the shutdown callback before flushing TX/RX queue to resolve the issue. Reported-and-tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Guenter Roeck <linux@roeck-us.net> Fixes: 0ea9fd001a14 ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: fix repeated calls to sco_sock_killDesmond Cheong Zhi Xi
[ Upstream commit e1dee2c1de2b4dd00eb44004a4bda6326ed07b59 ] In commit 4e1a720d0312 ("Bluetooth: avoid killing an already killed socket"), a check was added to sco_sock_kill to skip killing a socket if the SOCK_DEAD flag was set. This was done after a trace for a use-after-free bug showed that the same sock pointer was being killed twice. Unfortunately, this check prevents sco_sock_kill from running on any socket. sco_sock_kill kills a socket only if it's zapped and orphaned, however sock_orphan announces that the socket is dead before detaching it. i.e., orphaned sockets have the SOCK_DEAD flag set. To fix this, we remove the check for SOCK_DEAD, and avoid repeated calls to sco_sock_kill by removing incorrect calls in: 1. sco_sock_timeout. The socket should not be killed on timeout as further processing is expected to be done. For example, sco_sock_connect sets the timer then waits for the socket to be connected or for an error to be returned. 2. sco_conn_del. This function should clean up resources for the connection, but the socket itself should be cleaned up in sco_sock_release. 3. sco_sock_close. Calls to sco_sock_close in sco_sock_cleanup_listen and sco_sock_release are followed by sco_sock_kill. Hence the duplicated call should be removed. Fixes: 4e1a720d0312 ("Bluetooth: avoid killing an already killed socket") Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflowColin Ian King
[ Upstream commit 713baf3dae8f45dc8ada4ed2f5fdcbf94a5c274d ] An earlier commit replaced using batostr to using %pMR sprintf for the construction of session->name. Static analysis detected that this new method can use a total of 21 characters (including the trailing '\0') so we need to increase the BTNAMSIZ from 18 to 21 to fix potential buffer overflows. Addresses-Coverity: ("Out-of-bounds write") Fixes: fcb73338ed53 ("Bluetooth: Use %pMR in sprintf/seq_printf instead of batostr") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22net: cipso: fix warnings in netlbl_cipsov4_add_stdPavel Skripkin
[ Upstream commit 8ca34a13f7f9b3fa2c464160ffe8cc1a72088204 ] Syzbot reported warning in netlbl_cipsov4_add(). The problem was in too big doi_def->map.std->lvl.local_size passed to kcalloc(). Since this value comes from userpace there is no need to warn if value is not correct. The same problem may occur with other kcalloc() calls in this function, so, I've added __GFP_NOWARN flag to all kcalloc() calls there. Reported-and-tested-by: syzbot+cdd51ee2e6b0b2e18c0d@syzkaller.appspotmail.com Fixes: 96cb8e3313c7 ("[NetLabel]: CIPSOv4 and Unlabeled packet integration") Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22tcp: seq_file: Avoid skipping sk during tcp_seek_last_posMartin KaFai Lau
[ Upstream commit 525e2f9fd0229eb10cb460a9e6d978257f24804e ] st->bucket stores the current bucket number. st->offset stores the offset within this bucket that is the sk to be seq_show(). Thus, st->offset only makes sense within the same st->bucket. These two variables are an optimization for the common no-lseek case. When resuming the seq_file iteration (i.e. seq_start()), tcp_seek_last_pos() tries to continue from the st->offset at bucket st->bucket. However, it is possible that the bucket pointed by st->bucket has changed and st->offset may end up skipping the whole st->bucket without finding a sk. In this case, tcp_seek_last_pos() currently continues to satisfy the offset condition in the next (and incorrect) bucket. Instead, regardless of the offset value, the first sk of the next bucket should be returned. Thus, "bucket == st->bucket" check is added to tcp_seek_last_pos(). The chance of hitting this is small and the issue is a decade old, so targeting for the next tree. Fixes: a8b690f98baf ("tcp: Fix slowness in read /proc/net/tcp") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200541.1033917-1-kafai@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org>