summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2020-12-30lwt: Disable BH too in run_lwt_bpf()Dongdong Wang
[ Upstream commit d9054a1ff585ba01029584ab730efc794603d68f ] The per-cpu bpf_redirect_info is shared among all skb_do_redirect() and BPF redirect helpers. Callers on RX path are all in BH context, disabling preemption is not sufficient to prevent BH interruption. In production, we observed strange packet drops because of the race condition between LWT xmit and TC ingress, and we verified this issue is fixed after we disable BH. Although this bug was technically introduced from the beginning, that is commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure"), at that time call_rcu() had to be call_rcu_bh() to match the RCU context. So this patch may not work well before RCU flavor consolidation has been completed around v5.0. Update the comments above the code too, as call_rcu() is now BH friendly. Signed-off-by: Dongdong Wang <wangdongdong.6@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30cfg80211: initialize rekey_dataSara Sharon
[ Upstream commit f495acd8851d7b345e5f0e521b2645b1e1f928a0 ] In case we have old supplicant, the akm field is uninitialized. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20201129172929.930f0ab7ebee.Ic546e384efab3f4a89f318eafddc3eb7d556aecb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30mac80211: don't set set TDLS STA bandwidth wider than possibleJohannes Berg
[ Upstream commit f65607cdbc6b0da356ef5a22552ddd9313cf87a0 ] When we set up a TDLS station, we set sta->sta.bandwidth solely based on the capabilities, because the "what's the current bandwidth" check is bypassed and only applied for other types of stations. This leads to the unfortunate scenario that the sta->sta.bandwidth is 160 MHz if both stations support it, but we never actually configure this bandwidth unless the AP is already using 160 MHz; even for wider bandwidth support we only go up to 80 MHz (at least right now.) For iwlwifi, this can also lead to firmware asserts, telling us that we've configured the TX rates for a higher bandwidth than is actually available due to the PHY configuration. For non-TDLS, we check against the interface's requested bandwidth, but we explicitly skip this check for TDLS to cope with the wider BW case. Change this to (a) still limit to the TDLS peer's own chandef, which gets factored into the overall PHY configuration we request from the driver, and (b) limit it to when the TDLS peer is authorized, because it's only factored into the channel context in this case. Fixes: 504871e602d9 ("mac80211: fix bandwidth computation for TDLS peers") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20201206145305.fcc7d29c4590.I11f77e9e25ddf871a3c8d5604650c763e2c5887a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30SUNRPC: xprt_load_transport() needs to support the netid "rdma6"Trond Myklebust
[ Upstream commit d5aa6b22e2258f05317313ecc02efbb988ed6d38 ] According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30Bluetooth: Fix null pointer dereference in hci_event_packet()Anmol Karn
[ Upstream commit 6dfccd13db2ff2b709ef60a50163925d477549aa ] AMP_MGR is getting derefernced in hci_phy_link_complete_evt(), when called from hci_event_packet() and there is a possibility, that hcon->amp_mgr may not be found when accessing after initialization of hcon. - net/bluetooth/hci_event.c:4945 The bug seems to get triggered in this line: bredr_hcon = hcon->amp_mgr->l2cap_conn->hcon; Fix it by adding a NULL check for the hcon->amp_mgr before checking the ev-status. Fixes: d5e911928bd8 ("Bluetooth: AMP: Process Physical Link Complete evt") Reported-and-tested-by: syzbot+0bef568258653cff272f@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=0bef568258653cff272f Signed-off-by: Anmol Karn <anmol.karan123@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30Bluetooth: Fix slab-out-of-bounds read in hci_le_direct_adv_report_evt()Peilin Ye
commit f7e0e8b2f1b0a09b527885babda3e912ba820798 upstream. `num_reports` is not being properly checked. A malformed event packet with a large `num_reports` number makes hci_le_direct_adv_report_evt() read out of bounds. Fix it. Cc: stable@vger.kernel.org Fixes: 2f010b55884e ("Bluetooth: Add support for handling LE Direct Advertising Report events") Reported-and-tested-by: syzbot+24ebd650e20bd263ca01@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=24ebd650e20bd263ca01 Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30netfilter: x_tables: Switch synchronization to RCUSubash Abhinov Kasiviswanathan
[ Upstream commit cc00bcaa589914096edef7fb87ca5cee4a166b5c ] When running concurrent iptables rules replacement with data, the per CPU sequence count is checked after the assignment of the new information. The sequence count is used to synchronize with the packet path without the use of any explicit locking. If there are any packets in the packet path using the table information, the sequence count is incremented to an odd value and is incremented to an even after the packet process completion. The new table value assignment is followed by a write memory barrier so every CPU should see the latest value. If the packet path has started with the old table information, the sequence counter will be odd and the iptables replacement will wait till the sequence count is even prior to freeing the old table info. However, this assumes that the new table information assignment and the memory barrier is actually executed prior to the counter check in the replacement thread. If CPU decides to execute the assignment later as there is no user of the table information prior to the sequence check, the packet path in another CPU may use the old table information. The replacement thread would then free the table information under it leading to a use after free in the packet processing context- Unable to handle kernel NULL pointer dereference at virtual address 000000000000008e pc : ip6t_do_table+0x5d0/0x89c lr : ip6t_do_table+0x5b8/0x89c ip6t_do_table+0x5d0/0x89c ip6table_filter_hook+0x24/0x30 nf_hook_slow+0x84/0x120 ip6_input+0x74/0xe0 ip6_rcv_finish+0x7c/0x128 ipv6_rcv+0xac/0xe4 __netif_receive_skb+0x84/0x17c process_backlog+0x15c/0x1b8 napi_poll+0x88/0x284 net_rx_action+0xbc/0x23c __do_softirq+0x20c/0x48c This could be fixed by forcing instruction order after the new table information assignment or by switching to RCU for the synchronization. Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore") Reported-by: Sean Tranchetti <stranche@codeaurora.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30xsk: Fix xsk_poll()'s return typeLuc Van Oostenryck
[ Upstream commit 5d946c5abbaf68083fa6a41824dd79e1f06286d8 ] xsk_poll() is defined as returning 'unsigned int' but the .poll method is declared as returning '__poll_t', a bitwise type. Fix this by using the proper return type and using the EPOLL constants instead of the POLL ones, as required for __poll_t. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20191120001042.30830-1-luc.vanoostenryck@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30net: bridge: vlan: fix error return code in __vlan_add()Zhang Changzhong
[ Upstream commit ee4f52a8de2c6f78b01f10b4c330867d88c1653a ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30tcp: fix cwnd-limited bug for TSO deferral where we send nothingNeal Cardwell
[ Upstream commit 299bcb55ecd1412f6df606e9dc0912d55610029e ] When cwnd is not a multiple of the TSO skb size of N*MSS, we can get into persistent scenarios where we have the following sequence: (1) ACK for full-sized skb of N*MSS arrives -> tcp_write_xmit() transmit full-sized skb with N*MSS -> move pacing release time forward -> exit tcp_write_xmit() because pacing time is in the future (2) TSQ callback or TCP internal pacing timer fires -> try to transmit next skb, but TSO deferral finds remainder of available cwnd is not big enough to trigger an immediate send now, so we defer sending until the next ACK. (3) repeat... So we can get into a case where we never mark ourselves as cwnd-limited for many seconds at a time, even with bulk/infinite-backlog senders, because: o In case (1) above, every time in tcp_write_xmit() we have enough cwnd to send a full-sized skb, we are not fully using the cwnd (because cwnd is not a multiple of the TSO skb size). So every time we send data, we are not cwnd limited, and so in the cwnd-limited tracking code in tcp_cwnd_validate() we mark ourselves as not cwnd-limited. o In case (2) above, every time in tcp_write_xmit() that we try to transmit the "remainder" of the cwnd but defer, we set the local variable is_cwnd_limited to true, but we do not send any packets, so sent_pkts is zero, so we don't call the cwnd-limited logic to update tp->is_cwnd_limited. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Reported-by: Ingemar Johansson <ingemar.s.johansson@ericsson.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30tcp: select sane initial rcvq_space.space for big MSSEric Dumazet
[ Upstream commit 72d05c00d7ecda85df29abd046da7e41cc071c17 ] Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS. This is no longer the case, and Hazem Mohamed Abuelfotoh reported that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames. Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit of rcvq_space.space computation, while it can select later a smaller value for tp->rcv_ssthresh and tp->window_clamp. ss -temoi on receiver would show : skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596 This means that TCP can not increase its window in tcp_grow_window(), and that DRS can never kick. Fix this by making sure that rcvq_space.space is not bigger than number of bytes that can be held in TCP receive queue. People unable/unwilling to change their kernel can work around this issue by selecting a bigger tcp_rmem[1] value as in : echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem Based on an initial report and patch from Hazem Mohamed Abuelfotoh https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/ Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner") Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30mac80211: mesh: fix mesh_pathtbl_init() error pathEric Dumazet
[ Upstream commit 905b2032fa424f253d9126271439cc1db2b01130 ] If tbl_mpp can not be allocated, we call mesh_table_free(tbl_path) while tbl_path rhashtable has not yet been initialized, which causes panics. Simply factorize the rhashtable_init() call into mesh_table_alloc() WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __flush_work kernel/workqueue.c:3040 [inline] WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136 Modules linked in: CPU: 1 PID: 8474 Comm: syz-executor663 Not tainted 5.10.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__flush_work kernel/workqueue.c:3040 [inline] RIP: 0010:__cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136 Code: 5d c3 e8 bf ae 29 00 0f 0b e9 f0 fd ff ff e8 b3 ae 29 00 0f 0b 43 80 3c 3e 00 0f 85 31 ff ff ff e9 34 ff ff ff e8 9c ae 29 00 <0f> 0b e9 dc fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7d fd ff RSP: 0018:ffffc9000165f5a0 EFLAGS: 00010293 RAX: ffffffff814b7064 RBX: 0000000000000001 RCX: ffff888021c80000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888024039ca0 R08: dffffc0000000000 R09: fffffbfff1dd3e64 R10: fffffbfff1dd3e64 R11: 0000000000000000 R12: 1ffff920002cbebd R13: ffff888024039c88 R14: 1ffff11004807391 R15: dffffc0000000000 FS: 0000000001347880(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 000000002cc0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rhashtable_free_and_destroy+0x25/0x9c0 lib/rhashtable.c:1137 mesh_table_free net/mac80211/mesh_pathtbl.c:69 [inline] mesh_pathtbl_init+0x287/0x2e0 net/mac80211/mesh_pathtbl.c:785 ieee80211_mesh_init_sdata+0x2ee/0x530 net/mac80211/mesh.c:1591 ieee80211_setup_sdata+0x733/0xc40 net/mac80211/iface.c:1569 ieee80211_if_add+0xd5c/0x1cd0 net/mac80211/iface.c:1987 ieee80211_add_iface+0x59/0x130 net/mac80211/cfg.c:125 rdev_add_virtual_intf net/wireless/rdev-ops.h:45 [inline] nl80211_new_interface+0x563/0xb40 net/wireless/nl80211.c:3855 genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0xe4e/0x1280 net/netlink/genetlink.c:800 netlink_rcv_skb+0x190/0x3a0 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x780/0x930 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x9a8/0xd40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0x519/0x800 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x2b1/0x360 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20201204162428.2583119-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-11netfilter: nf_tables: avoid false-postive lockdep splatFlorian Westphal
commit c0700dfa2cae44c033ed97dade8a2679c7d22a9d upstream. There are reports wrt lockdep splat in nftables, e.g.: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 31416 at net/netfilter/nf_tables_api.c:622 lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables] ... These are caused by an earlier, unrelated bug such as a n ABBA deadlock in a different subsystem. In such an event, lockdep is disabled and lockdep_is_held returns true unconditionally. This then causes the WARN() in nf_tables. Make the WARN conditional on lockdep still active to avoid this. Fixes: f102d66b335a417 ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/linux-kselftest/CA+G9fYvFUpODs+NkSYcnwKnXm62tmP=ksLeBPmB+KFrB2rvCtQ@mail.gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net/x25: prevent a couple of overflowsDan Carpenter
[ Upstream commit 6ee50c8e262a0f0693dad264c3c99e30e6442a56 ] The .x25_addr[] address comes from the user and is not necessarily NUL terminated. This leads to a couple problems. The first problem is that the strlen() in x25_bind() can read beyond the end of the buffer. The second problem is more subtle and could result in memory corruption. The call tree is: x25_connect() --> x25_write_internal() --> x25_addr_aton() The .x25_addr[] buffers are copied to the "addresses" buffer from x25_write_internal() so it will lead to stack corruption. Verify that the strings are NUL terminated and return -EINVAL if they are not. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: a9288525d2ae ("X25: Dont let x25_bind use addresses containing characters") Reported-by: "kiyin(尹亮)" <kiyin@tencent.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/X8ZeAKm8FnFpN//B@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net: ip6_gre: set dev->hard_header_len when using header_opsAntoine Tenart
[ Upstream commit 832ba596494b2c9eac7760259eff2d8b7dcad0ee ] syzkaller managed to crash the kernel using an NBMA ip6gre interface. I could reproduce it creating an NBMA ip6gre interface and forwarding traffic to it: skbuff: skb_under_panic: text:ffffffff8250e927 len:148 put:44 head:ffff8c03c7a33 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:109! Call Trace: skb_push+0x10/0x10 ip6gre_header+0x47/0x1b0 neigh_connected_output+0xae/0xf0 ip6gre tunnel provides its own header_ops->create, and sets it conditionally when initializing the tunnel in NBMA mode. When header_ops->create is used, dev->hard_header_len should reflect the length of the header created. Otherwise, when not used, dev->needed_headroom should be used. Fixes: eb95f52fc72d ("net: ipv6_gre: Fix GRO to work on IPv6 over GRE tap") Cc: Maria Pasechnik <mariap@mellanox.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20201130161911.464106-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08ipv4: Fix tos mask in inet_rtm_getroute()Guillaume Nault
[ Upstream commit 1ebf179037cb46c19da3a9c1e2ca16e7a754b75e ] When inet_rtm_getroute() was converted to use the RCU variants of ip_route_input() and ip_route_output_key(), the TOS parameters stopped being masked with IPTOS_RT_MASK before doing the route lookup. As a result, "ip route get" can return a different route than what would be used when sending real packets. For example: $ ip route add 192.0.2.11/32 dev eth0 $ ip route add unreachable 192.0.2.11/32 tos 2 $ ip route get 192.0.2.11 tos 2 RTNETLINK answers: No route to host But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would actually be routed using the first route: $ ping -c 1 -Q 2 192.0.2.11 PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data. 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms --- 192.0.2.11 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to return results consistent with real route lookups. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08netfilter: bridge: reset skb->pkt_type after NF_INET_POST_ROUTING traversalAntoine Tenart
[ Upstream commit 44f64f23bae2f0fad25503bc7ab86cd08d04cd47 ] Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the hooks as, while it's an expected value for a bridge, routing expects PACKET_HOST. The change is undone later on after hook traversal. This can be seen with pairs of functions updating skb>pkt_type and then reverting it to its original value: For hook NF_INET_PRE_ROUTING: setup_pre_routing / br_nf_pre_routing_finish For hook NF_INET_FORWARD: br_nf_forward_ip / br_nf_forward_finish But the third case where netfilter does this, for hook NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing but never reverted. A comment says: /* We assume any code from br_dev_queue_push_xmit onwards doesn't care * about the value of skb->pkt_type. */ But when having a tunnel (say vxlan) attached to a bridge we have the following call trace: br_nf_pre_routing br_nf_pre_routing_ipv6 br_nf_pre_routing_finish br_nf_forward_ip br_nf_forward_finish br_nf_post_routing <- pkt_type is updated to PACKET_HOST br_nf_dev_queue_xmit <- but not reverted to its original value vxlan_xmit vxlan_xmit_one skb_tunnel_check_pmtu <- a check on pkt_type is performed In this specific case, this creates issues such as when an ICMPv6 PTB should be sent back. When CONFIG_BRIDGE_NETFILTER is enabled, the PTB isn't sent (as skb_tunnel_check_pmtu checks if pkt_type is PACKET_HOST and returns early). If the comment is right and no one cares about the value of skb->pkt_type after br_dev_queue_push_xmit (which isn't true), resetting it to its original value should be safe. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net/tls: Protect from calling tls_dev_del for TLS RX twiceMaxim Mikityanskiy
[ Upstream commit 025cc2fb6a4e84e9a0552c0017dcd1c24b7ac7da ] tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after calling tls_dev_del if TLX TX offload is also enabled. Clearing tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a time frame when tls_device_down may get called and call tls_dev_del for RX one extra time, confusing the driver, which may lead to a crash. This patch corrects this racy behavior by adding a flag to prevent tls_device_down from calling tls_dev_del the second time. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_controlAlexander Duyck
[ Upstream commit 55472017a4219ca965a957584affdb17549ae4a4 ] When setting congestion control via a BPF program it is seen that the SYN/ACK for packets within a given flow will not include the ECT0 flag. A bit of simple printk debugging shows that when this is configured without BPF we will see the value INET_ECN_xmit value initialized in tcp_assign_congestion_control however when we configure this via BPF the socket is in the closed state and as such it isn't configured, and I do not see it being initialized when we transition the socket into the listen state. The result of this is that the ECT0 bit is configured based on whatever the default state is for the socket. Any easy way to reproduce this is to monitor the following with tcpdump: tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca Without this patch the SYN/ACK will follow whatever the default is. If dctcp all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0 will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK bit matches the value seen on the other packets in the given stream. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08sock: set sk_err to ee_errno on dequeue from errqWillem de Bruijn
[ Upstream commit 985f7337421a811cb354ca93882f943c8335a6f5 ] When setting sk_err, set it to ee_errno, not ee_origin. Commit f5f99309fa74 ("sock: do not set sk_err in sock_dequeue_err_skb") disabled updating sk_err on errq dequeue, which is correct for most error types (origins): - sk->sk_err = err; Commit 38b257938ac6 ("sock: reset sk_err when the error queue is empty") reenabled the behavior for IMCP origins, which do require it: + if (icmp_next) + sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin; But read from ee_errno. Fixes: 38b257938ac6 ("sock: reset sk_err when the error queue is empty") Reported-by: Ayush Ranjan <ayushranjan@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08rose: Fix Null pointer dereference in rose_send_frame()Anmol Karn
[ Upstream commit 3b3fd068c56e3fbea30090859216a368398e39bf ] rose_send_frame() dereferences `neigh->dev` when called from rose_transmit_clear_request(), and the first occurrence of the `neigh` is in rose_loopback_timer() as `rose_loopback_neigh`, and it is initialized in rose_add_loopback_neigh() as NULL. i.e when `rose_loopback_neigh` used in rose_loopback_timer() its `->dev` was still NULL and rose_loopback_timer() was calling rose_rx_call_request() without checking for NULL. - net/rose/rose_link.c This bug seems to get triggered in this line: rose_call = (ax25_address *)neigh->dev->dev_addr; Fix it by adding NULL checking for `rose_loopback_neigh->dev` in rose_loopback_timer(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reported-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com Tested-by: syzbot+a1c743815982d9496393@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=9d2a7ca8c7f2e4b682c97578dfa3f236258300b3 Signed-off-by: Anmol Karn <anmol.karan123@gmail.com> Link: https://lore.kernel.org/r/20201119191043.28813-1-anmol.karan123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net/tls: missing received data after fast remote closeVadim Fedorenko
[ Upstream commit 20ffc7adf53a5fd3d19751fbff7895bcca66686e ] In case when tcp socket received FIN after some data and the parser haven't started before reading data caller will receive an empty buffer. This behavior differs from plain TCP socket and leads to special treating in user-space. The flow that triggers the race is simple. Server sends small amount of data right after the connection is configured to use TLS and closes the connection. In this case receiver sees TLS Handshake data, configures TLS socket right after Change Cipher Spec record. While the configuration is in process, TCP socket receives small Application Data record, Encrypted Alert record and FIN packet. So the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with SK_DONE bit set. The received data is not parsed upon arrival and is never sent to user-space. Patch unpauses parser directly if we have unparsed data in tcp receive queue. Fixes: fcf4793e278e ("tls: check RCV_SHUTDOWN in tls_wait_data") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08net/af_iucv: set correct sk_protocol for child socketsJulian Wiedmann
[ Upstream commit c5dab0941fcdc9664eb0ec0d4d51433216d91336 ] Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*), instead of the PF_IUCV protocol that the parent was created with in iucv_sock_create(). We're currently not using sk->sk_protocol ourselves, so this shouldn't have much impact (except eg. getting the output in skb_dump() right). Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20201120100657.34407-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_initWang Hai
[ Upstream commit e255e11e66da8281e337e4e352956e8a4999fca4 ] kmemleak report a memory leak as follows: unreferenced object 0xffff8880059c6a00 (size 64): comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s) hex dump (first 32 bytes): 20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00 ............... 1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 ................ backtrace: [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0 [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170 [<000000006a9ca9d4>] ops_init+0xa8/0x3c0 [<000000002da57bf2>] setup_net+0x2de/0x7e0 [<000000004e52d573>] copy_net_ns+0x27d/0x530 [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30 [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0 [<0000000030653721>] ksys_unshare+0x3a4/0x780 [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40 [<0000000031a10c08>] do_syscall_64+0x33/0x40 [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 We should free all rules when we catch an error in ip6addrlbl_net_init(). otherwise a memory leak will occur. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201124071728.8385-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-02batman-adv: set .owner to THIS_MODULETaehee Yoo
[ Upstream commit 14a2e551faea53d45bc11629a9dac88f88950ca7 ] If THIS_MODULE is not set, the module would be removed while debugfs is being used. It eventually makes kernel panic. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24mac80211: free sta in sta_info_insert_finish() on errorsJohannes Berg
commit 7bc40aedf24d31d8bea80e1161e996ef4299fb10 upstream. If sta_info_insert_finish() fails, we currently keep the station around and free it only in the caller, but there's only one such caller and it always frees it immediately. As syzbot found, another consequence of this split is that we can put things that sleep only into __cleanup_single_sta() and not in sta_info_free(), but this is the only place that requires such of sta_info_free() now. Change this to free the station in sta_info_insert_finish(), in which case we can still sleep. This will also let us unify the cleanup code later. Cc: stable@vger.kernel.org Fixes: dcd479e10a05 ("mac80211: always wind down STA state") Reported-by: syzbot+32c6c38c4812d22f2f0b@syzkaller.appspotmail.com Reported-by: syzbot+4c81fe92e372d26c4246@syzkaller.appspotmail.com Reported-by: syzbot+6a7fe9faf0d1d61bc24a@syzkaller.appspotmail.com Reported-by: syzbot+abed06851c5ffe010921@syzkaller.appspotmail.com Reported-by: syzbot+b7aeb9318541a1c709f1@syzkaller.appspotmail.com Reported-by: syzbot+d5a9416c6cafe53b5dd0@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201112112201.ee6b397b9453.I9c31d667a0ea2151441cc64ed6613d36c18a48e0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24mac80211: minstrel: fix tx status processing corner caseFelix Fietkau
commit b2911a84396f72149dce310a3b64d8948212c1b3 upstream. Some drivers fill the status rate list without setting the rate index after the final rate to -1. minstrel_ht already deals with this, but minstrel doesn't, which causes it to get stuck at the lowest rate on these drivers. Fix this by checking the count as well. Cc: stable@vger.kernel.org Fixes: cccf129f820e ("mac80211: add the 'minstrel' rate control algorithm") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24mac80211: minstrel: remove deferred sampling codeFelix Fietkau
commit 4fe40b8e1566dad04c87fbf299049a1d0d4bd58d upstream. Deferring sampling attempts to the second stage has some bad interactions with drivers that process the rate table in hardware and use the probe flag to indicate probing packets (e.g. most mt76 drivers). On affected drivers it can lead to probing not working at all. If the link conditions turn worse, it might not be such a good idea to do a lot of sampling for lower rates in this case. Fix this by simply skipping the sample attempt instead of deferring it, but keep the checks that would allow it to be sampled if it was skipped too often, but only if it has less than 95% success probability. Also ensure that IEEE80211_TX_CTL_RATE_CTRL_PROBE is set for all probing packets. Cc: stable@vger.kernel.org Fixes: cccf129f820e ("mac80211: add the 'minstrel' rate control algorithm") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201111183359.43528-2-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24can: af_can: prevent potential access of uninitialized member in canfd_rcv()Anant Thazhemadam
[ Upstream commit 9aa9379d8f868e91719333a7f063ccccc0579acc ] In canfd_rcv(), cfd->len is uninitialized when skb->len = 0, and this uninitialized cfd->len is accessed nonetheless by pr_warn_once(). Fix this uninitialized variable access by checking cfd->len's validity condition (cfd->len > CANFD_MAX_DLEN) separately after the skb->len's condition is checked, and appropriately modify the log messages that are generated as well. In case either of the required conditions fail, the skb is freed and NET_RX_DROP is returned, same as before. Fixes: d4689846881d ("can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once") Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Link: https://lore.kernel.org/r/20201103213906.24219-3-anant.thazhemadam@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24can: af_can: prevent potential access of uninitialized member in can_rcv()Anant Thazhemadam
[ Upstream commit c8c958a58fc67f353289986850a0edf553435702 ] In can_rcv(), cfd->len is uninitialized when skb->len = 0, and this uninitialized cfd->len is accessed nonetheless by pr_warn_once(). Fix this uninitialized variable access by checking cfd->len's validity condition (cfd->len > CAN_MAX_DLEN) separately after the skb->len's condition is checked, and appropriately modify the log messages that are generated as well. In case either of the required conditions fail, the skb is freed and NET_RX_DROP is returned, same as before. Fixes: 8cb68751c115 ("can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once") Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Link: https://lore.kernel.org/r/20201103213906.24219-2-anant.thazhemadam@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24net/ncsi: Fix netlink registrationJoel Stanley
[ Upstream commit 1922a46b8c18cb09d33e06a6cc2e43844ac1b9d0 ] If a user unbinds and re-binds a NC-SI aware driver the kernel will attempt to register the netlink interface at runtime. The structure is marked __ro_after_init so registration fails spectacularly at this point. # echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind # echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/bind ftgmac100 1e660000.ethernet: Read MAC address 52:54:00:12:34:56 from chip ftgmac100 1e660000.ethernet: Using NCSI interface 8<--- cut here --- Unable to handle kernel paging request at virtual address 80a8f858 pgd = 8c768dd6 [80a8f858] *pgd=80a0841e(bad) Internal error: Oops: 80d [#1] SMP ARM CPU: 0 PID: 116 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00003-gdd25b227ec1e #51 Hardware name: Generic DT based system PC is at genl_register_family+0x1f8/0x6d4 LR is at 0xff26ffff pc : [<8073f930>] lr : [<ff26ffff>] psr: 20000153 sp : 8553bc80 ip : 81406244 fp : 8553bd04 r10: 8085d12c r9 : 80a8f73c r8 : 85739000 r7 : 00000017 r6 : 80a8f860 r5 : 80c8ab98 r4 : 80a8f858 r3 : 00000000 r2 : 00000000 r1 : 81406130 r0 : 00000017 Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none Control: 00c5387d Table: 85524008 DAC: 00000051 Process sh (pid: 116, stack limit = 0x1f1988d6) ... Backtrace: [<8073f738>] (genl_register_family) from [<80860ac0>] (ncsi_init_netlink+0x20/0x48) r10:8085d12c r9:80c8fb0c r8:85739000 r7:00000000 r6:81218000 r5:85739000 r4:8121c000 [<80860aa0>] (ncsi_init_netlink) from [<8085d740>] (ncsi_register_dev+0x1b0/0x210) r5:8121c400 r4:8121c000 [<8085d590>] (ncsi_register_dev) from [<805a8060>] (ftgmac100_probe+0x6e0/0x778) r10:00000004 r9:80950228 r8:8115bc10 r7:8115ab00 r6:9eae2c24 r5:813b6f88 r4:85739000 [<805a7980>] (ftgmac100_probe) from [<805355ec>] (platform_drv_probe+0x58/0xa8) r9:80c76bb0 r8:00000000 r7:80cd4974 r6:80c76bb0 r5:8115bc10 r4:00000000 [<80535594>] (platform_drv_probe) from [<80532d58>] (really_probe+0x204/0x514) r7:80cd4974 r6:00000000 r5:80cd4868 r4:8115bc10 Jakub pointed out that ncsi_register_dev is obviously broken, because there is only one family so it would never work if there was more than one ncsi netdev. Fix the crash by registering the netlink family once on boot, and drop the code to unregister it. Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family") Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Link: https://lore.kernel.org/r/20201112061210.914621-1-joel@jms.id.au Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24tcp: only postpone PROBE_RTT if RTT is < current min_rtt estimateRyan Sharpelletti
[ Upstream commit 1b9e2a8c99a5c021041bfb2d512dc3ed92a94ffd ] During loss recovery, retransmitted packets are forced to use TCP timestamps to calculate the RTT samples, which have a millisecond granularity. BBR is designed using a microsecond granularity. As a result, multiple RTT samples could be truncated to the same RTT value during loss recovery. This is problematic, as BBR will not enter PROBE_RTT if the RTT sample is <= the current min_rtt sample, meaning that if there are persistent losses, PROBE_RTT will constantly be pushed off and potentially never re-entered. This patch makes sure that BBR enters PROBE_RTT by checking if RTT sample is < the current min_rtt sample, rather than <=. The Netflix transport/TCP team discovered this bug in the Linux TCP BBR code during lab tests. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Ryan Sharpelletti <sharpelletti@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24sctp: change to hold/put transport for proto_unreach_timerXin Long
[ Upstream commit 057a10fa1f73d745c8e69aa54ab147715f5630ae ] A call trace was found in Hangbin's Codenomicon testing with debug kernel: [ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp] [ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0 [ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0 [ 2616.191533] Call Trace: [ 2616.194265] <IRQ> [ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0 [ 2616.207336] slab_free_freelist_hook+0xeb/0x140 [ 2616.220971] kfree+0xd6/0x2c0 [ 2616.224293] rcu_do_batch+0x3bd/0xc70 [ 2616.243096] rcu_core+0x8b9/0xd00 [ 2616.256065] __do_softirq+0x23d/0xacd [ 2616.260166] irq_exit+0x236/0x2a0 [ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620 [ 2616.269138] apic_timer_interrupt+0xf/0x20 [ 2616.273711] </IRQ> This is because it holds asoc when transport->proto_unreach_timer starts and puts asoc when the timer stops, and without holding transport the transport could be freed when the timer is still running. So fix it by holding/putting transport instead for proto_unreach_timer in transport, just like other timers in transport. v1->v2: - Also use sctp_transport_put() for the "out_unlock:" path in sctp_generate_proto_unreach_event(), as Marcelo noticed. Fixes: 50b5d6ad6382 ("sctp: Fix a race between ICMP protocol unreachable and connect()") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24net: x25: Increase refcnt of "struct x25_neigh" in x25_rx_call_requestXie He
[ Upstream commit 4ee18c179e5e815fa5575e0d2db0c05795a804ee ] The x25_disconnect function in x25_subr.c would decrease the refcount of "x25->neighbour" (struct x25_neigh) and reset this pointer to NULL. However, the x25_rx_call_request function in af_x25.c, which is called when we receive a connection request, does not increase the refcount when it assigns the pointer. Fix this issue by increasing the refcount of "struct x25_neigh" in x25_rx_call_request. This patch fixes frequent kernel crashes when using AF_X25 sockets. Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201112103506.5875-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist()Paul Moore
[ Upstream commit 1ba86d4366e023d96df3dbe415eea7f1dc08c303 ] Static checking revealed that a previous fix to netlbl_unlabel_staticlist() leaves a stack variable uninitialized, this patches fixes that. Fixes: 866358ec331f ("netlabel: fix our progress tracking in netlbl_unlabel_staticlist()") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/160530304068.15651.18355773009751195447.stgit@sifl Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24netlabel: fix our progress tracking in netlbl_unlabel_staticlist()Paul Moore
[ Upstream commit 866358ec331f8faa394995fb4b511af1db0247c8 ] The current NetLabel code doesn't correctly keep track of the netlink dump state in some cases, in particular when multiple interfaces with large configurations are loaded. The problem manifests itself by not reporting the full configuration to userspace, even though it is loaded and active in the kernel. This patch fixes this by ensuring that the dump state is properly reset when necessary inside the netlbl_unlabel_staticlist() function. Fixes: 8cc44579d1bd ("NetLabel: Introduce static network labels for unlabeled connections") Signed-off-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/160484450633.3752.16512718263560813473.stgit@sifl Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24net: Have netpoll bring-up DSA management interfaceFlorian Fainelli
[ Upstream commit 1532b9778478577152201adbafa7738b1e844868 ] DSA network devices rely on having their DSA management interface up and running otherwise their ndo_open() will return -ENETDOWN. Without doing this it would not be possible to use DSA devices as netconsole when configured on the command line. These devices also do not utilize the upper/lower linking so the check about the netpoll device having upper is not going to be a problem. The solution adopted here is identical to the one done for net/ipv4/ipconfig.c with 728c02089a0e ("net: ipv4: handle DSA enabled master network devices"), with the network namespace scope being restricted to that of the process configuring netpoll. Fixes: 04ff53f96a93 ("net: dsa: Add netconsole support") Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24net: bridge: add missing counters to ndo_get_stats64 callbackHeiner Kallweit
[ Upstream commit 7a30ecc9237681bb125cbd30eee92bef7e86293d ] In br_forward.c and br_input.c fields dev->stats.tx_dropped and dev->stats.multicast are populated, but they are ignored in ndo_get_stats64. Fixes: 28172739f0a2 ("net: fix 64 bit counters on 32 bit arches") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24inet_diag: Fix error path to cancel the meseage in inet_req_diag_fill()Wang Hai
[ Upstream commit e33de7c5317e2827b2ba6fd120a505e9eb727b05 ] nlmsg_cancel() needs to be called in the error path of inet_req_diag_fill to cancel the message. Fixes: d545caca827b ("net: inet: diag: expose the socket mark to privileged processes.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201116082018.16496-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill()Wang Hai
[ Upstream commit 849920c703392957f94023f77ec89ca6cf119d43 ] If sb_occ_port_pool_get() failed in devlink_nl_sb_port_pool_fill(), msg should be canceled by genlmsg_cancel(). Fixes: df38dafd2559 ("devlink: implement shared buffer occupancy monitoring interface") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201113111622.11040-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24ah6: fix error return code in ah6_input()Zhang Changzhong
[ Upstream commit a5ebcbdf34b65fcc07f38eaf2d60563b42619a59 ] Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1605581105-35295-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22can: proc: can_remove_proc(): silence remove_proc_entry warningZhang Changzhong
commit 3accbfdc36130282f5ae9e6eecfdf820169fedce upstream. If can_init_proc() fail to create /proc/net/can directory, can_remove_proc() will trigger a warning: WARNING: CPU: 6 PID: 7133 at fs/proc/generic.c:672 remove_proc_entry+0x17b0 Kernel panic - not syncing: panic_on_warn set ... Fix to return early from can_remove_proc() if can proc_dir does not exists. Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1594709090-3203-1-git-send-email-zhangchangzhong@huawei.com Fixes: 8e8cda6d737d ("can: initial support for network namespaces") Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22mac80211: always wind down STA stateJohannes Berg
commit dcd479e10a0510522a5d88b29b8f79ea3467d501 upstream. When (for example) an IBSS station is pre-moved to AUTHORIZED before it's inserted, and then the insertion fails, we don't clean up the fast RX/TX states that might already have been created, since we don't go through all the state transitions again on the way down. Do that, if it hasn't been done already, when the station is freed. I considered only freeing the fast TX/RX state there, but we might add more state so it's more robust to wind down the state properly. Note that we warn if the station was ever inserted, it should have been properly cleaned up in that case, and the driver will probably not like things happening out of order. Reported-by: syzbot+2e293dbd67de2836ba42@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201009141710.7223b322a955.I95bd08b9ad0e039c034927cce0b75beea38e059b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18net: sch_generic: fix the missing new qdisc assignment bugYunsheng Lin
When commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") is backported to stable kernel, one assignment is missing, which causes two problems reported by Joakim and Vishwanath, see [1] and [2]. So add the assignment back to fix it. 1. https://www.spinics.net/lists/netdev/msg693916.html 2. https://www.spinics.net/lists/netdev/msg695131.html Fixes: 749cc0b0c7f3 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18tipc: fix memory leak in tipc_topsrv_start()Wang Hai
[ Upstream commit fa6882c63621821f73cc806f291208e1c6ea6187 ] kmemleak report a memory leak as follows: unreferenced object 0xffff88810a596800 (size 512): comm "ip", pid 21558, jiffies 4297568990 (age 112.120s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 00 83 60 b0 ff ff ff ff ..........`..... backtrace: [<0000000022bbe21f>] tipc_topsrv_init_net+0x1f3/0xa70 [<00000000fe15ddf7>] ops_init+0xa8/0x3c0 [<00000000138af6f2>] setup_net+0x2de/0x7e0 [<000000008c6807a3>] copy_net_ns+0x27d/0x530 [<000000006b21adbd>] create_new_namespaces+0x382/0xa30 [<00000000bb169746>] unshare_nsproxy_namespaces+0xa1/0x1d0 [<00000000fe2e42bc>] ksys_unshare+0x39c/0x780 [<0000000009ba3b19>] __x64_sys_unshare+0x2d/0x40 [<00000000614ad866>] do_syscall_64+0x56/0xa0 [<00000000a1b5ca3c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 'srv' is malloced in tipc_topsrv_start() but not free before leaving from the error handling cases. We need to free it. Fixes: 5c45ab24ac77 ("tipc: make struct tipc_server private for server.c") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201109140913.47370-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18net/x25: Fix null-ptr-deref in x25_connectMartin Schiller
[ Upstream commit 361182308766a265b6c521879b34302617a8c209 ] This fixes a regression for blocking connects introduced by commit 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect"). The x25->neighbour is already set to "NULL" by x25_disconnect() now, while a blocking connect is waiting in x25_wait_for_connection_establishment(). Therefore x25->neighbour must not be accessed here again and x25->state is also already set to X25_STATE_0 by x25_disconnect(). Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18net: Update window_clamp if SOCK_RCVBUF is setMao Wenan
[ Upstream commit 909172a149749242990a6e64cb55d55460d4e417 ] When net.ipv4.tcp_syncookies=1 and syn flood is happened, cookie_v4_check or cookie_v6_check tries to redo what tcp_v4_send_synack or tcp_v6_send_synack did, rsk_window_clamp will be changed if SOCK_RCVBUF is set, which will make rcv_wscale is different, the client still operates with initial window scale and can overshot granted window, the client use the initial scale but local server use new scale to advertise window value, and session work abnormally. Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt") Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18net/af_iucv: fix null pointer dereference on shutdownUrsula Braun
[ Upstream commit 4031eeafa71eaf22ae40a15606a134ae86345daf ] syzbot reported the following KASAN finding: BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 Read of size 2 at addr 000000000000021e by task syz-executor907/519 CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: [<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline] [<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135 [<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline] [<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118 [<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383 [<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline] [<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534 [<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 [<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457 [<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204 [<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline] [<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210 [<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 There is nothing to shutdown if a connection has never been established. Besides that iucv->hs_dev is not yet initialized if a socket is in IUCV_OPEN state and iucv->path is not yet initialized if socket is in IUCV_BOUND state. So, just skip the shutdown calls for a socket in these states. Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support") Fixes: 82492a355fac ("af_iucv: add shutdown for HS transport") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> [jwi: correct one Fixes tag] Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18IPv6: Set SIT tunnel hard_header_len to zeroOliver Herms
[ Upstream commit 8ef9ba4d666614497a057d09b0a6eafc1e34eadf ] Due to the legacy usage of hard_header_len for SIT tunnels while already using infrastructure from net/ipv4/ip_tunnel.c the calculation of the path MTU in tnl_update_pmtu is incorrect. This leads to unnecessary creation of MTU exceptions for any flow going over a SIT tunnel. As SIT tunnels do not have a header themsevles other than their transport (L3, L2) headers we're leaving hard_header_len set to zero as tnl_update_pmtu is already taking care of the transport headers sizes. This will also help avoiding unnecessary IPv6 GC runs and spinlock contention seen when using SIT tunnels and for more than net.ipv6.route.gc_thresh flows. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18cfg80211: regulatory: Fix inconsistent format argumentYe Bin
[ Upstream commit db18d20d1cb0fde16d518fb5ccd38679f174bc04 ] Fix follow warning: [net/wireless/reg.c:3619]: (warning) %d in format string (no. 2) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20201009070215.63695-1-yebin10@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>