aboutsummaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2022-05-25Merge tag 'v5.4.196' into v5.4/standard/baseBruce Ashfield
This is the 5.4.196 stable release Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com> # gpg: Signature made Wed 25 May 2022 03:14:43 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key # Conflicts: # fs/file_table.c
2022-05-25Merge tag 'v5.4.195' into v5.4/standard/baseBruce Ashfield
This is the 5.4.195 stable release # gpg: Signature made Wed 18 May 2022 03:47:48 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-25Merge tag 'v5.4.194' into v5.4/standard/baseBruce Ashfield
This is the 5.4.194 stable release # gpg: Signature made Sun 15 May 2022 01:55:08 PM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-25Merge tag 'v5.4.193' into v5.4/standard/baseBruce Ashfield
This is the 5.4.193 stable release # gpg: Signature made Thu 12 May 2022 06:24:08 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-25mac80211: fix rx reordering with non explicit / psmp ack policyFelix Fietkau
[ Upstream commit 5e469ed9764d4722c59562da13120bd2dc6834c5 ] When the QoS ack policy was set to non explicit / psmp ack, frames are treated as not being part of a BA session, which causes extra latency on reordering. Fix this by only bypassing reordering for packets with no-ack policy Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220420105038.36443-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.Andrew Lunn
[ Upstream commit fbb3abdf2223cd0dfc07de85fe5a43ba7f435bdf ] It is possible to stack bridges on top of each other. Consider the following which makes use of an Ethernet switch: br1 / \ / \ / \ br0.11 wlan0 | br0 / | \ p1 p2 p3 br0 is offloaded to the switch. Above br0 is a vlan interface, for vlan 11. This vlan interface is then a slave of br1. br1 also has a wireless interface as a slave. This setup trunks wireless lan traffic over the copper network inside a VLAN. A frame received on p1 which is passed up to the bridge has the skb->offload_fwd_mark flag set to true, indicating that the switch has dealt with forwarding the frame out ports p2 and p3 as needed. This flag instructs the software bridge it does not need to pass the frame back down again. However, the flag is not getting reset when the frame is passed upwards. As a result br1 sees the flag, wrongly interprets it, and fails to forward the frame to wlan0. When passing a frame upwards, clear the flag. This is the Rx equivalent of br_switchdev_frame_unmark() in br_dev_xmit(). Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25net: af_key: add check for pfkey_broadcast in function pfkey_processJiasheng Jiang
[ Upstream commit 4dc2a5a8f6754492180741facf2a8787f2c415d7 ] If skb_clone() returns null pointer, pfkey_broadcast() will return error. Therefore, it should be better to check the return value of pfkey_broadcast() and return error if fails. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25NFC: nci: fix sleep in atomic context bugs caused by nci_skb_allocDuoming Zhou
[ Upstream commit 23dd4581350d4ffa23d58976ec46408f8f4c1e16 ] There are sleep in atomic context bugs when the request to secure element of st-nci is timeout. The root cause is that nci_skb_alloc with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is a timer handler. The call paths that could trigger bugs are shown below: (interrupt context 1) st_nci_se_wt_timeout nci_hci_send_event nci_hci_send_data nci_skb_alloc(..., GFP_KERNEL) //may sleep (interrupt context 2) st_nci_se_wt_timeout nci_hci_send_event nci_hci_send_data nci_send_data nci_queue_tx_data_frags nci_skb_alloc(..., GFP_KERNEL) //may sleep This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context. Fixes: ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220517012530.75714-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25net/sched: act_pedit: sanitize shift argument before usagePaolo Abeni
[ Upstream commit 4d42d54a7d6aa6d29221d3fd4f2ae9503e94f011 ] syzbot was able to trigger an Out-of-Bound on the pedit action: UBSAN: shift-out-of-bounds in net/sched/act_pedit.c:238:43 shift exponent 1400735974 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 3606 Comm: syz-executor151 Not tainted 5.18.0-rc5-syzkaller-00165-g810c2f0a3f86 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x50 lib/ubsan.c:151 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x187 lib/ubsan.c:322 tcf_pedit_init.cold+0x1a/0x1f net/sched/act_pedit.c:238 tcf_action_init_1+0x414/0x690 net/sched/act_api.c:1367 tcf_action_init+0x530/0x8d0 net/sched/act_api.c:1432 tcf_action_add+0xf9/0x480 net/sched/act_api.c:1956 tc_ctl_action+0x346/0x470 net/sched/act_api.c:2015 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5993 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fe36e9e1b59 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffef796fe88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe36e9e1b59 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 RBP: 00007fe36e9a5d00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe36e9a5d90 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The 'shift' field is not validated, and any value above 31 will trigger out-of-bounds. The issue predates the git history, but syzbot was able to trigger it only after the commit mentioned in the fixes tag, and this change only applies on top of such commit. Address the issue bounding the 'shift' value to the maximum allowed by the relevant operator. Reported-and-tested-by: syzbot+8ed8fc4c57e9dcf23ca6@syzkaller.appspotmail.com Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()Meena Shanmugam
From: Trond Myklebust <trond.myklebust@hammerspace.com> commit f00432063db1a0db484e85193eccc6845435b80e upstream. We must ensure that all sockets are closed before we call xprt_free() and release the reference to the net namespace. The problem is that calling fput() will defer closing the socket until delayed_fput() gets called. Let's fix the situation by allowing rpciod and the transport teardown code (which runs on the system wq) to call __fput_sync(), and directly close the socket. Reported-by: Felix Fu <foyjog@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()") Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket Cc: stable@vger.kernel.org # 5.1.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> [meenashanmugam: Fix merge conflict in xprt_connect] Signed-off-by: Meena Shanmugam <meenashanmugam@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25SUNRPC: Don't call connect() more than once on a TCP socketMeena Shanmugam
From: Trond Myklebust <trond.myklebust@hammerspace.com> commit 89f42494f92f448747bd8a7ab1ae8b5d5520577d upstream. Avoid socket state races due to repeated calls to ->connect() using the same socket. If connect() returns 0 due to the connection having completed, but we are in fact in a closing state, then we may leave the XPRT_CONNECTING flag set on the transport. Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de> Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> [meenashanmugam: Fix merge conflict in xs_tcp_setup_socket] Signed-off-by: Meena Shanmugam <meenashanmugam@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25SUNRPC: Prevent immediate close+reconnectMeena Shanmugam
From: Trond Myklebust <trond.myklebust@hammerspace.com> commit 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 upstream. If we have already set up the socket and are waiting for it to connect, then don't immediately close and retry. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Meena Shanmugam <meenashanmugam@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25SUNRPC: Clean up scheduling of autocloseMeena Shanmugam
From: Trond Myklebust <trond.myklebust@hammerspace.com> commit e26d9972720e2484f44cdd94ca4e31cc372ed2ed upstream. Consolidate duplicated code in xprt_force_disconnect() and xprt_conditional_disconnect(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Meena Shanmugam <meenashanmugam@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-18ping: fix address binding wrt vrfNicolas Dichtel
commit e1a7ac6f3ba6e157adcd0ca94d92a401f1943f56 upstream. When ping_group_range is updated, 'ping' uses the DGRAM ICMP socket, instead of an IP raw socket. In this case, 'ping' is unable to bind its socket to a local address owned by a vrflite. Before the patch: $ sysctl -w net.ipv4.ping_group_range='0 2147483647' $ ip link add blue type vrf table 10 $ ip link add foo type dummy $ ip link set foo master blue $ ip link set foo up $ ip addr add 192.168.1.1/24 dev foo $ ip addr add 2001::1/64 dev foo $ ip vrf exec blue ping -c1 -I 192.168.1.1 192.168.1.2 ping: bind: Cannot assign requested address $ ip vrf exec blue ping6 -c1 -I 2001::1 2001::2 ping6: bind icmp socket: Cannot assign requested address CC: stable@vger.kernel.org Fixes: 1b69c6d0ae90 ("net: Introduce L3 Master device abstraction") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-18tcp: resalt the secret every 10 secondsEric Dumazet
[ Upstream commit 4dfa9b438ee34caca4e6a4e5e961641807367f6f ] In order to limit the ability for an observer to recognize the source ports sequence used to contact a set of destinations, we should periodically shuffle the secret. 10 seconds looks effective enough without causing particular issues. Cc: Moshe Kol <moshe.kol@mail.huji.ac.il> Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il> Cc: Amit Klein <aksecurity@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pendingGuangguan Wang
[ Upstream commit f3c46e41b32b6266cf60b0985c61748f53bf1c61 ] Non blocking sendmsg will return -EAGAIN when any signal pending and no send space left, while non blocking recvmsg return -EINTR when signal pending and no data received. This may makes confused. As TCP returns -EAGAIN in the conditions described above. Align the behavior of smc with TCP. Fixes: 846e344eb722 ("net/smc: add receive timeout check") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220512030820.73848-1-guangguan.wang@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18net/sched: act_pedit: really ensure the skb is writablePaolo Abeni
[ Upstream commit 8b796475fd7882663a870456466a4fb315cc1bd6 ] Currently pedit tries to ensure that the accessed skb offset is writable via skb_unclone(). The action potentially allows touching any skb bytes, so it may end-up modifying shared data. The above causes some sporadic MPTCP self-test failures, due to this code: tc -n $ns2 filter add dev ns2eth$i egress \ protocol ip prio 1000 \ handle 42 fw \ action pedit munge offset 148 u8 invert \ pipe csum tcp \ index 100 The above modifies a data byte outside the skb head and the skb is a cloned one, carrying a TCP output packet. This change addresses the issue by keeping track of a rough over-estimate highest skb offset accessed by the action and ensuring such offset is really writable. Note that this may cause performance regressions in some scenarios, but hopefully pedit is not in the critical path. Fixes: db2c24175d14 ("act_pedit: access skb->data safely") Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18netlink: do not reset transport header in netlink_recvmsg()Eric Dumazet
[ Upstream commit d5076fe4049cadef1f040eda4aaa001bb5424225 ] netlink_recvmsg() does not need to change transport header. If transport header was needed, it should have been reset by the producer (netlink_dump()), not the consumer(s). The following trace probably happened when multiple threads were using MSG_PEEK. BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg write to 0xffff88811e9f15b2 of 2 bytes by task 32012 on cpu 1: skb_reset_transport_header include/linux/skbuff.h:2760 [inline] netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] __sys_recvfrom+0x204/0x2c0 net/socket.c:2097 __do_sys_recvfrom net/socket.c:2115 [inline] __se_sys_recvfrom net/socket.c:2111 [inline] __x64_sys_recvfrom+0x74/0x90 net/socket.c:2111 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae write to 0xffff88811e9f15b2 of 2 bytes by task 32005 on cpu 0: skb_reset_transport_header include/linux/skbuff.h:2760 [inline] netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978 ____sys_recvmsg+0x162/0x2f0 ___sys_recvmsg net/socket.c:2674 [inline] __sys_recvmsg+0x209/0x3f0 net/socket.c:2704 __do_sys_recvmsg net/socket.c:2714 [inline] __se_sys_recvmsg net/socket.c:2711 [inline] __x64_sys_recvmsg+0x42/0x50 net/socket.c:2711 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff -> 0x0000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 32005 Comm: syz-executor.4 Not tainted 5.18.0-rc1-syzkaller-00328-ge1f700ebd6be-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220505161946.2867638-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18ipv4: drop dst in multicast routing pathLokesh Dhoundiyal
[ Upstream commit 9e6c6d17d1d6a3f1515ce399f9a011629ec79aa0 ] kmemleak reports the following when routing multicast traffic over an ipsec tunnel. Kmemleak output: unreferenced object 0x8000000044bebb00 (size 256): comm "softirq", pid 0, jiffies 4294985356 (age 126.810s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 80 00 00 00 05 13 74 80 ..............t. 80 00 00 00 04 9b bf f9 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f83947e0>] __kmalloc+0x1e8/0x300 [<00000000b7ed8dca>] metadata_dst_alloc+0x24/0x58 [<0000000081d32c20>] __ipgre_rcv+0x100/0x2b8 [<00000000824f6cf1>] gre_rcv+0x178/0x540 [<00000000ccd4e162>] gre_rcv+0x7c/0xd8 [<00000000c024b148>] ip_protocol_deliver_rcu+0x124/0x350 [<000000006a483377>] ip_local_deliver_finish+0x54/0x68 [<00000000d9271b3a>] ip_local_deliver+0x128/0x168 [<00000000bd4968ae>] xfrm_trans_reinject+0xb8/0xf8 [<0000000071672a19>] tasklet_action_common.isra.16+0xc4/0x1b0 [<0000000062e9c336>] __do_softirq+0x1fc/0x3e0 [<00000000013d7914>] irq_exit+0xc4/0xe0 [<00000000a4d73e90>] plat_irq_dispatch+0x7c/0x108 [<000000000751eb8e>] handle_int+0x16c/0x178 [<000000001668023b>] _raw_spin_unlock_irqrestore+0x1c/0x28 The metadata dst is leaked when ip_route_input_mc() updates the dst for the skb. Commit f38a9eb1f77b ("dst: Metadata destinations") correctly handled dropping the dst in ip_route_input_slow() but missed the multicast case which is handled by ip_route_input_mc(). Drop the dst in ip_route_input_mc() avoiding the leak. Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Lokesh Dhoundiyal <lokesh.dhoundiyal@alliedtelesis.co.nz> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220505020017.3111846-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18mac80211: Reset MBSSID parameters upon connectionManikanta Pubbisetty
[ Upstream commit 86af062f40a73bf63321694e6bf637144f0383fe ] Currently MBSSID parameters in struct ieee80211_bss_conf are not reset upon connection. This could be problematic with some drivers in a scenario where the device first connects to a non-transmit BSS and then connects to a transmit BSS of a Multi BSS AP. The MBSSID parameters which are set after connecting to a non-transmit BSS will not be reset and the same parameters will be passed on to the driver during the subsequent connection to a transmit BSS of a Multi BSS AP. For example, firmware running on the ath11k device uses the Multi BSS data for tracking the beacon of a non-transmit BSS and reports the driver when there is a beacon miss. If we do not reset the MBSSID parameters during the subsequent connection to a transmit BSS, then the driver would have wrong MBSSID data and FW would be looking for an incorrect BSSID in the MBSSID beacon of a Multi BSS AP and reports beacon loss leading to an unstable connection. Reset the MBSSID parameters upon every connection to solve this problem. Fixes: 78ac51f81532 ("mac80211: support multi-bssid") Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com> Link: https://lore.kernel.org/r/20220428052744.27040-1-quic_mpubbise@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-18batman-adv: Don't skb_split skbuffs with frag_listSven Eckelmann
[ Upstream commit a063f2fba3fa633a599253b62561051ac185fa99 ] The receiving interface might have used GRO to receive more fragments than MAX_SKB_FRAGS fragments. In this case, these will not be stored in skb_shinfo(skb)->frags but merged into the frag list. batman-adv relies on the function skb_split to split packets up into multiple smaller packets which are not larger than the MTU on the outgoing interface. But this function cannot handle frag_list entries and is only operating on skb_shinfo(skb)->frags. If it is still trying to split such an skb and xmit'ing it on an interface without support for NETIF_F_FRAGLIST, then validate_xmit_skb() will try to linearize it. But this fails due to inconsistent information. And __pskb_pull_tail will trigger a BUG_ON after skb_copy_bits() returns an error. In case of entries in frag_list, just linearize the skb before operating on it with skb_split(). Reported-by: Felix Kaechele <felix@kaechele.ca> Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Felix Kaechele <felix@kaechele.ca> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-15Bluetooth: Fix the creation of hdev->nameItay Iellin
commit 103a2f3255a95991252f8f13375c3a96a75011cd upstream. Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin <ieitayie@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12tcp: make sure treq->af_specific is initializedEric Dumazet
commit ba5a4fdd63ae0c575707030db0b634b160baddd7 upstream. syzbot complained about a recent change in TCP stack, hitting a NULL pointer [1] tcp request sockets have an af_specific pointer, which was used before the blamed change only for SYNACK generation in non SYNCOOKIE mode. tcp requests sockets momentarily created when third packet coming from client in SYNCOOKIE mode were not using treq->af_specific. Make sure this field is populated, in the same way normal TCP requests sockets do in tcp_conn_request(). [1] TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters. general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534 Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48 RSP: 0018:ffffc90000de0588 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100 RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008 RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0 R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267 tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207 cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline] tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486 tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725 ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422 ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519 process_backlog+0x3a0/0x7c0 net/core/dev.c:5847 __napi_poll+0xb3/0x6e0 net/core/dev.c:6413 napi_poll net/core/dev.c:6480 [inline] net_rx_action+0x8ec/0xc60 net/core/dev.c:6567 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097 Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net> [fruggeri: Account for backport conflicts from 35b2c3211609 and 6fc8c827dd4f] Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12net: ipv6: ensure we call ipv6_mc_down() at most oncej.nixdorf@avm.de
commit 9995b408f17ff8c7f11bc725c8aa225ba3a63b1c upstream. There are two reasons for addrconf_notify() to be called with NETDEV_DOWN: either the network device is actually going down, or IPv6 was disabled on the interface. If either of them stays down while the other is toggled, we repeatedly call the code for NETDEV_DOWN, including ipv6_mc_down(), while never calling the corresponding ipv6_mc_up() in between. This will cause a new entry in idev->mc_tomb to be allocated for each multicast group the interface is subscribed to, which in turn leaks one struct ifmcaddr6 per nontrivial multicast group the interface is subscribed to. The following reproducer will leak at least $n objects: ip addr add ff2e::4242/32 dev eth0 autojoin sysctl -w net.ipv6.conf.eth0.disable_ipv6=1 for i in $(seq 1 $n); do ip link set up eth0; ip link set down eth0 done Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2) can also be used to create a nontrivial idev->mc_list, which will the leak objects with the right up-down-sequence. Based on both sources for NETDEV_DOWN events the interface IPv6 state should be considered: - not ready if the network interface is not ready OR IPv6 is disabled for it - ready if the network interface is ready AND IPv6 is enabled for it The functions ipv6_mc_up() and ipv6_down() should only be run when this state changes. Implement this by remembering when the IPv6 state is ready, and only run ipv6_mc_down() if it actually changed from ready to not ready. The other direction (not ready -> ready) already works correctly, as: - the interface notification triggered codepath for NETDEV_UP / NETDEV_CHANGE returns early if ipv6 is disabled, and - the disable_ipv6=0 triggered codepath skips fully initializing the interface as long as addrconf_link_ready(dev) returns false - calling ipv6_mc_up() repeatedly does not leak anything Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled") Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net> [jnixdorf: context updated for bpo to v4.19/v5.4] Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()Eric Dumazet
commit dba5bdd57bea587ea4f0b79b03c71135f84a7e8b upstream. syzbot reported an UAF in ip_mc_sf_allow() [1] Whenever RCU protected list replaces an object, the pointer to the new object needs to be updated _before_ the call to kfree_rcu() or call_rcu() Because kfree_rcu(ptr, rcu) got support for NULL ptr only recently in commit 12edff045bc6 ("rcu: Make kfree_rcu() ignore NULL pointers"), I chose to use the conditional to make sure stable backports won't miss this detail. if (psl) kfree_rcu(psl, rcu); net/ipv6/mcast.c has similar issues, addressed in a separate patch. [1] BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655 Read of size 4 at addr ffff88807d37b904 by task syz-executor.5/908 CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted 5.18.0-rc4-syzkaller-00064-g8f4dd16603ce #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313 print_report mm/kasan/report.c:429 [inline] kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491 ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655 raw_v4_input net/ipv4/raw.c:190 [inline] raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218 ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193 ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519 netif_receive_skb_internal net/core/dev.c:5605 [inline] netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664 tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534 tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985 tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015 call_write_iter include/linux/fs.h:2050 [inline] new_sync_write+0x38a/0x560 fs/read_write.c:504 vfs_write+0x7c0/0xac0 fs/read_write.c:591 ksys_write+0x127/0x250 fs/read_write.c:644 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f3f12c3bbff Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48 RSP: 002b:00007f3f13ea9130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f3f12d9bf60 RCX: 00007f3f12c3bbff RDX: 0000000000000036 RSI: 0000000020002ac0 RDI: 00000000000000c8 RBP: 00007f3f12ce308d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000036 R11: 0000000000000293 R12: 0000000000000000 R13: 00007fffb68dd79f R14: 00007f3f13ea9300 R15: 0000000000022000 </TASK> Allocated by task 908: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:436 [inline] ____kasan_kmalloc mm/kasan/common.c:515 [inline] ____kasan_kmalloc mm/kasan/common.c:474 [inline] __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524 kasan_kmalloc include/linux/kasan.h:234 [inline] __do_kmalloc mm/slab.c:3710 [inline] __kmalloc+0x209/0x4d0 mm/slab.c:3719 kmalloc include/linux/slab.h:586 [inline] sock_kmalloc net/core/sock.c:2501 [inline] sock_kmalloc+0xb5/0x100 net/core/sock.c:2492 ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392 do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline] ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432 raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861 __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 753: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328 kasan_slab_free include/linux/kasan.h:200 [inline] __cache_free mm/slab.c:3439 [inline] kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774 kfree_bulk include/linux/slab.h:437 [inline] kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 Last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348 kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595 ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510 do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline] ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432 raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861 __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348 call_rcu+0x99/0x790 kernel/rcu/tree.c:3074 mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656 notifier_call_chain+0xb5/0x200 kernel/notifier.c:84 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938 call_netdevice_notifiers_extack net/core/dev.c:1976 [inline] call_netdevice_notifiers net/core/dev.c:1990 [inline] unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751 default_device_exit_batch+0x449/0x590 net/core/dev.c:11245 ops_exit_list+0x125/0x170 net/core/net_namespace.c:167 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 The buggy address belongs to the object at ffff88807d37b900 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 4 bytes inside of 64-byte region [ffff88807d37b900, ffff88807d37b940) The buggy address belongs to the physical page: page:ffffea0001f4dec0 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200 raw: ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts 139732238007, free_ts 139730893262 prep_new_page mm/page_alloc.c:2441 [inline] get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408 __alloc_pages_node include/linux/gfp.h:587 [inline] kmem_getpages mm/slab.c:1378 [inline] cache_grow_begin+0x75/0x350 mm/slab.c:2584 cache_alloc_refill+0x27f/0x380 mm/slab.c:2957 ____cache_alloc mm/slab.c:3040 [inline] ____cache_alloc mm/slab.c:3023 [inline] __do_cache_alloc mm/slab.c:3267 [inline] slab_alloc mm/slab.c:3309 [inline] __do_kmalloc mm/slab.c:3708 [inline] __kmalloc+0x3b3/0x4d0 mm/slab.c:3719 kmalloc include/linux/slab.h:586 [inline] kzalloc include/linux/slab.h:714 [inline] tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45 tomoyo_encode2 security/tomoyo/realpath.c:31 [inline] tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80 tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822 security_inode_getattr+0xcf/0x140 security/security.c:1350 vfs_getattr fs/stat.c:157 [inline] vfs_statx+0x16a/0x390 fs/stat.c:232 vfs_fstatat+0x8c/0xb0 fs/stat.c:255 __do_sys_newfstatat+0x91/0x110 fs/stat.c:425 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1356 [inline] free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406 free_unref_page_prepare mm/page_alloc.c:3328 [inline] free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423 __vunmap+0x85d/0xd30 mm/vmalloc.c:2667 __vfree+0x3c/0xd0 mm/vmalloc.c:2715 vfree+0x5a/0x90 mm/vmalloc.c:2746 __do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117 do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline] do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639 nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101 ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026 tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696 __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Memory state around the buggy address: ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc >ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc Fixes: c85bb41e9318 ("igmp: fix ip_mc_sf_allow race [v5]") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Flavio Leitner <fbl@sysclose.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12NFC: netlink: fix sleep in atomic bug when firmware download timeoutDuoming Zhou
commit 4071bf121d59944d5cd2238de0642f3d7995a997 upstream. There are sleep in atomic bug that could cause kernel panic during firmware download process. The root cause is that nlmsg_new with GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer handler. The call trace is shown below: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 Call Trace: kmem_cache_alloc_node __alloc_skb nfc_genl_fw_download_done call_timer_fn __run_timers.part.0 run_timer_softirq __do_softirq ... The nlmsg_new with GFP_KERNEL parameter may sleep during memory allocation process, and the timer handler is run as the result of a "software interrupt" that should not call any other function that could sleep. This patch changes allocation mode of netlink message from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context. Fixes: 9674da8759df ("NFC: Add firmware upload netlink command") Fixes: 9ea7187c53f6 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220504055847.38026-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12nfc: replace improper check device_is_registered() in netlink related functionsDuoming Zhou
commit da5c0f119203ad9728920456a0f52a6d850c01cd upstream. The device_is_registered() in nfc core is used to check whether nfc device is registered in netlink related functions such as nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered() is protected by device_lock, there is still a race condition between device_del() and device_is_registered(). The root cause is that kobject_del() in device_del() is not protected by device_lock. (cleanup task) | (netlink task) | nfc_unregister_device | nfc_fw_download device_del | device_lock ... | if (!device_is_registered)//(1) kobject_del//(2) | ... ... | device_unlock The device_is_registered() returns the value of state_in_sysfs and the state_in_sysfs is set to zero in kobject_del(). If we pass check in position (1), then set zero in position (2). As a result, the check in position (1) is useless. This patch uses bool variable instead of device_is_registered() to judge whether the nfc device is registered, which is well synchronized. Fixes: 3e256b8f8dfa ("NFC: add nfc subsystem core") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-12Revert "SUNRPC: attempt AF_LOCAL connect on setup"Trond Myklebust
commit a3d0562d4dc039bca39445e1cddde7951662e17d upstream. This reverts commit 7073ea8799a8cf73db60270986f14e4aae20fa80. We must not try to connect the socket while the transport is under construction, because the mechanisms to safely tear it down are not in place. As the code stands, we end up leaking the sockets on a connection error. Reported-by: wanghai (M) <wanghai38@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09Merge tag 'v5.4.192' into v5.4/standard/baseBruce Ashfield
This is the 5.4.192 stable release # gpg: Signature made Mon 09 May 2022 03:03:50 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-09Merge tag 'v5.4.191' into v5.4/standard/baseBruce Ashfield
This is the 5.4.191 stable release # gpg: Signature made Wed 27 Apr 2022 07:50:55 AM EDT # gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E # gpg: Can't check signature: No public key
2022-05-09netfilter: nft_socket: only do sk lookups when indev is availableFlorian Westphal
commit 743b83f15d4069ea57c3e40996bf4a1077e0cdc1 upstream. Check if the incoming interface is available and NFT_BREAK in case neither skb->sk nor input device are set. Because nf_sk_lookup_slow*() assume packet headers are in the 'in' direction, use in postrouting is not going to yield a meaningful result. Same is true for the forward chain, so restrict the use to prerouting, input and output. Use in output work if a socket is already attached to the skb. Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching") Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit()Peilin Ye
[ Upstream commit f40c064e933d7787ca7411b699504d7a2664c1f5 ] Do not update tunnel->tun_hlen in data plane code. Use a local variable instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit(). Co-developed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tcp: fix F-RTO may not work correctly when receiving DSACKPengcheng Yang
[ Upstream commit d9157f6806d1499e173770df1f1b234763de5c79 ] Currently DSACK is regarded as a dupack, which may cause F-RTO to incorrectly enter "loss was real" when receiving DSACK. Packetdrill to demonstrate: // Enable F-RTO and TLP 0 `sysctl -q net.ipv4.tcp_frto=2` 0 `sysctl -q net.ipv4.tcp_early_retrans=3` 0 `sysctl -q net.ipv4.tcp_congestion_control=cubic` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // RTT 10ms, RTO 210ms +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 2 data segments +0 write(4, ..., 2000) = 2000 +0 > P. 1:2001(2000) ack 1 // TLP +.022 > P. 1001:2001(1000) ack 1 // Continue to send 8 data segments +0 write(4, ..., 10000) = 10000 +0 > P. 2001:10001(8000) ack 1 // RTO +.188 > . 1:1001(1000) ack 1 // The original data is acked and new data is sent(F-RTO step 2.b) +0 < . 1:1(0) ack 2001 win 257 +0 > P. 10001:12001(2000) ack 1 // D-SACK caused by TLP is regarded as a dupack, this results in // the incorrect judgment of "loss was real"(F-RTO step 3.a) +.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop> // Never-retransmitted data(3001:4001) are acked and // expect to switch to open state(F-RTO step 3.b) +0 < . 1:1(0) ack 4001 win 257 +0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }% Fixes: e33099f96d99 ("tcp: implement RFC5682 F-RTO") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tls: Skip tls_append_frag on zero copy sizeMaxim Mikityanskiy
[ Upstream commit a0df71948e9548de819a6f1da68f5f1742258a52 ] Calling tls_append_frag when max_open_record_len == record->len might add an empty fragment to the TLS record if the call happens to be on the page boundary. Normally tls_append_frag coalesces the zero-sized fragment to the previous one, but not if it's on page boundary. If a resync happens then, the mlx5 driver posts dump WQEs in tx_post_resync_dump, and the empty fragment may become a data segment with byte_count == 0, which will confuse the NIC and lead to a CQE error. This commit fixes the described issue by skipping tls_append_frag on zero size to avoid adding empty fragments. The fix is not in the driver, because an empty fragment is hardly the desired behavior. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWATEric Dumazet
[ Upstream commit 4bfe744ff1644fbc0a991a2677dc874475dd6776 ] I had this bug sitting for too long in my pile, it is time to fix it. Thanks to Doug Porter for reminding me of it! We had various attempts in the past, including commit 0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"), but the issue is that TCP stack currently only generates EPOLLOUT from input path, when tp->snd_una has advanced and skb(s) cleaned from rtx queue. If a flow has a big RTT, and/or receives SACKs, it is possible that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0 and no more data can be sent until tp->snd_una finally advances. What is needed is to also check if POLLOUT needs to be generated whenever tp->snd_nxt is advanced, from output path. This bug triggers more often after an idle period, as we do not receive ACK for at least one RTT. tcp_notsent_lowat could be a fraction of what CWND and pacing rate would allow to send during this RTT. In a followup patch, I will remove the bogus call to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED) from tcp_check_space(). Fact that we have decided to generate an EPOLLOUT does not mean the application has immediately refilled the transmit queue. This optimistic call might have been the reason the bug seemed not too serious. Tested: 200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2] $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat $ cat bench_rr.sh SUM=0 for i in {1..10} do V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"` echo $V SUM=$(($SUM + $V)) done echo SUM=$SUM Before patch: $ bench_rr.sh 130000000 80000000 140000000 140000000 140000000 140000000 130000000 40000000 90000000 110000000 SUM=1140000000 After patch: $ bench_rr.sh 430000000 590000000 530000000 450000000 450000000 350000000 450000000 490000000 480000000 460000000 SUM=4680000000 # This is 410 % of the value before patch. Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Doug Porter <dsp@fb.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09ip_gre: Make o_seqno start from 0 in native modePeilin Ye
[ Upstream commit ff827beb706ed719c766acf36449801ded0c17fc ] For GRE and GRETAP devices, currently o_seqno starts from 1 in native mode. According to RFC 2890 2.2., "The first datagram is sent with a sequence number of 0." Fix it. It is worth mentioning that o_seqno already starts from 0 in collect_md mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to gre_build_header() before getting incremented. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09net/smc: sync err code when tcp connection was refusedliuyacan
[ Upstream commit 4e2e65e2e56c6ceb4ea1719360080c0af083229e ] In the current implementation, when TCP initiates a connection to an unavailable [ip,port], ECONNREFUSED will be stored in the TCP socket, but SMC will not. However, some apps (like curl) use getsockopt(,,SO_ERROR,,) to get the error information, which makes them miss the error message and behave strangely. Fixes: 50717a37db03 ("net/smc: nonblocking connect rework") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09sctp: check asoc strreset_chunk in sctp_generate_reconf_eventXin Long
[ Upstream commit 165e3e17fe8fe6a8aab319bc6e631a2e23b9a857 ] A null pointer reference issue can be triggered when the response of a stream reconf request arrives after the timer is triggered, such as: send Incoming SSN Reset Request ---> CPU0: reconf timer is triggered, go to the handler code before hold sk lock <--- reply with Outgoing SSN Reset Request CPU1: process Outgoing SSN Reset Request, and set asoc->strreset_chunk to NULL CPU0: continue the handler code, hold sk lock, and try to hold asoc->strreset_chunk, crash! In Ying Xu's testing, the call trace is: [ ] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp] [ ] Call Trace: [ ] <IRQ> [ ] sctp_sf_send_reconf+0x2c/0x100 [sctp] [ ] sctp_do_sm+0xa4/0x220 [sctp] [ ] sctp_generate_reconf_event+0xbd/0xe0 [sctp] [ ] call_timer_fn+0x26/0x130 This patch is to fix it by returning from the timer handler if asoc strreset_chunk is already set to NULL. Fixes: 7b9438de0cd4 ("sctp: add stream reconf timer") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tcp: ensure to use the most recently sent skb when filling the rate samplePengcheng Yang
[ Upstream commit b253a0680ceadc5d7b4acca7aa2d870326cad8ad ] If an ACK (s)acks multiple skbs, we favor the information from the most recently sent skb by choosing the skb with the highest prior_delivered count. But in the interval between receiving ACKs, we send multiple skbs with the same prior_delivered, because the tp->delivered only changes when we receive an ACK. We used RACK's solution, copying tcp_rack_sent_after() as tcp_skb_sent_after() helper to determine "which packet was sent last?". Later, we will use tcp_skb_sent_after() instead in RACK. Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Cc: Paolo Abeni <pabeni@redhat.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650422081-22153-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09tcp: md5: incorrect tcp_header_len for incoming connectionsFrancesco Ruggeri
[ Upstream commit 5b0b9e4c2c895227c8852488b3f09839233bba54 ] In tcp_create_openreq_child we adjust tcp_header_len for md5 using the remote address in newsk. But that address is still 0 in newsk at this point, and it is only set later by the callers (tcp_v[46]_syn_recv_sock). Use the address from the request socket instead. Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220421005026.686A45EC01F2@us226.sjc.aristanetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hookEyal Birger
[ Upstream commit b02d196c44ead1a5949729be9ff08fe781c3e48a ] xmit_check_hhlen() observes the dst for getting the device hard header length to make sure a modified packet can fit. When a helper which changes the dst - such as bpf_skb_set_tunnel_key() - is called as part of the xmit program the accessed dst is no longer valid. This leads to the following splat: BUG: kernel NULL pointer dereference, address: 00000000000000de #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 798 Comm: ping Not tainted 5.18.0-rc2+ #103 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 RIP: 0010:bpf_xmit+0xfb/0x17f Code: c6 c0 4d cd 8e 48 c7 c7 7d 33 f0 8e e8 42 09 fb ff 48 8b 45 58 48 8b 95 c8 00 00 00 48 2b 95 c0 00 00 00 48 83 e0 fe 48 8b 00 <0f> b7 80 de 00 00 00 39 c2 73 22 29 d0 b9 20 0a 00 00 31 d2 48 89 RSP: 0018:ffffb148c0bc7b98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000240008 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 00000000ffffffea RDI: 00000000ffffffff RBP: ffff922a828a4e00 R08: ffffffff8f1350e8 R09: 00000000ffffdfff R10: ffffffff8f055100 R11: ffffffff8f105100 R12: 0000000000000000 R13: ffff922a828a4e00 R14: 0000000000000040 R15: 0000000000000000 FS: 00007f414e8f0080(0000) GS:ffff922afdc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000de CR3: 0000000002d80006 CR4: 0000000000370ef0 Call Trace: <TASK> lwtunnel_xmit.cold+0x71/0xc8 ip_finish_output2+0x279/0x520 ? __ip_finish_output.part.0+0x21/0x130 Fix by fetching the device hard header length before running the BPF code. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220420165219.1755407-1-eyal.birger@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09ipvs: correctly print the memory size of ip_vs_conn_tabPengcheng Yang
[ Upstream commit eba1a872cb73314280d5448d934935b23e30b7ca ] The memory size of ip_vs_conn_tab changed after we use hlist instead of list. Fixes: 731109e78415 ("ipvs: use hlist instead of list") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27openvswitch: fix OOB access in reserve_sfa_size()Paolo Valerio
commit cefa91b2332d7009bc0be5d951d6cbbf349f90f8 upstream. Given a sufficiently large number of actions, while copying and reserving memory for a new action of a new flow, if next_offset is greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE bytes increasing actions_len by req_size. This can then lead to an OOB write access, especially when further actions need to be copied. Fix it by rearranging the flow action size check. KASAN splat below: ================================================================== BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch] Write of size 65360 at addr ffff888147e4001c by task handler15/836 CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27 ... Call Trace: <TASK> dump_stack_lvl+0x45/0x5a print_report.cold+0x5e/0x5db ? __lock_text_start+0x8/0x8 ? reserve_sfa_size+0x1ba/0x380 [openvswitch] kasan_report+0xb5/0x130 ? reserve_sfa_size+0x1ba/0x380 [openvswitch] kasan_check_range+0xf5/0x1d0 memcpy+0x39/0x60 reserve_sfa_size+0x1ba/0x380 [openvswitch] __add_action+0x24/0x120 [openvswitch] ovs_nla_add_action+0xe/0x20 [openvswitch] ovs_ct_copy_action+0x29d/0x1130 [openvswitch] ? __kernel_text_address+0xe/0x30 ? unwind_get_return_address+0x56/0xa0 ? create_prof_cpu_mask+0x20/0x20 ? ovs_ct_verify+0xf0/0xf0 [openvswitch] ? prep_compound_page+0x198/0x2a0 ? __kasan_check_byte+0x10/0x40 ? kasan_unpoison+0x40/0x70 ? ksize+0x44/0x60 ? reserve_sfa_size+0x75/0x380 [openvswitch] __ovs_nla_copy_actions+0xc26/0x2070 [openvswitch] ? __zone_watermark_ok+0x420/0x420 ? validate_set.constprop.0+0xc90/0xc90 [openvswitch] ? __alloc_pages+0x1a9/0x3e0 ? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0 ? unwind_next_frame+0x991/0x1e40 ? __mod_node_page_state+0x99/0x120 ? __mod_lruvec_page_state+0x2e3/0x470 ? __kasan_kmalloc_large+0x90/0xe0 ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch] ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch] ... Cc: stable@vger.kernel.org Fixes: f28cd2af22a0 ("openvswitch: fix flow actions reallocation") Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27netlink: reset network and mac headers in netlink_dump()Eric Dumazet
[ Upstream commit 99c07327ae11e24886d552dddbe4537bfca2765d ] netlink_dump() is allocating an skb, reserves space in it but forgets to reset network header. This allows a BPF program, invoked later from sk_filter() to access uninitialized kernel memory from the reserved space. Theorically mac header reset could be omitted, because it is set to a special initial value. bpf_internal_load_pointer_neg_helper calls skb_mac_header() without checking skb_mac_header_was_set(). Relying on skb->len not being too big seems fragile. We also could add a sanity check in bpf_internal_load_pointer_neg_helper() to avoid surprises in the future. syzbot report was: BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637 ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637 __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796 bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline] __bpf_prog_run include/linux/filter.h:626 [inline] bpf_prog_run include/linux/filter.h:633 [inline] __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756 bpf_prog_run_save_cb include/linux/filter.h:770 [inline] sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150 sk_filter include/linux/filter.h:905 [inline] netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_read_iter+0x5a9/0x630 net/socket.c:1039 do_iter_readv_writev+0xa7f/0xc70 do_iter_read+0x52c/0x14c0 fs/read_write.c:786 vfs_readv fs/read_write.c:906 [inline] do_readv+0x432/0x800 fs/read_write.c:943 __do_sys_readv fs/read_write.c:1034 [inline] __se_sys_readv fs/read_write.c:1031 [inline] __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was stored to memory at: ___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558 __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796 bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline] __bpf_prog_run include/linux/filter.h:626 [inline] bpf_prog_run include/linux/filter.h:633 [inline] __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756 bpf_prog_run_save_cb include/linux/filter.h:770 [inline] sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150 sk_filter include/linux/filter.h:905 [inline] netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_read_iter+0x5a9/0x630 net/socket.c:1039 do_iter_readv_writev+0xa7f/0xc70 do_iter_read+0x52c/0x14c0 fs/read_write.c:786 vfs_readv fs/read_write.c:906 [inline] do_readv+0x432/0x800 fs/read_write.c:943 __do_sys_readv fs/read_write.c:1034 [inline] __se_sys_readv fs/read_write.c:1031 [inline] __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3244 [inline] __kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0x545/0xf90 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1158 [inline] netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_read_iter+0x5a9/0x630 net/socket.c:1039 do_iter_readv_writev+0xa7f/0xc70 do_iter_read+0x52c/0x14c0 fs/read_write.c:786 vfs_readv fs/read_write.c:906 [inline] do_readv+0x432/0x800 fs/read_write.c:943 __do_sys_readv fs/read_write.c:1034 [inline] __se_sys_readv fs/read_write.c:1031 [inline] __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x44/0xae CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: db65a3aaf29e ("netlink: Trim skb to alloc size to avoid MSG_TRUNC") Fixes: 9063e21fb026 ("netlink: autosize skb lengthes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using ↵David Ahern
netdev_master_upper_dev_get_rcu [ Upstream commit 83daab06252ee5d0e1f4373ff28b79304945fc19 ] Next patch uses l3mdev_master_upper_ifindex_by_index_rcu which throws a splat with debug kernels: [13783.087570] ------------[ cut here ]------------ [13783.093974] RTNL: assertion failed at net/core/dev.c (6702) [13783.100761] WARNING: CPU: 3 PID: 51132 at net/core/dev.c:6702 netdev_master_upper_dev_get+0x16a/0x1a0 [13783.184226] CPU: 3 PID: 51132 Comm: kworker/3:3 Not tainted 5.17.0-custom-100090-g6f963aafb1cc #682 [13783.194788] Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017 [13783.204755] Workqueue: mld mld_ifc_work [ipv6] [13783.210338] RIP: 0010:netdev_master_upper_dev_get+0x16a/0x1a0 [13783.217209] Code: 0f 85 e3 fe ff ff e8 65 ac ec fe ba 2e 1a 00 00 48 c7 c6 60 6f 38 83 48 c7 c7 c0 70 38 83 c6 05 5e b5 d7 01 01 e8 c6 29 52 00 <0f> 0b e9 b8 fe ff ff e8 5a 6c 35 ff e9 1c ff ff ff 48 89 ef e8 7d [13783.238659] RSP: 0018:ffffc9000b37f5a8 EFLAGS: 00010286 [13783.244995] RAX: 0000000000000000 RBX: ffff88812ee5c000 RCX: 0000000000000000 [13783.253379] RDX: ffff88811ce09d40 RSI: ffffffff812d0fcd RDI: fffff5200166fea7 [13783.261769] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8882375f4287 [13783.270138] R10: ffffed1046ebe850 R11: 0000000000000001 R12: dffffc0000000000 [13783.278510] R13: 0000000000000275 R14: ffffc9000b37f688 R15: ffff8881273b4af8 [13783.286870] FS: 0000000000000000(0000) GS:ffff888237400000(0000) knlGS:0000000000000000 [13783.296352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13783.303177] CR2: 00007ff25fc9b2e8 CR3: 0000000174d23000 CR4: 00000000001006e0 [13783.311546] Call Trace: [13783.314660] <TASK> [13783.317553] l3mdev_master_upper_ifindex_by_index_rcu+0x43/0xe0 ... Change l3mdev_master_upper_ifindex_by_index_rcu to use netdev_master_upper_dev_get_rcu. Fixes: 6a6d6681ac1a ("l3mdev: add function to retreive upper master") Signed-off-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@kernel.org> Cc: Alexis Bauvin <abauvin@scaleway.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27net/sched: cls_u32: fix possible leak in u32_init_knode()Eric Dumazet
[ Upstream commit ec5b0f605b105457f257f2870acad4a5d463984b ] While investigating a related syzbot report, I found that whenever call to tcf_exts_init() from u32_init_knode() is failing, we end up with an elevated refcount on ht->refcnt To avoid that, only increase the refcount after all possible errors have been evaluated. Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27net/packet: fix packet_sock xmit return value checkingHangbin Liu
[ Upstream commit 29e8e659f984be00d75ec5fef4e37c88def72712 ] packet_sock xmit could be dev_queue_xmit, which also returns negative errors. So only checking positive errors is not enough, or userspace sendmsg may return success while packet is not send out. Move the net_xmit_errno() assignment in the braces as checkpatch.pl said do not use assignment in if condition. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27net/smc: Fix sock leak when release after smc_shutdown()Tony Lu
[ Upstream commit 1a74e99323746353bba11562a2f2d0aa8102f402 ] Since commit e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback"), for a fallback connection, __smc_release() does not call sock_put() if its state is already SMC_CLOSED. When calling smc_shutdown() after falling back, its state is set to SMC_CLOSED but does not call sock_put(), so this patch calls it. Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com Fixes: e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27rxrpc: Restore removed timer deletionDavid Howells
[ Upstream commit ee3b0826b4764f6c13ad6db67495c5a1c38e9025 ] A recent patch[1] from Eric Dumazet flipped the order in which the keepalive timer and the keepalive worker were cancelled in order to fix a syzbot reported issue[2]. Unfortunately, this enables the mirror image bug whereby the timer races with rxrpc_exit_net(), restarting the worker after it has been cancelled: CPU 1 CPU 2 =============== ===================== if (rxnet->live) <INTERRUPT> rxnet->live = false; cancel_work_sync(&rxnet->peer_keepalive_work); rxrpc_queue_work(&rxnet->peer_keepalive_work); del_timer_sync(&rxnet->peer_keepalive_timer); Fix this by restoring the removed del_timer_sync() so that we try to remove the timer twice. If the timer runs again, it should see ->live == false and not restart the worker. Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Dumazet <edumazet@google.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1] Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2] Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27tcp: Fix potential use-after-free due to double kfree()Kuniyuki Iwashima
commit c89dffc70b340780e5b933832d8c3e045ef3791e upstream. Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct request_sock and then can allocate inet_rsk(req)->ireq_opt. After that, tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full socket into ehash and sets NULL to ireq_opt. Otherwise, tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full socket. The commit 01770a1661657 ("tcp: fix race condition when creating child sockets from syncookies") added a new path, in which more than one cores create full sockets for the same SYN cookie. Currently, the core which loses the race frees the full socket without resetting inet_opt, resulting in that both sock_put() and reqsk_put() call kfree() for the same memory: sock_put sk_free __sk_free sk_destruct __sk_destruct sk->sk_destruct/inet_sock_destruct kfree(rcu_dereference_protected(inet->inet_opt, 1)); reqsk_put reqsk_free __reqsk_free req->rsk_ops->destructor/tcp_v4_reqsk_destructor kfree(rcu_dereference_protected(inet_rsk(req)->ireq_opt, 1)); Calling kmalloc() between the double kfree() can lead to use-after-free, so this patch fixes it by setting NULL to inet_opt before sock_put(). As a side note, this kind of issue does not happen for IPv6. This is because tcp_v6_syn_recv_sock() clones both ipv6_opt and pktopts which correspond to ireq_opt in IPv4. Fixes: 01770a166165 ("tcp: fix race condition when creating child sockets from syncookies") CC: Ricardo Dias <rdias@singlestore.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210118055920.82516-1-kuniyu@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>