summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2021-10-07netfilter: nf_tables: Fix oversized kvmalloc() callsPablo Neira Ayuso
commit 45928afe94a094bcda9af858b96673d59bc4a0e9 upstream. The commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") limits the max allocatable memory via kvmalloc() to MAX_INT. Reported-by: syzbot+cd43695a64bcd21b8596@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07netfilter: conntrack: serialize hash resizes and cleanupsEric Dumazet
commit e9edc188fc76499b0b9bd60364084037f6d03773 upstream. Syzbot was able to trigger the following warning [1] No repro found by syzbot yet but I was able to trigger similar issue by having 2 scripts running in parallel, changing conntrack hash sizes, and: for j in `seq 1 1000` ; do unshare -n /bin/true >/dev/null ; done It would take more than 5 minutes for net_namespace structures to be cleaned up. This is because nf_ct_iterate_cleanup() has to restart everytime a resize happened. By adding a mutex, we can serialize hash resizes and cleanups and also make get_next_corpse() faster by skipping over empty buckets. Even without resizes in the picture, this patch considerably speeds up network namespace dismantles. [1] INFO: task syz-executor.0:8312 can't die for more than 144 seconds. task:syz-executor.0 state:R running task stack:25672 pid: 8312 ppid: 6573 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:4955 [inline] __schedule+0x940/0x26f0 kernel/sched/core.c:6236 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6408 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35 __local_bh_enable_ip+0x109/0x120 kernel/softirq.c:390 local_bh_enable include/linux/bottom_half.h:32 [inline] get_next_corpse net/netfilter/nf_conntrack_core.c:2252 [inline] nf_ct_iterate_cleanup+0x15a/0x450 net/netfilter/nf_conntrack_core.c:2275 nf_conntrack_cleanup_net_list+0x14c/0x4f0 net/netfilter/nf_conntrack_core.c:2469 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:171 setup_net+0x639/0xa30 net/core/net_namespace.c:349 copy_net_ns+0x319/0x760 net/core/net_namespace.c:470 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:226 ksys_unshare+0x445/0x920 kernel/fork.c:3128 __do_sys_unshare kernel/fork.c:3202 [inline] __se_sys_unshare kernel/fork.c:3200 [inline] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3200 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f63da68e739 RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000 RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80 R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000 Showing all locks held in the system: 1 lock held by khungtaskd/27: #0: ffffffff8b980020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446 2 locks held by kworker/u4:2/153: #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2268 #1: ffffc9000140fdb0 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2272 1 lock held by systemd-udevd/2970: 1 lock held by in:imklog/6258: #0: ffff88807f970ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990 3 locks held by kworker/1:6/8158: 1 lock held by syz-executor.0/8312: 2 locks held by kworker/u4:13/9320: 1 lock held by syz-executor.5/10178: 1 lock held by syz-executor.4/10217: Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07netfilter: ipset: Fix oversized kvmalloc() callsJozsef Kadlecsik
commit 7bbc3d385bd813077acaf0e6fdb2a86a901f5382 upstream. The commit commit 7661809d493b426e979f39ab512e3adf41fbcc69 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 14 09:45:49 2021 -0700 mm: don't allow oversized kvmalloc() calls limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the same limit in ipset. Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07net: udp: annotate data race around udp_sk(sk)->corkflagEric Dumazet
commit a9f5970767d11eadc805d5283f202612c7ba1f59 upstream. up->corkflag field can be read or written without any lock. Annotate accesses to avoid possible syzbot/KCSAN reports. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07af_unix: fix races in sk_peer_pid and sk_peer_cred accessesEric Dumazet
[ Upstream commit 35306eb23814444bd4021f8a1c3047d3cb0c8b2b ] Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred. In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written. Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected. Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07net: sched: flower: protect fl_walk() with rcuVlad Buslov
[ Upstream commit d5ef190693a7d76c5c192d108e8dec48307b46ee ] Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul() also removed rcu protection of individual filters which causes following use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain rcu read lock while iterating and taking the filter reference and temporary release the lock while calling arg->fn() callback that can sleep. KASAN trace: [ 352.773640] ================================================================== [ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower] [ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987 [ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2 [ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 352.781022] Call Trace: [ 352.781573] dump_stack_lvl+0x46/0x5a [ 352.782332] print_address_description.constprop.0+0x1f/0x140 [ 352.783400] ? fl_walk+0x159/0x240 [cls_flower] [ 352.784292] ? fl_walk+0x159/0x240 [cls_flower] [ 352.785138] kasan_report.cold+0x83/0xdf [ 352.785851] ? fl_walk+0x159/0x240 [cls_flower] [ 352.786587] kasan_check_range+0x145/0x1a0 [ 352.787337] fl_walk+0x159/0x240 [cls_flower] [ 352.788163] ? fl_put+0x10/0x10 [cls_flower] [ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.790102] tcf_chain_dump+0x231/0x450 [ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170 [ 352.791833] ? __might_sleep+0x2e/0xc0 [ 352.792594] ? tfilter_notify+0x170/0x170 [ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.794477] tc_dump_tfilter+0x385/0x4b0 [ 352.795262] ? tc_new_tfilter+0x1180/0x1180 [ 352.796103] ? __mod_node_page_state+0x1f/0xc0 [ 352.796974] ? __build_skb_around+0x10e/0x130 [ 352.797826] netlink_dump+0x2c0/0x560 [ 352.798563] ? netlink_getsockopt+0x430/0x430 [ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.800542] __netlink_dump_start+0x356/0x440 [ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550 [ 352.802190] ? tc_new_tfilter+0x1180/0x1180 [ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.803668] ? tc_new_tfilter+0x1180/0x1180 [ 352.804344] ? _copy_from_iter_nocache+0x800/0x800 [ 352.805202] ? kasan_set_track+0x1c/0x30 [ 352.805900] netlink_rcv_skb+0xc6/0x1f0 [ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0 [ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.808324] ? netlink_ack+0x4d0/0x4d0 [ 352.809086] ? netlink_deliver_tap+0x62/0x3d0 [ 352.809951] netlink_unicast+0x353/0x480 [ 352.810744] ? netlink_attachskb+0x430/0x430 [ 352.811586] ? __alloc_skb+0xd7/0x200 [ 352.812349] netlink_sendmsg+0x396/0x680 [ 352.813132] ? netlink_unicast+0x480/0x480 [ 352.813952] ? __import_iovec+0x192/0x210 [ 352.814759] ? netlink_unicast+0x480/0x480 [ 352.815580] sock_sendmsg+0x6c/0x80 [ 352.816299] ____sys_sendmsg+0x3a5/0x3c0 [ 352.817096] ? kernel_sendmsg+0x30/0x30 [ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150 [ 352.818753] ___sys_sendmsg+0xd8/0x140 [ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110 [ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0 [ 352.821110] ? __copy_msghdr_from_user+0x260/0x260 [ 352.821934] ? _raw_spin_lock+0x81/0xd0 [ 352.822680] ? __handle_mm_fault+0xef3/0x1b20 [ 352.823549] ? rb_insert_color+0x2a/0x270 [ 352.824373] ? copy_page_range+0x16b0/0x16b0 [ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0 [ 352.826190] ? __fget_light+0xd9/0xf0 [ 352.826941] __sys_sendmsg+0xb3/0x130 [ 352.827613] ? __sys_sendmsg_sock+0x20/0x20 [ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0 [ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60 [ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160 [ 352.830845] do_syscall_64+0x35/0x80 [ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.832331] RIP: 0033:0x7f7bee973c17 [ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17 [ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003 [ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40 [ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f [ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088 [ 352.843784] Allocated by task 2960: [ 352.844451] kasan_save_stack+0x1b/0x40 [ 352.845173] __kasan_kmalloc+0x7c/0x90 [ 352.845873] fl_change+0x282/0x22db [cls_flower] [ 352.846696] tc_new_tfilter+0x6cf/0x1180 [ 352.847493] rtnetlink_rcv_msg+0x471/0x550 [ 352.848323] netlink_rcv_skb+0xc6/0x1f0 [ 352.849097] netlink_unicast+0x353/0x480 [ 352.849886] netlink_sendmsg+0x396/0x680 [ 352.850678] sock_sendmsg+0x6c/0x80 [ 352.851398] ____sys_sendmsg+0x3a5/0x3c0 [ 352.852202] ___sys_sendmsg+0xd8/0x140 [ 352.852967] __sys_sendmsg+0xb3/0x130 [ 352.853718] do_syscall_64+0x35/0x80 [ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.855830] Freed by task 7: [ 352.856421] kasan_save_stack+0x1b/0x40 [ 352.857139] kasan_set_track+0x1c/0x30 [ 352.857854] kasan_set_free_info+0x20/0x30 [ 352.858609] __kasan_slab_free+0xed/0x130 [ 352.859348] kfree+0xa7/0x3c0 [ 352.859951] process_one_work+0x44d/0x780 [ 352.860685] worker_thread+0x2e2/0x7e0 [ 352.861390] kthread+0x1f4/0x220 [ 352.862022] ret_from_fork+0x1f/0x30 [ 352.862955] Last potentially related work creation: [ 352.863758] kasan_save_stack+0x1b/0x40 [ 352.864378] kasan_record_aux_stack+0xab/0xc0 [ 352.865028] insert_work+0x30/0x160 [ 352.865617] __queue_work+0x351/0x670 [ 352.866261] rcu_work_rcufn+0x30/0x40 [ 352.866917] rcu_core+0x3b2/0xdb0 [ 352.867561] __do_softirq+0xf6/0x386 [ 352.868708] Second to last potentially related work creation: [ 352.869779] kasan_save_stack+0x1b/0x40 [ 352.870560] kasan_record_aux_stack+0xab/0xc0 [ 352.871426] call_rcu+0x5f/0x5c0 [ 352.872108] queue_rcu_work+0x44/0x50 [ 352.872855] __fl_put+0x17c/0x240 [cls_flower] [ 352.873733] fl_delete+0xc7/0x100 [cls_flower] [ 352.874607] tc_del_tfilter+0x510/0xb30 [ 352.886085] rtnetlink_rcv_msg+0x471/0x550 [ 352.886875] netlink_rcv_skb+0xc6/0x1f0 [ 352.887636] netlink_unicast+0x353/0x480 [ 352.888285] netlink_sendmsg+0x396/0x680 [ 352.888942] sock_sendmsg+0x6c/0x80 [ 352.889583] ____sys_sendmsg+0x3a5/0x3c0 [ 352.890311] ___sys_sendmsg+0xd8/0x140 [ 352.891019] __sys_sendmsg+0xb3/0x130 [ 352.891716] do_syscall_64+0x35/0x80 [ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.893666] The buggy address belongs to the object at ffff8881c8251000 which belongs to the cache kmalloc-2k of size 2048 [ 352.895696] The buggy address is located 1152 bytes inside of 2048-byte region [ffff8881c8251000, ffff8881c8251800) [ 352.897640] The buggy address belongs to the page: [ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250 [ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0 [ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00 [ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 352.905861] page dumped because: kasan: bad access detected [ 352.907323] Memory state around the buggy address: [ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.912012] ^ [ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.915185] ================================================================== Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07net: ipv4: Fix rtnexthop len when RTA_FLOW is presentXiao Liang
[ Upstream commit 597aa16c782496bf74c5dc3b45ff472ade6cee64 ] Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct. Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07mptcp: allow changing the 'backup' bit when no sockets are openDavide Caratti
[ Upstream commit 3f4a08909e2c740f8045efc74c4cf82eeaae3e36 ] current Linux refuses to change the 'backup' bit of MPTCP endpoints, i.e. using MPTCP_PM_CMD_SET_FLAGS, unless it finds (at least) one subflow that matches the endpoint address. There is no reason for that, so we can just ignore the return value of mptcp_nl_addr_backup(). In this way, endpoints can reconfigure their 'backup' flag even if no MPTCP sockets are open (or more generally, in case the MP_PRIO message is not sent out). Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07mptcp: don't return sockets in foreign netnsFlorian Westphal
[ Upstream commit ea1300b9df7c8e8b65695a08b8f6aaf4b25fec9c ] mptcp_token_get_sock() may return a mptcp socket that is in a different net namespace than the socket that received the token value. The mptcp syncookie code path had an explicit check for this, this moves the test into mptcp_token_get_sock() function. Eventually token.c should be converted to pernet storage, but such change is not suitable for net tree. Fixes: 2c5ebd001d4f0 ("mptcp: refactor token container") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootbXin Long
[ Upstream commit f7e745f8e94492a8ac0b0a26e25f2b19d342918f ] We should always check if skb_header_pointer's return is NULL before using it, otherwise it may cause null-ptr-deref, as syzbot reported: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline] RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196 Call Trace: <IRQ> sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109 ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422 ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472 dst_input include/net/dst.h:460 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297 Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07mac80211: mesh: fix potentially unaligned accessJohannes Berg
[ Upstream commit b9731062ce8afd35cf723bf3a8ad55d208f915a5 ] The pointer here points directly into the frame, so the access is potentially unaligned. Use get_unaligned_le16 to avoid that. Fixes: 3f52b7e328c5 ("mac80211: mesh power save basics") Link: https://lore.kernel.org/r/20210920154009.3110ff75be0c.Ib6a2ff9e9cc9bc6fca50fce631ec1ce725cc926b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotapLorenzo Bianconi
[ Upstream commit 13cb6d826e0ac0d144b0d48191ff1a111d32f0c6 ] Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap routine in order to fix the following warning reported by syzbot: WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline] WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244 Modules linked in: CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline] RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244 RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216 RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000 RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003 RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100 R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8 R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004 FS: 00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740 netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089 __dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165 __bpf_tx_skb net/core/filter.c:2114 [inline] __bpf_redirect_no_mac net/core/filter.c:2139 [inline] __bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162 ____bpf_clone_redirect net/core/filter.c:2429 [inline] bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401 bpf_prog_eeb6f53a69e5c6a2+0x59/0x234 bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline] __bpf_prog_run include/linux/filter.h:624 [inline] bpf_prog_run include/linux/filter.h:631 [inline] bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119 bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663 bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline] __sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605 __do_sys_bpf kernel/bpf/syscall.c:4691 [inline] __se_sys_bpf kernel/bpf/syscall.c:4689 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com Fixes: 646e76bb5daf4 ("mac80211: parse VHT info in injected frames") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07mac80211: Fix ieee80211_amsdu_aggregate frag_tail bugChih-Kang Chang
[ Upstream commit fe94bac626d9c1c5bc98ab32707be8a9d7f8adba ] In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the end of skb_shinfo(head)->frag_list, and use it to bind other skb in the end of this function. But when execute ieee80211_amsdu_aggregate() ->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of skb_shinfo(head)->frag_list will be changed. However, the ieee80211_amsdu_aggregate() not update frag_tail after call pskb_expand_head(). That will cause the second skb can't bind to the head skb appropriately.So we update the address of frag_tail to fix it. Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support") Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com [reword comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07Revert "mac80211: do not use low data rates for data frames with no ack flag"Felix Fietkau
[ Upstream commit 98d46b021f6ee246c7a73f9d490d4cddb4511a3b ] This reverts commit d333322361e7 ("mac80211: do not use low data rates for data frames with no ack flag"). Returning false early in rate_control_send_low breaks sending broadcast packets, since rate control will not select a rate for it. Before re-introducing a fixed version of this patch, we should probably also make some changes to rate control to be more conservative in selecting rates for no-ack packets and also prevent using probing rates on them, since we won't get any feedback. Fixes: d333322361e7 ("mac80211: do not use low data rates for data frames with no ack flag") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210906083559.9109-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07netfilter: log: work around missing softdep backend moduleFlorian Westphal
[ Upstream commit b53deef054e58fe4f37c66211b8ece9f8fc1aa13 ] iptables/nftables has two types of log modules: 1. backend, e.g. nf_log_syslog, which implement the functionality 2. frontend, e.g. xt_LOG or nft_log, which call the functionality provided by backend based on nf_tables or xtables rule set. Problem is that the request_module() call to load the backed in nf_logger_find_get() might happen with nftables transaction mutex held in case the call path is via nf_tables/nft_compat. This can cause deadlocks (see 'Fixes' tags for details). The chosen solution as to let modprobe deal with this by adding 'pre: ' soft dep tag to xt_LOG (to load the syslog backend) and xt_NFLOG (to load nflog backend). Eric reports that this breaks on systems with older modprobe that doesn't support softdeps. Another, similar issue occurs when someone either insmods xt_(NF)LOG directly or unloads the backend module (possible if no log frontend is in use): because the frontend module is already loaded, modprobe is not invoked again so the softdep isn't evaluated. Add a workaround: If nf_logger_find_get() returns -ENOENT and call is not via nft_compat, load the backend explicitly and try again. Else, let nft_compat ask for deferred request_module via nf_tables infra. Softdeps are kept in-place, so with newer modprobe the dependencies are resolved from userspace. Fixes: cefa31a9d461 ("netfilter: nft_log: perform module load from nf_tables") Fixes: a38b5b56d6f4 ("netfilter: nf_log: add module softdeps") Reported-and-tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07netfilter: nf_tables: unlink table before deleting itFlorian Westphal
[ Upstream commit a499b03bf36b0c2e3b958a381d828678ab0ffc5e ] syzbot reports following UAF: BUG: KASAN: use-after-free in memcmp+0x18f/0x1c0 lib/string.c:955 nla_strcmp+0xf2/0x130 lib/nlattr.c:836 nft_table_lookup.part.0+0x1a2/0x460 net/netfilter/nf_tables_api.c:570 nft_table_lookup net/netfilter/nf_tables_api.c:4064 [inline] nf_tables_getset+0x1b3/0x860 net/netfilter/nf_tables_api.c:4064 nfnetlink_rcv_msg+0x659/0x13f0 net/netfilter/nfnetlink.c:285 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 Problem is that all get operations are lockless, so the commit_mutex held by nft_rcv_nl_event() isn't enough to stop a parallel GET request from doing read-accesses to the table object even after synchronize_rcu(). To avoid this, unlink the table first and store the table objects in on-stack scratch space. Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership") Reported-and-tested-by: syzbot+f31660cf279b0557160c@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07ipvs: check that ip_vs_conn_tab_bits is between 8 and 20Andrea Claudi
[ Upstream commit 69e73dbfda14fbfe748d3812da1244cce2928dcb ] ip_vs_conn_tab_bits may be provided by the user through the conn_tab_bits module parameter. If this value is greater than 31, or less than 0, the shift operator used to derive tab_size causes undefined behaviour. Fix this checking ip_vs_conn_tab_bits value to be in the range specified in ipvs Kconfig. If not, simply use default value. Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07mac80211: fix use-after-free in CCMP/GCMP RXJohannes Berg
commit 94513069eb549737bcfc3d988d6ed4da948a2de8 upstream. When PN checking is done in mac80211, for fragmentation we need to copy the PN to the RX struct so we can later use it to do a comparison, since commit bf30ca922a0c ("mac80211: check defrag PN against current frame"). Unfortunately, in that commit I used the 'hdr' variable without it being necessarily valid, so use-after-free could occur if it was necessary to reallocate (parts of) the frame. Fix this by reloading the variable after the code that results in the reallocations, if any. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401. Cc: stable@vger.kernel.org Fixes: bf30ca922a0c ("mac80211: check defrag PN against current frame") Link: https://lore.kernel.org/r/20210927115838.12b9ac6bb233.I1d066acd5408a662c3b6e828122cd314fcb28cdb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-30ipv6: delay fib6_sernum increase in fib6_addzhang kai
[ Upstream commit e87b5052271e39d62337ade531992b7e5d8c2cfa ] only increase fib6_sernum in net namespace after add fib6_info successfully. Signed-off-by: zhang kai <zhangkaiheb@126.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30nexthop: Fix memory leaks in nexthop notification chain listenersIdo Schimmel
[ Upstream commit 3106a0847525befe3e22fc723909d1b21eb0d520 ] syzkaller discovered memory leaks [1] that can be reduced to the following commands: # ip nexthop add id 1 blackhole # devlink dev reload pci/0000:06:00.0 As part of the reload flow, mlxsw will unregister its netdevs and then unregister from the nexthop notification chain. Before unregistering from the notification chain, mlxsw will receive delete notifications for nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw will not receive notifications for nexthops using netdevs that are not dismantled as part of the reload flow. For example, the blackhole nexthop above that internally uses the loopback netdev as its nexthop device. One way to fix this problem is to have listeners flush their nexthop tables after unregistering from the notification chain. This is error-prone as evident by this patch and also not symmetric with the registration path where a listener receives a dump of all the existing nexthops. Therefore, fix this problem by replaying delete notifications for the listener being unregistered. This is symmetric to the registration path and also consistent with the netdev notification chain. The above means that unregister_nexthop_notifier(), like register_nexthop_notifier(), will have to take RTNL in order to iterate over the existing nexthops and that any callers of the function cannot hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN driver. To avoid a deadlock, change the latter to unregister its nexthop listener without holding RTNL, making it symmetric to the registration path. [1] unreferenced object 0xffff88806173d600 (size 512): comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s) hex dump (first 32 bytes): 41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff A..`......sa.... 08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00 ..sa............ backtrace: [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522 [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline] [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline] [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231 [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline] [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239 [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83 [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline] [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306 [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244 [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline] [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline] [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913 [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572 [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504 [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590 [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340 [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929 [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline] [<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline] [<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409 [<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463 [<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492 [<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline] [<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline] [<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499 Fixes: 2a014b200bbd ("mlxsw: spectrum_router: Add support for nexthop objects") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30mptcp: ensure tx skbs always have the MPTCP extPaolo Abeni
[ Upstream commit 977d293e23b48a1129830d7968605f61c4af71a0 ] Due to signed/unsigned comparison, the expression: info->size_goal - skb->len > 0 evaluates to true when the size goal is smaller than the skb size. That results in lack of tx cache refill, so that the skb allocated by the core TCP code lacks the required MPTCP skb extensions. Due to the above, syzbot is able to trigger the following WARN_ON(): WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366 Modules linked in: CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366 Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9 RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216 RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000 RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003 RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280 R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0 FS: 00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547 mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003 release_sock+0xb4/0x1b0 net/core/sock.c:3206 sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145 mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749 inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_write_iter+0x2a0/0x3e0 net/socket.c:1057 call_write_iter include/linux/fs.h:2163 [inline] new_sync_write+0x40b/0x640 fs/read_write.c:507 vfs_write+0x7cf/0xae0 fs/read_write.c:594 ksys_write+0x1ee/0x250 fs/read_write.c:647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9 RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038 R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000 Fix the issue rewriting the relevant expression to avoid sign-related problems - note: size_goal is always >= 0. Additionally, ensure that the skb in the tx cache always carries the relevant extension. Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com Fixes: 1094c6fe7280 ("mptcp: fix possible divide by zero") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net: dsa: don't allocate the slave_mii_bus using devresVladimir Oltean
[ Upstream commit 5135e96a3dd2f4555ae6981c3155a62bcf3227f6 ] The Linux device model permits both the ->shutdown and ->remove driver methods to get called during a shutdown procedure. Example: a DSA switch which sits on an SPI bus, and the SPI bus driver calls this on its ->shutdown method: spi_unregister_controller -> device_for_each_child(&ctlr->dev, NULL, __unregister); -> spi_unregister_device(to_spi_device(dev)); -> device_del(&spi->dev); So this is a simple pattern which can theoretically appear on any bus, although the only other buses on which I've been able to find it are I2C: i2c_del_adapter -> device_for_each_child(&adap->dev, NULL, __unregister_client); -> i2c_unregister_device(client); -> device_unregister(&client->dev); The implication of this pattern is that devices on these buses can be unregistered after having been shut down. The drivers for these devices might choose to return early either from ->remove or ->shutdown if the other callback has already run once, and they might choose that the ->shutdown method should only perform a subset of the teardown done by ->remove (to avoid unnecessary delays when rebooting). So in other words, the device driver may choose on ->remove to not do anything (therefore to not unregister an MDIO bus it has registered on ->probe), because this ->remove is actually triggered by the device_shutdown path, and its ->shutdown method has already run and done the minimally required cleanup. This used to be fine until the blamed commit, but now, the following BUG_ON triggers: void mdiobus_free(struct mii_bus *bus) { /* For compatibility with error handling in drivers. */ if (bus->state == MDIOBUS_ALLOCATED) { kfree(bus); return; } BUG_ON(bus->state != MDIOBUS_UNREGISTERED); bus->state = MDIOBUS_RELEASED; put_device(&bus->dev); } In other words, there is an attempt to free an MDIO bus which was not unregistered. The attempt to free it comes from the devres release callbacks of the SPI device, which are executed after the device is unregistered. I'm not saying that the fact that MDIO buses allocated using devres would automatically get unregistered wasn't strange. I'm just saying that the commit didn't care about auditing existing call paths in the kernel, and now, the following code sequences are potentially buggy: (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device located on a bus that unregisters its children on shutdown. After the blamed patch, either both the alloc and the register should use devres, or none should. (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no mdiobus_unregister at all in the remove path. After the blamed patch, nobody unregisters the MDIO bus anymore, so this is even more buggy than the previous case which needs a specific bus configuration to be seen, this one is an unconditional bug. In this case, DSA falls into category (a), it tries to be helpful and registers an MDIO bus on behalf of the switch, which might be on such a bus. I've no idea why it does it under devres. It does this on probe: if (!ds->slave_mii_bus && ds->ops->phy_read) alloc and register mdio bus and this on remove: if (ds->slave_mii_bus && ds->ops->phy_read) unregister mdio bus I _could_ imagine using devres because the condition used on remove is different than the condition used on probe. So strictly speaking, DSA cannot determine whether the ds->slave_mii_bus it sees on remove is the ds->slave_mii_bus that _it_ has allocated on probe. Using devres would have solved that problem. But nonetheless, the existing code already proceeds to unregister the MDIO bus, even though it might be unregistering an MDIO bus it has never registered. So I can only guess that no driver that implements ds->ops->phy_read also allocates and registers ds->slave_mii_bus itself. So in that case, if unregistering is fine, freeing must be fine too. Stop using devres and free the MDIO bus manually. This will make devres stop attempting to free a still registered MDIO bus on ->shutdown. Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()") Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net: dsa: fix dsa_tree_setup error pathVladimir Oltean
[ Upstream commit e5845aa0eadda3d8a950eb8845c1396827131f30 ] Since the blamed commit, dsa_tree_teardown_switches() was split into two smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports. However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports. Fixes: a57d8c217aad ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net/smc: fix 'workqueue leaked lock' in smc_conn_abort_workKarsten Graul
[ Upstream commit a18cee4791b1123d0a6579a7c89f4b87e48abe03 ] The abort_work is scheduled when a connection was detected to be out-of-sync after a link failure. The work calls smc_conn_kill(), which calls smc_close_active_abort() and that might end up calling smc_close_cancel_work(). smc_close_cancel_work() cancels any pending close_work and tx_work but needs to release the sock_lock before and acquires the sock_lock again afterwards. So when the sock_lock was NOT acquired before then it may be held after the abort_work completes. Thats why the sock_lock is acquired before the call to smc_conn_kill() in __smc_lgr_terminate(), but this is missing in smc_conn_abort_work(). Fix that by acquiring the sock_lock first and release it after the call to smc_conn_kill(). Fixes: b286a0651e44 ("net/smc: handle incoming CDC validation message") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net/smc: add missing error check in smc_clc_prfx_set()Karsten Graul
[ Upstream commit 6c90731980655280ea07ce4b21eb97457bf86286 ] Coverity stumbled over a missing error check in smc_clc_prfx_set(): *** CID 1475954: Error handling issues (CHECKED_RETURN) /net/smc/smc_clc.c: 233 in smc_clc_prfx_set() >>> CID 1475954: Error handling issues (CHECKED_RETURN) >>> Calling "kernel_getsockname" without checking return value (as is done elsewhere 8 out of 10 times). 233 kernel_getsockname(clcsock, (struct sockaddr *)&addrs); Add the return code check in smc_clc_prfx_set(). Fixes: c246d942eabc ("net/smc: restructure netinfo for CLC proposal msgs") Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30napi: fix race inside napi_enableXuan Zhuo
[ Upstream commit 3765996e4f0b8a755cab215a08df744490c76052 ] The process will cause napi.state to contain NAPI_STATE_SCHED and not in the poll_list, which will cause napi_disable() to get stuck. The prefix "NAPI_STATE_" is removed in the figure below, and NAPI_STATE_HASHED is ignored in napi.state. CPU0 | CPU1 | napi.state =============================================================================== napi_disable() | | SCHED | NPSVC napi_enable() | | { | | smp_mb__before_atomic(); | | clear_bit(SCHED, &n->state); | | NPSVC | napi_schedule_prep() | SCHED | NPSVC | napi_poll() | | napi_complete_done() | | { | | if (n->state & (NPSVC | | (1) | _BUSY_POLL))) | | return false; | | ................ | | } | SCHED | NPSVC | | clear_bit(NPSVC, &n->state); | | SCHED } | | | | napi_schedule_prep() | | SCHED | MISSED (2) (1) Here return direct. Because of NAPI_STATE_NPSVC exists. (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list Since NAPI_STATE_SCHED already exists and napi is not in the sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always exist. 1. This will cause this queue to no longer receive packets. 2. If you encounter napi_disable under the protection of rtnl_lock, it will cause the entire rtnl_lock to be locked, affecting the overall system. This patch uses cmpxchg to implement napi_enable(), which ensures that there will be no race due to the separation of clear two bits. Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30net: dsa: tear down devlink port regions when tearing down the devlink port ↵Vladimir Oltean
on error [ Upstream commit fd292c189a979838622d5e03e15fa688c81dd50b ] Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal") decided it was fine to ignore errors on certain ports that fail to probe, and go on with the ports that do probe fine. Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port") noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get called, and devlink notices after a timeout of 3600 seconds and prints a WARN_ON. So it went ahead to unregister the devlink port. And because there exists an UNUSED port flavour, we actually re-register the devlink port as UNUSED. Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA") added devlink port regions, which are set up by the driver and not by DSA. When we trigger the devlink port deregistration and reregistration as unused, devlink now prints another WARN_ON, from here: devlink_port_unregister: WARN_ON(!list_empty(&devlink_port->region_list)); So the port still has regions, which makes sense, because they were set up by the driver, and the driver doesn't know we're unregistering the devlink port. Somebody needs to tear them down, and optionally (actually it would be nice, to be consistent) set them up again for the new devlink port. But DSA's layering stays in our way quite badly here. The options I've considered are: 1. Introduce a function in devlink to just change a port's type and flavour. No dice, devlink keeps a lot of state, it really wants the port to not be registered when you set its parameters, so changing anything can only be done by destroying what we currently have and recreating it. 2. Make DSA cache the parameters passed to dsa_devlink_port_region_create, and the region returned, keep those in a list, then when the devlink port unregister needs to take place, the existing devlink regions are destroyed by DSA, and we replay the creation of new regions using the cached parameters. Problem: mv88e6xxx keeps the region pointers in chip->ports[port].region, and these will remain stale after DSA frees them. There are many things DSA can do, but updating mv88e6xxx's private pointers is not one of them. 3. Just let the driver do it (i.e. introduce a very specific method called ds->ops->port_reinit_as_unused, which unregisters its devlink port devlink regions, then the old devlink port, then registers the new one, then the devlink port regions for it). While it does work, as opposed to the others, it's pretty horrible from an API perspective and we can do better. 4. Introduce a new pair of methods, ->port_setup and ->port_teardown, which in the case of mv88e6xxx must register and unregister the devlink port regions. Call these 2 methods when the port must be reinitialized as unused. Naturally, I went for the 4th approach. Fixes: 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30nexthop: Fix division by zero while replacing a resilient groupIdo Schimmel
commit 563f23b002534176f49524b5ca0e1d94d8906c40 upstream. The resilient nexthop group torture tests in fib_nexthop.sh exposed a possible division by zero while replacing a resilient group [1]. The division by zero occurs when the data path sees a resilient nexthop group with zero buckets. The tests replace a resilient nexthop group in a loop while traffic is forwarded through it. The tests do not specify the number of buckets while performing the replacement, resulting in the kernel allocating a stub resilient table (i.e, 'struct nh_res_table') with zero buckets. This table should never be visible to the data path, but the old nexthop group (i.e., 'oldg') might still be used by the data path when the stub table is assigned to it. Fix this by only assigning the stub table to the old nexthop group after making sure the group is no longer used by the data path. Tested with fib_nexthops.sh: Tests passed: 222 Tests failed: 0 [1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:nexthop_select_path+0x2d2/0x1a80 [...] Call Trace: fib_select_multipath+0x79b/0x1530 fib_select_path+0x8fb/0x1c10 ip_route_output_key_hash_rcu+0x1198/0x2da0 ip_route_output_key_hash+0x190/0x340 ip_route_output_flow+0x21/0x120 raw_sendmsg+0x91d/0x2e10 inet_sendmsg+0x9e/0xe0 __sys_sendto+0x23d/0x360 __x64_sys_sendto+0xe1/0x1b0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: stable@vger.kernel.org Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26SUNRPC: don't pause on incomplete allocationNeilBrown
[ Upstream commit e38b3f20059426a0adbde014ff71071739ab5226 ] alloc_pages_bulk_array() attempts to allocate at least one page based on the provided pages, and then opportunistically allocates more if that can be done without dropping the spinlock. So if it returns fewer than requested, that could just mean that it needed to drop the lock. In that case, try again immediately. Only pause for a time if no progress could be made. Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com> Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org> Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator") Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Mel Gorman <mgorman@suse.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-269p/trans_virtio: Remove sysfs file on probe failureXie Yongji
commit f997ea3b7afc108eb9761f321b57de2d089c7c48 upstream. This ensures we don't leak the sysfs file if we failed to allocate chan->vc_wq during probe. Link: http://lkml.kernel.org/r/20210517083557.172-1-xieyongji@bytedance.com Fixes: 86c8437383ac ("net/9p: Add sysfs mount_tag file for virtio 9P device") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22ip6_gre: Revert "ip6_gre: add validation for csum_start"Willem de Bruijn
[ Upstream commit fe63339ef36bfb1cc12962015a9b254170eea057 ] This reverts commit 9cf448c200ba9935baa94e7a0964598ce947db9d. This commit was added for equivalence with a similar fix to ip_gre. That fix proved to have a bug. Upon closer inspection, ip6_gre is not susceptible to the original bug. So revert the unnecessary extra check. In short, ipgre_xmit calls skb_pull to remove ipv4 headers previously inserted by dev_hard_header. ip6gre_tunnel_xmit does not. Link: https://lore.kernel.org/netdev/CA+FuTSe+vJgTVLc9SojGuN-f9YQ+xWLPKE_S4f=f+w+_P2hgUg@mail.gmail.com/#t Fixes: 9cf448c200ba ("ip6_gre: add validation for csum_start") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22ip_gre: validate csum_start only on pullWillem de Bruijn
[ Upstream commit 8a0ed250f911da31a2aef52101bc707846a800ff ] The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengthen the test. Simplify, by moving the check next to the offending pull, making it more self documenting and removing an unnecessary branch from other code paths. Strengthen, by also ensuring that the transport header is correct and therefore the inner headers will be after skb_reset_inner_headers. The transport header is set to csum_start in skb_partial_csum_set. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22fq_codel: reject silly quantum parametersEric Dumazet
[ Upstream commit c7c5e6ff533fe1f9afef7d2fa46678987a1335a7 ] syzbot found that forcing a big quantum attribute would crash hosts fast, essentially using this: tc qd replace dev eth0 root fq_codel quantum 4294967295 This is because fq_codel_dequeue() would have to loop ~2^31 times in : if (flow->deficit <= 0) { flow->deficit += q->quantum; list_move_tail(&flow->flowchain, &q->old_flows); goto begin; } SFQ max quantum is 2^19 (half a megabyte) Lets adopt a max quantum of one megabyte for FQ_CODEL. Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22netfilter: socket: icmp6: fix use-after-scopeBenjamin Hesmans
[ Upstream commit 730affed24bffcd1eebd5903171960f5ff9f1f22 ] Bug reported by KASAN: BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40) Call Trace: (...) inet6_ehashfn (net/ipv6/inet6_hashtables.c:40) (...) nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91 net/ipv6/netfilter/nf_socket_ipv6.c:146) It seems that this bug has already been fixed by Eric Dumazet in the past in: commit 78296c97ca1f ("netfilter: xt_socket: fix a stack corruption bug") But a variant of the same issue has been introduced in commit d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match") `daddr` and `saddr` potentially hold a reference to ipv6_var that is no longer in scope when the call to `nf_socket_get_sock_v6` is made. Fixes: d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match") Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22mptcp: Only send extra TCP acks in eligible socket statesMat Martineau
[ Upstream commit 340fa6667a696338e707cd5531a9631093d1be29 ] Recent changes exposed a bug where specifically-timed requests to the path manager netlink API could trigger a divide-by-zero in __tcp_select_window(), as syzkaller does: divide error: 0000 [#1] SMP KASAN NOPTI CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016 Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea RSP: 0018:ffff888031ccf020 EFLAGS: 00010216 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000 RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002 RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000 R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000 R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: tcp_select_window net/ipv4/tcp_output.c:264 [inline] __tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351 __tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972 __tcp_send_ack net/ipv4/tcp_output.c:3978 [inline] tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978 mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654 mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58 mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328 mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359 genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792 netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0x14e/0x190 net/socket.c:724 ____sys_sendmsg+0x709/0x870 net/socket.c:2403 ___sys_sendmsg+0xff/0x170 net/socket.c:2457 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the first subflow in the MPTCP socket's connection list without validating that the subflow was in a suitable connection state. To address this, always validate subflow state when sending extra ACKs on subflows for address advertisement or subflow priority change. Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229 Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22mptcp: fix possible divide by zeroPaolo Abeni
[ Upstream commit 1094c6fe7280e17e0e87934add5ad2585e990def ] Florian noted that if mptcp_alloc_tx_skb() allocation fails in __mptcp_push_pending(), we can end-up invoking mptcp_push_release()/tcp_push() with a zero mss, causing a divide by 0 error. This change addresses the issue refactoring the skb allocation code checking if skb collapsing will happen for sure and doing the skb allocation only after such check. Skb allocation will now happen only after the call to tcp_send_mss() which correctly initializes mss_now. As side bonuses we now fill the skb tx cache only when needed, and this also clean-up a bit the output path. v1 -> v2: - use lockdep_assert_held_once() - Jakub - fix indentation - Jakub Reported-by: Florian Westphal <fw@strlen.de> Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22net: dsa: tag_rtl4_a: Fix egress tagsLinus Walleij
[ Upstream commit 0e90dfa7a8d817db755c7b5d89d77b9c485e4180 ] I noticed that only port 0 worked on the RTL8366RB since we started to use custom tags. It turns out that the format of egress custom tags is actually different from ingress custom tags. While the lower bits just contain the port number in ingress tags, egress tags need to indicate destination port by setting the bit for the corresponding port. It was working on port 0 because port 0 added 0x00 as port number in the lower bits, and if you do this the packet appears at all ports, including the intended port. Ooops. Fix this and all ports work again. Use the define for shifting the "type A" into place while we're at it. Tested on the D-Link DIR-685 by sending traffic to each of the ports in turn. It works. Fixes: 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags") Cc: DENG Qingfang <dqfext@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutexPavel Skripkin
[ Upstream commit e3245a7b7b34bd2e97f744fd79463add6e9d41f4 ] Syzbot hit use-after-free in nf_tables_dump_sets. The problem was in missing lock protection for nft_ct_pcpu_template_refcnt. Before commit f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") all transactions were serialized by global mutex, but then global mutex was changed to local per netnamespace commit_mutex. This change causes use-after-free bug, when 2 netnamespaces concurently changing nft_ct_pcpu_template_refcnt without proper locking. Fix it by adding nft_ct_pcpu_mutex and protect all nft_ct_pcpu_template_refcnt changes with it. Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-and-tested-by: syzbot+649e339fa6658ee623d3@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6Ryoga Saito
[ Upstream commit 9aca491e0dccf8a9d84a5b478e5eee3c6ea7803b ] This patch fixes kernel NULL pointer dereference when creating nexthop which is bound with SRv6 decapsulation. In the creation of nexthop, __seg6_end_dt_vrf_build is called. __seg6_end_dt_vrf_build expects fc_lninfo in fib6_config is set correctly, but it isn't set in nh_create_ipv6, which causes kernel crash. Here is steps to reproduce kernel crash: 1. modprobe vrf 2. ip -6 nexthop add encap seg6local action End.DT4 vrftable 1 dev eth0 We got the following message: [ 901.370336] BUG: kernel NULL pointer dereference, address: 0000000000000ba0 [ 901.371658] #PF: supervisor read access in kernel mode [ 901.372672] #PF: error_code(0x0000) - not-present page [ 901.373672] PGD 0 P4D 0 [ 901.374248] Oops: 0000 [#1] SMP PTI [ 901.374944] CPU: 0 PID: 8593 Comm: ip Not tainted 5.14-051400-generic #202108310811-Ubuntu [ 901.376404] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014 [ 901.377907] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf] [ 901.379182] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00 [ 901.382652] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282 [ 901.383746] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8 [ 901.385436] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000 [ 901.386924] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40 [ 901.388537] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000 [ 901.389937] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b [ 901.391226] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000 [ 901.392737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 901.393803] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0 [ 901.395122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 901.396496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 901.397833] PKRU: 55555554 [ 901.398578] Call Trace: [ 901.399144] l3mdev_ifindex_lookup_by_table_id+0x3b/0x70 [ 901.400179] __seg6_end_dt_vrf_build+0x34/0xd0 [ 901.401067] seg6_end_dt4_build+0x16/0x20 [ 901.401904] seg6_local_build_state+0x271/0x430 [ 901.402797] lwtunnel_build_state+0x81/0x130 [ 901.403645] fib_nh_common_init+0x82/0x100 [ 901.404465] ? sock_def_readable+0x4b/0x80 [ 901.405285] fib6_nh_init+0x115/0x7c0 [ 901.406033] nh_create_ipv6.isra.0+0xe1/0x140 [ 901.406932] rtm_new_nexthop+0x3b7/0xeb0 [ 901.407828] rtnetlink_rcv_msg+0x152/0x3a0 [ 901.408663] ? rtnl_calcit.isra.0+0x130/0x130 [ 901.409535] netlink_rcv_skb+0x55/0x100 [ 901.410319] rtnetlink_rcv+0x15/0x20 [ 901.411026] netlink_unicast+0x1a8/0x250 [ 901.411813] netlink_sendmsg+0x238/0x470 [ 901.412602] ? _copy_from_user+0x2b/0x60 [ 901.413394] sock_sendmsg+0x65/0x70 [ 901.414112] ____sys_sendmsg+0x218/0x290 [ 901.414929] ? copy_msghdr_from_user+0x5c/0x90 [ 901.415814] ___sys_sendmsg+0x81/0xc0 [ 901.416559] ? fsnotify_destroy_marks+0x27/0xf0 [ 901.417447] ? call_rcu+0xa4/0x230 [ 901.418153] ? kmem_cache_free+0x23f/0x410 [ 901.418972] ? dentry_free+0x37/0x70 [ 901.419705] ? mntput_no_expire+0x4c/0x260 [ 901.420574] __sys_sendmsg+0x62/0xb0 [ 901.421297] __x64_sys_sendmsg+0x1f/0x30 [ 901.422057] do_syscall_64+0x5c/0xc0 [ 901.422756] ? syscall_exit_to_user_mode+0x27/0x50 [ 901.423675] ? __x64_sys_close+0x12/0x40 [ 901.424462] ? do_syscall_64+0x69/0xc0 [ 901.425219] ? irqentry_exit_to_user_mode+0x9/0x20 [ 901.426149] ? irqentry_exit+0x19/0x30 [ 901.426901] ? exc_page_fault+0x89/0x160 [ 901.427709] ? asm_exc_page_fault+0x8/0x30 [ 901.428536] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 901.429514] RIP: 0033:0x7fe493945747 [ 901.430248] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 901.433549] RSP: 002b:00007ffe9932cf68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 901.434981] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe493945747 [ 901.436303] RDX: 0000000000000000 RSI: 00007ffe9932cfe0 RDI: 0000000000000003 [ 901.437607] RBP: 00000000613053f7 R08: 0000000000000001 R09: 00007ffe9932d07c [ 901.438990] R10: 000055f4a903a010 R11: 0000000000000246 R12: 0000000000000001 [ 901.440340] R13: 0000000000000001 R14: 000055f4a802b163 R15: 000055f4a8042020 [ 901.441630] Modules linked in: vrf nls_utf8 isofs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit rapl input_leds joydev serio_raw qemu_fw_cfg mac_hid sch_fq_codel drm virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd virtio_net net_failover cryptd psmouse virtio_blk failover i2c_piix4 pata_acpi floppy [ 901.450808] CR2: 0000000000000ba0 [ 901.451514] ---[ end trace c27b934b99ade304 ]--- [ 901.452403] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf] [ 901.453626] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00 [ 901.456910] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282 [ 901.457912] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8 [ 901.459238] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000 [ 901.460552] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40 [ 901.461882] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000 [ 901.463208] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b [ 901.464529] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000 [ 901.466058] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 901.467189] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0 [ 901.468515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 901.469858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 901.471139] PKRU: 55555554 Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22net: dsa: flush switchdev workqueue before tearing down CPU/DSA portsVladimir Oltean
[ Upstream commit a57d8c217aadac75530b8e7ffb3a3e1b7bfd0330 ] Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error messages appear: mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2 (and similarly for other ports) What happens is that DSA has a policy "even if there are bugs, let's at least not leak memory" and dsa_port_teardown() clears the dp->fdbs and dp->mdbs lists, which are supposed to be empty. But deleting that cleanup code, the warnings go away. => the FDB and MDB lists (used for refcounting on shared ports, aka CPU and DSA ports) will eventually be empty, but are not empty by the time we tear down those ports. Aka we are deleting them too soon. The addresses that DSA complains about are host-trapped addresses: the local addresses of the ports, and the MAC address of the bridge device. The problem is that offloading those entries happens from a deferred work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this races with the teardown of the CPU and DSA ports where the refcounting is kept. In fact, not only it races, but fundamentally speaking, if we iterate through the port list linearly, we might end up tearing down the shared ports even before we delete a DSA user port which has a bridge upper. So as it turns out, we need to first tear down the user ports (and the unused ones, for no better place of doing that), then the shared ports (the CPU and DSA ports). In between, we need to ensure that all work items scheduled by our switchdev handlers (which only run for user ports, hence the reason why we tear them down first) have finished. Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22udp_tunnel: Fix udp_tunnel_nic work-queue typeAya Levin
commit e50e711351bdc656a8e6ca1022b4293cae8dcd59 upstream. Turn udp_tunnel_nic work-queue to an ordered work-queue. This queue holds the UDP-tunnel configuration commands of the different netdevs. When the netdevs are functions of the same NIC the order of execution may be crucial. Problem example: NIC with 2 PFs, both PFs declare offload quota of up to 3 UDP-ports. $ifconfig eth2 1.1.1.1/16 up $ip link add eth2_19503 type vxlan id 5049 remote 1.1.1.2 dev eth2 dstport 19053 $ip link set dev eth2_19503 up $ip link add eth2_19504 type vxlan id 5049 remote 1.1.1.3 dev eth2 dstport 19054 $ip link set dev eth2_19504 up $ip link add eth2_19505 type vxlan id 5049 remote 1.1.1.4 dev eth2 dstport 19055 $ip link set dev eth2_19505 up $ip link add eth2_19506 type vxlan id 5049 remote 1.1.1.5 dev eth2 dstport 19056 $ip link set dev eth2_19506 up NIC RX port offload infrastructure offloads the first 3 UDP-ports (on all devices which sets NETIF_F_RX_UDP_TUNNEL_PORT feature) and not UDP-port 19056. So both PFs gets this offload configuration. $ip link set dev eth2_19504 down This triggers udp-tunnel-core to remove the UDP-port 19504 from offload-ports-list and offload UDP-port 19056 instead. In this scenario it is important that the UDP-port of 19504 will be removed from both PFs before trying to add UDP-port 19056. The NIC can stop offloading a UDP-port only when all references are removed. Otherwise the NIC may report exceeding of the offload quota. Fixes: cc4e3835eff4 ("udp_tunnel: add central NIC RX port offload infrastructure") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()zhenggy
commit 4f884f3962767877d7aabbc1ec124d2c307a4257 upstream. Commit 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") may directly retrans a multiple segments TSO/GSO packet without split, Since this commit, we can no longer assume that a retransmitted packet is a single segment. This patch fixes the tp->undo_retrans accounting in tcp_sacktag_one() that use the actual segments(pcount) of the retransmitted packet. Before that commit (10d3be569243), the assumption underlying the tp->undo_retrans-- seems correct. Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: zhenggy <zhenggy@chinatelecom.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setupVladimir Oltean
commit 6a52e73368038f47f6618623d75061dc263b26ae upstream. DSA supports connecting to a phy-handle, and has a fallback to a non-OF based method of connecting to an internal PHY on the switch's own MDIO bus, if no phy-handle and no fixed-link nodes were present. The -ENODEV error code from the first attempt (phylink_of_phy_connect) is what triggers the second attempt (phylink_connect_phy). However, when the first attempt returns a different error code than -ENODEV, this results in an unbalance of calls to phylink_create and phylink_destroy by the time we exit the function. The phylink instance has leaked. There are many other error codes that can be returned by phylink_of_phy_connect. For example, phylink_validate returns -EINVAL. So this is a practical issue too. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net/af_unix: fix a data-race in unix_dgram_pollEric Dumazet
commit 04f08eb44b5011493d77b602fdec29ff0f5c6cd5 upstream. syzbot reported another data-race in af_unix [1] Lets change __skb_insert() to use WRITE_ONCE() when changing skb head qlen. Also, change unix_dgram_poll() to use lockless version of unix_recvq_full() It is verry possible we can switch all/most unix_recvq_full() to the lockless version, this will be done in a future kernel version. [1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1 BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0: __skb_insert include/linux/skbuff.h:1938 [inline] __skb_queue_before include/linux/skbuff.h:2043 [inline] __skb_queue_tail include/linux/skbuff.h:2076 [inline] skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264 unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2532 __do_sys_sendmmsg net/socket.c:2561 [inline] __se_sys_sendmmsg net/socket.c:2558 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1: skb_queue_len include/linux/skbuff.h:1869 [inline] unix_recvq_full net/unix/af_unix.c:194 [inline] unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777 sock_poll+0x23e/0x260 net/socket.c:1288 vfs_poll include/linux/poll.h:90 [inline] ep_item_poll fs/eventpoll.c:846 [inline] ep_send_events fs/eventpoll.c:1683 [inline] ep_poll fs/eventpoll.c:1798 [inline] do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226 __do_sys_epoll_wait fs/eventpoll.c:2238 [inline] __se_sys_epoll_wait fs/eventpoll.c:2233 [inline] __x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000001b -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 86b18aaa2b5b ("skbuff: fix a data race in skb_queue_len()") Cc: Qian Cai <cai@lca.pw> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22tipc: increase timeout in tipc_sk_enqueue()Hoang Le
commit f4bb62e64c88c93060c051195d3bbba804e56945 upstream. In tipc_sk_enqueue() we use hardcoded 2 jiffies to extract socket buffer from generic queue to particular socket. The 2 jiffies is too short in case there are other high priority tasks get CPU cycles for multiple jiffies update. As result, no buffer could be enqueued to particular socket. To solve this, we switch to use constant timeout 20msecs. Then, the function will be expired between 2 jiffies (CONFIG_100HZ) and 20 jiffies (CONFIG_1000HZ). Fixes: c637c1035534 ("tipc: resolve race problem at unicast message reception") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net/l2tp: Fix reference count leak in l2tp_udp_recv_coreXiyu Yang
commit 9b6ff7eb666415e1558f1ba8a742f5db6a9954de upstream. The reference count leak issue may take place in an error handling path. If both conditions of tunnel->version == L2TP_HDR_VER_3 and the return value of l2tp_v3_ensure_opt_in_linear is nonzero, the function would directly jump to label invalid, without decrementing the reference count of the l2tp_session object session increased earlier by l2tp_tunnel_get_session(). This may result in refcount leaks. Fix this issue by decrease the reference count before jumping to the label invalid. Fixes: 4522a70db7aa ("l2tp: fix reading optional fields of L2TPv3") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22dccp: don't duplicate ccid when cloning dccp sockLin, Zhenpeng
commit d9ea761fdd197351890418acd462c51f241014a7 upstream. Commit 2677d2067731 ("dccp: don't free ccid2_hc_tx_sock ...") fixed a UAF but reintroduced CVE-2017-6074. When the sock is cloned, two dccps_hc_tx_ccid will reference to the same ccid. So one can free the ccid object twice from two socks after cloning. This issue was found by "Hadar Manor" as well and assigned with CVE-2020-16119, which was fixed in Ubuntu's kernel. So here I port the patch from Ubuntu to fix it. The patch prevents cloned socks from referencing the same ccid. Fixes: 2677d2067731410 ("dccp: don't free ccid2_hc_tx_sock ...") Signed-off-by: Zhenpeng Lin <zplin@psu.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net-caif: avoid user-triggerable WARN_ON(1)Eric Dumazet
commit 550ac9c1aaaaf51fd42e20d461f0b1cdbd55b3d2 upstream. syszbot triggers this warning, which looks something we can easily prevent. If we initialize priv->list_field in chnl_net_init(), then always use list_del_init(), we can remove robust_list_del() completely. WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 robust_list_del net/caif/chnl_net.c:67 [inline] WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375 Modules linked in: CPU: 0 PID: 3233 Comm: syz-executor.3 Not tainted 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:robust_list_del net/caif/chnl_net.c:67 [inline] RIP: 0010:chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375 Code: 89 eb e8 3a a3 ba f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 bf 01 00 00 48 81 fb 00 14 4e 8d 48 8b 2b 75 d0 e8 17 a3 ba f8 <0f> 0b 5b 5d 41 5c 41 5d e9 0a a3 ba f8 4c 89 e3 e8 02 a3 ba f8 4c RSP: 0018:ffffc90009067248 EFLAGS: 00010202 RAX: 0000000000008780 RBX: ffffffff8d4e1400 RCX: ffffc9000fd34000 RDX: 0000000000040000 RSI: ffffffff88bb6e49 RDI: 0000000000000003 RBP: ffff88802cd9ee08 R08: 0000000000000000 R09: ffffffff8d0e6647 R10: ffffffff88bb6dc2 R11: 0000000000000000 R12: ffff88803791ae08 R13: dffffc0000000000 R14: 00000000e600ffce R15: ffff888073ed3480 FS: 00007fed10fa0700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2c322000 CR3: 00000000164a6000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: register_netdevice+0xadf/0x1500 net/core/dev.c:10347 ipcaif_newlink+0x4c/0x260 net/caif/chnl_net.c:468 __rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 __sys_sendto+0x21c/0x320 net/socket.c:2036 __do_sys_sendto net/socket.c:2048 [inline] __se_sys_sendto net/socket.c:2044 [inline] __x64_sys_sendto+0xdd/0x1b0 net/socket.c:2044 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: cc36a070b590 ("net-caif: add CAIF netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22net: remove the unnecessary check in cipso_v4_doi_free王贇
commit 9756e44fd4d283ebcc94df353642f322428b73de upstream. The commit 733c99ee8be9 ("net: fix NULL pointer reference in cipso_v4_doi_free") was merged by a mistake, this patch try to cleanup the mess. And we already have the commit e842cb60e8ac ("net: fix NULL pointer reference in cipso_v4_doi_free") which fixed the root cause of the issue mentioned in it's description. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22ethtool: Fix rxnfc copy to user buffer overflowSaeed Mahameed
commit 9b29a161ef38040f000dcf9ccf78e34495edfd55 upstream. In the cited commit, copy_to_user() got called with the wrong pointer, instead of passing the actual buffer ptr to copy from, a pointer to the pointer got passed, which causes a buffer overflow calltrace to pop up when executing "ethtool -x ethX". Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed to the function, instead of a pointer to it. This fixes below call trace: [ 15.533533] ------------[ cut here ]------------ [ 15.539007] Buffer overflow detected (8 < 192)! [ 15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20 [ 15.549308] Modules linked in: [ 15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058 [ 15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 15.558378] RIP: 0010:copy_overflow+0x15/0x20 [ 15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 [ 15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286 [ 15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000 [ 15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff [ 15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff [ 15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000 [ 15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000 [ 15.573584] FS: 00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000 [ 15.575639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0 [ 15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.582441] Call Trace: [ 15.582970] ethtool_rxnfc_copy_to_user+0x30/0x46 [ 15.583815] ethtool_get_rxnfc.cold+0x23/0x2b [ 15.584584] dev_ethtool+0x29c/0x25f0 [ 15.585286] ? security_netlbl_sid_to_secattr+0x77/0xd0 [ 15.586728] ? do_set_pte+0xc4/0x110 [ 15.587349] ? _raw_spin_unlock+0x18/0x30 [ 15.588118] ? __might_sleep+0x49/0x80 [ 15.588956] dev_ioctl+0x2c1/0x490 [ 15.589616] sock_ioctl+0x18e/0x330 [ 15.591143] __x64_sys_ioctl+0x41c/0x990 [ 15.591823] ? irqentry_exit_to_user_mode+0x9/0x20 [ 15.592657] ? irqentry_exit+0x33/0x40 [ 15.593308] ? exc_page_fault+0x32f/0x770 [ 15.593877] ? exit_to_user_mode_prepare+0x3c/0x130 [ 15.594775] do_syscall_64+0x35/0x80 [ 15.595397] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.596037] RIP: 0033:0x7f5381226d4b [ 15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48 [ 15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b [ 15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003 [ 15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001 [ 15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000 [ 15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000 [ 15.605042] ---[ end trace 325cf185e2795048 ]--- Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling") Reported-by: Shannon Nelson <snelson@pensando.io> CC: Arnd Bergmann <arnd@arndb.de> CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Tested-by: Shannon Nelson <snelson@pensando.io> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>