Age | Commit message (Collapse) | Author |
|
commit 45928afe94a094bcda9af858b96673d59bc4a0e9 upstream.
The commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
limits the max allocatable memory via kvmalloc() to MAX_INT.
Reported-by: syzbot+cd43695a64bcd21b8596@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e9edc188fc76499b0b9bd60364084037f6d03773 upstream.
Syzbot was able to trigger the following warning [1]
No repro found by syzbot yet but I was able to trigger similar issue
by having 2 scripts running in parallel, changing conntrack hash sizes,
and:
for j in `seq 1 1000` ; do unshare -n /bin/true >/dev/null ; done
It would take more than 5 minutes for net_namespace structures
to be cleaned up.
This is because nf_ct_iterate_cleanup() has to restart everytime
a resize happened.
By adding a mutex, we can serialize hash resizes and cleanups
and also make get_next_corpse() faster by skipping over empty
buckets.
Even without resizes in the picture, this patch considerably
speeds up network namespace dismantles.
[1]
INFO: task syz-executor.0:8312 can't die for more than 144 seconds.
task:syz-executor.0 state:R running task stack:25672 pid: 8312 ppid: 6573 flags:0x00004006
Call Trace:
context_switch kernel/sched/core.c:4955 [inline]
__schedule+0x940/0x26f0 kernel/sched/core.c:6236
preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6408
preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
__local_bh_enable_ip+0x109/0x120 kernel/softirq.c:390
local_bh_enable include/linux/bottom_half.h:32 [inline]
get_next_corpse net/netfilter/nf_conntrack_core.c:2252 [inline]
nf_ct_iterate_cleanup+0x15a/0x450 net/netfilter/nf_conntrack_core.c:2275
nf_conntrack_cleanup_net_list+0x14c/0x4f0 net/netfilter/nf_conntrack_core.c:2469
ops_exit_list+0x10d/0x160 net/core/net_namespace.c:171
setup_net+0x639/0xa30 net/core/net_namespace.c:349
copy_net_ns+0x319/0x760 net/core/net_namespace.c:470
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:226
ksys_unshare+0x445/0x920 kernel/fork.c:3128
__do_sys_unshare kernel/fork.c:3202 [inline]
__se_sys_unshare kernel/fork.c:3200 [inline]
__x64_sys_unshare+0x2d/0x40 kernel/fork.c:3200
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f63da68e739
RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000
RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80
R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000
Showing all locks held in the system:
1 lock held by khungtaskd/27:
#0: ffffffff8b980020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
2 locks held by kworker/u4:2/153:
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2268
#1: ffffc9000140fdb0 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2272
1 lock held by systemd-udevd/2970:
1 lock held by in:imklog/6258:
#0: ffff88807f970ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990
3 locks held by kworker/1:6/8158:
1 lock held by syz-executor.0/8312:
2 locks held by kworker/u4:13/9320:
1 lock held by syz-executor.5/10178:
1 lock held by syz-executor.4/10217:
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7bbc3d385bd813077acaf0e6fdb2a86a901f5382 upstream.
The commit
commit 7661809d493b426e979f39ab512e3adf41fbcc69
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed Jul 14 09:45:49 2021 -0700
mm: don't allow oversized kvmalloc() calls
limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the
same limit in ipset.
Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com
Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com
Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a9f5970767d11eadc805d5283f202612c7ba1f59 upstream.
up->corkflag field can be read or written without any lock.
Annotate accesses to avoid possible syzbot/KCSAN reports.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 35306eb23814444bd4021f8a1c3047d3cb0c8b2b ]
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.
Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.
Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d5ef190693a7d76c5c192d108e8dec48307b46ee ]
Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
also removed rcu protection of individual filters which causes following
use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
rcu read lock while iterating and taking the filter reference and temporary
release the lock while calling arg->fn() callback that can sleep.
KASAN trace:
[ 352.773640] ==================================================================
[ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
[ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987
[ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2
[ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 352.781022] Call Trace:
[ 352.781573] dump_stack_lvl+0x46/0x5a
[ 352.782332] print_address_description.constprop.0+0x1f/0x140
[ 352.783400] ? fl_walk+0x159/0x240 [cls_flower]
[ 352.784292] ? fl_walk+0x159/0x240 [cls_flower]
[ 352.785138] kasan_report.cold+0x83/0xdf
[ 352.785851] ? fl_walk+0x159/0x240 [cls_flower]
[ 352.786587] kasan_check_range+0x145/0x1a0
[ 352.787337] fl_walk+0x159/0x240 [cls_flower]
[ 352.788163] ? fl_put+0x10/0x10 [cls_flower]
[ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[ 352.790102] tcf_chain_dump+0x231/0x450
[ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170
[ 352.791833] ? __might_sleep+0x2e/0xc0
[ 352.792594] ? tfilter_notify+0x170/0x170
[ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[ 352.794477] tc_dump_tfilter+0x385/0x4b0
[ 352.795262] ? tc_new_tfilter+0x1180/0x1180
[ 352.796103] ? __mod_node_page_state+0x1f/0xc0
[ 352.796974] ? __build_skb_around+0x10e/0x130
[ 352.797826] netlink_dump+0x2c0/0x560
[ 352.798563] ? netlink_getsockopt+0x430/0x430
[ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
[ 352.800542] __netlink_dump_start+0x356/0x440
[ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550
[ 352.802190] ? tc_new_tfilter+0x1180/0x1180
[ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0
[ 352.803668] ? tc_new_tfilter+0x1180/0x1180
[ 352.804344] ? _copy_from_iter_nocache+0x800/0x800
[ 352.805202] ? kasan_set_track+0x1c/0x30
[ 352.805900] netlink_rcv_skb+0xc6/0x1f0
[ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0
[ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0
[ 352.808324] ? netlink_ack+0x4d0/0x4d0
[ 352.809086] ? netlink_deliver_tap+0x62/0x3d0
[ 352.809951] netlink_unicast+0x353/0x480
[ 352.810744] ? netlink_attachskb+0x430/0x430
[ 352.811586] ? __alloc_skb+0xd7/0x200
[ 352.812349] netlink_sendmsg+0x396/0x680
[ 352.813132] ? netlink_unicast+0x480/0x480
[ 352.813952] ? __import_iovec+0x192/0x210
[ 352.814759] ? netlink_unicast+0x480/0x480
[ 352.815580] sock_sendmsg+0x6c/0x80
[ 352.816299] ____sys_sendmsg+0x3a5/0x3c0
[ 352.817096] ? kernel_sendmsg+0x30/0x30
[ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150
[ 352.818753] ___sys_sendmsg+0xd8/0x140
[ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110
[ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0
[ 352.821110] ? __copy_msghdr_from_user+0x260/0x260
[ 352.821934] ? _raw_spin_lock+0x81/0xd0
[ 352.822680] ? __handle_mm_fault+0xef3/0x1b20
[ 352.823549] ? rb_insert_color+0x2a/0x270
[ 352.824373] ? copy_page_range+0x16b0/0x16b0
[ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0
[ 352.826190] ? __fget_light+0xd9/0xf0
[ 352.826941] __sys_sendmsg+0xb3/0x130
[ 352.827613] ? __sys_sendmsg_sock+0x20/0x20
[ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0
[ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60
[ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160
[ 352.830845] do_syscall_64+0x35/0x80
[ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 352.832331] RIP: 0033:0x7f7bee973c17
[ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17
[ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003
[ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40
[ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f
[ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088
[ 352.843784] Allocated by task 2960:
[ 352.844451] kasan_save_stack+0x1b/0x40
[ 352.845173] __kasan_kmalloc+0x7c/0x90
[ 352.845873] fl_change+0x282/0x22db [cls_flower]
[ 352.846696] tc_new_tfilter+0x6cf/0x1180
[ 352.847493] rtnetlink_rcv_msg+0x471/0x550
[ 352.848323] netlink_rcv_skb+0xc6/0x1f0
[ 352.849097] netlink_unicast+0x353/0x480
[ 352.849886] netlink_sendmsg+0x396/0x680
[ 352.850678] sock_sendmsg+0x6c/0x80
[ 352.851398] ____sys_sendmsg+0x3a5/0x3c0
[ 352.852202] ___sys_sendmsg+0xd8/0x140
[ 352.852967] __sys_sendmsg+0xb3/0x130
[ 352.853718] do_syscall_64+0x35/0x80
[ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 352.855830] Freed by task 7:
[ 352.856421] kasan_save_stack+0x1b/0x40
[ 352.857139] kasan_set_track+0x1c/0x30
[ 352.857854] kasan_set_free_info+0x20/0x30
[ 352.858609] __kasan_slab_free+0xed/0x130
[ 352.859348] kfree+0xa7/0x3c0
[ 352.859951] process_one_work+0x44d/0x780
[ 352.860685] worker_thread+0x2e2/0x7e0
[ 352.861390] kthread+0x1f4/0x220
[ 352.862022] ret_from_fork+0x1f/0x30
[ 352.862955] Last potentially related work creation:
[ 352.863758] kasan_save_stack+0x1b/0x40
[ 352.864378] kasan_record_aux_stack+0xab/0xc0
[ 352.865028] insert_work+0x30/0x160
[ 352.865617] __queue_work+0x351/0x670
[ 352.866261] rcu_work_rcufn+0x30/0x40
[ 352.866917] rcu_core+0x3b2/0xdb0
[ 352.867561] __do_softirq+0xf6/0x386
[ 352.868708] Second to last potentially related work creation:
[ 352.869779] kasan_save_stack+0x1b/0x40
[ 352.870560] kasan_record_aux_stack+0xab/0xc0
[ 352.871426] call_rcu+0x5f/0x5c0
[ 352.872108] queue_rcu_work+0x44/0x50
[ 352.872855] __fl_put+0x17c/0x240 [cls_flower]
[ 352.873733] fl_delete+0xc7/0x100 [cls_flower]
[ 352.874607] tc_del_tfilter+0x510/0xb30
[ 352.886085] rtnetlink_rcv_msg+0x471/0x550
[ 352.886875] netlink_rcv_skb+0xc6/0x1f0
[ 352.887636] netlink_unicast+0x353/0x480
[ 352.888285] netlink_sendmsg+0x396/0x680
[ 352.888942] sock_sendmsg+0x6c/0x80
[ 352.889583] ____sys_sendmsg+0x3a5/0x3c0
[ 352.890311] ___sys_sendmsg+0xd8/0x140
[ 352.891019] __sys_sendmsg+0xb3/0x130
[ 352.891716] do_syscall_64+0x35/0x80
[ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 352.893666] The buggy address belongs to the object at ffff8881c8251000
which belongs to the cache kmalloc-2k of size 2048
[ 352.895696] The buggy address is located 1152 bytes inside of
2048-byte region [ffff8881c8251000, ffff8881c8251800)
[ 352.897640] The buggy address belongs to the page:
[ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250
[ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0
[ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00
[ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[ 352.905861] page dumped because: kasan: bad access detected
[ 352.907323] Memory state around the buggy address:
[ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 352.912012] ^
[ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 352.915185] ==================================================================
Fixes: d39d714969cd ("idr: introduce idr_for_each_entry_continue_ul()")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 597aa16c782496bf74c5dc3b45ff472ade6cee64 ]
Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop()
to get the length of rtnexthop correct.
Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info")
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3f4a08909e2c740f8045efc74c4cf82eeaae3e36 ]
current Linux refuses to change the 'backup' bit of MPTCP endpoints, i.e.
using MPTCP_PM_CMD_SET_FLAGS, unless it finds (at least) one subflow that
matches the endpoint address. There is no reason for that, so we can just
ignore the return value of mptcp_nl_addr_backup(). In this way, endpoints
can reconfigure their 'backup' flag even if no MPTCP sockets are open (or
more generally, in case the MP_PRIO message is not sent out).
Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ea1300b9df7c8e8b65695a08b8f6aaf4b25fec9c ]
mptcp_token_get_sock() may return a mptcp socket that is in
a different net namespace than the socket that received the token value.
The mptcp syncookie code path had an explicit check for this,
this moves the test into mptcp_token_get_sock() function.
Eventually token.c should be converted to pernet storage, but
such change is not suitable for net tree.
Fixes: 2c5ebd001d4f0 ("mptcp: refactor token container")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f7e745f8e94492a8ac0b0a26e25f2b19d342918f ]
We should always check if skb_header_pointer's return is NULL before
using it, otherwise it may cause null-ptr-deref, as syzbot reported:
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline]
RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196
Call Trace:
<IRQ>
sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109
ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422
ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472
dst_input include/net/dst.h:460 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297
Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize")
Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b9731062ce8afd35cf723bf3a8ad55d208f915a5 ]
The pointer here points directly into the frame, so the
access is potentially unaligned. Use get_unaligned_le16
to avoid that.
Fixes: 3f52b7e328c5 ("mac80211: mesh power save basics")
Link: https://lore.kernel.org/r/20210920154009.3110ff75be0c.Ib6a2ff9e9cc9bc6fca50fce631ec1ce725cc926b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 13cb6d826e0ac0d144b0d48191ff1a111d32f0c6 ]
Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap
routine in order to fix the following warning reported by syzbot:
WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
Modules linked in:
CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216
RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000
RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100
R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8
R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004
FS: 00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740
netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089
__dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165
__bpf_tx_skb net/core/filter.c:2114 [inline]
__bpf_redirect_no_mac net/core/filter.c:2139 [inline]
__bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162
____bpf_clone_redirect net/core/filter.c:2429 [inline]
bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401
bpf_prog_eeb6f53a69e5c6a2+0x59/0x234
bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline]
__bpf_prog_run include/linux/filter.h:624 [inline]
bpf_prog_run include/linux/filter.h:631 [inline]
bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119
bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663
bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline]
__sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605
__do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
__se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
__x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com
Fixes: 646e76bb5daf4 ("mac80211: parse VHT info in injected frames")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fe94bac626d9c1c5bc98ab32707be8a9d7f8adba ]
In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the
end of skb_shinfo(head)->frag_list, and use it to bind other skb in
the end of this function. But when execute ieee80211_amsdu_aggregate()
->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of
skb_shinfo(head)->frag_list will be changed. However, the
ieee80211_amsdu_aggregate() not update frag_tail after call
pskb_expand_head(). That will cause the second skb can't bind to the
head skb appropriately.So we update the address of frag_tail to fix it.
Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com
[reword comment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 98d46b021f6ee246c7a73f9d490d4cddb4511a3b ]
This reverts commit d333322361e7 ("mac80211: do not use low data rates for
data frames with no ack flag").
Returning false early in rate_control_send_low breaks sending broadcast
packets, since rate control will not select a rate for it.
Before re-introducing a fixed version of this patch, we should probably also
make some changes to rate control to be more conservative in selecting rates
for no-ack packets and also prevent using probing rates on them, since we won't
get any feedback.
Fixes: d333322361e7 ("mac80211: do not use low data rates for data frames with no ack flag")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210906083559.9109-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b53deef054e58fe4f37c66211b8ece9f8fc1aa13 ]
iptables/nftables has two types of log modules:
1. backend, e.g. nf_log_syslog, which implement the functionality
2. frontend, e.g. xt_LOG or nft_log, which call the functionality
provided by backend based on nf_tables or xtables rule set.
Problem is that the request_module() call to load the backed in
nf_logger_find_get() might happen with nftables transaction mutex held
in case the call path is via nf_tables/nft_compat.
This can cause deadlocks (see 'Fixes' tags for details).
The chosen solution as to let modprobe deal with this by adding 'pre: '
soft dep tag to xt_LOG (to load the syslog backend) and xt_NFLOG (to
load nflog backend).
Eric reports that this breaks on systems with older modprobe that
doesn't support softdeps.
Another, similar issue occurs when someone either insmods xt_(NF)LOG
directly or unloads the backend module (possible if no log frontend
is in use): because the frontend module is already loaded, modprobe is
not invoked again so the softdep isn't evaluated.
Add a workaround: If nf_logger_find_get() returns -ENOENT and call
is not via nft_compat, load the backend explicitly and try again.
Else, let nft_compat ask for deferred request_module via nf_tables
infra.
Softdeps are kept in-place, so with newer modprobe the dependencies
are resolved from userspace.
Fixes: cefa31a9d461 ("netfilter: nft_log: perform module load from nf_tables")
Fixes: a38b5b56d6f4 ("netfilter: nf_log: add module softdeps")
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a499b03bf36b0c2e3b958a381d828678ab0ffc5e ]
syzbot reports following UAF:
BUG: KASAN: use-after-free in memcmp+0x18f/0x1c0 lib/string.c:955
nla_strcmp+0xf2/0x130 lib/nlattr.c:836
nft_table_lookup.part.0+0x1a2/0x460 net/netfilter/nf_tables_api.c:570
nft_table_lookup net/netfilter/nf_tables_api.c:4064 [inline]
nf_tables_getset+0x1b3/0x860 net/netfilter/nf_tables_api.c:4064
nfnetlink_rcv_msg+0x659/0x13f0 net/netfilter/nfnetlink.c:285
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
Problem is that all get operations are lockless, so the commit_mutex
held by nft_rcv_nl_event() isn't enough to stop a parallel GET request
from doing read-accesses to the table object even after synchronize_rcu().
To avoid this, unlink the table first and store the table objects in
on-stack scratch space.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Reported-and-tested-by: syzbot+f31660cf279b0557160c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 69e73dbfda14fbfe748d3812da1244cce2928dcb ]
ip_vs_conn_tab_bits may be provided by the user through the
conn_tab_bits module parameter. If this value is greater than 31, or
less than 0, the shift operator used to derive tab_size causes undefined
behaviour.
Fix this checking ip_vs_conn_tab_bits value to be in the range specified
in ipvs Kconfig. If not, simply use default value.
Fixes: 6f7edb4881bf ("IPVS: Allow boot time change of hash size")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 94513069eb549737bcfc3d988d6ed4da948a2de8 upstream.
When PN checking is done in mac80211, for fragmentation we need
to copy the PN to the RX struct so we can later use it to do a
comparison, since commit bf30ca922a0c ("mac80211: check defrag
PN against current frame").
Unfortunately, in that commit I used the 'hdr' variable without
it being necessarily valid, so use-after-free could occur if it
was necessary to reallocate (parts of) the frame.
Fix this by reloading the variable after the code that results
in the reallocations, if any.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401.
Cc: stable@vger.kernel.org
Fixes: bf30ca922a0c ("mac80211: check defrag PN against current frame")
Link: https://lore.kernel.org/r/20210927115838.12b9ac6bb233.I1d066acd5408a662c3b6e828122cd314fcb28cdb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e87b5052271e39d62337ade531992b7e5d8c2cfa ]
only increase fib6_sernum in net namespace after add fib6_info
successfully.
Signed-off-by: zhang kai <zhangkaiheb@126.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3106a0847525befe3e22fc723909d1b21eb0d520 ]
syzkaller discovered memory leaks [1] that can be reduced to the
following commands:
# ip nexthop add id 1 blackhole
# devlink dev reload pci/0000:06:00.0
As part of the reload flow, mlxsw will unregister its netdevs and then
unregister from the nexthop notification chain. Before unregistering
from the notification chain, mlxsw will receive delete notifications for
nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
will not receive notifications for nexthops using netdevs that are not
dismantled as part of the reload flow. For example, the blackhole
nexthop above that internally uses the loopback netdev as its nexthop
device.
One way to fix this problem is to have listeners flush their nexthop
tables after unregistering from the notification chain. This is
error-prone as evident by this patch and also not symmetric with the
registration path where a listener receives a dump of all the existing
nexthops.
Therefore, fix this problem by replaying delete notifications for the
listener being unregistered. This is symmetric to the registration path
and also consistent with the netdev notification chain.
The above means that unregister_nexthop_notifier(), like
register_nexthop_notifier(), will have to take RTNL in order to iterate
over the existing nexthops and that any callers of the function cannot
hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
driver. To avoid a deadlock, change the latter to unregister its nexthop
listener without holding RTNL, making it symmetric to the registration
path.
[1]
unreferenced object 0xffff88806173d600 (size 512):
comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s)
hex dump (first 32 bytes):
41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff A..`......sa....
08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00 ..sa............
backtrace:
[<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522
[<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline]
[<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline]
[<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231
[<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline]
[<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline]
[<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline]
[<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline]
[<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239
[<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83
[<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline]
[<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306
[<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244
[<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline]
[<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline]
[<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913
[<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572
[<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504
[<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590
[<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
[<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340
[<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929
[<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline]
[<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline]
[<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409
[<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463
[<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492
[<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline]
[<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline]
[<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499
Fixes: 2a014b200bbd ("mlxsw: spectrum_router: Add support for nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 977d293e23b48a1129830d7968605f61c4af71a0 ]
Due to signed/unsigned comparison, the expression:
info->size_goal - skb->len > 0
evaluates to true when the size goal is smaller than the
skb size. That results in lack of tx cache refill, so that
the skb allocated by the core TCP code lacks the required
MPTCP skb extensions.
Due to the above, syzbot is able to trigger the following WARN_ON():
WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
Modules linked in:
CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
FS: 00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
release_sock+0xb4/0x1b0 net/core/sock.c:3206
sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
call_write_iter include/linux/fs.h:2163 [inline]
new_sync_write+0x40b/0x640 fs/read_write.c:507
vfs_write+0x7cf/0xae0 fs/read_write.c:594
ksys_write+0x1ee/0x250 fs/read_write.c:647
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000
Fix the issue rewriting the relevant expression to avoid
sign-related problems - note: size_goal is always >= 0.
Additionally, ensure that the skb in the tx cache always carries
the relevant extension.
Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
Fixes: 1094c6fe7280 ("mptcp: fix possible divide by zero")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5135e96a3dd2f4555ae6981c3155a62bcf3227f6 ]
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:
spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
-> spi_unregister_device(to_spi_device(dev));
-> device_del(&spi->dev);
So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:
i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
-> i2c_unregister_device(client);
-> device_unregister(&client->dev);
The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).
So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.
This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:
void mdiobus_free(struct mii_bus *bus)
{
/* For compatibility with error handling in drivers. */
if (bus->state == MDIOBUS_ALLOCATED) {
kfree(bus);
return;
}
BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
bus->state = MDIOBUS_RELEASED;
put_device(&bus->dev);
}
In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.
I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:
(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
located on a bus that unregisters its children on shutdown. After
the blamed patch, either both the alloc and the register should use
devres, or none should.
(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
mdiobus_unregister at all in the remove path. After the blamed
patch, nobody unregisters the MDIO bus anymore, so this is even more
buggy than the previous case which needs a specific bus
configuration to be seen, this one is an unconditional bug.
In this case, DSA falls into category (a), it tries to be helpful and
registers an MDIO bus on behalf of the switch, which might be on such a
bus. I've no idea why it does it under devres.
It does this on probe:
if (!ds->slave_mii_bus && ds->ops->phy_read)
alloc and register mdio bus
and this on remove:
if (ds->slave_mii_bus && ds->ops->phy_read)
unregister mdio bus
I _could_ imagine using devres because the condition used on remove is
different than the condition used on probe. So strictly speaking, DSA
cannot determine whether the ds->slave_mii_bus it sees on remove is the
ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
have solved that problem. But nonetheless, the existing code already
proceeds to unregister the MDIO bus, even though it might be
unregistering an MDIO bus it has never registered. So I can only guess
that no driver that implements ds->ops->phy_read also allocates and
registers ds->slave_mii_bus itself.
So in that case, if unregistering is fine, freeing must be fine too.
Stop using devres and free the MDIO bus manually. This will make devres
stop attempting to free a still registered MDIO bus on ->shutdown.
Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e5845aa0eadda3d8a950eb8845c1396827131f30 ]
Since the blamed commit, dsa_tree_teardown_switches() was split into two
smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports.
However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports.
Fixes: a57d8c217aad ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a18cee4791b1123d0a6579a7c89f4b87e48abe03 ]
The abort_work is scheduled when a connection was detected to be
out-of-sync after a link failure. The work calls smc_conn_kill(),
which calls smc_close_active_abort() and that might end up calling
smc_close_cancel_work().
smc_close_cancel_work() cancels any pending close_work and tx_work but
needs to release the sock_lock before and acquires the sock_lock again
afterwards. So when the sock_lock was NOT acquired before then it may
be held after the abort_work completes. Thats why the sock_lock is
acquired before the call to smc_conn_kill() in __smc_lgr_terminate(),
but this is missing in smc_conn_abort_work().
Fix that by acquiring the sock_lock first and release it after the
call to smc_conn_kill().
Fixes: b286a0651e44 ("net/smc: handle incoming CDC validation message")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6c90731980655280ea07ce4b21eb97457bf86286 ]
Coverity stumbled over a missing error check in smc_clc_prfx_set():
*** CID 1475954: Error handling issues (CHECKED_RETURN)
/net/smc/smc_clc.c: 233 in smc_clc_prfx_set()
>>> CID 1475954: Error handling issues (CHECKED_RETURN)
>>> Calling "kernel_getsockname" without checking return value (as is done elsewhere 8 out of 10 times).
233 kernel_getsockname(clcsock, (struct sockaddr *)&addrs);
Add the return code check in smc_clc_prfx_set().
Fixes: c246d942eabc ("net/smc: restructure netinfo for CLC proposal msgs")
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3765996e4f0b8a755cab215a08df744490c76052 ]
The process will cause napi.state to contain NAPI_STATE_SCHED and
not in the poll_list, which will cause napi_disable() to get stuck.
The prefix "NAPI_STATE_" is removed in the figure below, and
NAPI_STATE_HASHED is ignored in napi.state.
CPU0 | CPU1 | napi.state
===============================================================================
napi_disable() | | SCHED | NPSVC
napi_enable() | |
{ | |
smp_mb__before_atomic(); | |
clear_bit(SCHED, &n->state); | | NPSVC
| napi_schedule_prep() | SCHED | NPSVC
| napi_poll() |
| napi_complete_done() |
| { |
| if (n->state & (NPSVC | | (1)
| _BUSY_POLL))) |
| return false; |
| ................ |
| } | SCHED | NPSVC
| |
clear_bit(NPSVC, &n->state); | | SCHED
} | |
| |
napi_schedule_prep() | | SCHED | MISSED (2)
(1) Here return direct. Because of NAPI_STATE_NPSVC exists.
(2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
Since NAPI_STATE_SCHED already exists and napi is not in the
sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
exist.
1. This will cause this queue to no longer receive packets.
2. If you encounter napi_disable under the protection of rtnl_lock, it
will cause the entire rtnl_lock to be locked, affecting the overall
system.
This patch uses cmpxchg to implement napi_enable(), which ensures that
there will be no race due to the separation of clear two bits.
Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
on error
[ Upstream commit fd292c189a979838622d5e03e15fa688c81dd50b ]
Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")
decided it was fine to ignore errors on certain ports that fail to
probe, and go on with the ports that do probe fine.
Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")
noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get
called, and devlink notices after a timeout of 3600 seconds and prints a
WARN_ON. So it went ahead to unregister the devlink port. And because
there exists an UNUSED port flavour, we actually re-register the devlink
port as UNUSED.
Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to
DSA") added devlink port regions, which are set up by the driver and not
by DSA.
When we trigger the devlink port deregistration and reregistration as
unused, devlink now prints another WARN_ON, from here:
devlink_port_unregister:
WARN_ON(!list_empty(&devlink_port->region_list));
So the port still has regions, which makes sense, because they were set
up by the driver, and the driver doesn't know we're unregistering the
devlink port.
Somebody needs to tear them down, and optionally (actually it would be
nice, to be consistent) set them up again for the new devlink port.
But DSA's layering stays in our way quite badly here.
The options I've considered are:
1. Introduce a function in devlink to just change a port's type and
flavour. No dice, devlink keeps a lot of state, it really wants the
port to not be registered when you set its parameters, so changing
anything can only be done by destroying what we currently have and
recreating it.
2. Make DSA cache the parameters passed to dsa_devlink_port_region_create,
and the region returned, keep those in a list, then when the devlink
port unregister needs to take place, the existing devlink regions are
destroyed by DSA, and we replay the creation of new regions using the
cached parameters. Problem: mv88e6xxx keeps the region pointers in
chip->ports[port].region, and these will remain stale after DSA frees
them. There are many things DSA can do, but updating mv88e6xxx's
private pointers is not one of them.
3. Just let the driver do it (i.e. introduce a very specific method
called ds->ops->port_reinit_as_unused, which unregisters its devlink
port devlink regions, then the old devlink port, then registers the
new one, then the devlink port regions for it). While it does work,
as opposed to the others, it's pretty horrible from an API
perspective and we can do better.
4. Introduce a new pair of methods, ->port_setup and ->port_teardown,
which in the case of mv88e6xxx must register and unregister the
devlink port regions. Call these 2 methods when the port must be
reinitialized as unused.
Naturally, I went for the 4th approach.
Fixes: 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 563f23b002534176f49524b5ca0e1d94d8906c40 upstream.
The resilient nexthop group torture tests in fib_nexthop.sh exposed a
possible division by zero while replacing a resilient group [1]. The
division by zero occurs when the data path sees a resilient nexthop
group with zero buckets.
The tests replace a resilient nexthop group in a loop while traffic is
forwarded through it. The tests do not specify the number of buckets
while performing the replacement, resulting in the kernel allocating a
stub resilient table (i.e, 'struct nh_res_table') with zero buckets.
This table should never be visible to the data path, but the old nexthop
group (i.e., 'oldg') might still be used by the data path when the stub
table is assigned to it.
Fix this by only assigning the stub table to the old nexthop group after
making sure the group is no longer used by the data path.
Tested with fib_nexthops.sh:
Tests passed: 222
Tests failed: 0
[1]
divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
RIP: 0010:nexthop_select_path+0x2d2/0x1a80
[...]
Call Trace:
fib_select_multipath+0x79b/0x1530
fib_select_path+0x8fb/0x1c10
ip_route_output_key_hash_rcu+0x1198/0x2da0
ip_route_output_key_hash+0x190/0x340
ip_route_output_flow+0x21/0x120
raw_sendmsg+0x91d/0x2e10
inet_sendmsg+0x9e/0xe0
__sys_sendto+0x23d/0x360
__x64_sys_sendto+0xe1/0x1b0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: stable@vger.kernel.org
Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e38b3f20059426a0adbde014ff71071739ab5226 ]
alloc_pages_bulk_array() attempts to allocate at least one page based on
the provided pages, and then opportunistically allocates more if that
can be done without dropping the spinlock.
So if it returns fewer than requested, that could just mean that it
needed to drop the lock. In that case, try again immediately.
Only pause for a time if no progress could be made.
Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com>
Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org>
Fixes: f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Mel Gorman <mgorman@suse.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f997ea3b7afc108eb9761f321b57de2d089c7c48 upstream.
This ensures we don't leak the sysfs file if we failed to
allocate chan->vc_wq during probe.
Link: http://lkml.kernel.org/r/20210517083557.172-1-xieyongji@bytedance.com
Fixes: 86c8437383ac ("net/9p: Add sysfs mount_tag file for virtio 9P device")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fe63339ef36bfb1cc12962015a9b254170eea057 ]
This reverts commit 9cf448c200ba9935baa94e7a0964598ce947db9d.
This commit was added for equivalence with a similar fix to ip_gre.
That fix proved to have a bug. Upon closer inspection, ip6_gre is not
susceptible to the original bug.
So revert the unnecessary extra check.
In short, ipgre_xmit calls skb_pull to remove ipv4 headers previously
inserted by dev_hard_header. ip6gre_tunnel_xmit does not.
Link: https://lore.kernel.org/netdev/CA+FuTSe+vJgTVLc9SojGuN-f9YQ+xWLPKE_S4f=f+w+_P2hgUg@mail.gmail.com/#t
Fixes: 9cf448c200ba ("ip6_gre: add validation for csum_start")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8a0ed250f911da31a2aef52101bc707846a800ff ]
The GRE tunnel device can pull existing outer headers in ipge_xmit.
This is a rare path, apparently unique to this device. The below
commit ensured that pulling does not move skb->data beyond csum_start.
But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and
thus csum_start is irrelevant.
Refine to exclude this. At the same time simplify and strengthen the
test.
Simplify, by moving the check next to the offending pull, making it
more self documenting and removing an unnecessary branch from other
code paths.
Strengthen, by also ensuring that the transport header is correct and
therefore the inner headers will be after skb_reset_inner_headers.
The transport header is set to csum_start in skb_partial_csum_set.
Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start")
Reported-by: Ido Schimmel <idosch@idosch.org>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c7c5e6ff533fe1f9afef7d2fa46678987a1335a7 ]
syzbot found that forcing a big quantum attribute would crash hosts fast,
essentially using this:
tc qd replace dev eth0 root fq_codel quantum 4294967295
This is because fq_codel_dequeue() would have to loop
~2^31 times in :
if (flow->deficit <= 0) {
flow->deficit += q->quantum;
list_move_tail(&flow->flowchain, &q->old_flows);
goto begin;
}
SFQ max quantum is 2^19 (half a megabyte)
Lets adopt a max quantum of one megabyte for FQ_CODEL.
Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 730affed24bffcd1eebd5903171960f5ff9f1f22 ]
Bug reported by KASAN:
BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
Call Trace:
(...)
inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
(...)
nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91
net/ipv6/netfilter/nf_socket_ipv6.c:146)
It seems that this bug has already been fixed by Eric Dumazet in the
past in:
commit 78296c97ca1f ("netfilter: xt_socket: fix a stack corruption bug")
But a variant of the same issue has been introduced in
commit d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")
`daddr` and `saddr` potentially hold a reference to ipv6_var that is no
longer in scope when the call to `nf_socket_get_sock_v6` is made.
Fixes: d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 340fa6667a696338e707cd5531a9631093d1be29 ]
Recent changes exposed a bug where specifically-timed requests to the
path manager netlink API could trigger a divide-by-zero in
__tcp_select_window(), as syzkaller does:
divide error: 0000 [#1] SMP KASAN NOPTI
CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016
Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea
RSP: 0018:ffff888031ccf020 EFLAGS: 00010216
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000
RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002
RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000
R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000
R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000
FS: 00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
tcp_select_window net/ipv4/tcp_output.c:264 [inline]
__tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351
__tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972
__tcp_send_ack net/ipv4/tcp_output.c:3978 [inline]
tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978
mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654
mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58
mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328
mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359
genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0x14e/0x190 net/socket.c:724
____sys_sendmsg+0x709/0x870 net/socket.c:2403
___sys_sendmsg+0xff/0x170 net/socket.c:2457
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the
first subflow in the MPTCP socket's connection list without validating
that the subflow was in a suitable connection state. To address this,
always validate subflow state when sending extra ACKs on subflows
for address advertisement or subflow priority change.
Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1094c6fe7280e17e0e87934add5ad2585e990def ]
Florian noted that if mptcp_alloc_tx_skb() allocation fails
in __mptcp_push_pending(), we can end-up invoking
mptcp_push_release()/tcp_push() with a zero mss, causing
a divide by 0 error.
This change addresses the issue refactoring the skb allocation
code checking if skb collapsing will happen for sure and doing
the skb allocation only after such check. Skb allocation will
now happen only after the call to tcp_send_mss() which
correctly initializes mss_now.
As side bonuses we now fill the skb tx cache only when needed,
and this also clean-up a bit the output path.
v1 -> v2:
- use lockdep_assert_held_once() - Jakub
- fix indentation - Jakub
Reported-by: Florian Westphal <fw@strlen.de>
Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0e90dfa7a8d817db755c7b5d89d77b9c485e4180 ]
I noticed that only port 0 worked on the RTL8366RB since we
started to use custom tags.
It turns out that the format of egress custom tags is actually
different from ingress custom tags. While the lower bits just
contain the port number in ingress tags, egress tags need to
indicate destination port by setting the bit for the
corresponding port.
It was working on port 0 because port 0 added 0x00 as port
number in the lower bits, and if you do this the packet appears
at all ports, including the intended port. Ooops.
Fix this and all ports work again. Use the define for shifting
the "type A" into place while we're at it.
Tested on the D-Link DIR-685 by sending traffic to each of
the ports in turn. It works.
Fixes: 86dd9868b878 ("net: dsa: tag_rtl4_a: Support also egress tags")
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e3245a7b7b34bd2e97f744fd79463add6e9d41f4 ]
Syzbot hit use-after-free in nf_tables_dump_sets. The problem was in
missing lock protection for nft_ct_pcpu_template_refcnt.
Before commit f102d66b335a ("netfilter: nf_tables: use dedicated
mutex to guard transactions") all transactions were serialized by global
mutex, but then global mutex was changed to local per netnamespace
commit_mutex.
This change causes use-after-free bug, when 2 netnamespaces concurently
changing nft_ct_pcpu_template_refcnt without proper locking. Fix it by
adding nft_ct_pcpu_mutex and protect all nft_ct_pcpu_template_refcnt
changes with it.
Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-and-tested-by: syzbot+649e339fa6658ee623d3@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9aca491e0dccf8a9d84a5b478e5eee3c6ea7803b ]
This patch fixes kernel NULL pointer dereference when creating nexthop
which is bound with SRv6 decapsulation. In the creation of nexthop,
__seg6_end_dt_vrf_build is called. __seg6_end_dt_vrf_build expects
fc_lninfo in fib6_config is set correctly, but it isn't set in
nh_create_ipv6, which causes kernel crash.
Here is steps to reproduce kernel crash:
1. modprobe vrf
2. ip -6 nexthop add encap seg6local action End.DT4 vrftable 1 dev eth0
We got the following message:
[ 901.370336] BUG: kernel NULL pointer dereference, address: 0000000000000ba0
[ 901.371658] #PF: supervisor read access in kernel mode
[ 901.372672] #PF: error_code(0x0000) - not-present page
[ 901.373672] PGD 0 P4D 0
[ 901.374248] Oops: 0000 [#1] SMP PTI
[ 901.374944] CPU: 0 PID: 8593 Comm: ip Not tainted 5.14-051400-generic #202108310811-Ubuntu
[ 901.376404] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module_el8.2.0+320+13f867d7 04/01/2014
[ 901.377907] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf]
[ 901.379182] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00
[ 901.382652] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282
[ 901.383746] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8
[ 901.385436] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
[ 901.386924] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40
[ 901.388537] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000
[ 901.389937] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b
[ 901.391226] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000
[ 901.392737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 901.393803] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0
[ 901.395122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 901.396496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 901.397833] PKRU: 55555554
[ 901.398578] Call Trace:
[ 901.399144] l3mdev_ifindex_lookup_by_table_id+0x3b/0x70
[ 901.400179] __seg6_end_dt_vrf_build+0x34/0xd0
[ 901.401067] seg6_end_dt4_build+0x16/0x20
[ 901.401904] seg6_local_build_state+0x271/0x430
[ 901.402797] lwtunnel_build_state+0x81/0x130
[ 901.403645] fib_nh_common_init+0x82/0x100
[ 901.404465] ? sock_def_readable+0x4b/0x80
[ 901.405285] fib6_nh_init+0x115/0x7c0
[ 901.406033] nh_create_ipv6.isra.0+0xe1/0x140
[ 901.406932] rtm_new_nexthop+0x3b7/0xeb0
[ 901.407828] rtnetlink_rcv_msg+0x152/0x3a0
[ 901.408663] ? rtnl_calcit.isra.0+0x130/0x130
[ 901.409535] netlink_rcv_skb+0x55/0x100
[ 901.410319] rtnetlink_rcv+0x15/0x20
[ 901.411026] netlink_unicast+0x1a8/0x250
[ 901.411813] netlink_sendmsg+0x238/0x470
[ 901.412602] ? _copy_from_user+0x2b/0x60
[ 901.413394] sock_sendmsg+0x65/0x70
[ 901.414112] ____sys_sendmsg+0x218/0x290
[ 901.414929] ? copy_msghdr_from_user+0x5c/0x90
[ 901.415814] ___sys_sendmsg+0x81/0xc0
[ 901.416559] ? fsnotify_destroy_marks+0x27/0xf0
[ 901.417447] ? call_rcu+0xa4/0x230
[ 901.418153] ? kmem_cache_free+0x23f/0x410
[ 901.418972] ? dentry_free+0x37/0x70
[ 901.419705] ? mntput_no_expire+0x4c/0x260
[ 901.420574] __sys_sendmsg+0x62/0xb0
[ 901.421297] __x64_sys_sendmsg+0x1f/0x30
[ 901.422057] do_syscall_64+0x5c/0xc0
[ 901.422756] ? syscall_exit_to_user_mode+0x27/0x50
[ 901.423675] ? __x64_sys_close+0x12/0x40
[ 901.424462] ? do_syscall_64+0x69/0xc0
[ 901.425219] ? irqentry_exit_to_user_mode+0x9/0x20
[ 901.426149] ? irqentry_exit+0x19/0x30
[ 901.426901] ? exc_page_fault+0x89/0x160
[ 901.427709] ? asm_exc_page_fault+0x8/0x30
[ 901.428536] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 901.429514] RIP: 0033:0x7fe493945747
[ 901.430248] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 901.433549] RSP: 002b:00007ffe9932cf68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 901.434981] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe493945747
[ 901.436303] RDX: 0000000000000000 RSI: 00007ffe9932cfe0 RDI: 0000000000000003
[ 901.437607] RBP: 00000000613053f7 R08: 0000000000000001 R09: 00007ffe9932d07c
[ 901.438990] R10: 000055f4a903a010 R11: 0000000000000246 R12: 0000000000000001
[ 901.440340] R13: 0000000000000001 R14: 000055f4a802b163 R15: 000055f4a8042020
[ 901.441630] Modules linked in: vrf nls_utf8 isofs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit rapl input_leds joydev serio_raw qemu_fw_cfg mac_hid sch_fq_codel drm virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd virtio_net net_failover cryptd psmouse virtio_blk failover i2c_piix4 pata_acpi floppy
[ 901.450808] CR2: 0000000000000ba0
[ 901.451514] ---[ end trace c27b934b99ade304 ]---
[ 901.452403] RIP: 0010:vrf_ifindex_lookup_by_table_id+0x19/0x90 [vrf]
[ 901.453626] Code: c1 e9 72 ff ff ff e8 96 49 01 c2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 89 f5 41 54 53 8b 05 47 4c 00 00 <48> 8b 97 a0 0b 00 00 48 8b 1c c2 e8 57 27 53 c1 4c 8d a3 88 00 00
[ 901.456910] RSP: 0018:ffffbf2d02043590 EFLAGS: 00010282
[ 901.457912] RAX: 000000000000000b RBX: ffff990808255e70 RCX: ffffbf2d02043aa8
[ 901.459238] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
[ 901.460552] RBP: ffffbf2d020435b0 R08: 00000000000000c0 R09: ffff990808255e40
[ 901.461882] R10: ffffffff83b08c90 R11: 0000000000000009 R12: 0000000000000000
[ 901.463208] R13: 0000000000000001 R14: 0000000000000000 R15: 000000000000000b
[ 901.464529] FS: 00007fe49381f740(0000) GS:ffff99087dc00000(0000) knlGS:0000000000000000
[ 901.466058] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 901.467189] CR2: 0000000000000ba0 CR3: 000000000e3e8003 CR4: 0000000000770ef0
[ 901.468515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 901.469858] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 901.471139] PKRU: 55555554
Signed-off-by: Ryoga Saito <contact@proelbtn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a57d8c217aadac75530b8e7ffb3a3e1b7bfd0330 ]
Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
messages appear:
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2
(and similarly for other ports)
What happens is that DSA has a policy "even if there are bugs, let's at
least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
dp->mdbs lists, which are supposed to be empty.
But deleting that cleanup code, the warnings go away.
=> the FDB and MDB lists (used for refcounting on shared ports, aka CPU
and DSA ports) will eventually be empty, but are not empty by the time
we tear down those ports. Aka we are deleting them too soon.
The addresses that DSA complains about are host-trapped addresses: the
local addresses of the ports, and the MAC address of the bridge device.
The problem is that offloading those entries happens from a deferred
work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
races with the teardown of the CPU and DSA ports where the refcounting
is kept.
In fact, not only it races, but fundamentally speaking, if we iterate
through the port list linearly, we might end up tearing down the shared
ports even before we delete a DSA user port which has a bridge upper.
So as it turns out, we need to first tear down the user ports (and the
unused ones, for no better place of doing that), then the shared ports
(the CPU and DSA ports). In between, we need to ensure that all work
items scheduled by our switchdev handlers (which only run for user
ports, hence the reason why we tear them down first) have finished.
Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e50e711351bdc656a8e6ca1022b4293cae8dcd59 upstream.
Turn udp_tunnel_nic work-queue to an ordered work-queue. This queue
holds the UDP-tunnel configuration commands of the different netdevs.
When the netdevs are functions of the same NIC the order of
execution may be crucial.
Problem example:
NIC with 2 PFs, both PFs declare offload quota of up to 3 UDP-ports.
$ifconfig eth2 1.1.1.1/16 up
$ip link add eth2_19503 type vxlan id 5049 remote 1.1.1.2 dev eth2 dstport 19053
$ip link set dev eth2_19503 up
$ip link add eth2_19504 type vxlan id 5049 remote 1.1.1.3 dev eth2 dstport 19054
$ip link set dev eth2_19504 up
$ip link add eth2_19505 type vxlan id 5049 remote 1.1.1.4 dev eth2 dstport 19055
$ip link set dev eth2_19505 up
$ip link add eth2_19506 type vxlan id 5049 remote 1.1.1.5 dev eth2 dstport 19056
$ip link set dev eth2_19506 up
NIC RX port offload infrastructure offloads the first 3 UDP-ports (on
all devices which sets NETIF_F_RX_UDP_TUNNEL_PORT feature) and not
UDP-port 19056. So both PFs gets this offload configuration.
$ip link set dev eth2_19504 down
This triggers udp-tunnel-core to remove the UDP-port 19504 from
offload-ports-list and offload UDP-port 19056 instead.
In this scenario it is important that the UDP-port of 19504 will be
removed from both PFs before trying to add UDP-port 19056. The NIC can
stop offloading a UDP-port only when all references are removed.
Otherwise the NIC may report exceeding of the offload quota.
Fixes: cc4e3835eff4 ("udp_tunnel: add central NIC RX port offload infrastructure")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4f884f3962767877d7aabbc1ec124d2c307a4257 upstream.
Commit 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit
time") may directly retrans a multiple segments TSO/GSO packet without
split, Since this commit, we can no longer assume that a retransmitted
packet is a single segment.
This patch fixes the tp->undo_retrans accounting in tcp_sacktag_one()
that use the actual segments(pcount) of the retransmitted packet.
Before that commit (10d3be569243), the assumption underlying the
tp->undo_retrans-- seems correct.
Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: zhenggy <zhenggy@chinatelecom.cn>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6a52e73368038f47f6618623d75061dc263b26ae upstream.
DSA supports connecting to a phy-handle, and has a fallback to a non-OF
based method of connecting to an internal PHY on the switch's own MDIO
bus, if no phy-handle and no fixed-link nodes were present.
The -ENODEV error code from the first attempt (phylink_of_phy_connect)
is what triggers the second attempt (phylink_connect_phy).
However, when the first attempt returns a different error code than
-ENODEV, this results in an unbalance of calls to phylink_create and
phylink_destroy by the time we exit the function. The phylink instance
has leaked.
There are many other error codes that can be returned by
phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
So this is a practical issue too.
Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20210914134331.2303380-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04f08eb44b5011493d77b602fdec29ff0f5c6cd5 upstream.
syzbot reported another data-race in af_unix [1]
Lets change __skb_insert() to use WRITE_ONCE() when changing
skb head qlen.
Also, change unix_dgram_poll() to use lockless version
of unix_recvq_full()
It is verry possible we can switch all/most unix_recvq_full()
to the lockless version, this will be done in a future kernel version.
[1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1
BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll
write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0:
__skb_insert include/linux/skbuff.h:1938 [inline]
__skb_queue_before include/linux/skbuff.h:2043 [inline]
__skb_queue_tail include/linux/skbuff.h:2076 [inline]
skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264
unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg net/socket.c:723 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
___sys_sendmsg net/socket.c:2446 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2532
__do_sys_sendmmsg net/socket.c:2561 [inline]
__se_sys_sendmmsg net/socket.c:2558 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1:
skb_queue_len include/linux/skbuff.h:1869 [inline]
unix_recvq_full net/unix/af_unix.c:194 [inline]
unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777
sock_poll+0x23e/0x260 net/socket.c:1288
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll fs/eventpoll.c:846 [inline]
ep_send_events fs/eventpoll.c:1683 [inline]
ep_poll fs/eventpoll.c:1798 [inline]
do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226
__do_sys_epoll_wait fs/eventpoll.c:2238 [inline]
__se_sys_epoll_wait fs/eventpoll.c:2233 [inline]
__x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000001b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 86b18aaa2b5b ("skbuff: fix a data race in skb_queue_len()")
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f4bb62e64c88c93060c051195d3bbba804e56945 upstream.
In tipc_sk_enqueue() we use hardcoded 2 jiffies to extract
socket buffer from generic queue to particular socket.
The 2 jiffies is too short in case there are other high priority
tasks get CPU cycles for multiple jiffies update. As result, no
buffer could be enqueued to particular socket.
To solve this, we switch to use constant timeout 20msecs.
Then, the function will be expired between 2 jiffies (CONFIG_100HZ)
and 20 jiffies (CONFIG_1000HZ).
Fixes: c637c1035534 ("tipc: resolve race problem at unicast message reception")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9b6ff7eb666415e1558f1ba8a742f5db6a9954de upstream.
The reference count leak issue may take place in an error handling
path. If both conditions of tunnel->version == L2TP_HDR_VER_3 and the
return value of l2tp_v3_ensure_opt_in_linear is nonzero, the function
would directly jump to label invalid, without decrementing the reference
count of the l2tp_session object session increased earlier by
l2tp_tunnel_get_session(). This may result in refcount leaks.
Fix this issue by decrease the reference count before jumping to the
label invalid.
Fixes: 4522a70db7aa ("l2tp: fix reading optional fields of L2TPv3")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d9ea761fdd197351890418acd462c51f241014a7 upstream.
Commit 2677d2067731 ("dccp: don't free ccid2_hc_tx_sock ...") fixed
a UAF but reintroduced CVE-2017-6074.
When the sock is cloned, two dccps_hc_tx_ccid will reference to the
same ccid. So one can free the ccid object twice from two socks after
cloning.
This issue was found by "Hadar Manor" as well and assigned with
CVE-2020-16119, which was fixed in Ubuntu's kernel. So here I port
the patch from Ubuntu to fix it.
The patch prevents cloned socks from referencing the same ccid.
Fixes: 2677d2067731410 ("dccp: don't free ccid2_hc_tx_sock ...")
Signed-off-by: Zhenpeng Lin <zplin@psu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 550ac9c1aaaaf51fd42e20d461f0b1cdbd55b3d2 upstream.
syszbot triggers this warning, which looks something
we can easily prevent.
If we initialize priv->list_field in chnl_net_init(),
then always use list_del_init(), we can remove robust_list_del()
completely.
WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 robust_list_del net/caif/chnl_net.c:67 [inline]
WARNING: CPU: 0 PID: 3233 at net/caif/chnl_net.c:67 chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375
Modules linked in:
CPU: 0 PID: 3233 Comm: syz-executor.3 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:robust_list_del net/caif/chnl_net.c:67 [inline]
RIP: 0010:chnl_net_uninit+0xc9/0x2e0 net/caif/chnl_net.c:375
Code: 89 eb e8 3a a3 ba f8 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 bf 01 00 00 48 81 fb 00 14 4e 8d 48 8b 2b 75 d0 e8 17 a3 ba f8 <0f> 0b 5b 5d 41 5c 41 5d e9 0a a3 ba f8 4c 89 e3 e8 02 a3 ba f8 4c
RSP: 0018:ffffc90009067248 EFLAGS: 00010202
RAX: 0000000000008780 RBX: ffffffff8d4e1400 RCX: ffffc9000fd34000
RDX: 0000000000040000 RSI: ffffffff88bb6e49 RDI: 0000000000000003
RBP: ffff88802cd9ee08 R08: 0000000000000000 R09: ffffffff8d0e6647
R10: ffffffff88bb6dc2 R11: 0000000000000000 R12: ffff88803791ae08
R13: dffffc0000000000 R14: 00000000e600ffce R15: ffff888073ed3480
FS: 00007fed10fa0700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2c322000 CR3: 00000000164a6000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
register_netdevice+0xadf/0x1500 net/core/dev.c:10347
ipcaif_newlink+0x4c/0x260 net/caif/chnl_net.c:468
__rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
__sys_sendto+0x21c/0x320 net/socket.c:2036
__do_sys_sendto net/socket.c:2048 [inline]
__se_sys_sendto net/socket.c:2044 [inline]
__x64_sys_sendto+0xdd/0x1b0 net/socket.c:2044
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: cc36a070b590 ("net-caif: add CAIF netdevice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9756e44fd4d283ebcc94df353642f322428b73de upstream.
The commit 733c99ee8be9 ("net: fix NULL pointer reference in
cipso_v4_doi_free") was merged by a mistake, this patch try
to cleanup the mess.
And we already have the commit e842cb60e8ac ("net: fix NULL
pointer reference in cipso_v4_doi_free") which fixed the root
cause of the issue mentioned in it's description.
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9b29a161ef38040f000dcf9ccf78e34495edfd55 upstream.
In the cited commit, copy_to_user() got called with the wrong pointer,
instead of passing the actual buffer ptr to copy from, a pointer to
the pointer got passed, which causes a buffer overflow calltrace to pop
up when executing "ethtool -x ethX".
Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed
to the function, instead of a pointer to it.
This fixes below call trace:
[ 15.533533] ------------[ cut here ]------------
[ 15.539007] Buffer overflow detected (8 < 192)!
[ 15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20
[ 15.549308] Modules linked in:
[ 15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058
[ 15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 15.558378] RIP: 0010:copy_overflow+0x15/0x20
[ 15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55
[ 15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286
[ 15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000
[ 15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff
[ 15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff
[ 15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000
[ 15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000
[ 15.573584] FS: 00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000
[ 15.575639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0
[ 15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 15.582441] Call Trace:
[ 15.582970] ethtool_rxnfc_copy_to_user+0x30/0x46
[ 15.583815] ethtool_get_rxnfc.cold+0x23/0x2b
[ 15.584584] dev_ethtool+0x29c/0x25f0
[ 15.585286] ? security_netlbl_sid_to_secattr+0x77/0xd0
[ 15.586728] ? do_set_pte+0xc4/0x110
[ 15.587349] ? _raw_spin_unlock+0x18/0x30
[ 15.588118] ? __might_sleep+0x49/0x80
[ 15.588956] dev_ioctl+0x2c1/0x490
[ 15.589616] sock_ioctl+0x18e/0x330
[ 15.591143] __x64_sys_ioctl+0x41c/0x990
[ 15.591823] ? irqentry_exit_to_user_mode+0x9/0x20
[ 15.592657] ? irqentry_exit+0x33/0x40
[ 15.593308] ? exc_page_fault+0x32f/0x770
[ 15.593877] ? exit_to_user_mode_prepare+0x3c/0x130
[ 15.594775] do_syscall_64+0x35/0x80
[ 15.595397] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 15.596037] RIP: 0033:0x7f5381226d4b
[ 15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48
[ 15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b
[ 15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003
[ 15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001
[ 15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000
[ 15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000
[ 15.605042] ---[ end trace 325cf185e2795048 ]---
Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling")
Reported-by: Shannon Nelson <snelson@pensando.io>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tested-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|