Age | Commit message (Collapse) | Author |
|
unload/load driver
commit 05a7f4a8dff19999ca8a83a35ff4782689de7bfc upstream.
The change of namespaces during devlink reload calls to driver unload
before it accesses devlink parameters. The commands below causes to
use-after-free bug when trying to get flow steering mode.
* ip netns add n1
* devlink dev reload pci/0000:00:09.0 netns n1
==================================================================
BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
Read of size 4 at addr ffff888009d04308 by task devlink/275
CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x93/0xc2
print_address_description.constprop.0+0x18/0x140
? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
kasan_report.cold+0x7c/0xd8
? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
devlink_nl_param_fill+0x1c8/0xe80
? __free_pages_ok+0x37a/0x8a0
? devlink_flash_update_timeout_notify+0xd0/0xd0
? lock_acquire+0x1a9/0x6d0
? fs_reclaim_acquire+0xb7/0x160
? lock_is_held_type+0x98/0x110
? 0xffffffff81000000
? lock_release+0x1f9/0x6c0
? fs_reclaim_release+0xa1/0xf0
? lock_downgrade+0x6d0/0x6d0
? lock_is_held_type+0x98/0x110
? lock_is_held_type+0x98/0x110
? memset+0x20/0x40
? __build_skb_around+0x1f8/0x2b0
devlink_param_notify+0x6d/0x180
devlink_reload+0x1c3/0x520
? devlink_remote_reload_actions_performed+0x30/0x30
? mutex_trylock+0x24b/0x2d0
? devlink_nl_cmd_reload+0x62b/0x1070
devlink_nl_cmd_reload+0x66d/0x1070
? devlink_reload+0x520/0x520
? devlink_get_from_attrs+0x1bc/0x260
? devlink_nl_pre_doit+0x64/0x4d0
genl_family_rcv_msg_doit+0x1e9/0x2f0
? mutex_lock_io_nested+0x1130/0x1130
? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
? security_capable+0x51/0x90
genl_rcv_msg+0x27f/0x4a0
? genl_get_cmd+0x3c0/0x3c0
? lock_acquire+0x1a9/0x6d0
? devlink_reload+0x520/0x520
? lock_release+0x6c0/0x6c0
netlink_rcv_skb+0x11d/0x340
? genl_get_cmd+0x3c0/0x3c0
? netlink_ack+0x9f0/0x9f0
? lock_release+0x1f9/0x6c0
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
? netlink_attachskb+0x730/0x730
? _copy_from_iter_full+0x178/0x650
? __alloc_skb+0x113/0x2b0
netlink_sendmsg+0x6f1/0xbd0
? netlink_unicast+0x700/0x700
? lock_is_held_type+0x98/0x110
? netlink_unicast+0x700/0x700
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x193/0x240
? __x64_sys_getpeername+0xb0/0xb0
? do_sys_openat2+0x10b/0x370
? __up_read+0x1a1/0x7b0
? do_user_addr_fault+0x219/0xdc0
? __x64_sys_openat+0x120/0x1d0
? __x64_sys_open+0x1a0/0x1a0
__x64_sys_sendto+0xdd/0x1b0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc69d0af14a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 146:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x99/0xc0
mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
mlx5_load+0xd2/0x180 [mlx5_core]
mlx5_init_one+0x2f6/0x450 [mlx5_core]
probe_one+0x47d/0x6e0 [mlx5_core]
pci_device_probe+0x2a0/0x4a0
really_probe+0x20a/0xc90
driver_probe_device+0xd8/0x380
device_driver_attach+0x1df/0x250
__driver_attach+0xff/0x240
bus_for_each_dev+0x11e/0x1a0
bus_add_driver+0x309/0x570
driver_register+0x1ee/0x380
0xffffffffa06b8062
do_one_initcall+0xd5/0x410
do_init_module+0x1c8/0x760
load_module+0x6d8b/0x9650
__do_sys_finit_module+0x118/0x1b0
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 275:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0x102/0x140
slab_free_freelist_hook+0x74/0x1b0
kfree+0xd7/0x2a0
mlx5_unload+0x16/0xb0 [mlx5_core]
mlx5_unload_one+0xae/0x120 [mlx5_core]
mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
devlink_reload+0x141/0x520
devlink_nl_cmd_reload+0x66d/0x1070
genl_family_rcv_msg_doit+0x1e9/0x2f0
genl_rcv_msg+0x27f/0x4a0
netlink_rcv_skb+0x11d/0x340
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
netlink_sendmsg+0x6f1/0xbd0
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x193/0x240
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff888009d04300
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 8 bytes inside of
128-byte region [ffff888009d04300, ffff888009d04380)
The buggy address belongs to the page:
page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
head:0000000086a64ecc order:1 compound_mapcount:0
flags: 0x4000000000010200(slab|head)
raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The right solution to devlink reload is to notify about deletion of
parameters, unload driver, change net namespaces, load driver and notify
about addition of parameters.
Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload")
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 92548b0ee220e000d81c27ac9a80e0ede895a881 ]
The UDP length field should be in network order.
This removes the following sparse error:
net/ipv4/route.c:3173:27: warning: incorrect type in assignment (different base types)
net/ipv4/route.c:3173:27: expected restricted __be16 [usertype] len
net/ipv4/route.c:3173:27: got unsigned long
Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c66070125837900163b81a03063ddd657a7e9bfb ]
The reference counting issue happens in one exception handling path of
cbq_change_class(). When failing to get tcf_block, the function forgets
to decrease the refcount of "rtab" increased by qdisc_put_rtab(),
causing a refcount leak.
Fix this issue by jumping to "failure" label when get tcf_block failed.
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ca49bfd90a9dde175d2929dc1544b54841e33804 ]
In HTB offload mode, qdiscs of leaf classes are grafted to netdev
queues. sch_htb expects the dev_queue field of these qdiscs to point to
the corresponding queues. However, qdisc creation may fail, and in that
case noop_qdisc is used instead. Its dev_queue doesn't point to the
right queue, so sch_htb can lose track of used netdev queues, which will
cause internal inconsistencies.
This commit fixes this bug by keeping track of the netdev queue inside
struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
new field, the two values are synced on writes, and WARNs are added to
assert equality of the two values.
The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
the driver used to pass the old and new queue IDs to sch_htb. Now that
there is a new field (offload_queue) in struct htb_class that needs to
be updated on this operation, the driver will pass the old class ID to
sch_htb instead (it already knows the new class ID).
Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aaa8e4922c887ff47ad66ef918193682bccc1905 ]
These checks are still not strict enough. The main problem is that if
"cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is
guaranteed to be 4 but we need to be at least 16 bytes. In fact, we
can reject everything smaller than sizeof(*pkt) which is 20 bytes.
Also I don't like the ALIGN(size, 4). It's better to just insist that
data is needs to be aligned at the start.
Fixes: 0baa99ee353c ("net: qrtr: Allow non-immediate node routing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 67d6d681e15b578c1725bad8ad079e05d1c48a8e ]
Even after commit 6457378fe796 ("ipv4: use siphash instead of Jenkins in
fnhe_hashfun()"), an attacker can still use brute force to learn
some secrets from a victim linux host.
One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.
Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.
After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.
This is slightly increasing memory size used by the hash table,
by 50% in average, we do not expect this to be a problem.
This patch is more complex than the prior one (IPv6 equivalent),
because IPv4 was reusing the oldest entry.
Since we need to be able to evict more than one entry per
update_or_create_fnhe() call, I had to replace
fnhe_oldest() with fnhe_remove_oldest().
Also note that we will queue extra kfree_rcu() calls under stress,
which hopefully wont be a too big issue.
Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a00df2caffed3883c341d5685f830434312e4a43 ]
Even after commit 4785305c05b2 ("ipv6: use siphash in rt6_exception_hash()"),
an attacker can still use brute force to learn some secrets from a victim
linux host.
One way to defeat these attacks is to make the max depth of the hash
table bucket a random value.
Before this patch, each bucket of the hash table used to store exceptions
could contain 6 items under attack.
After the patch, each bucket would contains a random number of items,
between 6 and 10. The attacker can no longer infer secrets.
This is slightly increasing memory size used by the hash table,
we do not expect this to be a problem.
Following patch is dealing with the same issue in IPv4.
Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f41a4b2b5eb7872109723dab8ae1603bdd9d9ec1 ]
Syzbot hit "task hung" bug in hci_req_sync(). The problem was in
unreasonable huge inquiry timeout passed from userspace.
Fix it by adding sanity check for timeout value to hci_inquiry().
Since hci_inquiry() is the only user of hci_req_sync() with user
controlled timeout value, it makes sense to check timeout value in
hci_inquiry() and don't touch hci_req_sync().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+be2baed593ea56c6a84c@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5c11720767f70d34357d00a15ba5a0ad052c40fe ]
Some paths through svc_process() leave rqst->rq_procinfo set to
NULL, which triggers a crash if tracing happens to be enabled.
Fixes: 89ff87494c6e ("SUNRPC: Display RPC procedure names instead of proc numbers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f50d2ff8f016b79a2ff4acd5943a1eda40c545d4 ]
ieee80211_amsdu_realloc_pad() fails to account for extra_tx_headroom,
the original reserved headroom might be eaten. Add the necessary
extra_tx_headroom.
Fixes: 6e0456b54545 ("mac80211: add A-MSDU tx support")
Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20210816085128.10931-2-pkshih@realtek.com
[fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0ea53674d07fb6db2dd7a7ec2fdc85a12eb246c2 ]
Commit 0ea9fd001a14 ("Bluetooth: Shutdown controller after workqueues
are flushed or cancelled") introduced a regression that makes mtkbtsdio
driver stops working:
[ 36.593956] Bluetooth: hci0: Firmware already downloaded
[ 46.814613] Bluetooth: hci0: Execution of wmt command timed out
[ 46.814619] Bluetooth: hci0: Failed to send wmt func ctrl (-110)
The shutdown callback depends on the result of hdev->rx_work, so we
should call it before flushing rx_work:
-> btmtksdio_shutdown()
-> mtk_hci_wmt_sync()
-> __hci_cmd_send()
-> wait for BTMTKSDIO_TX_WAIT_VND_EVT gets cleared
-> btmtksdio_recv_event()
-> hci_recv_frame()
-> queue_work(hdev->workqueue, &hdev->rx_work)
-> clears BTMTKSDIO_TX_WAIT_VND_EVT
So move the shutdown callback before flushing TX/RX queue to resolve the
issue.
Reported-and-tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Fixes: 0ea9fd001a14 ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ed43fbac717882165a2a4bd64f7b1f56f7467bb7 ]
The { 0 } doesn't clear all fields in the struct, but tells to the
compiler to set all fields to zero and doesn't touch any sub-fields
if they exists.
The {} is an empty initialiser that instructs to fully initialize whole
struct including sub-fields, which is error-prone for future
devlink_flash_notify extensions.
Fixes: 6700acc5f1fe ("devlink: collect flash notify params into a struct")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4a11174d6dbd0bde6d5a1d6efb0d70f58811db55 ]
The address "&sband->iftype_data[i]" points to an array at the end of
struct. It can't be NULL and so the check can be removed.
Fixes: bac2fd3d7534 ("mac80211: remove use of ieee80211_get_he_sta_cap()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YNmgHi7Rh3SISdog@mwanda
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e1dee2c1de2b4dd00eb44004a4bda6326ed07b59 ]
In commit 4e1a720d0312 ("Bluetooth: avoid killing an already killed
socket"), a check was added to sco_sock_kill to skip killing a socket
if the SOCK_DEAD flag was set.
This was done after a trace for a use-after-free bug showed that the
same sock pointer was being killed twice.
Unfortunately, this check prevents sco_sock_kill from running on any
socket. sco_sock_kill kills a socket only if it's zapped and orphaned,
however sock_orphan announces that the socket is dead before detaching
it. i.e., orphaned sockets have the SOCK_DEAD flag set.
To fix this, we remove the check for SOCK_DEAD, and avoid repeated
calls to sco_sock_kill by removing incorrect calls in:
1. sco_sock_timeout. The socket should not be killed on timeout as
further processing is expected to be done. For example,
sco_sock_connect sets the timer then waits for the socket to be
connected or for an error to be returned.
2. sco_conn_del. This function should clean up resources for the
connection, but the socket itself should be cleaned up in
sco_sock_release.
3. sco_sock_close. Calls to sco_sock_close in sco_sock_cleanup_listen
and sco_sock_release are followed by sco_sock_kill. Hence the
duplicated call should be removed.
Fixes: 4e1a720d0312 ("Bluetooth: avoid killing an already killed socket")
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
querier
[ Upstream commit c73c57081b3d59aa99093fbedced32ea02620cd3 ]
Commit 08cc83cc7fd8 ("net: dsa: add support for BRIDGE_MROUTER
attribute") added an option for users to turn off multicast flooding
towards the CPU if they turn off the IGMP querier on a bridge which
already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router).
And commit a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags")
simply papered over that issue, because it moved the decision to flood
the CPU with multicast (or not) from the DSA core down to individual drivers,
instead of taking a more radical position then.
The truth is that disabling multicast flooding to the CPU is simply
something we are not prepared to do now, if at all. Some reasons:
- ICMP6 neighbor solicitation messages are unregistered multicast
packets as far as the bridge is concerned. So if we stop flooding
multicast, the outside world cannot ping the bridge device's IPv6
link-local address.
- There might be foreign interfaces bridged with our DSA switch ports
(sending a packet towards the host does not necessarily equal
termination, but maybe software forwarding). So if there is no one
interested in that multicast traffic in the local network stack, that
doesn't mean nobody is.
- PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as
the bridge is concerned. This should reach the CPU port.
- The switch driver might not do FDB partitioning. And since we don't
even bother to do more fine-grained flood disabling (such as "disable
flooding _from_port_N_ towards the CPU port" as opposed to "disable
flooding _from_any_port_ towards the CPU port"), this breaks standalone
ports, or even multiple bridges where one has an IGMP querier and one
doesn't.
Reverting the logic makes all of the above work.
Fixes: a8b659e7ff75 ("net: dsa: act as passthrough for bridge port flags")
Fixes: 08cc83cc7fd8 ("net: dsa: add support for BRIDGE_MROUTER attribute")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7df4e7449489d82cee6813dccbb4ae4f3f26ef7b ]
Qingfang points out that when a bridge with the default settings is
created and a port joins it:
ip link add br0 type bridge
ip link set swp0 master br0
DSA calls br_multicast_router() on the bridge to see if the br0 device
is a multicast router port, and if it is, it enables multicast flooding
to the CPU port, otherwise it disables it.
If we look through the multicast_router_show() sysfs or at the
IFLA_BR_MCAST_ROUTER netlink attribute, we see that the default mrouter
attribute for the bridge device is "1" (MDB_RTR_TYPE_TEMP_QUERY).
However, br_multicast_router() will return "0" (MDB_RTR_TYPE_DISABLED),
because an mrouter port in the MDB_RTR_TYPE_TEMP_QUERY state may not be
actually _active_ until it receives an actual IGMP query. So, the
br_multicast_router() function should really have been called
br_multicast_router_active() perhaps.
When/if an IGMP query is received, the bridge device will transition via
br_multicast_mark_router() into the active state until the
ip4_mc_router_timer expires after an multicast_querier_interval.
Of course, this does not happen if the bridge is created with an
mcast_router attribute of "2" (MDB_RTR_TYPE_PERM).
The point is that in lack of any IGMP query messages, and in the default
bridge configuration, unregistered multicast packets will not be able to
reach the CPU port through flooding, and this breaks many use cases
(most obviously, IPv6 ND, with its ICMP6 neighbor solicitation multicast
messages).
Leave the multicast flooding setting towards the CPU port down to a driver
level decision.
Fixes: 010e269f91be ("net: dsa: sync up switchdev objects and port attributes when joining the bridge")
Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 713baf3dae8f45dc8ada4ed2f5fdcbf94a5c274d ]
An earlier commit replaced using batostr to using %pMR sprintf for the
construction of session->name. Static analysis detected that this new
method can use a total of 21 characters (including the trailing '\0')
so we need to increase the BTNAMSIZ from 18 to 21 to fix potential
buffer overflows.
Addresses-Coverity: ("Out-of-bounds write")
Fixes: fcb73338ed53 ("Bluetooth: Use %pMR in sprintf/seq_printf instead of batostr")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
module if PTP is enabled
[ Upstream commit f8b17a0bd96065e4511858689916bb729dbb881b ]
TX timestamps are sent by SJA1110 as Ethernet packets containing
metadata, so they are received by the tagging driver but must be
processed by the switch driver - the one that is stateful since it
keeps the TX timestamp queue.
This means that there is an sja1110_process_meta_tstamp() symbol
exported by the switch driver which is called by the tagging driver.
There is a shim definition for that function when the switch driver is
not compiled, which does nothing, but that shim is not effective when
the tagging protocol driver is built-in and the switch driver is a
module, because built-in code cannot call symbols exported by modules.
So add an optional dependency between the tagger and the switch driver,
if PTP support is enabled in the switch driver. If PTP is not enabled,
sja1110_process_meta_tstamp() will translate into the shim "do nothing
with these meta frames" function.
Fixes: 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8b6e638b4be2ad77f61fb93b4e1776c6ccc2edab ]
Upcoming patches will add tag_8021q related logic to switch.c and
port.c, in order to allow it to make use of cross-chip notifiers.
In addition, a struct dsa_8021q_context *ctx pointer will be added to
struct dsa_switch.
It seems fairly low-reward to #ifdef the *ctx from struct dsa_switch and
to provide shim implementations of the entire tag_8021q.c calling
surface (not even clear what to do about the tag_8021q cross-chip
notifiers to avoid compiling them). The runtime overhead for switches
which don't use tag_8021q is fairly small because all helpers will check
for ds->tag_8021q_ctx being a NULL pointer and stop there.
So let's make it part of dsa_core.o.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a25fca4d3c18766b6f7a3c95fa8faec23ef464c5 ]
This patch fixes the MGMT add_advertising command repsones with the
wrong opcode when it is trying to return the not supported error.
Fixes: cbbdfa6f33198 ("Bluetooth: Enable controller RPA resolution using Experimental feature")
Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8ca34a13f7f9b3fa2c464160ffe8cc1a72088204 ]
Syzbot reported warning in netlbl_cipsov4_add(). The
problem was in too big doi_def->map.std->lvl.local_size
passed to kcalloc(). Since this value comes from userpace there is
no need to warn if value is not correct.
The same problem may occur with other kcalloc() calls in
this function, so, I've added __GFP_NOWARN flag to all
kcalloc() calls there.
Reported-and-tested-by: syzbot+cdd51ee2e6b0b2e18c0d@syzkaller.appspotmail.com
Fixes: 96cb8e3313c7 ("[NetLabel]: CIPSOv4 and Unlabeled packet integration")
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 525e2f9fd0229eb10cb460a9e6d978257f24804e ]
st->bucket stores the current bucket number.
st->offset stores the offset within this bucket that is the sk to be
seq_show(). Thus, st->offset only makes sense within the same
st->bucket.
These two variables are an optimization for the common no-lseek case.
When resuming the seq_file iteration (i.e. seq_start()),
tcp_seek_last_pos() tries to continue from the st->offset
at bucket st->bucket.
However, it is possible that the bucket pointed by st->bucket
has changed and st->offset may end up skipping the whole st->bucket
without finding a sk. In this case, tcp_seek_last_pos() currently
continues to satisfy the offset condition in the next (and incorrect)
bucket. Instead, regardless of the offset value, the first sk of the
next bucket should be returned. Thus, "bucket == st->bucket" check is
added to tcp_seek_last_pos().
The chance of hitting this is small and the issue is a decade old,
so targeting for the next tree.
Fixes: a8b690f98baf ("tcp: Fix slowness in read /proc/net/tcp")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210701200541.1033917-1-kafai@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9af417610b6142e826fd1ee8ba7ff3e9a2133a5a ]
The bounds check of id is off-by-one and the comparison should
be >= rather >. Currently the WARN_ON_ONCE check does not stop
the out of range indexing of &ldev->ctx.table[id] so also add
a return path if the bounds are out of range.
Addresses-Coverity: ("Illegal address computation").
Fixes: 5609c185f24d ("6lowpan: iphc: add support for stateful compression")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 59da0b38bc2ea570ede23a3332ecb3e7574ce6b2 ]
Smatch complains that some of these struct members are not initialized
leading to a stack information disclosure:
net/bluetooth/sco.c:778 sco_conn_defer_accept() warn:
check that 'cp.retrans_effort' doesn't leak information
This seems like a valid warning. I've added a default case to fix
this issue.
Fixes: 2f69a82acf6f ("Bluetooth: Use voice setting in deferred SCO connection request")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 23d2b94043ca8835bd1e67749020e839f396a1c2 upstream.
I got below panic when doing fuzz test:
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G B 5.14.0-rc1-00195-gcff5c4254439-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x7a/0x9b
panic+0x2cd/0x5af
end_report.cold+0x5a/0x5a
kasan_report+0xec/0x110
ip_check_mc_rcu+0x556/0x5d0
__mkroute_output+0x895/0x1740
ip_route_output_key_hash_rcu+0x2d0/0x1050
ip_route_output_key_hash+0x182/0x2e0
ip_route_output_flow+0x28/0x130
udp_sendmsg+0x165d/0x2280
udpv6_sendmsg+0x121e/0x24f0
inet6_sendmsg+0xf7/0x140
sock_sendmsg+0xe9/0x180
____sys_sendmsg+0x2b8/0x7a0
___sys_sendmsg+0xf0/0x160
__sys_sendmmsg+0x17e/0x3c0
__x64_sys_sendmmsg+0x9e/0x100
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x462eb9
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8
48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9
RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc
R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff
It is one use-after-free in ip_check_mc_rcu.
In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection.
But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d0efb16294d145d157432feda83877ae9d7cdf37 upstream.
A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).
Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.
Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.
Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull nfsd fix from Bruce Fields:
"This is a one-liner fix for a serious bug that can cause the server to
become unresponsive to a client, so I think it's worth the last-minute
inclusion for 5.14"
* tag 'nfsd-5.14-1' of git://linux-nfs.org/~bfields/linux:
SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...
|
|
This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88.
Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:
qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.
Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
failed, we sometime observe panic:
BUG: kernel NULL pointer dereference, address:
...
RIP: 0010:cipso_v4_doi_free+0x3a/0x80
...
Call Trace:
netlbl_cipsov4_add_std+0xf4/0x8c0
netlbl_cipsov4_add+0x13f/0x1b0
genl_family_rcv_msg_doit.isra.15+0x132/0x170
genl_rcv_msg+0x125/0x240
This is because in cipso_v4_doi_free() there is no check
on 'doi_def->map.std' when 'doi_def->type' equal 1, which
is possibe, since netlbl_cipsov4_add_std() haven't initialize
it before alloc 'doi_def->map.std'.
This patch just add the check to prevent panic happen for similar
cases.
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently when device is moved between network namespaces using
RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
target namespace already has device with same name, userspace will get
EINVAL what is confusing and makes debugging harder.
Fix it so that userspace gets more appropriate EEXIST instead what makes
debugging much easier.
Before:
# ./ifname.sh
+ ip netns add ns0
+ ip netns exec ns0 ip link add l0 type dummy
+ ip netns exec ns0 ip link show l0
8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
+ ip link add l0 type dummy
+ ip link show l0
10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
+ ip link set l0 netns ns0
RTNETLINK answers: Invalid argument
After:
# ./ifname.sh
+ ip netns add ns0
+ ip netns exec ns0 ip link add l0 type dummy
+ ip netns exec ns0 ip link show l0
8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
+ ip link add l0 type dummy
+ ip link show l0
10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
+ ip link set l0 netns ns0
RTNETLINK answers: File exists
The problem is that do_setlink() passes its `char *ifname` argument,
that it gets from a caller, to __dev_change_net_namespace() as is (as
`const char *pat`), but semantics of ifname and pat can be different.
For example, __rtnl_newlink() does this:
net/core/rtnetlink.c
3270 char ifname[IFNAMSIZ];
...
3286 if (tb[IFLA_IFNAME])
3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
3288 else
3289 ifname[0] = '\0';
...
3364 if (dev) {
...
3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
3395 }
, i.e. do_setlink() gets ifname pointer that is always valid no matter
if user specified IFLA_IFNAME or not and then do_setlink() passes this
ifname pointer as is to __dev_change_net_namespace() as pat argument.
But the pat (pattern) in __dev_change_net_namespace() is used as:
net/core/dev.c
11198 err = -EEXIST;
11199 if (__dev_get_by_name(net, dev->name)) {
11200 /* We get here if we can't use the current device name */
11201 if (!pat)
11202 goto out;
11203 err = dev_get_valid_name(net, dev, pat);
11204 if (err < 0)
11205 goto out;
11206 }
As the result the `goto out` path on line 11202 is neven taken and
instead of returning EEXIST defined on line 11198,
__dev_change_net_namespace() returns an error from dev_get_valid_name()
and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.
Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A group of security researchers brought to our attention
the weakness of hash function used in fnhe_hashfun().
Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.
Also remove the inline keyword, this really is distracting.
Fixes: d546c621542d ("ipv4: harden fnhe_hashfun()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A group of security researchers brought to our attention
the weakness of hash function used in rt6_exception_hash()
Lets use siphash instead of Jenkins Hash, to considerably
reduce security risks.
Following patch deals with IPv4.
Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
flag on the socket. Among other things, this make it impossible to close
the socket.
Fixes: 82011c80b3ec ("SUNRPC: Move svc_xprt_received() call sites")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
While running kselftests, Hangbin observed that sch_ets.sh often crashes,
and splats like the following one are seen in the output of 'dmesg':
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0
Oops: 0000 [#1] SMP NOPTI
CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: 0010:__list_del_entry_valid+0x2d/0x50
Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94
RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217
RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8
RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100
R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002
FS: 00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0
Call Trace:
ets_qdisc_reset+0x6e/0x100 [sch_ets]
qdisc_reset+0x49/0x1d0
tbf_reset+0x15/0x60 [sch_tbf]
qdisc_reset+0x49/0x1d0
dev_reset_queue.constprop.42+0x2f/0x90
dev_deactivate_many+0x1d3/0x3d0
dev_deactivate+0x56/0x90
qdisc_graft+0x47e/0x5a0
tc_get_qdisc+0x1db/0x3e0
rtnetlink_rcv_msg+0x164/0x4c0
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x1a5/0x280
netlink_sendmsg+0x242/0x480
sock_sendmsg+0x5b/0x60
____sys_sendmsg+0x1f2/0x260
___sys_sendmsg+0x7c/0xc0
__sys_sendmsg+0x57/0xa0
do_syscall_64+0x3a/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f93e44b8338
Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338
RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000
Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000
When the change() function decreases the value of 'nstrict', we must take
into account that packets might be already enqueued on a class that flips
from 'strict' to 'quantum': otherwise that class will not be added to the
bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to
do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer
dereference.
For classes flipping from 'strict' to 'quantum', initialize an empty list
and eventually add it to the bandwidth-sharing list, if there are packets
already enqueued. In this way, the kernel will:
a) prevent crashing as described above.
b) avoid retaining the backlog packets (for an arbitrarily long time) in
case no packet is enqueued after a change from 'strict' to 'quantum'.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
correct comments in set and get fn_sernum
Signed-off-by: zhang kai <zhangkaiheb@126.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Validate csum_start in gre_handle_offloads before we call _gre_xmit so
that we do not crash later when the csum_start value is used in the
lco_csum function call.
This patch deals with ipv6 code.
Fixes: Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common
GRE functions")
Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com
Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Validate csum_start in gre_handle_offloads before we call _gre_xmit so
that we do not crash later when the csum_start value is used in the
lco_csum function call.
This patch deals with ipv4 code.
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com
Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This check was incomplete, did not consider size is 0:
if (len != ALIGN(size, 4) + hdrlen)
goto err;
if size from qrtr_hdr is 0, the result of ALIGN(size, 4)
will be 0, In case of len == hdrlen and size == 0
in header this check won't fail and
if (cb->type == QRTR_TYPE_NEW_SERVER) {
/* Remote node endpoint can bridge other distant nodes */
const struct qrtr_ctrl_pkt *pkt = data + hdrlen;
qrtr_node_assign(node, le32_to_cpu(pkt->server.node));
}
will also read out of bound from data, which is hdrlen allocated block.
Fixes: 194ccc88297a ("net: qrtr: Support decoding incoming v2 packets")
Fixes: ad9d24c9429e ("net: qrtr: fix OOB Read in qrtr_endpoint_post")
Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR
with HMAC from the server, it is enough to switch to a "fully
established" mode because it has received more MPTCP options.
It was then OK to enable the "fully_established" flag on the MPTCP
socket. Still, best to check if the ADD_ADDR looks valid by looking if
it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received
while we are not in "fully established" mode, it is strange and then
we should not switch to this mode now.
But that is not enough. On one hand, the path-manager has be notified
the state has changed. On the other hand, the "fully_established" flag
on the subflow socket should be turned on as well not to re-send the
MP_CAPABLE 3rd ACK content with the next ACK.
Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The endpoint cleanup path is prone to a memory leak, as reported
by syzkaller:
BUG: memory leak
unreferenced object 0xffff88810680ea00 (size 64):
comm "syz-executor.6", pid 6191, jiffies 4295756280 (age 24.138s)
hex dump (first 32 bytes):
58 75 7d 3c 80 88 ff ff 22 01 00 00 00 00 ad de Xu}<....".......
01 00 02 00 00 00 00 00 ac 1e 00 07 00 00 00 00 ................
backtrace:
[<0000000072a9f72a>] kmalloc include/linux/slab.h:591 [inline]
[<0000000072a9f72a>] mptcp_nl_cmd_add_addr+0x287/0x9f0 net/mptcp/pm_netlink.c:1170
[<00000000f6e931bf>] genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
[<00000000f1504a2c>] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
[<00000000f1504a2c>] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
[<0000000097e76f6a>] netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
[<00000000ceefa2b8>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
[<000000008ff91aec>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
[<000000008ff91aec>] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
[<0000000041682c35>] netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
[<00000000df3aa8e7>] sock_sendmsg_nosec net/socket.c:704 [inline]
[<00000000df3aa8e7>] sock_sendmsg+0x14e/0x190 net/socket.c:724
[<000000002154c54c>] ____sys_sendmsg+0x709/0x870 net/socket.c:2403
[<000000001aab01d7>] ___sys_sendmsg+0xff/0x170 net/socket.c:2457
[<00000000fa3b1446>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
[<00000000db2ee9c7>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<00000000db2ee9c7>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<000000005873517d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
We should not require an allocation to cleanup stuff.
Rework the code a bit so that the additional RCU work is no more needed.
Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Function "dma_map_sg" is entitled to merge adjacent entries
and return a value smaller than what was passed as "nents".
Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len")
rather than the original "nents" parameter ("sg_len").
This old RDS bug was exposed and reliably causes kernel panics
(using RDMA operations "rds-stress -D") on x86_64 starting with:
commit c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Simply put: Linux 5.11 and later.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
doesn't clear skb->tstamp. We encountered a problem with linux
version 5.4.56 and ovs version 2.14.1, and packets failed to
dequeue from qdisc when fq qdisc was attached to ovs port.
Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com>
Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When adding support for using the skb->hash value as the flow hash in CAKE,
I accidentally introduced a logic error that broke the host-only isolation
modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash
variable should stay initialised to 0 in cake_hash() in pure host-based
hashing mode. Add a check for this before using the skb->hash value as
flow_hash.
Fixes: b0c19ed6088a ("sch_cake: Take advantage of skb->hash where appropriate")
Reported-by: Pete Heist <pete@heistp.net>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ilan's change to move locking around accidentally lost the
wiphy_lock() during some porting, add it back.
Fixes: 45daaa131841 ("mac80211: Properly WARN on HW scan before restart")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20210817121210.47bdb177064f.Ib1ef79440cd27f318c028ddfc0c642406917f512@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
__tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg()
or tipc_connect(). The difference is in tipc_connect(), it will call
tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting
is done. So there's no need to wait in __tipc_sendmsg() for this case.
This patch is to fix it by calling tipc_wait_for_connect() only when dlen
is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect().
Note this also fixes the failure in tipcutils/test/ptts/:
# ./tipcTS &
# ./tipcTC 9
(hang)
Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
ieee802154 for net 2021-08-12
Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller
reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw.
* tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
net: Fix memory leak in ieee802154_raw_deliver
ieee802154: hwsim: fix GPF in hwsim_new_edge_nl
ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi
====================
Link: https://lore.kernel.org/r/20210812183912.1663996-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a potential deadlock case when remove the vsock device or
process the RESET event:
vsock_for_each_connected_socket:
spin_lock_bh(&vsock_table_lock) ----------- (1)
...
virtio_vsock_reset_sock:
lock_sock(sk) --------------------- (2)
...
spin_unlock_bh(&vsock_table_lock)
lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".
Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
Call trace:
queued_spin_lock_slowpath
vsock_remove_bound
vsock_remove_sock
virtio_transport_release
__vsock_release
vsock_release
__sock_release
sock_close
__fput
____fput
So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.
Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko")
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 0efea3c649f0 because of:
- The returning -ENOBUF error is fine on socket buffer allocation.
- There is side effect in the calling path
tipc_node_xmit()->tipc_link_xmit() when checking error code returning.
Fixes: 0efea3c649f0 ("tipc: Return the correct errno code")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some arches support cmpxchg() on 4-byte and 8-byte only.
Increase mr_ifc_count width to 32bit to fix this problem.
Fixes: 4a2b285e7e10 ("net: igmp: fix data-race in igmp_ifc_timer_expire()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently if BBR congestion control is initialized after more than 2B
packets have been delivered, depending on the phase of the
tp->delivered counter the tracking of BBR round trips can get stuck.
The bug arises because if tp->delivered is between 2^31 and 2^32 at
the time the BBR congestion control module is initialized, then the
initialization of bbr->next_rtt_delivered to 0 will cause the logic to
believe that the end of the round trip is still billions of packets in
the future. More specifically, the following check will fail
repeatedly:
!before(rs->prior_delivered, bbr->next_rtt_delivered)
and thus the connection will take up to 2B packets delivered before
that check will pass and the connection will set:
bbr->round_start = 1;
This could cause many mechanisms in BBR to fail to trigger, for
example bbr_check_full_bw_reached() would likely never exit STARTUP.
This bug is 5 years old and has not been observed, and as a practical
matter this would likely rarely trigger, since it would require
transferring at least 2B packets, or likely more than 3 terabytes of
data, before switching congestion control algorithms to BBR.
This patch is a stable candidate for kernels as far back as v4.9,
when tcp_bbr.c was added.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Kevin Yang <yyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|