summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2021-10-27bpf, test, cgroup: Use sk_{alloc,free} for test casesDaniel Borkmann
commit 435b08ec0094ac1e128afe6cfd0d9311a8c617a7 upstream. BPF test infra has some hacks in place which kzalloc() a socket and perform minimum init via sock_net_set() and sock_init_data(). As a result, the sk's skcd->cgroup is NULL since it didn't go through proper initialization as it would have been the case from sk_alloc(). Rather than re-adding a NULL test in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test socket. The latter also allows to get rid of the bpf_sk_storage_free() special case. Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") Fixes: b7a1848e8398 ("bpf: add BPF_PROG_TEST_RUN support for flow dissector") Fixes: 2cb494a36c98 ("bpf: add tests for direct packet access from CGROUP_SKB") Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27net: bridge: mcast: use multicast_membership_interval for IGMPv3Nikolay Aleksandrov
commit fac3cb82a54a4b7c49c932f96ef196cf5774344c upstream. When I added IGMPv3 support I decided to follow the RFC for computing the GMI dynamically: " 8.4. Group Membership Interval The Group Membership Interval is the amount of time that must pass before a multicast router decides there are no more members of a group or a particular source on a network. This value MUST be ((the Robustness Variable) times (the Query Interval)) plus (one Query Response Interval)." But that actually is inconsistent with how the bridge used to compute it for IGMPv2, where it was user-configurable that has a correct default value but it is up to user-space to maintain it. This would make it consistent with the other timer values which are also maintained correct by the user instead of being dynamically computed. It also changes back to the previous user-expected GMI behaviour for IGMPv3 queries which were supported before IGMPv3 was added. Note that to properly compute it dynamically we would need to add support for "Robustness Variable" which is currently missing. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27netfilter: Kconfig: use 'default y' instead of 'm' for bool config optionVegard Nossum
commit 77076934afdcd46516caf18ed88b2f88025c9ddb upstream. This option, NF_CONNTRACK_SECMARK, is a bool, so it can never be 'm'. Fixes: 33b8e77605620 ("[NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27nfc: nci: fix the UAF of rf_conn_info objectLin Ma
commit 1b1499a817c90fd1ce9453a2c98d2a01cca0e775 upstream. The nci_core_conn_close_rsp_packet() function will release the conn_info with given conn_id. However, it needs to set the rf_conn_info to NULL to prevent other routines like nci_rf_intf_activated_ntf_packet() to trigger the UAF. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: j1939: j1939_xtp_rx_rts_session_new(): abort TP less than 9 bytesZhang Changzhong
commit a4fbe70c5cb746441d56b28cf88161d9e0e25378 upstream. The receiver should abort TP if 'total message size' in TP.CM_RTS and TP.CM_BAM is less than 9 or greater than 1785 [1], but currently the j1939 stack only checks the upper bound and the receiver will accept the following broadcast message: vcan1 18ECFF00 [8] 20 08 00 02 FF 00 23 01 vcan1 18EBFF00 [8] 01 00 00 00 00 00 00 00 vcan1 18EBFF00 [8] 02 00 FF FF FF FF FF FF This patch adds check for the lower bound and abort illegal TP. [1] SAE-J1939-82 A.3.4 Row 2 and A.3.6 Row 6. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1634203601-3460-1-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: j1939: j1939_xtp_rx_dat_one(): cancel session if receive TP.DT with ↵Zhang Changzhong
error length commit 379743985ab6cfe2cbd32067cf4ed497baca6d06 upstream. According to SAE-J1939-21, the data length of TP.DT must be 8 bytes, so cancel session when receive unexpected TP.DT message. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1632972800-45091-1-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: j1939: j1939_netdev_start(): fix UAF for rx_kref of j1939_privZiyang Xuan
commit d9d52a3ebd284882f5562c88e55991add5d01586 upstream. It will trigger UAF for rx_kref of j1939_priv as following. cpu0 cpu1 j1939_sk_bind(socket0, ndev0, ...) j1939_netdev_start j1939_sk_bind(socket1, ndev0, ...) j1939_netdev_start j1939_priv_set j1939_priv_get_by_ndev_locked j1939_jsk_add ..... j1939_netdev_stop kref_put_lock(&priv->rx_kref, ...) kref_get(&priv->rx_kref, ...) REFCOUNT_WARN("addition on 0;...") ==================================================== refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 20874 at lib/refcount.c:25 refcount_warn_saturate+0x169/0x1e0 RIP: 0010:refcount_warn_saturate+0x169/0x1e0 Call Trace: j1939_netdev_start+0x68b/0x920 j1939_sk_bind+0x426/0xeb0 ? security_socket_bind+0x83/0xb0 The rx_kref's kref_get() and kref_put() should use j1939_netdev_lock to protect. Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/20210926104757.2021540-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Reported-by: syzbot+85d9878b19c94f9019ad@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: j1939: j1939_tp_rxtimer(): fix errant alert in j1939_tp_rxtimerZiyang Xuan
commit b504a884f6b5a77dac7d580ffa08e482f70d1a30 upstream. When the session state is J1939_SESSION_DONE, j1939_tp_rxtimer() will give an alert "rx timeout, send abort", but do nothing actually. Move the alert into session active judgment condition, it is more reasonable. One of the scenarios is that j1939_tp_rxtimer() execute followed by j1939_xtp_rx_abort_one(). After j1939_xtp_rx_abort_one(), the session state is J1939_SESSION_DONE, then j1939_tp_rxtimer() give an alert. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/20210906094219.95924-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()Ziyang Xuan
commit 43a08c3bdac4cb42eff8fe5e2278bffe0c5c3daa upstream. When isotp_sendmsg() concurrent, tx.state of all TX processes can be ISOTP_IDLE. The conditions so->tx.state != ISOTP_IDLE and wq_has_sleeper(&so->wait) can not protect TX buffer from being accessed by multiple TX processes. We can use cmpxchg() to try to modify tx.state to ISOTP_SENDING firstly. If the modification of the previous process succeed, the later process must wait tx.state to ISOTP_IDLE firstly. Thus, we can ensure TX buffer is accessed by only one process at the same time. And we should also restore the original tx.state at the subsequent error processes. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/c2517874fbdf4188585cf9ddf67a8fa74d5dbde5.1633764159.git.william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: isotp: isotp_sendmsg(): add result check for wait_event_interruptible()Ziyang Xuan
commit 9acf636215a6ce9362fe618e7da4913b8bfe84c8 upstream. Using wait_event_interruptible() to wait for complete transmission, but do not check the result of wait_event_interruptible() which can be interrupted. It will result in TX buffer has multiple accessors and the later process interferes with the previous process. Following is one of the problems reported by syzbot. ============================================================= WARNING: CPU: 0 PID: 0 at net/can/isotp.c:840 isotp_tx_timer_handler+0x2e0/0x4c0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7+ #68 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:isotp_tx_timer_handler+0x2e0/0x4c0 Call Trace: <IRQ> ? isotp_setsockopt+0x390/0x390 __hrtimer_run_queues+0xb8/0x610 hrtimer_run_softirq+0x91/0xd0 ? rcu_read_lock_sched_held+0x4d/0x80 __do_softirq+0xe8/0x553 irq_exit_rcu+0xf8/0x100 sysvec_apic_timer_interrupt+0x9e/0xc0 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 Add result check for wait_event_interruptible() in isotp_sendmsg() to avoid multiple accessers for tx buffer. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/10ca695732c9dd267c76a3c30f37aefe1ff7e32f.1633764159.git.william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Reported-by: syzbot+78bab6958a614b0c80b9@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX pathMarc Kleine-Budde
commit d674a8f123b4096d85955c7eaabec688f29724c9 upstream. When the a large chunk of data send and the receiver does not send a Flow Control frame back in time, the sendmsg() does not return a error code, but the number of bytes sent corresponding to the size of the packet. If a timeout occurs the isotp_tx_timer_handler() is fired, sets sk->sk_err and calls the sk->sk_error_report() function. It was wrongly expected that the error would be propagated to user space in every case. For isotp_sendmsg() blocking on wait_event_interruptible() this is not the case. This patch fixes the problem by checking if sk->sk_err is set and returning the error to user space. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/hartkopp/can-isotp/issues/42 Link: https://github.com/hartkopp/can-isotp/pull/43 Link: https://lore.kernel.org/all/20210507091839.1366379-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Reported-by: Sottas Guillaume (LMB) <Guillaume.Sottas@liebherr.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()'Christophe JAILLET
[ Upstream commit ba69fd9101f20a6d05a96ab743341d4e7b1a2178 ] If we return before the end of the 'for_each_child_of_node()' iterator, the reference taken on 'port' must be released. Add the missing 'of_node_put()' calls. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27net/sched: act_ct: Fix byte count on fragmented packetsPaul Blakey
[ Upstream commit 2dc4e9e88cfcc38454d52b01ed3422238c134003 ] First fragmented packets (frag offset = 0) byte len is zeroed when stolen by ip_defrag(). And since act_ct update the stats only afterwards (at end of execute), bytes aren't correctly accounted for such packets. To fix this, move stats update to start of action execute. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27ipv6: When forwarding count rx stats on the orig netdevStephen Suryaputra
[ Upstream commit 0857d6f8c759d95f89d0436f86cdfd189ef99f20 ] Commit bdb7cc643fc9 ("ipv6: Count interface receive statistics on the ingress netdev") does not work when ip6_forward() executes on the skbs with vrf-enslaved netdev. Use IP6CB(skb)->iif to get to the right one. Add a selftest script to verify. Fixes: bdb7cc643fc9 ("ipv6: Count interface receive statistics on the ingress netdev") Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211014130845.410602-1-ssuryaextr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27tcp: md5: Fix overlap between vrf and non-vrf keysLeonard Crestez
[ Upstream commit 86f1e3a8489f6a0232c1f3bc2bdb379f5ccdecec ] With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to accept connection from the same client address in different VRFs. It is also possible to set different MD5 keys for these clients which differ only in the tcpm_l3index field. This appears to work when distinguishing between different VRFs but not between non-VRF and VRF connections. In particular: * tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key. This means that adding a key with l3index != 0 after a key with l3index == 0 will cause the earlier key to be deleted. Both keys can be present if the non-vrf key is added later. * _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This casues failures if the passwords differ. Fix this by making tcp_md5_do_lookup_exact perform an actual exact comparison on l3index and by making __tcp_md5_do_lookup perfer vrf-bound keys above other considerations like prefixlen. Fixes: dea53bb80e07 ("tcp: Add l3index to tcp_md5sig_key and md5 functions") Signed-off-by: Leonard Crestez <cdleonard@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27netfilter: ipvs: make global sysctl readonly in non-init netnsAntoine Tenart
[ Upstream commit 174c376278949c44aad89c514a6b5db6cee8db59 ] Because the data pointer of net/ipv4/vs/debug_level is not updated per netns, it must be marked as read-only in non-init netns. Fixes: c6d2d445d8de ("IPVS: netns, final patch enabling network name space.") Signed-off-by: Antoine Tenart <atenart@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27netfilter: ip6t_rt: fix rt0_hdr parsing in rt_mt6Xin Long
[ Upstream commit a482c5e00a9b5a194085bcd372ac36141028becb ] In rt_mt6(), when it's a nonlinear skb, the 1st skb_header_pointer() only copies sizeof(struct ipv6_rt_hdr) to _route that rh points to. The access by ((const struct rt0_hdr *)rh)->reserved will overflow the buffer. So this access should be moved below the 2nd call to skb_header_pointer(). Besides, after the 2nd skb_header_pointer(), its return value should also be checked, othersize, *rp may cause null-pointer-ref. v1->v2: - clean up some old debugging log. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27netfilter: nf_tables: skip netdev events generated on netns removalFlorian Westphal
[ Upstream commit 68a3765c659f809dcaac20030853a054646eb739 ] syzbot reported following (harmless) WARN: WARNING: CPU: 1 PID: 2648 at net/netfilter/core.c:468 nft_netdev_unregister_hooks net/netfilter/nf_tables_api.c:230 [inline] nf_tables_unregister_hook include/net/netfilter/nf_tables.h:1090 [inline] __nft_release_basechain+0x138/0x640 net/netfilter/nf_tables_api.c:9524 nft_netdev_event net/netfilter/nft_chain_filter.c:351 [inline] nf_tables_netdev_event+0x521/0x8a0 net/netfilter/nft_chain_filter.c:382 reproducer: unshare -n bash -c 'ip link add br0 type bridge; nft add table netdev t ; \ nft add chain netdev t ingress \{ type filter hook ingress device "br0" \ priority 0\; policy drop\; \}' Problem is that when netns device exit hooks create the UNREGISTER event, the .pre_exit hook for nf_tables core has already removed the base hook. Notifier attempts to do this again. The need to do base hook unregister unconditionally was needed in the past, because notifier was last stage where reg->dev dereference was safe. Now that nf_tables does the hook removal in .pre_exit, this isn't needed anymore. Reported-and-tested-by: syzbot+154bd5be532a63aa778b@syzkaller.appspotmail.com Fixes: 767d1216bff825 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-27netfilter: xt_IDLETIMER: fix panic that occurs when timer_type has garbage valueJuhee Kang
[ Upstream commit 902c0b1887522a099aa4e1e6b4b476c2fe5dd13e ] Currently, when the rule related to IDLETIMER is added, idletimer_tg timer structure is initialized by kmalloc on executing idletimer_tg_create function. However, in this process timer->timer_type is not defined to a specific value. Thus, timer->timer_type has garbage value and it occurs kernel panic. So, this commit fixes the panic by initializing timer->timer_type using kzalloc instead of kmalloc. Test commands: # iptables -A OUTPUT -j IDLETIMER --timeout 1 --label test $ cat /sys/class/xt_idletimer/timers/test Killed Splat looks like: BUG: KASAN: user-memory-access in alarm_expires_remaining+0x49/0x70 Read of size 8 at addr 0000002e8c7bc4c8 by task cat/917 CPU: 12 PID: 917 Comm: cat Not tainted 5.14.0+ #3 79940a339f71eb14fc81aee1757a20d5bf13eb0e Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: dump_stack_lvl+0x6e/0x9c kasan_report.cold+0x112/0x117 ? alarm_expires_remaining+0x49/0x70 __asan_load8+0x86/0xb0 alarm_expires_remaining+0x49/0x70 idletimer_tg_show+0xe5/0x19b [xt_IDLETIMER 11219304af9316a21bee5ba9d58f76a6b9bccc6d] dev_attr_show+0x3c/0x60 sysfs_kf_seq_show+0x11d/0x1f0 ? device_remove_bin_file+0x20/0x20 kernfs_seq_show+0xa4/0xb0 seq_read_iter+0x29c/0x750 kernfs_fop_read_iter+0x25a/0x2c0 ? __fsnotify_parent+0x3d1/0x570 ? iov_iter_init+0x70/0x90 new_sync_read+0x2a7/0x3d0 ? __x64_sys_llseek+0x230/0x230 ? rw_verify_area+0x81/0x150 vfs_read+0x17b/0x240 ksys_read+0xd9/0x180 ? vfs_write+0x460/0x460 ? do_syscall_64+0x16/0xc0 ? lockdep_hardirqs_on+0x79/0x120 __x64_sys_read+0x43/0x50 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f0cdc819142 Code: c0 e9 c2 fe ff ff 50 48 8d 3d 3a ca 0a 00 e8 f5 19 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 RSP: 002b:00007fff28eee5b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f0cdc819142 RDX: 0000000000020000 RSI: 00007f0cdc032000 RDI: 0000000000000003 RBP: 00007f0cdc032000 R08: 00007f0cdc031010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 00005607e9ee31f0 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 Fixes: 68983a354a65 ("netfilter: xtables: Add snapshot of hardidletimer target") Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-20mptcp: fix possible stall on recvmsg()Paolo Abeni
commit 612f71d7328c14369924384ad2170aae2a6abd92 upstream. recvmsg() can enter an infinite loop if the caller provides the MSG_WAITALL, the data present in the receive queue is not sufficient to fulfill the request, and no more data is received by the peer. When the above happens, mptcp_wait_data() will always return with no wait, as the MPTCP_DATA_READY flag checked by such function is set and never cleared in such code path. Leveraging the above syzbot was able to trigger an RCU stall: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1 (t=10500 jiffies g=13089 q=109) rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:28696 pid: 14 ppid: 2 flags:0x00004000 Call Trace: context_switch kernel/sched/core.c:4955 [inline] __schedule+0x940/0x26f0 kernel/sched/core.c:6236 schedule+0xd3/0x270 kernel/sched/core.c:6315 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline] RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline] RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline] RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline] RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline] RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189 Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00 RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283 RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870 RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877 R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000 R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000 FS: 0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 S: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: instrument_atomic_read_write include/linux/instrumented.h:101 [inline] test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline] mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016 release_sock+0xb4/0x1b0 net/core/sock.c:3204 mptcp_wait_data net/mptcp/protocol.c:1770 [inline] mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080 inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659 sock_recvmsg_nosec net/socket.c:944 [inline] ____sys_recvmsg+0x527/0x600 net/socket.c:2626 ___sys_recvmsg+0x127/0x200 net/socket.c:2670 do_recvmmsg+0x24d/0x6d0 net/socket.c:2764 __sys_recvmmsg net/socket.c:2843 [inline] __do_sys_recvmmsg net/socket.c:2866 [inline] __se_sys_recvmmsg net/socket.c:2859 [inline] __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc200d2dc39 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39 RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003 R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003 Fix the issue by replacing the MPTCP_DATA_READY bit with direct inspection of the msk receive queue. Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com Fixes: 7a6a6cbc3e59 ("mptcp: recvmsg() can drain data from multiple subflow") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20mqprio: Correct stats in mqprio_dump_class_stats().Sebastian Andrzej Siewior
commit 14132690860e4d06aa3e1c4d7d8e9866ba7756dd upstream. Introduction of lockless subqueues broke the class statistics. Before the change stats were accumulated in `bstats' and `qstats' on the stack which was then copied to struct gnet_dump. After the change the `bstats' and `qstats' are initialized to 0 and never updated, yet still fed to gnet_dump. The code updates the global qdisc->cpu_bstats and qdisc->cpu_qstats instead, clobbering them. Most likely a copy-paste error from the code in mqprio_dump(). __gnet_stats_copy_basic() and __gnet_stats_copy_queue() accumulate the values for per-CPU case but for global stats they overwrite the value, so only stats from the last loop iteration / tc end up in sch->[bq]stats. Use the on-stack [bq]stats variables again and add the stats manually in the global case. Fixes: ce679e8df7ed2 ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> https://lore.kernel.org/all/20211007175000.2334713-2-bigeasy@linutronix.de/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20NFC: digital: fix possible memory leak in digital_in_send_sdd_req()Ziyang Xuan
commit 291c932fc3692e4d211a445ba8aa35663831bac7 upstream. 'skb' is allocated in digital_in_send_sdd_req(), but not free when digital_in_send_cmd() failed, which will cause memory leak. Fix it by freeing 'skb' if digital_in_send_cmd() return failed. Fixes: 2c66daecc409 ("NFC Digital: Add NFC-A technology support") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()Ziyang Xuan
commit 58e7dcc9ca29c14e44267a4d0ea61e3229124907 upstream. 'params' is allocated in digital_tg_listen_mdaa(), but not free when digital_send_cmd() failed, which will cause memory leak. Fix it by freeing 'params' if digital_send_cmd() return failed. Fixes: 1c7a4c24fbfd ("NFC Digital: Add target NFC-DEP support") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20nfc: fix error handling of nfc_proto_register()Ziyang Xuan
commit 0911ab31896f0e908540746414a77dd63912748d upstream. When nfc proto id is using, nfc_proto_register() return -EBUSY error code, but forgot to unregister proto. Fix it by adding proto_unregister() in the error handling case. Fixes: c7fe3b52c128 ("NFC: add NFC socket family") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211013034932.2833737-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20net: dsa: fix spurious error message when unoffloaded port leaves bridgeAlvin Šipraga
commit 43a4b4dbd48c9006ef64df3a12acf33bdfe11c61 upstream. Flip the sign of a return value check, thereby suppressing the following spurious error: port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP ... which is emitted when removing an unoffloaded DSA switch port from a bridge. Fixes: d371b7c92d19 ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20211012112730.3429157-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20net/smc: improved fix wait on already cleared linkKarsten Graul
commit 95f7f3e7dc6bd2e735cb5de11734ea2222b1e05a upstream. Commit 8f3d65c16679 ("net/smc: fix wait on already cleared link") introduced link refcounting to avoid waits on already cleared links. This patch extents and improves the refcounting to cover all remaining possible cases for this kind of error situation. Fixes: 15e1b99aadfb ("net/smc: no WR buffer wait for terminating link group") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20sctp: account stream padding length for reconf chunkEiichi Tsukata
commit a2d859e3fc97e79d907761550dbc03ff1b36479c upstream. sctp_make_strreset_req() makes repeated calls to sctp_addto_chunk() which will automatically account for padding on each call. inreq and outreq are already 4 bytes aligned, but the payload is not and doing SCTP_PAD4(a + b) (which _sctp_make_chunk() did implicitly here) is different from SCTP_PAD4(a) + SCTP_PAD4(b) and not enough. It led to possible attempt to use more buffer than it was allocated and triggered a BUG_ON. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Greg KH <gregkh@linuxfoundation.org> Fixes: cc16f00f6529 ("sctp: add support for generating stream reconf ssn reset request chunk") Reported-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/b97c1f8b0c7ff79ac4ed206fc2c49d3612e0850c.1634156849.git.mleitner@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-17mac80211: check return value of rhashtable_initMichelleJin
[ Upstream commit 111461d573741c17eafad029ac93474fa9adcce0 ] When rhashtable_init() fails, it returns -EINVAL. However, since error return value of rhashtable_init is not checked, it can cause use of uninitialized pointers. So, fix unhandled errors of rhashtable_init. Signed-off-by: MichelleJin <shjy180909@gmail.com> Link: https://lore.kernel.org/r/20210927033457.1020967-4-shjy180909@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17net: prevent user from passing illegal stab size王贇
[ Upstream commit b193e15ac69d56f35e1d8e2b5d16cbd47764d053 ] We observed below report when playing with netlink sock: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10 shift exponent 249 is too large for 32-bit type CPU: 0 PID: 685 Comm: a.out Not tainted Call Trace: dump_stack_lvl+0x8d/0xcf ubsan_epilogue+0xa/0x4e __ubsan_handle_shift_out_of_bounds+0x161/0x182 __qdisc_calculate_pkt_len+0xf0/0x190 __dev_queue_xmit+0x2ed/0x15b0 it seems like kernel won't check the stab log value passing from user, and will use the insane value later to calculate pkt_len. This patch just add a check on the size/cell_log to avoid insane calculation. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17mac80211: Drop frames from invalid MAC address in ad-hoc modeYueHaibing
[ Upstream commit a6555f844549cd190eb060daef595f94d3de1582 ] WARNING: CPU: 1 PID: 9 at net/mac80211/sta_info.c:554 sta_info_insert_rcu+0x121/0x12a0 Modules linked in: CPU: 1 PID: 9 Comm: kworker/u8:1 Not tainted 5.14.0-rc7+ #253 Workqueue: phy3 ieee80211_iface_work RIP: 0010:sta_info_insert_rcu+0x121/0x12a0 ... Call Trace: ieee80211_ibss_finish_sta+0xbc/0x170 ieee80211_ibss_work+0x13f/0x7d0 ieee80211_iface_work+0x37a/0x500 process_one_work+0x357/0x850 worker_thread+0x41/0x4d0 If an Ad-Hoc node receives packets with invalid source MAC address, it hits a WARN_ON in sta_info_insert_check(), this can spam the log. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20210827144230.39944-1-yuehaibing@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17netfilter: nf_nat_masquerade: defer conntrack walk to work queueFlorian Westphal
[ Upstream commit 7970a19b71044bf4dc2c1becc200275bdf1884d4 ] The ipv4 and device notifiers are called with RTNL mutex held. The table walk can take some time, better not block other RTNL users. 'ip a' has been reported to block for up to 20 seconds when conntrack table has many entries and device down events are frequent (e.g., PPP). Reported-and-tested-by: Martin Zaharinov <micron10@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17netfilter: nf_nat_masquerade: make async masq_inet6_event handling genericFlorian Westphal
[ Upstream commit 30db406923b9285a9bac06a6af5e74bd6d0f1d06 ] masq_inet6_event is called asynchronously from system work queue, because the inet6 notifier is atomic and nf_iterate_cleanup can sleep. The ipv4 and device notifiers call nf_iterate_cleanup directly. This is legal, but these notifiers are called with RTNL mutex held. A large conntrack table with many devices coming and going will have severe impact on the system usability, with 'ip a' blocking for several seconds. This change places the defer code into a helper and makes it more generic so ipv4 and ifdown notifiers can be converted to defer the cleanup walk as well in a follow patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17netfilter: ip6_tables: zero-initialize fragment offsetJeremy Sowden
[ Upstream commit 310e2d43c3ad429c1fba4b175806cf1f55ed73a6 ] ip6tables only sets the `IP6T_F_PROTO` flag on a rule if a protocol is specified (`-p tcp`, for example). However, if the flag is not set, `ip6_packet_match` doesn't call `ipv6_find_hdr` for the skb, in which case the fragment offset is left uninitialized and a garbage value is passed to each matcher. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13dsa: tag_dsa: Fix mask for trunked packetsAndrew Lunn
commit b44d52a50bc6f191f0ae03f65de8401f3ef039b3 upstream. A packet received on a trunk will have bit 2 set in Forward DSA tagged frame. Bit 1 can be either 0 or 1 and is otherwise undefined and bit 0 indicates the frame CFI. Masking with 7 thus results in frames as being identified as being from a trunk when in fact they are not. Fix the mask to just look at bit 2. Fixes: 5b60dadb71db ("net: dsa: tag_dsa: Support reception of packets from LAG devices") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-13net: prefer socket bound to interface when not in VRFMike Manning
[ Upstream commit 8d6c414cd2fb74aa6812e9bfec6178f8246c4f3a ] The commit 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF") modified compute_score() so that a device match is always made, not just in the case of an l3mdev skb, then increments the score also for unbound sockets. This ensures that sockets bound to an l3mdev are never selected when not in a VRF. But as unbound and bound sockets are now scored equally, this results in the last opened socket being selected if there are matches in the default VRF for an unbound socket and a socket bound to a dev that is not an l3mdev. However, handling prior to this commit was to always select the bound socket in this case. Reinstate this handling by incrementing the score only for bound sockets. The required isolation due to choosing between an unbound socket and a socket bound to an l3mdev remains in place due to the device match always being made. The same approach is taken for compute_score() for stream sockets. Fixes: 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF") Fixes: e78190581aff ("net: ensure unbound stream socket to be chosen when not in a VRF") Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/cf0a8523-b362-1edf-ee78-eef63cbbb428@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13rtnetlink: fix if_nlmsg_stats_size() under estimationEric Dumazet
[ Upstream commit d34367991933d28bd7331f67a759be9a8c474014 ] rtnl_fill_statsinfo() is filling skb with one mandatory if_stats_msg structure. nlmsg_put(skb, pid, seq, type, sizeof(struct if_stats_msg), flags); But if_nlmsg_stats_size() never considered the needed storage. This bug did not show up because alloc_skb(X) allocates skb with extra tailroom, because of added alignments. This could very well be changed in the future to have deterministic behavior. Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13netlink: annotate data races around nlk->boundEric Dumazet
[ Upstream commit 7707a4d01a648e4c655101a469c956cb11273655 ] While existing code is correct, KCSAN is reporting a data-race in netlink_insert / netlink_sendmsg [1] It is correct to read nlk->bound without a lock, as netlink_autobind() will acquire all needed locks. [1] BUG: KCSAN: data-race in netlink_insert / netlink_sendmsg write to 0xffff8881031c8b30 of 1 bytes by task 18752 on cpu 0: netlink_insert+0x5cc/0x7f0 net/netlink/af_netlink.c:597 netlink_autobind+0xa9/0x150 net/netlink/af_netlink.c:842 netlink_sendmsg+0x479/0x7c0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmsg+0x1ed/0x270 net/socket.c:2475 __do_sys_sendmsg net/socket.c:2484 [inline] __se_sys_sendmsg net/socket.c:2482 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff8881031c8b30 of 1 bytes by task 18751 on cpu 1: netlink_sendmsg+0x270/0x7c0 net/netlink/af_netlink.c:1891 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] __sys_sendto+0x2a8/0x370 net/socket.c:2019 __do_sys_sendto net/socket.c:2031 [inline] __se_sys_sendto net/socket.c:2027 [inline] __x64_sys_sendto+0x74/0x90 net/socket.c:2027 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00 -> 0x01 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18751 Comm: syz-executor.0 Not tainted 5.14.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: da314c9923fe ("netlink: Replace rhash_portid with bound") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net/sched: sch_taprio: properly cancel timer from taprio_destroy()Eric Dumazet
[ Upstream commit a56d447f196fa9973c568f54c0d76d5391c3b0c0 ] There is a comment in qdisc_create() about us not calling ops->reset() in some cases. err_out4: /* * Any broken qdiscs that would require a ops->reset() here? * The qdisc was never in action so it shouldn't be necessary. */ As taprio sets a timer before actually receiving a packet, we need to cancel it from ops->destroy, just in case ops->reset has not been called. syzbot reported: ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22 WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000130f330 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff88802baeb880 RSI: ffffffff815d87b5 RDI: fffff52000261e58 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815d25ee R11: 0000000000000000 R12: ffffffff898dd020 R13: ffffffff89e3ce20 R14: ffffffff81653630 R15: dffffc0000000000 FS: 0000000000f0d300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffb64b3e000 CR3: 0000000036557000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __debug_check_no_obj_freed lib/debugobjects.c:987 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018 slab_free_hook mm/slub.c:1603 [inline] slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653 slab_free mm/slub.c:3213 [inline] kfree+0xe4/0x540 mm/slub.c:4267 qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299 tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2403 ___sys_sendmsg+0xf3/0x170 net/socket.c:2457 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 Fixes: 44d4775ca518 ("net/sched: sch_taprio: reset child qdiscs before freeing them") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Davide Caratti <dcaratti@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net: bridge: fix under estimation in br_get_linkxstats_size()Eric Dumazet
[ Upstream commit 0854a0513321cf70bea5fa483ebcaa983cc7c62e ] Commit de1799667b00 ("net: bridge: add STP xstats") added an additional nla_reserve_64bit() in br_fill_linkxstats(), but forgot to update br_get_linkxstats_size() accordingly. This can trigger the following in rtnl_stats_get() WARN_ON(err == -EMSGSIZE); Fixes: de1799667b00 ("net: bridge: add STP xstats") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size()Eric Dumazet
[ Upstream commit dbe0b88064494b7bb6a9b2aa7e085b14a3112d44 ] bridge_fill_linkxstats() is using nla_reserve_64bit(). We must use nla_total_size_64bit() instead of nla_total_size() for corresponding data structure. Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notificationPablo Neira Ayuso
[ Upstream commit 6fb721cf781808ee2ca5e737fb0592cc68de3381 ] Include the NLM_F_CREATE and NLM_F_EXCL flags in netlink event notifications, otherwise userspace cannot distiguish between create and add commands. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13net_sched: fix NULL deref in fifo_set_limit()Eric Dumazet
[ Upstream commit 560ee196fe9e5037e5015e2cdb14b3aecb1cd7dc ] syzbot reported another NULL deref in fifo_set_limit() [1] I could repro the issue with : unshare -n tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit tc qd replace dev lo parent 1:0 pfifo_fast tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit pfifo_fast does not have a change() operation. Make fifo_set_limit() more robust about this. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 1cf99067 P4D 1cf99067 PUD 7ca49067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffffc9000e2f7310 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffffffff8d6ecc00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888024c27910 RDI: ffff888071e34000 RBP: ffff888071e34000 R08: 0000000000000001 R09: ffffffff8fcfb947 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888024c27910 R13: ffff888071e34018 R14: 0000000000000000 R15: ffff88801ef74800 FS: 00007f321d897700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000000722c3000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: fifo_set_limit net/sched/sch_fifo.c:242 [inline] fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227 tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418 qdisc_change net/sched/sch_api.c:1332 [inline] tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: fb0305ce1b03 ("net-sched: consolidate default fifo qdisc setup") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13netfilter: nf_tables: reverse order in rule replacement expansionPablo Neira Ayuso
[ Upstream commit 2c964c558641a3bddaee5719c9e6d8805f777812 ] Deactivate old rule first, then append the new rule, so rule replacement notification via netlink first reports the deletion of the old rule with handle X in first place, then it adds the new rule (reusing the handle X of the replaced old rule). Note that the abort path releases the transaction that has been created by nft_delrule() on error. Fixes: ca08987885a1 ("netfilter: nf_tables: deactivate expressions in rule replecement routine") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13netfilter: nf_tables: add position handle in event notificationPablo Neira Ayuso
[ Upstream commit e189ae161dd784aa5d454b0832f818cacc0e131b ] Add position handle to allow to identify the rule location from netlink events. Otherwise, userspace cannot incrementally update a userspace cache through monitoring events. Skip handle dump if the rule has been either inserted (at the beginning of the ruleset) or appended (at the end of the ruleset), the NLM_F_APPEND netlink flag is sufficient in these two cases. Handle NLM_F_REPLACE as NLM_F_APPEND since the rule replacement expansion appends it after the specified rule handle. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1Florian Westphal
[ Upstream commit 339031bafe6b281cf2dcb8364217288b9fdab555 ] This is a revert of 7b1957b049 ("netfilter: nf_defrag_ipv4: use net_generic infra") and a partial revert of 8b0adbe3e3 ("netfilter: nf_defrag_ipv6: use net_generic infra"). If conntrack is builtin and kernel is booted with: nf_conntrack.enable_hooks=1 .... kernel will fail to boot due to a NULL deref in nf_defrag_ipv4_enable(): Its called before the ipv4 defrag initcall is made, so net_generic() returns NULL. To resolve this, move the user refcount back to struct net so calls to those functions are possible even before their initcalls have run. Fixes: 7b1957b04956 ("netfilter: nf_defrag_ipv4: use net_generic infra") Fixes: 8b0adbe3e38d ("netfilter: nf_defrag_ipv6: use net_generic infra"). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-13SUNRPC: fix sign error causing rpcsec_gss dropsJ. Bruce Fields
commit 2ba5acfb34957e8a7fe47cd78c77ca88e9cc2b03 upstream. If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number whenever sd_max is less than GSS_SEQ_WIN, and the comparison: seq_num <= sd->sd_max - GSS_SEQ_WIN in gss_check_seq_num is pretty much always true, even when that's clearly not what was intended. This was causing pynfs to hang when using krb5, because pynfs uses zero as the initial gss sequence number. That's perfectly legal, but this logic error causes knfsd to drop the rpc in that case. Out-of-order sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this. Fixes: 10b9d99a3dbb ("SUNRPC: Augment server-side rpcgss tracepoints") Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07netfilter: nf_tables: Fix oversized kvmalloc() callsPablo Neira Ayuso
commit 45928afe94a094bcda9af858b96673d59bc4a0e9 upstream. The commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") limits the max allocatable memory via kvmalloc() to MAX_INT. Reported-by: syzbot+cd43695a64bcd21b8596@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07netfilter: conntrack: serialize hash resizes and cleanupsEric Dumazet
commit e9edc188fc76499b0b9bd60364084037f6d03773 upstream. Syzbot was able to trigger the following warning [1] No repro found by syzbot yet but I was able to trigger similar issue by having 2 scripts running in parallel, changing conntrack hash sizes, and: for j in `seq 1 1000` ; do unshare -n /bin/true >/dev/null ; done It would take more than 5 minutes for net_namespace structures to be cleaned up. This is because nf_ct_iterate_cleanup() has to restart everytime a resize happened. By adding a mutex, we can serialize hash resizes and cleanups and also make get_next_corpse() faster by skipping over empty buckets. Even without resizes in the picture, this patch considerably speeds up network namespace dismantles. [1] INFO: task syz-executor.0:8312 can't die for more than 144 seconds. task:syz-executor.0 state:R running task stack:25672 pid: 8312 ppid: 6573 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:4955 [inline] __schedule+0x940/0x26f0 kernel/sched/core.c:6236 preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:6408 preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35 __local_bh_enable_ip+0x109/0x120 kernel/softirq.c:390 local_bh_enable include/linux/bottom_half.h:32 [inline] get_next_corpse net/netfilter/nf_conntrack_core.c:2252 [inline] nf_ct_iterate_cleanup+0x15a/0x450 net/netfilter/nf_conntrack_core.c:2275 nf_conntrack_cleanup_net_list+0x14c/0x4f0 net/netfilter/nf_conntrack_core.c:2469 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:171 setup_net+0x639/0xa30 net/core/net_namespace.c:349 copy_net_ns+0x319/0x760 net/core/net_namespace.c:470 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:226 ksys_unshare+0x445/0x920 kernel/fork.c:3128 __do_sys_unshare kernel/fork.c:3202 [inline] __se_sys_unshare kernel/fork.c:3200 [inline] __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3200 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f63da68e739 RSP: 002b:00007f63d7c05188 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f63da792f80 RCX: 00007f63da68e739 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000 RBP: 00007f63da6e8cc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f63da792f80 R13: 00007fff50b75d3f R14: 00007f63d7c05300 R15: 0000000000022000 Showing all locks held in the system: 1 lock held by khungtaskd/27: #0: ffffffff8b980020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446 2 locks held by kworker/u4:2/153: #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline] #0: ffff888010c69138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2268 #1: ffffc9000140fdb0 ((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2272 1 lock held by systemd-udevd/2970: 1 lock held by in:imklog/6258: #0: ffff88807f970ff0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990 3 locks held by kworker/1:6/8158: 1 lock held by syz-executor.0/8312: 2 locks held by kworker/u4:13/9320: 1 lock held by syz-executor.5/10178: 1 lock held by syz-executor.4/10217: Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07netfilter: ipset: Fix oversized kvmalloc() callsJozsef Kadlecsik
commit 7bbc3d385bd813077acaf0e6fdb2a86a901f5382 upstream. The commit commit 7661809d493b426e979f39ab512e3adf41fbcc69 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 14 09:45:49 2021 -0700 mm: don't allow oversized kvmalloc() calls limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the same limit in ipset. Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-07net: udp: annotate data race around udp_sk(sk)->corkflagEric Dumazet
commit a9f5970767d11eadc805d5283f202612c7ba1f59 upstream. up->corkflag field can be read or written without any lock. Annotate accesses to avoid possible syzbot/KCSAN reports. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>