summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2020-04-16net: Fix skb->csum update in inet_proto_csum_replace16().Praveen Chaudhary
commit 189c9b1e94539b11c80636bc13e9cf47529e7bba upstream. skb->csum is updated incorrectly, when manipulation for NF_NAT_MANIP_SRC\DST is done on IPV6 packet. Fix: There is no need to update skb->csum in inet_proto_csum_replace16(), because update in two fields a.) IPv6 src/dst address and b.) L4 header checksum cancels each other for skb->csum calculation. Whereas inet_proto_csum_replace4 function needs to update skb->csum, because update in 3 fields a.) IPv4 src/dst address, b.) IPv4 Header checksum and c.) L4 header checksum results in same diff as L4 Header checksum for skb->csum calculation. [ pablo@netfilter.org: a few comestic documentation edits ] Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Signed-off-by: Andy Stracner <astracner@linkedin.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16netfilter: conntrack: sctp: use distinct states for new SCTP connectionsJiri Wiesner
commit ab658b9fa7a2c467f79eac8b53ea308b8f98113d upstream. The netlink notifications triggered by the INIT and INIT_ACK chunks for a tracked SCTP association do not include protocol information for the corresponding connection - SCTP state and verification tags for the original and reply direction are missing. Since the connection tracking implementation allows user space programs to receive notifications about a connection and then create a new connection based on the values received in a notification, it makes sense that INIT and INIT_ACK notifications should contain the SCTP state and verification tags available at the time when a notification is sent. The missing verification tags cause a newly created netfilter connection to fail to verify the tags of SCTP packets when this connection has been created from the values previously received in an INIT or INIT_ACK notification. A PROTOINFO event is cached in sctp_packet() when the state of a connection changes. The CLOSED and COOKIE_WAIT state will be used for connections that have seen an INIT and INIT_ACK chunk, respectively. The distinct states will cause a connection state change in sctp_packet(). Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16xfrm: interface: do not confirm neighbor when do pmtu updateXu Wang
commit 8aaea2b0428b6aad7c7e22d3fddc31a78bb1d724 upstream. When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16xfrm interface: fix packet tx through bpf_redirect()Nicolas Dichtel
commit f042365dbffea98fb8148c98c700402e8d099f02 upstream. With an ebpf program that redirects packets through a xfrm interface, packets are dropped because no dst is attached to skb. This could also be reproduced with an AF_PACKET socket, with the following python script (xfrm1 is a xfrm interface): import socket send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0) # scapy # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request') # raw(p) req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00' send_s.sendto(req, ('xfrm1', 0x800, 0, 0)) It was also not possible to send an ip packet through an AF_PACKET socket because a LL header was expected. Let's remove those LL header constraints. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16vti[6]: fix packet tx through bpf_redirect()Nicolas Dichtel
commit 95224166a9032ff5d08fca633d37113078ce7d01 upstream. With an ebpf program that redirects packets through a vti[6] interface, the packets are dropped because no dst is attached. This could also be reproduced with an AF_PACKET socket, with the following python script (vti1 is an ip_vti interface): import socket send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0) # scapy # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request') # raw(p) req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00' send_s.sendto(req, ('vti1', 0x800, 0, 0)) Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16wireless: wext: avoid gcc -O3 warningArnd Bergmann
commit e16119655c9e6c4aa5767cd971baa9c491f41b13 upstream. After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, the wext code produces a bogus warning: In function 'iw_handler_get_iwstats', inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9, inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10, inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8, inlined from 'wext_handle_ioctl': net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull] memcpy(extra, stats, sizeof(struct iw_statistics)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/include/asm/string.h:5, net/wireless/wext-core.c: In function 'wext_handle_ioctl': arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here The problem is that ioctl_standard_call() sometimes calls the handler with a NULL argument that would cause a problem for iw_handler_get_iwstats. However, iw_handler_get_iwstats never actually gets called that way. Marking that function as noinline avoids the warning and leads to slightly smaller object code as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107200741.3588770-1-arnd@arndb.de Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16mac80211: Fix TKIP replay protection immediately after key setupJouni Malinen
commit 6f601265215a421f425ba3a4850a35861d024643 upstream. TKIP replay protection was skipped for the very first frame received after a new key is configured. While this is potentially needed to avoid dropping a frame in some cases, this does leave a window for replay attacks with group-addressed frames at the station side. Any earlier frame sent by the AP using the same key would be accepted as a valid frame and the internal RSC would then be updated to the TSC from that frame. This would allow multiple previously transmitted group-addressed frames to be replayed until the next valid new group-addressed frame from the AP is received by the station. Fix this by limiting the no-replay-protection exception to apply only for the case where TSC=0, i.e., when this is for the very first frame protected using the new key, and the local RSC had not been set to a higher value when configuring the key (which may happen with GTK). Signed-off-by: Jouni Malinen <j@w1.fi> Link: https://lore.kernel.org/r/20200107153545.10934-1-j@w1.fi Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16cfg80211: Fix radar event during another phy CACOrr Mazor
commit 26ec17a1dc5ecdd8d91aba63ead6f8b5ad5dea0d upstream. In case a radar event of CAC_FINISHED or RADAR_DETECTED happens during another phy is during CAC we might need to cancel that CAC. If we got a radar in a channel that another phy is now doing CAC on then the CAC should be canceled there. If, for example, 2 phys doing CAC on the same channels, or on comptable channels, once on of them will finish his CAC the other might need to cancel his CAC, since it is no longer relevant. To fix that the commit adds an callback and implement it in mac80211 to end CAC. This commit also adds a call to said callback if after a radar event we see the CAC is no longer relevant Signed-off-by: Orr Mazor <Orr.Mazor@tandemg.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com [slightly reformat/reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16wireless: fix enabling channel 12 for custom regulatory domainGanapathi Bhat
commit c4b9d655e445a8be0bff624aedea190606b5ebbc upstream. Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"") fixed a broken regulatory (leaving channel 12 open for AP where not permitted). Apply a similar fix to custom regulatory domain processing. Signed-off-by: Cathy Luo <xiaohua.luo@nxp.com> Signed-off-by: Ganapathi Bhat <ganapathi.bhat@nxp.com> Link: https://lore.kernel.org/r/1576836859-8945-1-git-send-email-ganapathi.bhat@nxp.com [reword commit message, fix coding style, add a comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16mac80211: mesh: restrict airtime metric to peered established plinksMarkus Theil
commit 02a614499600af836137c3fbc4404cd96365fff2 upstream. The following warning is triggered every time an unestablished mesh peer gets dumped. Checks if a peer link is established before retrieving the airtime link metric. [ 9563.022567] WARNING: CPU: 0 PID: 6287 at net/mac80211/mesh_hwmp.c:345 airtime_link_metric_get+0xa2/0xb0 [mac80211] [ 9563.022697] Hardware name: PC Engines apu2/apu2, BIOS v4.10.0.3 [ 9563.022756] RIP: 0010:airtime_link_metric_get+0xa2/0xb0 [mac80211] [ 9563.022838] Call Trace: [ 9563.022897] sta_set_sinfo+0x936/0xa10 [mac80211] [ 9563.022964] ieee80211_dump_station+0x6d/0x90 [mac80211] [ 9563.023062] nl80211_dump_station+0x154/0x2a0 [cfg80211] [ 9563.023120] netlink_dump+0x17b/0x370 [ 9563.023130] netlink_recvmsg+0x2a4/0x480 [ 9563.023140] ____sys_recvmsg+0xa6/0x160 [ 9563.023154] ___sys_recvmsg+0x93/0xe0 [ 9563.023169] __sys_recvmsg+0x7e/0xd0 [ 9563.023210] do_syscall_64+0x4e/0x140 [ 9563.023217] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20191203180644.70653-1-markus.theil@tu-ilmenau.de [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16Bluetooth: Fix race condition in hci_release_sock()Dan Carpenter
commit 11eb85ec42dc8c7a7ec519b90ccf2eeae9409de8 upstream. Syzbot managed to trigger a use after free "KASAN: use-after-free Write in hci_sock_bind". I have reviewed the code manually and one possibly cause I have found is that we are not holding lock_sock(sk) when we do the hci_dev_put(hdev) in hci_sock_release(). My theory is that the bind and the release are racing against each other which results in this use after free. Reported-by: syzbot+eba992608adf3d796bcc@syzkaller.appspotmail.com Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16Bluetooth: Allow combination of BDADDR_PROPERTY and INVALID_BDADDR quirksMarcel Holtmann
commit 7fdf6c6a0d0e032aac2aa4537a23af1e04a397ce upstream. When utilizing BDADDR_PROPERTY and INVALID_BDADDR quirks together it results in an unconfigured controller even if the bootloader provides a valid address. Fix this by allowing a bootloader provided address to mark the controller as configured. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net_sched: walk through all child classes in tc_bind_tclass()Cong Wang
commit 760d228e322e99cdf6d81b4b60a268b8f13cf67a upstream. In a complex TC class hierarchy like this: tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \ avpkt 1000 cell 8 tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \ rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \ avpkt 1000 bounded tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 80 0xffff flowid 1:3 tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 25 0xffff flowid 1:4 tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \ rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \ rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 where filters are installed on qdisc 1:0, so we can't merely search from class 1:1 when creating class 1:3 and class 1:4. We have to walk through all the child classes of the direct parent qdisc. Otherwise we would miss filters those need reverse binding. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net_sched: fix ops->bind_class() implementationsCong Wang
commit 2e24cd755552350b94a7617617c6877b8cbcb701 upstream. The current implementations of ops->bind_class() are merely searching for classid and updating class in the struct tcf_result, without invoking either of cl_ops->bind_tcf() or cl_ops->unbind_tcf(). This breaks the design of them as qdisc's like cbq use them to count filters too. This is why syzbot triggered the warning in cbq_destroy_class(). In order to fix this, we have to call cl_ops->bind_tcf() and cl_ops->unbind_tcf() like the filter binding path. This patch does so by refactoring out two helper functions __tcf_bind_filter() and __tcf_unbind_filter(), which are lockless and accept a Qdisc pointer, then teaching each implementation to call them correctly. Note, we merely pass the Qdisc pointer as an opaque pointer to each filter, they only need to pass it down to the helper functions without understanding it at all. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net_sched: ematch: reject invalid TCF_EM_SIMPLEEric Dumazet
commit 55cd9f67f1e45de8517cdaab985fb8e56c0bc1d8 upstream. It is possible for malicious userspace to set TCF_EM_SIMPLE bit even for matches that should not have this bit set. This can fool two places using tcf_em_is_simple() 1) tcf_em_tree_destroy() -> memory leak of em->data if ops->destroy() is NULL 2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes of a kernel pointer. BUG: memory leak unreferenced object 0xffff888121850a40 (size 32): comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s) hex dump (first 32 bytes): 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline] [<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline] [<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline] [<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671 [<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127 [<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline] [<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32 [<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline] [<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline] [<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300 [<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline] [<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219 [<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104 [<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415 [<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline] [<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline] [<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline] [<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+03c4738ed29d5d366ddf@syzkaller.appspotmail.com Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net/x25: fix nonblocking connectMartin Schiller
commit e21dba7a4df4d93da237da65a096084b4f2e87b4 upstream. This patch fixes 2 issues in x25_connect(): 1. It makes absolutely no sense to reset the neighbour and the connection state after a (successful) nonblocking call of x25_connect. This prevents any connection from being established, since the response (call accept) cannot be processed. 2. Any further calls to x25_connect() while a call is pending should simply return, instead of creating new Call Request (on different logical channels). This patch should also fix the "KASAN: null-ptr-deref Write in x25_connect" and "BUG: unable to handle kernel NULL pointer dereference in x25_connect" bugs reported by syzbot. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reported-by: syzbot+429c200ffc8772bfe070@syzkaller.appspotmail.com Reported-by: syzbot+eec0c87f31a7c3b66f7b@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16netfilter: nf_tables: add __nft_chain_type_get()Pablo Neira Ayuso
commit 826035498ec14b77b62a44f0cb6b94d45530db6f upstream. This new helper function validates that unknown family and chain type coming from userspace do not trigger an out-of-bound array access. Bail out in case __nft_chain_type_get() returns NULL from nft_chain_parse_hook(). Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types") Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16netfilter: ipset: use bitmap infrastructure completelyKadlecsik József
commit 32c72165dbd0e246e69d16a3ad348a4851afd415 upstream. The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16xfrm: support output_mark for offload ESP packetsUlrich Weber
commit 4e4362d2bf2a49ff44dbbc9585207977ca3d71d0 upstream. Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support. xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions. Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16netfilter: nft_osf: add missing check for DREG attributeFlorian Westphal
commit 7eaecf7963c1c8f62d62c6a8e7c439b0e7f2d365 upstream. syzbot reports just another NULL deref crash because of missing test for presence of the attribute. Reported-by: syzbot+cf23983d697c26c34f60@syzkaller.appspotmail.com Fixes: b96af92d6eaf9fadd ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16ipv4: Detect rollover in specific fib table dumpDavid Ahern
commit 9827c0634e461703abf81e8cc8b7adf5da5886d0 upstream. Sven-Haegar reported looping on fib dumps when 255.255.255.255 route has been added to a table. The looping is caused by the key rolling over from FFFFFFFF to 0. When dumping a specific table only, we need a means to detect when the table dump is done. The key and count saved to cb args are both 0 only at the start of the table dump. If key is 0 and count > 0, then we are in the rollover case. Detect and return to avoid looping. This only affects dumps of a specific table; for dumps of all tables (the case prior to the change in the Fixes tag) inet_dump_fib moved the entry counter to the next table and reset the cb args used by fib_table_dump and fn_trie_dump_leaf, so the rollover ffffffff back to 0 did not cause looping with the dumps. Fixes: effe67926624 ("net: Enable kernel side filtering of route dumps") Reported-by: Sven-Haegar Koch <haegar@sdinet.de> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16fou: Fix IPv6 netlink policyKristian Evensen
commit bb48eb9b12a95db9d679025927269d4adda6dbd1 upstream. When submitting v2 of "fou: Support binding FoU socket" (1713cb37bf67), I accidentally sent the wrong version of the patch and one fix was missing. In the initial version of the patch, as well as the version 2 that I submitted, I incorrectly used ".type" for the two V6-attributes. The correct is to use ".len". Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: 1713cb37bf67 ("fou: Support binding FoU socket") Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16tcp_bbr: improve arithmetic division in bbr_update_bw()Wen Yang
commit 5b2f1f3070b6447b76174ea8bfb7390dc6253ebd upstream. do_div() does a 64-by-32 division. Use div64_long() instead of it if the divisor is long, to avoid truncation to 32-bit. And as a nice side effect also cleans up the function a bit. Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16Revert "udp: do rmem bulk free even if the rx sk queue is empty"Paolo Abeni
commit d39ca2590d10712f412add7a88e1dd467a7246f4 upstream. This reverts commit 0d4a6608f68c7532dcbfec2ea1150c9761767d03. Williem reported that after commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty") the memory allocated by an almost idle system with many UDP sockets can grow a lot. For stable kernel keep the solution as simple as possible and revert the offending commit. Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Diagnosed-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net-sysfs: Fix reference count leakJouni Hogander
commit cb626bf566eb4433318d35681286c494f04fedcc upstream. Netdev_register_kobject is calling device_initialize. In case of error reference taken by device_initialize is not given up. Drivers are supposed to call free_netdev in case of error. In non-error case the last reference is given up there and device release sequence is triggered. In error case this reference is kept and the release sequence is never started. Fix this by setting reg_state as NETREG_UNREGISTERED if registering fails. This is the rootcause for couple of memory leaks reported by Syzkaller: BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256): comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280 [<000000002340019b>] device_add+0x882/0x1750 [<000000001d588c3a>] netdev_register_kobject+0x128/0x380 [<0000000011ef5535>] register_netdevice+0xa1b/0xf00 [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0 [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40 [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510 [<00000000fba062ea>] ksys_ioctl+0x99/0xb0 [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0 [<00000000984cabb9>] do_syscall_64+0x16f/0x580 [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000e6ca2d9f>] 0xffffffffffffffff BUG: memory leak unreferenced object 0xffff8880668ba588 (size 8): comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s) hex dump (first 8 bytes): 6e 72 30 00 cc be df 2b nr0....+ backtrace: [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290 [<00000000236fd26b>] kstrdup+0x3e/0x70 [<00000000dd4a2815>] kstrdup_const+0x3e/0x50 [<0000000049a377fc>] kvasprintf_const+0x10e/0x160 [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140 [<0000000019eeab06>] dev_set_name+0xc0/0xf0 [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320 [<00000000f2e83732>] register_netdevice+0xa1b/0xf00 [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0 [<000000009c560784>] tun_chr_ioctl+0x2f/0x40 [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510 [<00000000351d7c31>] ksys_ioctl+0x99/0xb0 [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0 [<0000000052d196b7>] do_syscall_64+0x16f/0x580 [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<00000000bc384531>] 0xffffffffffffffff v3 -> v4: Set reg_state to NETREG_UNREGISTERED if registering fails v2 -> v3: * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release v1 -> v2: * Relying on driver calling free_netdev rather than calling put_device directly in error path Reported-by: syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com Cc: David Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net_sched: fix datalen for ematchCong Wang
commit 61678d28d4a45ef376f5d02a839cc37509ae9281 upstream. syzbot reported an out-of-bound access in em_nbyte. As initially analyzed by Eric, this is because em_nbyte sets its own em->datalen in em_nbyte_change() other than the one specified by user, but this value gets overwritten later by its caller tcf_em_validate(). We should leave em->datalen untouched to respect their choices. I audit all the in-tree ematch users, all of those implement ->change() set em->datalen, so we can just avoid setting it twice in this case. Reported-and-tested-by: syzbot+5af9a90dad568aa9f611@syzkaller.appspotmail.com Reported-by: syzbot+2f07903a5b05e7f36410@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()Eric Dumazet
commit d836f5c69d87473ff65c06a6123e5b2cf5e56f5b upstream. rtnl_create_link() needs to apply dev->min_mtu and dev->max_mtu checks that we apply in do_setlink() Otherwise malicious users can crash the kernel, for example after an integer overflow : BUG: KASAN: use-after-free in memset include/linux/string.h:365 [inline] BUG: KASAN: use-after-free in __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 Write of size 32 at addr ffff88819f20b9c0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192 memset+0x24/0x40 mm/kasan/common.c:108 memset include/linux/string.h:365 [inline] __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 alloc_skb include/linux/skbuff.h:1049 [inline] alloc_skb_with_frags+0x93/0x590 net/core/skbuff.c:5664 sock_alloc_send_pskb+0x7ad/0x920 net/core/sock.c:2242 sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2259 mld_newpack+0x1d7/0x7f0 net/ipv6/mcast.c:1609 add_grhead.isra.0+0x299/0x370 net/ipv6/mcast.c:1713 add_grec+0x7db/0x10b0 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:1970 [inline] mld_ifc_timer_expire+0x3d3/0x950 net/ipv6/mcast.c:2477 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 98 6b ea f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 44 1c 60 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 34 1c 60 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 4e 5d 9a f9 e8 79 RSP: 0018:ffffffff89807ce8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff13266ae RBX: ffffffff8987a1c0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff8987aa54 RBP: ffffffff89807d18 R08: ffffffff8987a1c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8a799980 R14: 0000000000000000 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:690 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 rest_init+0x23b/0x371 init/main.c:451 arch_call_rest_init+0xe/0x1b start_kernel+0x904/0x943 init/main.c:784 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242 The buggy address belongs to the page: page:ffffea00067c82c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 raw: 057ffe0000000000 ffffea00067c82c8 ffffea00067c82c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88819f20b880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20b900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88819f20b980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88819f20ba00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20ba80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net, ip_tunnel: fix namespaces moveWilliam Dauchy
commit d0f418516022c32ecceaf4275423e5bd3f8743a9 upstream. in the same manner as commit 690afc165bb3 ("net: ip6_gre: fix moving ip6gre between namespaces"), fix namespace moving as it was broken since commit 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata."). Indeed, the ip6_gre commit removed the local flag for collect_md condition, so there is no reason to keep it for ip_gre/ip_tunnel. this patch will fix both ip_tunnel and ip_gre modules. Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: William Dauchy <w.dauchy@criteo.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net, ip6_tunnel: fix namespaces moveWilliam Dauchy
commit 5311a69aaca30fa849c3cc46fb25f75727fb72d0 upstream. in the same manner as commit d0f418516022 ("net, ip_tunnel: fix namespaces move"), fix namespace moving as it was broken since commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnel"), but for ipv6 this time; there is no reason to keep it for ip6_tunnel. Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnel") Signed-off-by: William Dauchy <w.dauchy@criteo.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16net: ip6_gre: fix moving ip6gre between namespacesNiko Kortstrom
commit 690afc165bb314354667f67157c1a1aea7dc797a upstream. Support for moving IPv4 GRE tunnels between namespaces was added in commit b57708add314 ("gre: add x-netns support"). The respective change for IPv6 tunnels, commit 22f08069e8b4 ("ip6gre: add x-netns support") did not drop NETIF_F_NETNS_LOCAL flag so moving them from one netns to another is still denied in IPv6 case. Drop NETIF_F_NETNS_LOCAL flag from ip6gre tunnels to allow moving ip6gre tunnel endpoints between network namespaces. Signed-off-by: Niko Kortstrom <niko.kortstrom@nokia.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-16ipv6: sr: remove SKB_GSO_IPXIP6 on End.D* actionsYuki Taguchi
commit 62ebaeaedee7591c257543d040677a60e35c7aec upstream. After LRO/GRO is applied, SRv6 encapsulated packets have SKB_GSO_IPXIP6 feature flag, and this flag must be removed right after decapulation procedure. Currently, SKB_GSO_IPXIP6 flag is not removed on End.D* actions, which creates inconsistent packet state, that is, a normal TCP/IP packets have the SKB_GSO_IPXIP6 flag. This behavior can cause unexpected fallback to GSO on routing to netdevices that do not support SKB_GSO_IPXIP6. For example, on inter-VRF forwarding, decapsulated packets separated into small packets by GSO because VRF devices do not support TSO for packets with SKB_GSO_IPXIP6 flag, and this degrades forwarding performance. This patch removes encapsulation related GSO flags from the skb right after the End.D* action is applied. Fixes: d7a669dd2f8b ("ipv6: sr: add helper functions for seg6local") Signed-off-by: Yuki Taguchi <tagyounit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12packet: fix data-race in fanout_flow_is_huge()Eric Dumazet
commit b756ad928d98e5ef0b74af7546a6a31a8dadde00 upstream. KCSAN reported the following data-race [1] Adding a couple of READ_ONCE()/WRITE_ONCE() should silence it. Since the report hinted about multiple cpus using the history concurrently, I added a test avoiding writing on it if the victim slot already contains the desired value. [1] BUG: KCSAN: data-race in fanout_demux_rollover / fanout_demux_rollover read to 0xffff8880b01786cc of 4 bytes by task 18921 on cpu 1: fanout_flow_is_huge net/packet/af_packet.c:1303 [inline] fanout_demux_rollover+0x33e/0x3f0 net/packet/af_packet.c:1353 packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453 deliver_skb net/core/dev.c:1888 [inline] dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958 xmit_one net/core/dev.c:3195 [inline] dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215 __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792 dev_queue_xmit+0x21/0x30 net/core/dev.c:3825 neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530 neigh_output include/net/neighbour.h:511 [inline] ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116 __ip6_finish_output net/ipv6/ip6_output.c:142 [inline] __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127 ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175 dst_output include/net/dst.h:436 [inline] ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179 ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795 udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173 udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471 inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311 __sys_sendmmsg+0x123/0x350 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 write to 0xffff8880b01786cc of 4 bytes by task 18922 on cpu 0: fanout_flow_is_huge net/packet/af_packet.c:1306 [inline] fanout_demux_rollover+0x3a4/0x3f0 net/packet/af_packet.c:1353 packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453 deliver_skb net/core/dev.c:1888 [inline] dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958 xmit_one net/core/dev.c:3195 [inline] dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215 __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792 dev_queue_xmit+0x21/0x30 net/core/dev.c:3825 neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530 neigh_output include/net/neighbour.h:511 [inline] ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116 __ip6_finish_output net/ipv6/ip6_output.c:142 [inline] __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127 ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175 dst_output include/net/dst.h:436 [inline] ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179 ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795 udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173 udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471 inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311 __sys_sendmmsg+0x123/0x350 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 18922 Comm: syz-executor.3 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 3b3a5b0aab5b ("packet: rollover huge flows before small flows") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12net: neigh: use long type to store jiffies deltaEric Dumazet
commit 9d027e3a83f39b819e908e4e09084277a2e45e95 upstream. A difference of two unsigned long needs long storage. Fixes: c7fb64db001f ("[NETLINK]: Neighbour table configuration and statistics via rtnetlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12tipc: fix wrong timeout input for tipc_wait_for_cond()Tung Nguyen
commit 12db3c8083fcab4270866a88191933f2d9f24f89 upstream. In function __tipc_shutdown(), the timeout value passed to tipc_wait_for_cond() is not jiffies. This commit fixes it by converting that value from milliseconds to jiffies. Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12tipc: fix wrong socket reference counter after tipc_sk_timeout() returnsTung Nguyen
commit 91a4a3eb433e4d786420c41f3c08d1d16c605962 upstream. When tipc_sk_timeout() is executed but user space is grabbing ownership, this function rearms itself and returns. However, the socket reference counter is not reduced. This causes potential unexpected behavior. This commit fixes it by calling sock_put() before tipc_sk_timeout() returns in the above-mentioned case. Fixes: afe8792fec69 ("tipc: refactor function tipc_sk_timeout()") Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12tipc: fix potential memory leak in __tipc_sendmsg()Tung Nguyen
commit 2fe97a578d7bad3116a89dc8a6692a51e6fc1d9c upstream. When initiating a connection message to a server side, the connection message is cloned and added to the socket write queue. However, if the cloning is failed, only the socket write queue is purged. It causes memory leak because the original connection message is not freed. This commit fixes it by purging the list of connection message when it cannot be cloned. Fixes: 6787927475e5 ("tipc: buffer overflow handling in listener socket") Reported-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12tipc: update mon's self addr when node addr generatedHoang Le
commit 46cb01eeeb86fca6afe24dda1167b0cb95424e29 upstream. In commit 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values"), the 32-bit node address only generated after one second trial period expired. However the self's addr in struct tipc_monitor do not update according to node address generated. This lead to it is always zero as initial value. As result, sorting algorithm using this value does not work as expected, neither neighbor monitoring framework. In this commit, we add a fix to update self's addr when 32-bit node address generated. Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12SUNRPC: Fix backchannel latency metricsChuck Lever
commit 8729aaba74626c4ebce3abf1b9e96bb62d2958ca upstream. I noticed that for callback requests, the reported backlog latency is always zero, and the rtt value is crazy big. The problem was that rqst->rq_xtime is never set for backchannel requests. Fixes: 78215759e20d ("SUNRPC: Make RTT measurement more ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12SUNRPC: Fix svcauth_gss_proxy_init()Chuck Lever
commit 5866efa8cbfbadf3905072798e96652faf02dbe8 upstream. gss_read_proxy_verf() assumes things about the XDR buffer containing the RPC Call that are not true for buffers generated by svc_rdma_recv(). RDMA's buffers look more like what the upper layer generates for sending: head is a kmalloc'd buffer; it does not point to a page whose contents are contiguous with the first page in the buffers' page array. The result is that ACCEPT_SEC_CONTEXT via RPC/RDMA has stopped working on Linux NFS servers that use gssproxy. This does not affect clients that use only TCP to send their ACCEPT_SEC_CONTEXT operation (that's all Linux clients). Other clients, like Solaris NFS clients, send ACCEPT_SEC_CONTEXT on the same transport as they send all other NFS operations. Such clients can send ACCEPT_SEC_CONTEXT via RPC/RDMA. I thought I had found every direct reference in the server RPC code to the rqstp->rq_pages field. Bug found at the 2019 Westford NFS bake-a-thon. Fixes: 3316f0631139 ("svcrdma: Persistently allocate and DMA- ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12cfg80211: check for set_wiphy_paramsJohannes Berg
commit 24953de0a5e31dcca7e82c8a3c79abc2dfe8fb6e upstream. Check if set_wiphy_params is assigned and return an error if not, some drivers (e.g. virt_wifi where syzbot reported it) don't have it. Reported-by: syzbot+e8a797964a4180eb57d5@syzkaller.appspotmail.com Reported-by: syzbot+34b582cf32c1db008f8e@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12tcp: refine rule to allow EPOLLOUT generation under mem pressureEric Dumazet
commit 216808c6ba6d00169fd2aa928ec3c0e63bef254f upstream. At the time commit ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty") was added to the kernel, we still had a single write queue, combining rtx and write queues. Once we moved the rtx queue into a separate rb-tree, testing if sk_write_queue is empty has been suboptimal. Indeed, if we have packets in the rtx queue, we probably want to delay the EPOLLOUT generation at the time incoming packets will free them, making room, but more importantly avoiding flooding application with EPOLLOUT events. Solution is to use tcp_rtx_and_write_queues_empty() helper. Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12net: dsa: tag_gswip: fix typo in tagger nameAlexander Lobakin
commit ad32205470919c8e04cdd33e0613bdba50c2376d upstream. The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in 875138f81d71a ("dsa: Move tagger name into its ops structure") while moving tagger names to their structures. Fixes: 875138f81d71a ("dsa: Move tagger name into its ops structure") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12tcp: fix marked lost packets not being retransmittedPengcheng Yang
commit e176b1ba476cf36f723cfcc7a9e57f3cb47dec70 upstream. When the packet pointed to by retransmit_skb_hint is unlinked by ACK, retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). If packet loss is detected at this time, retransmit_skb_hint will be set to point to the current packet loss in tcp_verify_retransmit_hint(), then the packets that were previously marked lost but not retransmitted due to the restriction of cwnd will be skipped and cannot be retransmitted. To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can be reset only after all marked lost packets are retransmitted (retrans_out >= lost_out), otherwise we need to traverse from tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). Packetdrill to demonstrate: // Disable RACK and set max_reordering to keep things simple 0 `sysctl -q net.ipv4.tcp_recovery=0` +0 `sysctl -q net.ipv4.tcp_max_reordering=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 8 data segments +0 write(4, ..., 8000) = 8000 +0 > P. 1:8001(8000) ack 1 // Enter recovery and 1:3001 is marked lost +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 +0 > . 1:1001(1000) ack 1 // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> // Now retransmit_skb_hint points to 4001:5001 which is now marked lost // BUG: 2001:3001 was not retransmitted +0 > . 2001:3001(1000) ack 1 Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12net/sched: act_ife: initalize ife->metalist earlierEric Dumazet
commit 44c23d71599f81a1c7fe8389e0319822dd50c37c upstream. It seems better to init ife->metalist earlier in tcf_ife_init() to avoid the following crash : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline] RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431 Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8 R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000 FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119 __tcf_action_put+0xfa/0x130 net/sched/act_api.c:135 __tcf_idr_release net/sched/act_api.c:165 [inline] __tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145 tcf_idr_release include/net/act_api.h:171 [inline] tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616 tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944 tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000 tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410 tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 ____sys_sendmsg+0x753/0x880 net/socket.c:2330 ___sys_sendmsg+0x100/0x170 net/socket.c:2384 __sys_sendmsg+0x105/0x1d0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg net/socket.c:2424 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12net: dsa: tag_qca: fix doubled Tx statisticsAlexander Lobakin
commit bd5874da57edd001b35cf28ae737779498c16a56 upstream. DSA subsystem takes care of netdev statistics since commit 4ed70ce9f01c ("net: dsa: Refactor transmit path to eliminate duplication"), so any accounting inside tagger callbacks is redundant and can lead to messing up the stats. This bug is present in Qualcomm tagger since day 0. Fixes: cafdc45c949b ("net-next: dsa: add Qualcomm tag RX/TX handler") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12batman-adv: Fix DAT candidate selection on little endian systemsSven Eckelmann
commit 4cc4a1708903f404d2ca0dfde30e71e052c6cbc9 upstream. The distributed arp table is using a DHT to store and retrieve MAC address information for an IP address. This is done using unicast messages to selected peers. The potential peers are looked up using the IP address and the VID. While the IP address is always stored in big endian byte order, this is not the case of the VID. It can (depending on the host system) either be big endian or little endian. The host must therefore always convert it to big endian to ensure that all devices calculate the same peers for the same lookup data. Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12net: bpf: Don't leak time wait and request socketsLorenz Bauer
commit 2e012c74823629d9db27963c79caa3f5b2010746 upstream. It's possible to leak time wait and request sockets via the following BPF pseudo code:   sk = bpf_skc_lookup_tcp(...) if (sk) bpf_sk_release(sk) If sk->sk_state is TCP_NEW_SYN_RECV or TCP_TIME_WAIT the refcount taken by bpf_skc_lookup_tcp is not undone by bpf_sk_release. This is because sk_flags is re-used for other data in both kinds of sockets. The check !sock_flag(sk, SOCK_RCU_FREE) therefore returns a bogus result. Check that sk_flags is valid by calling sk_fullsock. Skip checking SOCK_RCU_FREE if we already know that sk is not a full socket. Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper") Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110132336.26099-1-lmb@cloudflare.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12netfilter: nft_meta: skip EAGAIN if nft_meta_bridge is not a modulePablo Neira Ayuso
commit 78e21eb699203f32f8c524b01fb7363125cf9d68 upstream. If it is a module, request this module. Otherwise, if it is compiled built-in or not selected, skip this. Fixes: 0ef1efd1354d ("netfilter: nf_tables: force module load in case select_ops() returns -EAGAIN") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12netfilter: nf_tables: fix infinite loop when expr is not availableFlorian Westphal
commit 1d305ba40eb8081ff21eeb8ca6ba5c70fd920934 upstream. nft will loop forever if the kernel doesn't support an expression: 1. nft_expr_type_get() appends the family specific name to the module list. 2. -EAGAIN is returned to nfnetlink, nfnetlink calls abort path. 3. abort path sets ->done to true and calls request_module for the expression. 4. nfnetlink replays the batch, we end up in nft_expr_type_get() again. 5. nft_expr_type_get attempts to append family-specific name. This one already exists on the list, so we continue 6. nft_expr_type_get adds the generic expression name to the module list. -EAGAIN is returned, nfnetlink calls abort path. 7. abort path encounters the family-specific expression which has 'done' set, so it gets removed. 8. abort path requests the generic expression name, sets done to true. 9. batch is replayed. If the expression could not be loaded, then we will end up back at 1), because the family-specific name got removed and the cycle starts again. Note that userspace can SIGKILL the nft process to stop the cycle, but the desired behaviour is to return an error after the generic expr name fails to load the expression. Fixes: eb014de4fd418 ("netfilter: nf_tables: autoload modules from the abort path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-04-12netfilter: nf_tables: autoload modules from the abort pathPablo Neira Ayuso
commit eb014de4fd418de1a277913cba244e47274fe392 upstream. This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field). In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty. On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace. This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale. Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>