aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6/ip6_vti.c
AgeCommit message (Collapse)Author
2023-08-30ip6_vti: fix slab-use-after-free in decode_session6Zhengchao Shao
[ Upstream commit 9fd41f1ba638938c9a1195d09bc6fa3be2712f25 ] When ipv6_vti device is set to the qdisc of the sfb type, the cb field of the sent skb may be modified during enqueuing. Then, slab-use-after-free may occur when ipv6_vti device sends IPv6 packets. The stack information is as follows: BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890 Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0xd9/0x150 print_address_description.constprop.0+0x2c/0x3c0 kasan_report+0x11d/0x130 decode_session6+0x103f/0x1890 __xfrm_decode_session+0x54/0xb0 vti6_tnl_xmit+0x3e6/0x1ee0 dev_hard_start_xmit+0x187/0x700 sch_direct_xmit+0x1a3/0xc30 __qdisc_run+0x510/0x17a0 __dev_queue_xmit+0x2215/0x3b10 neigh_connected_output+0x3c2/0x550 ip6_finish_output2+0x55a/0x1550 ip6_finish_output+0x6b9/0x1270 ip6_output+0x1f1/0x540 ndisc_send_skb+0xa63/0x1890 ndisc_send_rs+0x132/0x6f0 addrconf_rs_timer+0x3f1/0x870 call_timer_fn+0x1a0/0x580 expire_timers+0x29b/0x4b0 run_timer_softirq+0x326/0x910 __do_softirq+0x1d4/0x905 irq_exit_rcu+0xb7/0x120 sysvec_apic_timer_interrupt+0x97/0xc0 </IRQ> Allocated by task 9176: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x7f/0x90 kmem_cache_alloc_node+0x1cd/0x410 kmalloc_reserve+0x165/0x270 __alloc_skb+0x129/0x330 netlink_sendmsg+0x9b1/0xe30 sock_sendmsg+0xde/0x190 ____sys_sendmsg+0x739/0x920 ___sys_sendmsg+0x110/0x1b0 __sys_sendmsg+0xf7/0x1c0 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9176: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x40 ____kasan_slab_free+0x160/0x1c0 slab_free_freelist_hook+0x11b/0x220 kmem_cache_free+0xf0/0x490 skb_free_head+0x17f/0x1b0 skb_release_data+0x59c/0x850 consume_skb+0xd2/0x170 netlink_unicast+0x54f/0x7f0 netlink_sendmsg+0x926/0xe30 sock_sendmsg+0xde/0x190 ____sys_sendmsg+0x739/0x920 ___sys_sendmsg+0x110/0x1b0 __sys_sendmsg+0xf7/0x1c0 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff88802e08ed00 which belongs to the cache skbuff_small_head of size 640 The buggy address is located 194 bytes inside of freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80) As commit f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") showed, xfrm_decode_session was originally intended only for the receive path. IP6CB(skb)->nhoff is not set during transmission. Therefore, set the cb field in the skb to 0 before sending packets. Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-11ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivateWilliam Zhao
[ Upstream commit c1833c3964d5bd8c163bd4e01736a38bc473cb8a ] The "__ip6_tnl_parm" struct was left uninitialized causing an invalid load of random data when the "__ip6_tnl_parm" struct was used elsewhere. As an example, in the function "ip6_tnl_xmit_ctl()", it tries to access the "collect_md" member. With "__ip6_tnl_parm" being uninitialized and containing random data, the UBSAN detected that "collect_md" held a non-boolean value. The UBSAN issue is as follows: =============================================================== UBSAN: invalid-load in net/ipv6/ip6_tunnel.c:1025:14 load of value 30 is not a valid value for type '_Bool' CPU: 1 PID: 228 Comm: kworker/1:3 Not tainted 5.16.0-rc4+ #8 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work Call Trace: <TASK> dump_stack_lvl+0x44/0x57 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value+0x66/0x70 ? __cpuhp_setup_state+0x1d3/0x210 ip6_tnl_xmit_ctl.cold.52+0x2c/0x6f [ip6_tunnel] vti6_tnl_xmit+0x79c/0x1e96 [ip6_vti] ? lock_is_held_type+0xd9/0x130 ? vti6_rcv+0x100/0x100 [ip6_vti] ? lock_is_held_type+0xd9/0x130 ? rcu_read_lock_bh_held+0xc0/0xc0 ? lock_acquired+0x262/0xb10 dev_hard_start_xmit+0x1e6/0x820 __dev_queue_xmit+0x2079/0x3340 ? mark_lock.part.52+0xf7/0x1050 ? netdev_core_pick_tx+0x290/0x290 ? kvm_clock_read+0x14/0x30 ? kvm_sched_clock_read+0x5/0x10 ? sched_clock_cpu+0x15/0x200 ? find_held_lock+0x3a/0x1c0 ? lock_release+0x42f/0xc90 ? lock_downgrade+0x6b0/0x6b0 ? mark_held_locks+0xb7/0x120 ? neigh_connected_output+0x31f/0x470 ? lockdep_hardirqs_on+0x79/0x100 ? neigh_connected_output+0x31f/0x470 ? ip6_finish_output2+0x9b0/0x1d90 ? rcu_read_lock_bh_held+0x62/0xc0 ? ip6_finish_output2+0x9b0/0x1d90 ip6_finish_output2+0x9b0/0x1d90 ? ip6_append_data+0x330/0x330 ? ip6_mtu+0x166/0x370 ? __ip6_finish_output+0x1ad/0xfb0 ? nf_hook_slow+0xa6/0x170 ip6_output+0x1fb/0x710 ? nf_hook.constprop.32+0x317/0x430 ? ip6_finish_output+0x180/0x180 ? __ip6_finish_output+0xfb0/0xfb0 ? lock_is_held_type+0xd9/0x130 ndisc_send_skb+0xb33/0x1590 ? __sk_mem_raise_allocated+0x11cf/0x1560 ? dst_output+0x4a0/0x4a0 ? ndisc_send_rs+0x432/0x610 addrconf_dad_completed+0x30c/0xbb0 ? addrconf_rs_timer+0x650/0x650 ? addrconf_dad_work+0x73c/0x10e0 addrconf_dad_work+0x73c/0x10e0 ? addrconf_dad_completed+0xbb0/0xbb0 ? rcu_read_lock_sched_held+0xaf/0xe0 ? rcu_read_lock_bh_held+0xc0/0xc0 process_one_work+0x97b/0x1740 ? pwq_dec_nr_in_flight+0x270/0x270 worker_thread+0x87/0xbf0 ? process_one_work+0x1740/0x1740 kthread+0x3ac/0x490 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x22/0x30 </TASK> =============================================================== The solution is to initialize "__ip6_tnl_parm" struct to zeros in the "vti6_siocdevprivate()" function. Signed-off-by: William Zhao <wizhao@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22ipv6: remove extra dev_hold() for fallback tunnelsEric Dumazet
commit 0d7a7b2014b1a499a0fe24c9f3063d7856b5aaaf upstream. My previous commits added a dev_hold() in tunnels ndo_init(), but forgot to remove it from special functions setting up fallback tunnels. Fallback tunnels do call their respective ndo_init() This leads to various reports like : unregister_netdevice: waiting for ip6gre0 to become free. Usage count = 2 Fixes: 48bb5697269a ("ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 6289a98f0817 ("sit: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 40cb881b5aaa ("ip6_vti: proper dev_{hold|put} in ndo_[un]init methods") Fixes: 7f700334be9a ("ip6_gre: proper dev_{hold|put} in ndo_[un]init methods") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22ip6_vti: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet
[ Upstream commit 40cb881b5aaa0b69a7d93dec8440d5c62dae299f ] After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Therefore, we need to move dev_hold() call from vti6_tnl_create2() to vti6_dev_init_gen() [1] WARNING: CPU: 0 PID: 15951 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 15951 Comm: syz-executor.3 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc90001eaef28 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520003d5dd7 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff88801bb1c568 R13: ffff88801f69e800 R14: 00000000ffffffff R15: ffff888050889d40 FS: 00007fc79314e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1c1ff47108 CR3: 0000000020fd5000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] vti6_dev_uninit+0x31a/0x360 net/ipv6/ip6_vti.c:297 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 vti6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_vti.c:190 vti6_newlink+0x9d/0xd0 net/ipv6/ip6_vti.c:1020 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x331/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmmsg+0x195/0x470 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2516 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-02vti6: Fix memory leak of skb if input policy check failsTorsten Hilbrich
commit 2a9de3af21aa8c31cd68b0b39330d69f8c1e59df upstream. The vti6_rcv function performs some tests on the retrieved tunnel including checking the IP protocol, the XFRM input policy, the source and destination address. In all but one places the skb is released in the error case. When the input policy check fails the network packet is leaked. Using the same goto-label discard in this case to fix this problem. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02vti[6]: fix packet tx through bpf_redirect() in XinY casesNicolas Dichtel
commit f1ed10264ed6b66b9cd5e8461cffce69be482356 upstream. I forgot the 4in6/6in4 cases in my previous patch. Let's fix them. Fixes: 95224166a903 ("vti[6]: fix packet tx through bpf_redirect()") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-05vti[6]: fix packet tx through bpf_redirect()Nicolas Dichtel
[ Upstream commit 95224166a9032ff5d08fca633d37113078ce7d01 ] With an ebpf program that redirects packets through a vti[6] interface, the packets are dropped because no dst is attached. This could also be reproduced with an AF_PACKET socket, with the following python script (vti1 is an ip_vti interface): import socket send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0) # scapy # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request') # raw(p) req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00' send_s.sendto(req, ('vti1', 0x800, 0, 0)) Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04vti: do not confirm neighbor when do pmtu updateHangbin Liu
[ Upstream commit 8247a79efa2f28b44329f363272550c1738377de ] When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. Although vti and vti6 are immune to this problem because they are IFF_NOARP interfaces, as Guillaume pointed. There is still no sense to confirm neighbour here. v5: Update commit description. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09ip: validate header length on virtual device xmitWillem de Bruijn
[ Upstream commit cb9f1b783850b14cbd7f87d061d784a666dfba1f ] KMSAN detected read beyond end of buffer in vti and sit devices when passing truncated packets with PF_PACKET. The issue affects additional ip tunnel devices. Extend commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the inner header") and commit ccfec9e5cb2d ("ip_tunnel: be careful when accessing the inner header"). Move the check to a separate helper and call at the start of each ndo_start_xmit function in net/ipv4 and net/ipv6. Minor changes: - convert dev_kfree_skb to kfree_skb on error path, as dev_kfree_skb calls consume_skb which is not for error paths. - use pskb_network_may_pull even though that is pedantic here, as the same as pskb_may_pull for devices without llheaders. - do not cache ipv6 hdrs if used only once (unsafe across pskb_may_pull, was more relevant to earlier patch) Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09ipv6: tunnels: fix two use-after-freeEric Dumazet
[ Upstream commit cbb49697d5512ce9e61b45ce75d3ee43d7ea5524 ] xfrm6_policy_check() might have re-allocated skb->head, we need to reload ipv6 header pointer. sysbot reported : BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40 Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304 CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40 ipv6_addr_type include/net/ipv6.h:403 [inline] ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443 IPVS: ftp: loaded support on port[0] = 21 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:289 [inline] ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083 process_backlog+0x24e/0x7a0 net/core/dev.c:5923 napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412 __do_softirq+0x308/0xb7e kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027 </IRQ> do_softirq.part.14+0x126/0x160 kernel/softirq.c:337 do_softirq+0x19/0x20 kernel/softirq.c:340 netif_rx_ni+0x521/0x860 net/core/dev.c:4569 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171 dst_output include/net/dst.h:444 [inline] ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline] rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945 kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>' kobject: 'queues' (0000000089e6eea2): kobject_uevent_env inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop! sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 sock_write_iter+0x35e/0x5c0 net/socket.c:900 call_write_iter include/linux/fs.h:1857 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6b8/0x9f0 fs/read_write.c:487 kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues' kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env vfs_write+0x1fc/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0' __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues' entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457669 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669 RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003 kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4 R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff Allocated by task 1304: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 __do_kmalloc_node mm/slab.c:3684 [inline] __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140 __alloc_skb+0x155/0x760 net/core/skbuff.c:208 kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0' alloc_skb include/linux/skbuff.h:1011 [inline] __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116 __sys_sendmsg+0x11d/0x280 net/socket.c:2154 __do_sys_sendmsg net/socket.c:2163 [inline] __se_sys_sendmsg net/socket.c:2161 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices' Freed by task 1304: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3817 skb_free_head+0x93/0xb0 net/core/skbuff.c:553 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896 pskb_may_pull include/linux/skbuff.h:2188 [inline] _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272 kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline] xfrm_policy_check include/net/xfrm.h:1175 [inline] xfrm6_policy_check include/net/xfrm.h:1185 [inline] vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:289 [inline] ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083 process_backlog+0x24e/0x7a0 net/core/dev.c:5923 kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0' napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412 __do_softirq+0x308/0xb7e kernel/softirq.c:292 The buggy address belongs to the object at ffff888191b8cac0 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 176 bytes inside of 512-byte region [ffff888191b8cac0, ffff888191b8ccc0) The buggy address belongs to the page: page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940 raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>' Memory state around the buggy address: ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path") Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-29vti6: remove !skb->ignore_df check from vti6_xmit()Alexey Kodanev
Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting on xmit") '!skb->ignore_df' check was always true because the function skb_scrub_packet() was called before it, resetting ignore_df to zero. In the commit, skb_scrub_packet() was moved below, and now this check can be false for the packet, e.g. when sending it in the two fragments, this prevents successful PMTU updates in such case. The next attempts to send the packet lead to the same tx error. Moreover, vti6 initial MTU value relies on PMTU adjustments. This issue can be reproduced with the following LTP test script: udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000 Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-19ip6_vti: fix a null pointer deference when destroy vti6 tunnelHaishuang Yan
If load ip6_vti module and create a network namespace when set fb_tunnels_only_for_init_net to 1, then exit the namespace will cause following crash: [ 6601.677036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 6601.679057] PGD 8000000425eca067 P4D 8000000425eca067 PUD 424292067 PMD 0 [ 6601.680483] Oops: 0000 [#1] SMP PTI [ 6601.681223] CPU: 7 PID: 93 Comm: kworker/u16:1 Kdump: loaded Tainted: G E 4.18.0+ #3 [ 6601.683153] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014 [ 6601.684919] Workqueue: netns cleanup_net [ 6601.685742] RIP: 0010:vti6_exit_batch_net+0x87/0xd0 [ip6_vti] [ 6601.686932] Code: 7b 08 48 89 e6 e8 b9 ea d3 dd 48 8b 1b 48 85 db 75 ec 48 83 c5 08 48 81 fd 00 01 00 00 75 d5 49 8b 84 24 08 01 00 00 48 89 e6 <48> 8b 78 08 e8 90 ea d3 dd 49 8b 45 28 49 39 c6 4c 8d 68 d8 75 a1 [ 6601.690735] RSP: 0018:ffffa897c2737de0 EFLAGS: 00010246 [ 6601.691846] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200 [ 6601.693324] RDX: 0000000000000015 RSI: ffffa897c2737de0 RDI: ffffffff9f2ea9e0 [ 6601.694824] RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000000 [ 6601.696314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dc323c07e00 [ 6601.697812] R13: ffff8dc324a63100 R14: ffffa897c2737e30 R15: ffffa897c2737e30 [ 6601.699345] FS: 0000000000000000(0000) GS:ffff8dc33fdc0000(0000) knlGS:0000000000000000 [ 6601.701068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6601.702282] CR2: 0000000000000008 CR3: 0000000424966002 CR4: 00000000001606e0 [ 6601.703791] Call Trace: [ 6601.704329] cleanup_net+0x1b4/0x2c0 [ 6601.705268] process_one_work+0x16c/0x370 [ 6601.706145] worker_thread+0x49/0x3e0 [ 6601.706942] kthread+0xf8/0x130 [ 6601.707626] ? rescuer_thread+0x340/0x340 [ 6601.708476] ? kthread_bind+0x10/0x10 [ 6601.709266] ret_from_fork+0x35/0x40 Reproduce: modprobe ip6_vti echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net unshare -n exit This because ip6n->tnls_wc[0] point to fallback device in default, but in non-default namespace, ip6n->tnls_wc[0] will be NULL, so add the NULL check comparatively. Fixes: e2948e5af8ee ("ip6_vti: fix creating fallback tunnel device for vti6") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-19ip6_vti: fix creating fallback tunnel device for vti6Haishuang Yan
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel device for vti6 when a new namespace is created. Tested: [root@builder2 ~]# modprobe ip6_tunnel [root@builder2 ~]# modprobe ip6_vti [root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net [root@builder2 ~]# unshare -n [root@builder2 ~]# ip link 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-18ip6_vti: simplify stats handling in vti6_xmitHaishuang Yan
Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit code path. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-07vti6: fix PMTU caching and reporting on xmitEyal Birger
When setting the skb->dst before doing the MTU check, the route PMTU caching and reporting is done on the new dst which is about to be released. Instead, PMTU handling should be done using the original dst. This is aligned with IPv4 VTI. Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The bpf syscall and selftests conflicts were trivial overlapping changes. The r8169 change involved moving the added mdelay from 'net' into a different function. A TLS close bug fix overlapped with the splitting of the TLS state into separate TX and RX parts. I just expanded the tests in the bug fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf == X". Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01change the comment of vti6_ioctlSun Lianwen
The comment of vti6_ioctl() is wrong. which use vti6_tnl_ioctl instead of vti6_ioctl. Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-04-27vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 tooStefano Brivio
A vti6 interface can carry IPv4 as well, so it makes no sense to enforce a minimum MTU of IPV6_MIN_MTU. If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be disabled on the interface, courtesy of addrconf_notify(). Reported-by: Xin Long <lucien.xin@gmail.com> Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking") Fixes: c6741fbed6dc ("vti6: Properly adjust vti6 MTU from MTU of lower device") Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-04-05vti6: better validate user provided tunnel namesEric Dumazet
Use valid_name() to make sure user does not provide illegal device name. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19vti6: Fix dev->max_mtu settingStefano Brivio
We shouldn't allow a tunnel to have IP_MAX_MTU as MTU, because another IPv6 header is going on top of our packets. Without this patch, we might end up building packets bigger than IP_MAX_MTU. Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-19vti6: Keep set MTU on link creation or change, validate itStefano Brivio
In vti6_link_config(), if MTU is already given on link creation or change, validate and use it instead of recomputing it. To do that, we need to propagate the knowledge that MTU was set by userspace all the way down to vti6_link_config(). To keep this simple, vti6_dev_init() sets the new 'keep_mtu' argument of vti6_link_config() to true: on initialization, we don't have convenient access to netlink attributes there, but we will anyway check whether dev->mtu is set in vti6_link_config(). If it's non-zero, it was set to the value of the IFLA_MTU attribute during creation. Otherwise, determine a reasonable value. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-19vti6: Properly adjust vti6 MTU from MTU of lower deviceStefano Brivio
If a lower device is found, we don't need to subtract LL_MAX_HEADER to calculate our MTU: just use its MTU, the link layer headers are already taken into account by it. If the lower device is not found, start from ETH_DATA_LEN instead, and only in this case subtract a worst-case LL_MAX_HEADER. We then need to subtract our additional IPv6 header from the calculation. While at it, note that vti6 doesn't have a hardware header, so it doesn't need to set dev->hard_header_len. And as vti6_link_config() now always sets the MTU, there's no need to set a default value in vti6_dev_setup(). This makes the behaviour consistent with IPv4 vti, after commit a32452366b72 ("vti4: Don't count header length twice."), which was accidentally reverted by merge commit f895f0cfbb77 ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec"). While commit 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") improved on the original situation, this was still not ideal. As reported in that commit message itself, if we start from an underlying veth MTU of 9000, we end up with an MTU of 8832, that is, 9000 - LL_MAX_HEADER - sizeof(ipv6hdr). This should simply be 8880, or 9000 - sizeof(ipv6hdr) instead: we found the lower device (veth) and we know we don't have any additional link layer header, so there's no need to subtract an hypothetical worst-case number. Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-04net/ipv6: Pass skb to route lookupDavid Ahern
IPv6 does path selection for multipath routes deep in the lookup functions. The next patch adds L4 hash option and needs the skb for the forward path. To get the skb to the relevant FIB lookup functions it needs to go through the fib rules layer, so add a lookup_data argument to the fib_lookup_arg struct. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27net: Convert vti6_net_opsKirill Tkhai
These pernet_operations are similar to ip6_tnl_net_ops. Exit method unregisters all net vti6 tunnels, and it looks like another pernet_operations are not interested in foreign net vti6 list. Init method registers netdevice. So, it's possible to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: don't call update_pmtu unconditionallyNicolas Dichtel
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-20ip6_vti: adjust vti mtu according to mtu of lower deviceAlexey Kodanev
LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over ip6_vti that require fragmentation and the underlying device has an MTU smaller than 1500 plus some extra space for headers. This happens because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating it depending on a destination address or link parameter. Further attempts to send UDP packets may succeed because pmtu gets updated on ICMPV6_PKT_TOOBIG in vti6_err(). In case the lower device has larger MTU size, e.g. 9000, ip6_vti works but not using the possible maximum size, output packets have 1500 limit. The above cases require manual MTU setup after ip6_vti creation. However ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev(). Here is the example when the lower device MTU is set to 9000: # ip a sh ltp_ns_veth2 ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ... inet 10.0.0.2/24 scope global ltp_ns_veth2 inet6 fd00::2/64 scope global # ip li add vti6 type vti6 local fd00::2 remote fd00::1 # ip li show vti6 vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ... link/tunnel6 fd00::2 peer fd00::1 After the patch: # ip li add vti6 type vti6 local fd00::2 remote fd00::1 # ip li show vti6 vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ... link/tunnel6 fd00::2 peer fd00::1 Reported-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmitAlexey Kodanev
When running LTP IPsec tests, KASan might report: BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti] Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0 ... Call Trace: <IRQ> dump_stack+0x63/0x89 print_address_description+0x7c/0x290 kasan_report+0x28d/0x370 ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti] __asan_report_load4_noabort+0x19/0x20 vti_tunnel_xmit+0xeee/0xff0 [ip_vti] ? vti_init_net+0x190/0x190 [ip_vti] ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 dev_hard_start_xmit+0x147/0x510 ? icmp_echo.part.24+0x1f0/0x210 __dev_queue_xmit+0x1394/0x1c60 ... Freed by task 0: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x70/0xc0 kmem_cache_free+0x81/0x1e0 kfree_skbmem+0xb1/0xe0 kfree_skb+0x75/0x170 kfree_skb_list+0x3e/0x60 __dev_queue_xmit+0x1298/0x1c60 dev_queue_xmit+0x10/0x20 neigh_resolve_output+0x3a8/0x740 ip_finish_output2+0x5c0/0xe70 ip_finish_output+0x4ba/0x680 ip_output+0x1c1/0x3a0 xfrm_output_resume+0xc65/0x13d0 xfrm_output+0x1e4/0x380 xfrm4_output_finish+0x5c/0x70 Can be fixed if we get skb->len before dst_output(). Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code") Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19ipv6: speedup ipv6 tunnels dismantleEric Dumazet
Implement exit_batch() method to dismantle more devices per round. (rtnl_lock() ... unregister_netdevice_many() ... rtnl_unlock()) Tested: $ cat add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait ; grep net_namespace /proc/slabinfo Before patch : $ time ./add_del_unshare.sh net_namespace 110 267 5504 1 2 : tunables 8 4 0 : slabdata 110 267 0 real 3m25.292s user 0m0.644s sys 0m40.153s After patch: $ time ./add_del_unshare.sh net_namespace 126 282 5504 1 2 : tunables 8 4 0 : slabdata 126 282 0 real 1m38.965s user 0m0.688s sys 0m37.017s Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18vti: revert flush x-netns xfrm cache when vti interface is removedFlorian Westphal
flow cache is removed in next commit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip6_tunnel: Allow policy-based routing through tunnelsCraig Gallek
This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-03-06 1) Fix lockdep splat on xfrm policy subsystem initialization. From Florian Westphal. 2) When using socket policies on IPv4-mapped IPv6 addresses, we access the flow informations of the wrong address family what leads to an out of bounds access. Fix this by using the family we get with the dst_entry, like we do it for the standard policy lookup. 3) vti6 can report a PMTU below IPV6_MIN_MTU. Fix this by adding a check for that before sending a ICMPV6_PKT_TOOBIG message. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-24vti6: return GRE_KEY for vti6David Forster
Align vti6 with vti by returning GRE_KEY flag. This enables iproute2 to display tunnel keys on "ip -6 tunnel show" Signed-off-by: David Forster <dforster@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-16vti6: Don't report path MTU below IPV6_MIN_MTU.Steffen Klassert
In vti6_xmit(), the check for IPV6_MIN_MTU before we send a ICMPV6_PKT_TOOBIG message is missing. So we might report a PMTU below 1280. Fix this by adding the required check. Fixes: ccd740cbc6e ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-27net/ipv6: support more tunnel interfaces for EUI64 link-local generationFelix Jia
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06vti6: fix device register to report IFLA_INFO_KINDDavid Forster
vti6 interface is registered before the rtnl_link_ops block is attached. As a result the resulting RTM_NEWLINK is missing IFLA_INFO_KIND. Re-order attachment of rtnl_link_ops block to fix. Signed-off-by: Dave Forster <dforster@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-04net: inet: Support UID-based routing in IP protocols.Lorenzo Colitti
- Use the UID in routing lookups made by protocol connect() and sendmsg() functions. - Make sure that routing lookups triggered by incoming packets (e.g., Path MTU discovery) take the UID of the socket into account. - For packets not associated with a userspace socket, (e.g., ping replies) use UID 0 inside the user namespace corresponding to the network namespace the socket belongs to. This allows all namespaces to apply routing and iptables rules to kernel-originated traffic in that namespaces by matching UID 0. This is better than using the UID of the kernel socket that is sending the traffic, because the UID of kernel sockets created at namespace creation time (e.g., the per-processor ICMP and TCP sockets) is the UID of the user that created the socket, which might not be mapped in the namespace. Tested: compiles allnoconfig, allyesconfig, allmodconfig Tested: https://android-review.googlesource.com/253302 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ipv4/6: use core net MTU range checkingJarod Wilson
ipv4/ip_tunnel: - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen - preserve all ndo_change_mtu checks for now to prevent regressions ipv6/ip6_tunnel: - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - preserve all ndo_change_mtu checks for now to prevent regressions ipv6/ip6_vti: - min_mtu = 1280, max_mtu = 65535 - remove redundant vti6_change_mtu ipv6/sit: - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen - remove redundant ipip6_tunnel_change_mtu CC: netdev@vger.kernel.org CC: "David S. Miller" <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-11vti6: flush x-netns xfrm cache when vti interface is removedNicolas Dichtel
This is the same fix than commit a5d0dc810abf ("vti: flush x-netns xfrm cache when vti interface is removed") This patch fixes a refcnt problem when a x-netns vti6 interface is removed: unregister_netdevice: waiting for vti6_test to become free. Usage count = 1 Here is a script to reproduce the problem: ip link set dev ntfp2 up ip addr add dev ntfp2 2001::1/64 ip link add vti6_test type vti6 local 2001::1 remote 2001::2 key 1 ip netns add secure ip link set vti6_test netns secure ip netns exec secure ip link set vti6_test up ip netns exec secure ip link s lo up ip netns exec secure ip addr add dev vti6_test 2003::1/64 ip -6 xfrm policy add dir out tmpl src 2001::1 dst 2001::2 proto esp \ mode tunnel mark 1 ip -6 xfrm policy add dir in tmpl src 2001::2 dst 2001::1 proto esp \ mode tunnel mark 1 ip xfrm state add src 2001::1 dst 2001::2 proto esp spi 1 mode tunnel \ enc des3_ede 0x112233445566778811223344556677881122334455667788 mark 1 ip xfrm state add src 2001::2 dst 2001::1 proto esp spi 1 mode tunnel \ enc des3_ede 0x112233445566778811223344556677881122334455667788 mark 1 ip netns exec secure ping6 -c 4 2003::2 ip netns del secure CC: Lance Richardson <lrichard@redhat.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2016-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-09-21vti6: fix input pathNicolas Dichtel
Since commit 1625f4529957, vti6 is broken, all input packets are dropped (LINUX_MIB_XFRMINNOSTATES is incremented). XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in xfrm6_rcv_spi(). A new function xfrm6_rcv_tnl() that enables to pass a value to xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function is used in several handlers). CC: Alexey Kodanev <alexey.kodanev@oracle.com> Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>