aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6
AgeCommit message (Collapse)Author
2020-10-01ipv6_route_seq_next should increase position indexVasily Averin
[ Upstream commit 4fc427e0515811250647d44de38d87d7b0e0790f ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-26net: ipv6: fix kconfig dependency warning for IPV6_SEG6_HMACNecip Fazil Yildiran
[ Upstream commit db7cd91a4be15e1485d6b58c6afc8761c59c4efb ] When IPV6_SEG6_HMAC is enabled and CRYPTO is disabled, it results in the following Kbuild warning: WARNING: unmet direct dependencies detected for CRYPTO_HMAC Depends on [n]: CRYPTO [=n] Selected by [y]: - IPV6_SEG6_HMAC [=y] && NET [=y] && INET [=y] && IPV6 [=y] WARNING: unmet direct dependencies detected for CRYPTO_SHA1 Depends on [n]: CRYPTO [=n] Selected by [y]: - IPV6_SEG6_HMAC [=y] && NET [=y] && INET [=y] && IPV6 [=y] WARNING: unmet direct dependencies detected for CRYPTO_SHA256 Depends on [n]: CRYPTO [=n] Selected by [y]: - IPV6_SEG6_HMAC [=y] && NET [=y] && INET [=y] && IPV6 [=y] The reason is that IPV6_SEG6_HMAC selects CRYPTO_HMAC, CRYPTO_SHA1, and CRYPTO_SHA256 without depending on or selecting CRYPTO while those configs are subordinate to CRYPTO. Honor the kconfig menu hierarchy to remove kconfig dependency warnings. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26ipv6: avoid lockdep issue in fib6_del()Eric Dumazet
[ Upstream commit 843d926b003ea692468c8cc5bea1f9f58dfa8c75 ] syzbot reported twice a lockdep issue in fib6_del() [1] which I think is caused by net->ipv6.fib6_null_entry having a NULL fib6_table pointer. fib6_del() already checks for fib6_null_entry special case, we only need to return earlier. Bug seems to occur very rarely, I have thus chosen a 'bug origin' that makes backports not too complex. [1] WARNING: suspicious RCU usage 5.9.0-rc4-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by syz-executor.5/8095: #0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401 #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline] #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312 #2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613 #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline] #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245 stack backtrace: CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996 fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180 fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150 fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230 __fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246 fib6_clean_all net/ipv6/ip6_fib.c:2257 [inline] fib6_run_gc+0x113/0x2d0 net/ipv6/ip6_fib.c:2320 ndisc_netdev_event+0x217/0x350 net/ipv6/ndisc.c:1805 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] dev_close_many+0x30b/0x650 net/core/dev.c:1634 rollback_registered_many+0x3a8/0x1210 net/core/dev.c:9261 rollback_registered net/core/dev.c:9329 [inline] unregister_netdevice_queue+0x2dd/0x570 net/core/dev.c:10410 unregister_netdevice include/linux/netdevice.h:2774 [inline] ppp_release+0x216/0x240 drivers/net/ppp/ppp_generic.c:403 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 421842edeaf6 ("net/ipv6: Add fib6_null_entry") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-12ipv6: Fix sysctl max for fib_multipath_hash_policyIdo Schimmel
[ Upstream commit 05d4487197b2b71d5363623c28924fd58c71c0b6 ] Cited commit added the possible value of '2', but it cannot be set. Fix it by adjusting the maximum value to '2'. This is consistent with the corresponding IPv4 sysctl. Before: # sysctl -w net.ipv6.fib_multipath_hash_policy=2 sysctl: setting key "net.ipv6.fib_multipath_hash_policy": Invalid argument net.ipv6.fib_multipath_hash_policy = 2 # sysctl net.ipv6.fib_multipath_hash_policy net.ipv6.fib_multipath_hash_policy = 0 After: # sysctl -w net.ipv6.fib_multipath_hash_policy=2 net.ipv6.fib_multipath_hash_policy = 2 # sysctl net.ipv6.fib_multipath_hash_policy net.ipv6.fib_multipath_hash_policy = 2 Fixes: d8f74f0975d8 ("ipv6: Support multipath hashing on inner IP pkts") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03netfilter: avoid ipv6 -> nf_defrag_ipv6 module dependencyFlorian Westphal
[ Upstream commit 2404b73c3f1a5f15726c6ecd226b56f6f992767f ] nf_ct_frag6_gather is part of nf_defrag_ipv6.ko, not ipv6 core. The current use of the netfilter ipv6 stub indirections causes a module dependency between ipv6 and nf_defrag_ipv6. This prevents nf_defrag_ipv6 module from being removed because ipv6 can't be unloaded. Remove the indirection and always use a direct call. This creates a depency from nf_conntrack_bridge to nf_defrag_ipv6 instead: modinfo nf_conntrack depends: nf_conntrack,nf_defrag_ipv6,bridge .. and nf_conntrack already depends on nf_defrag_ipv6 anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPYMark Tomlinson
[ Upstream commit 272502fcb7cda01ab07fc2fcff82d1d2f73d43cc ] When receiving an IPv4 packet inside an IPv6 GRE packet, and the IP6_TNL_F_RCV_DSCP_COPY flag is set on the tunnel, the IPv4 header would get corrupted. This is due to the common ip6_tnl_rcv() function assuming that the inner header is always IPv6. This patch checks the tunnel protocol for IPv4 inner packets, but still defaults to IPv6. Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11ipv6: Fix nexthop refcnt leak when creating ipv6 route infoXiyu Yang
[ Upstream commit 706ec919164622ff5ce822065472d0f30a9e9dd2 ] ip6_route_info_create() invokes nexthop_get(), which increases the refcount of the "nh". When ip6_route_info_create() returns, local variable "nh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of ip6_route_info_create(). When nexthops can not be used with source routing, the function forgets to decrease the refcnt increased by nexthop_get(), causing a refcnt leak. Fix this issue by pulling up the error source routing handling when nexthops can not be used with source routing. Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11ipv6: fix memory leaks on IPV6_ADDRFORM pathCong Wang
[ Upstream commit 8c0de6e96c9794cb523a516c465991a70245da1c ] IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket to IPv4, particularly struct ipv6_ac_socklist. Similar to struct ipv6_mc_socklist, we should just close it on this path. This bug can be easily reproduced with the following C program: #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <arpa/inet.h> int main() { int s, value; struct sockaddr_in6 addr; struct ipv6_mreq m6; s = socket(AF_INET6, SOCK_DGRAM, 0); addr.sin6_family = AF_INET6; addr.sin6_port = htons(5000); inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr); connect(s, (struct sockaddr *)&addr, sizeof(addr)); inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr); m6.ipv6mr_interface = 5; setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6)); value = AF_INET; setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value)); close(s); return 0; } Reported-by: ch3332xr@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31udp: Improve load balancing for SO_REUSEPORT.Kuniyuki Iwashima
[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ] Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index | connected | reciprocal_scale | result --------------------------------------------- 0 | no | 20% | 40% 1 | no | 20% | 20% 2 | yes | 20% | 0% 3 | no | 20% | 40% 4 | yes | 20% | 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31net: udp: Fix wrong clean up for IS_UDPLITE macroMiaohe Lin
[ Upstream commit b0a422772fec29811e293c7c0e6f991c0fd9241d ] We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked. Fixes: b2bf1e2659b1 ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31ip6_gre: fix null-ptr-deref in ip6gre_init_net()Wei Yongjun
[ Upstream commit 46ef5b89ec0ecf290d74c4aee844f063933c4da4 ] KASAN report null-ptr-deref error when register_netdev() failed: KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7] CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ #12 Call Trace: ip6gre_init_net+0x4ab/0x580 ? ip6gre_tunnel_uninit+0x3f0/0x3f0 ops_init+0xa8/0x3c0 setup_net+0x2de/0x7e0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? ops_init+0x3c0/0x3c0 ? kasan_unpoison_shadow+0x33/0x40 ? __kasan_kmalloc.constprop.0+0xc2/0xd0 copy_net_ns+0x27d/0x530 create_new_namespaces+0x382/0xa30 unshare_nsproxy_namespaces+0xa1/0x1d0 ksys_unshare+0x39c/0x780 ? walk_process_tree+0x2a0/0x2a0 ? trace_hardirqs_on+0x4a/0x1b0 ? _raw_spin_unlock_irq+0x1f/0x30 ? syscall_trace_enter+0x1a7/0x330 ? do_syscall_64+0x1c/0xa0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x56/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving 'ign->fb_tunnel_dev' to local variable ndev. Fixes: dafabb6590cb ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22ip: Fix SO_MARK in RST, ACK and ICMP packetsWillem de Bruijn
[ Upstream commit 0da7536fb47f51df89ccfcb1fa09f249d9accec5 ] When no full socket is available, skbs are sent over a per-netns control socket. Its sk_mark is temporarily adjusted to match that of the real (request or timewait) socket or to reflect an incoming skb, so that the outgoing skb inherits this in __ip_make_skb. Introduction of the socket cookie mark field broke this. Now the skb is set through the cookie and cork: <caller> # init sockc.mark from sk_mark or cmsg ip_append_data ip_setup_cork # convert sockc.mark to cork mark ip_push_pending_frames ip_finish_skb __ip_make_skb # set skb->mark to cork mark But I missed these special control sockets. Update all callers of __ip(6)_make_skb that were originally missed. For IPv6, the same two icmp(v6) paths are affected. The third case is not, as commit 92e55f412cff ("tcp: don't annotate mark on control socket from tcp_v6_send_response()") replaced the ctl_sk->sk_mark with passing the mark field directly as a function argument. That commit predates the commit that introduced the bug. Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg") Signed-off-by: Willem de Bruijn <willemb@google.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22ipv6: Fix use of anycast address with loopbackDavid Ahern
[ Upstream commit aea23c323d89836bcdcee67e49def997ffca043b ] Thomas reported a regression with IPv6 and anycast using the following reproducer: echo 1 > /proc/sys/net/ipv6/conf/all/forwarding ip -6 a add fc12::1/16 dev lo sleep 2 echo "pinging lo" ping6 -c 2 fc12:: The conversion of addrconf_f6i_alloc to use ip6_route_info_create missed the use of fib6_is_reject which checks addresses added to the loopback interface and sets the REJECT flag as needed. Update fib6_is_reject for loopback checks to handle RTF_ANYCAST addresses. Fixes: c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create") Reported-by: thomas.gambier@nexedi.com Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22ipv6: fib6_select_path can not use out path for nexthop objectsDavid Ahern
[ Upstream commit 34fe5a1cf95c3f114068fc16d919c9cf4b00e428 ] Brian reported a crash in IPv6 code when using rpfilter with a setup running FRR and external nexthop objects. The root cause of the crash is fib6_select_path setting fib6_nh in the result to NULL because of an improper check for nexthop objects. More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif set causing fib6_select_path to be called with have_oif_match set. fib6_select_path has early check on have_oif_match and jumps to the out label which presumes a builtin fib6_nh. This path is invalid for nexthop objects; for external nexthops fib6_select_path needs to just return if the fib6_nh has already been set in the result otherwise it returns after the call to nexthop_path_fib6_result. Update the check on have_oif_match to not bail on external nexthops. Update selftests for this problem. Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Reported-by: Brian Rak <brak@choopa.com> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()Taehee Yoo
[ Upstream commit dafabb6590cb15f300b77c095d50312e2c7c8e0f ] In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses fallback tunnel device pointer, which is fb_tunnel_dev. This pointer variable should be set to NULL when a fb interface is deleted. But there is no routine to set fb_tunnel_dev pointer to NULL. So, this pointer will be still used after interface is deleted and it eventually results in the use-after-free problem. Test commands: ip netns add A ip netns add B ip link add eth0 type veth peer name eth1 ip link set eth0 netns A ip link set eth1 netns B ip netns exec A ip link set lo up ip netns exec A ip link set eth0 up ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \ remote fc:0::2 ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1 ip netns exec A ip link set ip6gre1 up ip netns exec A ip -6 a a fc:0::1/64 dev eth0 ip netns exec A ip link set ip6gre0 up ip netns exec B ip link set lo up ip netns exec B ip link set eth1 up ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \ remote fc:0::1 ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1 ip netns exec B ip link set ip6gre1 up ip netns exec B ip -6 a a fc:0::2/64 dev eth1 ip netns exec B ip link set ip6gre0 up ip netns exec A ping fc:100::2 -s 60000 & ip netns del B Splat looks like: [ 73.087285][ C1] BUG: KASAN: use-after-free in ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre] [ 73.088361][ C1] Read of size 4 at addr ffff888040559218 by task ping/1429 [ 73.089317][ C1] [ 73.089638][ C1] CPU: 1 PID: 1429 Comm: ping Not tainted 5.7.0+ #602 [ 73.090531][ C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 73.091725][ C1] Call Trace: [ 73.092160][ C1] <IRQ> [ 73.092556][ C1] dump_stack+0x96/0xdb [ 73.093122][ C1] print_address_description.constprop.6+0x2cc/0x450 [ 73.094016][ C1] ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre] [ 73.094894][ C1] ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre] [ 73.095767][ C1] ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre] [ 73.096619][ C1] kasan_report+0x154/0x190 [ 73.097209][ C1] ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre] [ 73.097989][ C1] ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre] [ 73.098750][ C1] ? gre_del_protocol+0x60/0x60 [gre] [ 73.099500][ C1] gre_rcv+0x1c5/0x1450 [ip6_gre] [ 73.100199][ C1] ? ip6gre_header+0xf00/0xf00 [ip6_gre] [ 73.100985][ C1] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 73.101830][ C1] ? ip6_input_finish+0x5/0xf0 [ 73.102483][ C1] ip6_protocol_deliver_rcu+0xcbb/0x1510 [ 73.103296][ C1] ip6_input_finish+0x5b/0xf0 [ 73.103920][ C1] ip6_input+0xcd/0x2c0 [ 73.104473][ C1] ? ip6_input_finish+0xf0/0xf0 [ 73.105115][ C1] ? rcu_read_lock_held+0x90/0xa0 [ 73.105783][ C1] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 73.106548][ C1] ipv6_rcv+0x1f1/0x300 [ ... ] Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30mld: fix memory leak in ipv6_mc_destroy_dev()Wang Hai
[ Upstream commit ea2fce88d2fd678ed9d45354ff49b73f1d5615dd ] Commit a84d01647989 ("mld: fix memory leak in mld_del_delrec()") fixed the memory leak of MLD, but missing the ipv6_mc_destroy_dev() path, in which mca_sources are leaked after ma_put(). Using ip6_mc_clear_src() to take care of the missing free. BUG: memory leak unreferenced object 0xffff8881113d3180 (size 64): comm "syz-executor071", pid 389, jiffies 4294887985 (age 17.943s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff 02 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 ................ backtrace: [<000000002cbc483c>] kmalloc include/linux/slab.h:555 [inline] [<000000002cbc483c>] kzalloc include/linux/slab.h:669 [inline] [<000000002cbc483c>] ip6_mc_add1_src net/ipv6/mcast.c:2237 [inline] [<000000002cbc483c>] ip6_mc_add_src+0x7f5/0xbb0 net/ipv6/mcast.c:2357 [<0000000058b8b1ff>] ip6_mc_source+0xe0c/0x1530 net/ipv6/mcast.c:449 [<000000000bfc4fb5>] do_ipv6_setsockopt.isra.12+0x1b2c/0x3b30 net/ipv6/ipv6_sockglue.c:754 [<00000000e4e7a722>] ipv6_setsockopt+0xda/0x150 net/ipv6/ipv6_sockglue.c:950 [<0000000029260d9a>] rawv6_setsockopt+0x45/0x100 net/ipv6/raw.c:1081 [<000000005c1b46f9>] __sys_setsockopt+0x131/0x210 net/socket.c:2132 [<000000008491f7db>] __do_sys_setsockopt net/socket.c:2148 [inline] [<000000008491f7db>] __se_sys_setsockopt net/socket.c:2145 [inline] [<000000008491f7db>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2145 [<00000000c7bc11c5>] do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295 [<000000005fb7a3f3>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-17ipv6: fix IPV6_ADDRFORM operation logicHangbin Liu
[ Upstream commit 79a1f0ccdbb4ad700590f61b00525b390cb53905 ] Socket option IPV6_ADDRFORM supports UDP/UDPLITE and TCP at present. Previously the checking logic looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol != IPPROTO_TCP) break; After commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation"), TCP was blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; break; else break; Then after commit 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") UDP/UDPLITE were blocked as the logic changed to: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; if (sk->sk_protocol == IPPROTO_TCP) do_some_check; if (sk->sk_protocol != IPPROTO_TCP) break; Fix it by using Eric's code and simply remove the break in TCP check, which looks like: if (sk->sk_protocol == IPPROTO_UDP || sk->sk_protocol == IPPROTO_UDPLITE) do_some_check; else if (sk->sk_protocol == IPPROTO_TCP) do_some_check; else break; Fixes: 82c9ae440857 ("ipv6: fix restrict IPV6_ADDRFORM operation") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03esp6: get the right proto for transport mode in esp6_gso_encapXin Long
commit 3c96ec56828922e3fe5477f75eb3fc02f98f98b5 upstream. For transport mode, when ipv6 nexthdr is set, the packet format might be like: ---------------------------------------------------- | | dest | | | | ESP | ESP | | IP6 hdr| opts.| ESP | TCP | Data | Trailer | ICV | ---------------------------------------------------- What it wants to get for x-proto in esp6_gso_encap() is the proto that will be set in ESP nexthdr. So it should skip all ipv6 nexthdrs and get the real transport protocol. Othersize, the wrong proto number will be set into ESP nexthdr. This patch is to skip all ipv6 nexthdrs by calling ipv6_skip_exthdr() in esp6_gso_encap(). Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03xfrm: remove the xfrm_state_put call becofe going to out_resetXin Long
commit db87668ad1e4917cfe04e217307ba6ed9390716e upstream. This xfrm_state_put call in esp4/6_gro_receive() will cause double put for state, as in out_reset path secpath_reset() will put all states set in skb sec_path. So fix it by simply remove the xfrm_state_put call. Fixes: 6ed69184ed9c ("xfrm: Reset secpath in xfrm failure") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03net: don't return invalid table id error when we fall back to PF_UNSPECSabrina Dubroca
[ Upstream commit 41b4bd986f86331efc599b9a3f5fb86ad92e9af9 ] In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're in the same situation as if userspace had requested a PF_UNSPEC dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all the registered RTM_GETROUTE handlers. The requested table id may or may not exist for all of those families. commit ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") fixed the problem when userspace explicitly requests a PF_UNSPEC dump, but missed the fallback case. For example, when we pass ipv6.disable=1 to a kernel with CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y, the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in rtnl_dump_all, and listing IPv6 routes will unexpectedly print: # ip -6 r Error: ipv4: MR table does not exist. Dump terminated commit ae677bbb4441 introduced the dump_all_families variable, which gets set when userspace requests a PF_UNSPEC dump. However, we can't simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the fallback case to get dump_all_families == true, because some messages types (for example RTM_GETRULE and RTM_GETNEIGH) only register the PF_UNSPEC handler and use the family to filter in the kernel what is dumped to userspace. We would then export more entries, that userspace would have to filter. iproute does that, but other programs may not. Instead, this patch removes dump_all_families and updates the RTM_GETROUTE handlers to check if the family that is being dumped is their own. When it's not, which covers both the intentional PF_UNSPEC dumps (as dump_all_families did) and the fallback case, ignore the missing table id error. Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"Maciej Żenczykowski
[ Upstream commit 09454fd0a4ce23cb3d8af65066c91a1bf27120dd ] This reverts commit 19bda36c4299ce3d7e5bce10bebe01764a655a6d: | ipv6: add mtu lock check in __ip6_rt_update_pmtu | | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu. | It leaded to that mtu lock doesn't really work when receiving the pkt | of ICMPV6_PKT_TOOBIG. | | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4 | did in __ip_rt_update_pmtu. The above reasoning is incorrect. IPv6 *requires* icmp based pmtu to work. There's already a comment to this effect elsewhere in the kernel: $ git grep -p -B1 -A3 'RTAX_MTU lock' net/ipv6/route.c=4813= static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg) ... /* In IPv6 pmtu discovery is not optional, so that RTAX_MTU lock cannot disable it. We still use this lock to block changes caused by addrconf/ndisc. */ This reverts to the pre-4.9 behaviour. Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Fixes: 19bda36c4299 ("ipv6: add mtu lock check in __ip6_rt_update_pmtu") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20netlabel: cope with NULL catmapPaolo Abeni
[ Upstream commit eead1c2ea2509fd754c6da893a94f0e69e83ebe4 ] The cipso and calipso code can set the MLS_CAT attribute on successful parsing, even if the corresponding catmap has not been allocated, as per current configuration and external input. Later, selinux code tries to access the catmap if the MLS_CAT flag is present via netlbl_catmap_getlong(). That may cause null ptr dereference while processing incoming network traffic. Address the issue setting the MLS_CAT flag only if the catmap is really allocated. Additionally let netlbl_catmap_getlong() cope with NULL catmap. Reported-by: Matthew Sheets <matthew.sheets@gd-ms.com> Fixes: 4b8feff251da ("netlabel: fix the horribly broken catmap functions") Fixes: ceba1832b1b2 ("calipso: Set the calipso socket label to match the secattr.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14ipv6: Use global sernum for dst validation with nexthop objectsDavid Ahern
[ Upstream commit 8f34e53b60b337e559f1ea19e2780ff95ab2fa65 ] Nik reported a bug with pcpu dst cache when nexthop objects are used illustrated by the following: $ ip netns add foo $ ip -netns foo li set lo up $ ip -netns foo addr add 2001:db8:11::1/128 dev lo $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1 $ ip li add veth1 type veth peer name veth2 $ ip li set veth1 up $ ip addr add 2001:db8:10::1/64 dev veth1 $ ip li set dev veth2 netns foo $ ip -netns foo li set veth2 up $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2 $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1 $ ip -6 route add 2001:db8:11::1/128 nhid 100 Create a pcpu entry on cpu 0: $ taskset -a -c 0 ip -6 route get 2001:db8:11::1 Re-add the route entry: $ ip -6 ro del 2001:db8:11::1 $ ip -6 route add 2001:db8:11::1/128 nhid 100 Route get on cpu 0 returns the stale pcpu: $ taskset -a -c 0 ip -6 route get 2001:db8:11::1 RTNETLINK answers: Network is unreachable While cpu 1 works: $ taskset -a -c 1 ip -6 route get 2001:db8:11::1 2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium Conversion of FIB entries to work with external nexthop objects missed an important difference between IPv4 and IPv6 - how dst entries are invalidated when the FIB changes. IPv4 has a per-network namespace generation id (rt_genid) that is bumped on changes to the FIB. Checking if a dst_entry is still valid means comparing rt_genid in the rtable to the current value of rt_genid for the namespace. IPv6 also has a per network namespace counter, fib6_sernum, but the count is saved per fib6_node. With the per-node counter only dst_entries based on fib entries under the node are invalidated when changes are made to the routes - limiting the scope of invalidations. IPv6 uses a reference in the rt6_info, 'from', to track the corresponding fib entry used to create the dst_entry. When validating a dst_entry, the 'from' is used to backtrack to the fib6_node and check the sernum of it to the cookie passed to the dst_check operation. With the inline format (nexthop definition inline with the fib6_info), dst_entries cached in the fib6_nh have a 1:1 correlation between fib entries, nexthop data and dst_entries. With external nexthops, IPv6 looks more like IPv4 which means multiple fib entries across disparate fib6_nodes can all reference the same fib6_nh. That means validation of dst_entries based on external nexthops needs to use the IPv4 format - the per-network namespace counter. Add sernum to rt6_info and set it when creating a pcpu dst entry. Update rt6_get_cookie to return sernum if it is set and update dst_check for IPv6 to look for sernum set and based the check on it if so. Finally, rt6_get_pcpu_route needs to validate the cached entry before returning a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and __mkroute_output for IPv4). This problem only affects routes using the new, external nexthops. Thanks to the kbuild test robot for catching the IS_ENABLED needed around rt_genid_ipv6 before I sent this out. Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finishDavid Ahern
[ Upstream commit 0c922a4850eba2e668f73a3f1153196e09abb251 ] IPSKB_XFRM_TRANSFORMED and IP6SKB_XFRM_TRANSFORMED are skb flags set by xfrm code to tell other skb handlers that the packet has been passed through the xfrm output functions. Simplify the code and just always set them rather than conditionally based on netfilter enabled thus making the flag available for other users. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29ipv6: fix restrict IPV6_ADDRFORM operationJohn Haxby
[ Upstream commit 82c9ae440857840c56e05d4fb1427ee032531346 ] Commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a problem found by syzbot an unfortunate logic error meant that it also broke IPV6_ADDRFORM. Rearrange the checks so that the earlier test is just one of the series of checks made before moving the socket from IPv6 to IPv4. Fixes: b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") Signed-off-by: John Haxby <john.haxby@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13ipv6: don't auto-add link-local address to lag portsJarod Wilson
[ Upstream commit 744fdc8233f6aa9582ce08a51ca06e59796a3196 ] Bonding slave and team port devices should not have link-local addresses automatically added to them, as it can interfere with openvswitch being able to properly add tc ingress. Basic reproducer, courtesy of Marcelo: $ ip link add name bond0 type bond $ ip link set dev ens2f0np0 master bond0 $ ip link set dev ens2f1np2 master bond0 $ ip link set dev bond0 up $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link valid_lft forever preferred_lft forever (above trimmed to relevant entries, obviously) $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0 net.ipv6.conf.ens2f0np0.addr_gen_mode = 0 $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0 net.ipv6.conf.ens2f1np2.addr_gen_mode = 0 $ ip a l ens2f0np0 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever $ ip a l ens2f1np2 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is this a slave interface?" check added by commit c2edacf80e15, and results in an address getting added, while w/the proposed patch added, no address gets added. This simply adds the same gating check to another code path, and thus should prevent the same devices from erroneously obtaining an ipv6 link-local address. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Reported-by: Moshe Levi <moshele@mellanox.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Marcelo Ricardo Leitner <mleitner@redhat.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01vti6: Fix memory leak of skb if input policy check failsTorsten Hilbrich
commit 2a9de3af21aa8c31cd68b0b39330d69f8c1e59df upstream. The vti6_rcv function performs some tests on the retrieved tunnel including checking the IP protocol, the XFRM input policy, the source and destination address. In all but one places the skb is released in the error case. When the input policy check fails the network packet is leaked. Using the same goto-label discard in this case to fix this problem. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01vti[6]: fix packet tx through bpf_redirect() in XinY casesNicolas Dichtel
commit f1ed10264ed6b66b9cd5e8461cffce69be482356 upstream. I forgot the 4in6/6in4 cases in my previous patch. Let's fix them. Fixes: 95224166a903 ("vti[6]: fix packet tx through bpf_redirect()") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18ipv6: restrict IPV6_ADDRFORM operationEric Dumazet
commit b6f6118901d1e867ac9177bbff3b00b185bd4fdc upstream. IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one. While this operation sounds illogical, we have to support it. One of the things it does for TCP socket is to switch sk->sk_prot to tcp_prot. We now have other layers playing with sk->sk_prot, so we should make sure to not interfere with them. This patch makes sure sk_prot is the default pointer for TCP IPv6 socket. syzbot reported : BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD a0113067 P4D a0113067 PUD a8771067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:0x0 Code: Bad RIP value. RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246 RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40 RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5 R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098 R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000 FS: 00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427 __sock_release net/socket.c:605 [inline] sock_close+0xe1/0x260 net/socket.c:1283 __fput+0x2e4/0x740 fs/file_table.c:280 ____fput+0x15/0x20 fs/file_table.c:313 task_work_run+0x176/0x1b0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop arch/x86/entry/common.c:164 [inline] prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195 syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278 do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45c429 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f2ae75dac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: 0000000000000000 RBX: 00007f2ae75db6d4 RCX: 000000000045c429 RDX: 0000000000000001 RSI: 000000000000011a RDI: 0000000000000004 RBP: 000000000076bf20 R08: 0000000000000038 R09: 0000000000000000 R10: 0000000020000180 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000a9d R14: 00000000004ccfb4 R15: 000000000076bf2c Modules linked in: CR2: 0000000000000000 ---[ end trace 82567b5207e87bae ]--- RIP: 0010:0x0 Code: Bad RIP value. RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246 RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40 RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5 R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098 R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000 FS: 00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18net/ipv6: remove the old peer route if change it to a new oneHangbin Liu
[ Upstream commit d0098e4c6b83e502cc1cd96d67ca86bc79a6c559 ] When we modify the peer route and changed it to a new one, we should remove the old route first. Before the fix: + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::2 proto kernel metric 256 pref medium + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::2 proto kernel metric 256 pref medium After the fix: + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::3 proto kernel metric 256 pref medium This patch depend on the previous patch "net/ipv6: need update peer route when modify metric" to update new peer route after delete old one. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18net/ipv6: need update peer route when modify metricHangbin Liu
[ Upstream commit 617940123e0140521f3080d2befc2bf55bcda094 ] When we modify the route metric, the peer address's route need also be updated. Before the fix: + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 60 pref medium 2001:db8::2 proto kernel metric 60 pref medium + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 61 pref medium 2001:db8::2 proto kernel metric 60 pref medium After the fix: + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 61 pref medium 2001:db8::2 proto kernel metric 61 pref medium Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18net/ipv6: use configured metric when add peer routeHangbin Liu
[ Upstream commit 07758eb9ff52794fba15d03aa88d92dbd1b7d125 ] When we add peer address with metric configured, IPv4 could set the dest metric correctly, but IPv6 do not. e.g. ]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20 ]# ip route show dev eth1 192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20 ]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20 ]# ip -6 route show dev eth1 2001:db8::1 proto kernel metric 20 pref medium 2001:db8::2 proto kernel metric 256 pref medium Fix this by using configured metric instead of default one. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interfaceHangbin Liu
[ Upstream commit 60380488e4e0b95e9e82aa68aa9705baa86de84c ] Rafał found an issue that for non-Ethernet interface, if we down and up frequently, the memory will be consumed slowly. The reason is we add allnodes/allrouters addressed in multicast list in ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up() for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb getting bigger and bigger. The call stack looks like: addrconf_notify(NETDEV_REGISTER) ipv6_add_dev ipv6_dev_mc_inc(ff01::1) ipv6_dev_mc_inc(ff02::1) ipv6_dev_mc_inc(ff02::2) addrconf_notify(NETDEV_UP) addrconf_dev_config /* Alas, we support only Ethernet autoconfiguration. */ return; addrconf_notify(NETDEV_DOWN) addrconf_ifdown ipv6_mc_down igmp6_group_dropped(ff02::2) mld_add_delrec(ff02::2) igmp6_group_dropped(ff02::1) igmp6_group_dropped(ff01::1) After investigating, I can't found a rule to disable multicast on non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM, tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up() in inetdev_event(). Even for IPv6, we don't check the dev type and call ipv6_add_dev(), ipv6_dev_mc_inc() after register device. So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for non-Ethernet interface. v2: Also check IFF_MULTICAST flag to make sure the interface supports multicast Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <zajec5@gmail.com> Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels") Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05ipv6: Fix nlmsg_flags when splitting a multipath routeBenjamin Poirier
[ Upstream commit afecdb376bd81d7e16578f0cfe82a1aec7ae18f3 ] When splitting an RTA_MULTIPATH request into multiple routes and adding the second and later components, we must not simply remove NLM_F_REPLACE but instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink message was malformed. For example, ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \ nexthop via fe80::30:2 dev dummy0 results in the following warnings: [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to what it would get if the multipath route had been added in multiple netlink operations: ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05ipv6: Fix route replacement with dev-only routeBenjamin Poirier
[ Upstream commit e404b8c7cfb31654c9024d497cec58a501501692 ] After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no longer possible to replace an ECMP-able route by a non ECMP-able route. For example, ip route add 2001:db8::1/128 via fe80::1 dev dummy0 ip route replace 2001:db8::1/128 dev dummy0 does not work as expected. Tweak the replacement logic so that point 3 in the log of the above commit becomes: 3. If the new route is not ECMP-able, and no matching non-ECMP-able route exists, replace matching ECMP-able route (if any) or add the new route. We can now summarize the entire replace semantics to: When doing a replace, prefer replacing a matching route of the same "ECMP-able-ness" as the replace argument. If there is no such candidate, fallback to the first route found. Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11ipv6/addrconf: fix potential NULL deref in inet6_set_link_af()Eric Dumazet
[ Upstream commit db3fa271022dacb9f741b96ea4714461a8911bb9 ] __in6_dev_get(dev) called from inet6_set_link_af() can return NULL. The needed check has been recently removed, let's add it back. While do_setlink() does call validate_linkmsg() : ... err = validate_linkmsg(dev, tb); /* OK at this point */ ... It is possible that the following call happening before the ->set_link_af() removes IPv6 if MTU is less than 1280 : if (tb[IFLA_MTU]) { err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack); if (err < 0) goto errout; status |= DO_SETLINK_MODIFIED; } ... if (tb[IFLA_AF_SPEC]) { ... err = af_ops->set_link_af(dev, af); ->inet6_set_link_af() // CRASH because idev is NULL Please note that IPv4 is immune to the bug since inet_set_link_af() does : struct in_device *in_dev = __in_dev_get_rcu(dev); if (!in_dev) return -EAFNOSUPPORT; This problem has been mentioned in commit cf7afbfeb8ce ("rtnl: make link af-specific updates atomic") changelog : This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7] CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754 rtnl_group_changelink net/core/rtnetlink.c:3103 [inline] __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4402e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8 R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70 R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace cfa7664b8fdcdff3 ]--- RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 7dc2bccab0ee ("Validate required parameters in inet6_validate_link_af") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-and-reported-by: syzbot <syzkaller@googlegroups.com> Cc: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-05vti[6]: fix packet tx through bpf_redirect()Nicolas Dichtel
[ Upstream commit 95224166a9032ff5d08fca633d37113078ce7d01 ] With an ebpf program that redirects packets through a vti[6] interface, the packets are dropped because no dst is attached. This could also be reproduced with an AF_PACKET socket, with the following python script (vti1 is an ip_vti interface): import socket send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0) # scapy # p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request') # raw(p) req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00' send_s.sendto(req, ('vti1', 0x800, 0, 0)) Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29xfrm: support output_mark for offload ESP packetsUlrich Weber
commit 4e4362d2bf2a49ff44dbbc9585207977ca3d71d0 upstream. Commit 9b42c1f179a6 ("xfrm: Extend the output_mark") added output_mark support but missed ESP offload support. xfrm_smark_get() is not called within xfrm_input() for packets coming from esp4_gro_receive() or esp6_gro_receive(). Therefore call xfrm_smark_get() directly within these functions. Fixes: 9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.") Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29net, ip6_tunnel: fix namespaces moveWilliam Dauchy
[ Upstream commit 5311a69aaca30fa849c3cc46fb25f75727fb72d0 ] in the same manner as commit d0f418516022 ("net, ip_tunnel: fix namespaces move"), fix namespace moving as it was broken since commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnel"), but for ipv6 this time; there is no reason to keep it for ip6_tunnel. Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnel") Signed-off-by: William Dauchy <w.dauchy@criteo.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29net: ip6_gre: fix moving ip6gre between namespacesNiko Kortstrom
[ Upstream commit 690afc165bb314354667f67157c1a1aea7dc797a ] Support for moving IPv4 GRE tunnels between namespaces was added in commit b57708add314 ("gre: add x-netns support"). The respective change for IPv6 tunnels, commit 22f08069e8b4 ("ip6gre: add x-netns support") did not drop NETIF_F_NETNS_LOCAL flag so moving them from one netns to another is still denied in IPv6 case. Drop NETIF_F_NETNS_LOCAL flag from ip6gre tunnels to allow moving ip6gre tunnel endpoints between network namespaces. Signed-off-by: Niko Kortstrom <niko.kortstrom@nokia.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29ipv6: sr: remove SKB_GSO_IPXIP6 on End.D* actionsYuki Taguchi
[ Upstream commit 62ebaeaedee7591c257543d040677a60e35c7aec ] After LRO/GRO is applied, SRv6 encapsulated packets have SKB_GSO_IPXIP6 feature flag, and this flag must be removed right after decapulation procedure. Currently, SKB_GSO_IPXIP6 flag is not removed on End.D* actions, which creates inconsistent packet state, that is, a normal TCP/IP packets have the SKB_GSO_IPXIP6 flag. This behavior can cause unexpected fallback to GSO on routing to netdevices that do not support SKB_GSO_IPXIP6. For example, on inter-VRF forwarding, decapsulated packets separated into small packets by GSO because VRF devices do not support TSO for packets with SKB_GSO_IPXIP6 flag, and this degrades forwarding performance. This patch removes encapsulation related GSO flags from the skb right after the End.D* action is applied. Fixes: d7a669dd2f8b ("ipv6: sr: add helper functions for seg6local") Signed-off-by: Yuki Taguchi <tagyounit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is setHangbin Liu
[ Upstream commit 2beb6d2901a3f73106485d560c49981144aeacb1 ] In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers") we add strict check for inet6_rtm_getaddr(). But we did the invalid header values check before checking if NETLINK_F_STRICT_CHK is set. This may break backwards compatibility if user already set the ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code. I didn't move the nlmsg_len check because I thought it's a valid check. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04sit: do not confirm neighbor when do pmtu updateHangbin Liu
[ Upstream commit 4d42df46d6372ece4cb4279870b46c2ea7304a47 ] When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04vti: do not confirm neighbor when do pmtu updateHangbin Liu
[ Upstream commit 8247a79efa2f28b44329f363272550c1738377de ] When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. Although vti and vti6 are immune to this problem because they are IFF_NOARP interfaces, as Guillaume pointed. There is still no sense to confirm neighbour here. v5: Update commit description. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04tunnel: do not confirm neighbor when do pmtu updateHangbin Liu
[ Upstream commit 7a1592bcb15d71400a98632727791d1e68ea0ee8 ] When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. v5: No Change. v4: Update commit description v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP") Reviewed-by: Guillaume Nault <gnault@redhat.com> Tested-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04ip6_gre: do not confirm neighbor when do pmtu updateHangbin Liu
[ Upstream commit 675d76ad0ad5bf41c9a129772ef0aba8f57ea9a7 ] When we do ipv6 gre pmtu update, we will also do neigh confirm currently. This will cause the neigh cache be refreshed and set to REACHABLE before xmit. But if the remote mac address changed, e.g. device is deleted and recreated, we will not able to notice this and still use the old mac address as the neigh cache is REACHABLE. Fix this by disable neigh confirm when do pmtu update v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04net: add bool confirm_neigh parameter for dst_ops.update_pmtuHangbin Liu
[ Upstream commit bd085ef678b2cc8c38c105673dfe8ff8f5ec0c57 ] The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-18net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca
[ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ] ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-18net: ipv6: add net argument to ip6_dst_lookup_flowSabrina Dubroca
[ Upstream commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e ] This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow, as some modules currently pass a net argument without a socket to ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument"). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-21net-ipv6: IPV6_TRANSPARENT - check NET_RAW prior to NET_ADMINMaciej Żenczykowski
NET_RAW is less dangerous, so more likely to be available to a process, so check it first to prevent some spurious logging. This matches IP_TRANSPARENT which checks NET_RAW first. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>