aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/macvlan.c
AgeCommit message (Collapse)Author
2020-09-03macvlan: validate setting of multiple remote source MAC addressesAlvin Šipraga
[ Upstream commit 8b61fba503904acae24aeb2bd5569b4d6544d48f ] Remote source MAC addresses can be set on a 'source mode' macvlan interface via the IFLA_MACVLAN_MACADDR_DATA attribute. This commit tightens the validation of these MAC addresses to match the validation already performed when setting or adding a single MAC address via the IFLA_MACVLAN_MACADDR attribute. iproute2 uses IFLA_MACVLAN_MACADDR_DATA for its 'macvlan macaddr set' command, and IFLA_MACVLAN_MACADDR for its 'macvlan macaddr add' command, which demonstrates the inconsistent behaviour that this commit addresses: # ip link add link eth0 name macvlan0 type macvlan mode source # ip link set link dev macvlan0 type macvlan macaddr add 01:00:00:00:00:00 RTNETLINK answers: Cannot assign requested address # ip link set link dev macvlan0 type macvlan macaddr set 01:00:00:00:00:00 # ip -d link show macvlan0 5: macvlan0@eth0: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 ... link/ether 2e:ac:fd:2d:69:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 macvlan mode source remotes (1) 01:00:00:00:00:00 numtxqueues 1 ... With this change, the 'set' command will (rightly) fail in the same way as the 'add' command. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22macvlan: Skip loopback packets in RX handlerAlexander Sverdlin
[ Upstream commit 81f3dc9349ce0bf7b8447f147f45e70f0a5b36a6 ] Ignore loopback-originatig packets soon enough and don't try to process L2 header where it doesn't exist. The very similar br_handle_frame() in bridge code performs exactly the same check. This is an example of such ICMPv6 packet: skb len=96 headroom=40 headlen=96 tailroom=56 mac=(40,0) net=(40,40) trans=80 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0) hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24 dev name=etha01.212 feat=0x0x0000000040005000 skb headroom: 00000000: 00 7c 86 52 84 88 ff ff 00 00 00 00 00 00 08 00 skb headroom: 00000010: 45 00 00 9e 5d 5c 40 00 40 11 33 33 00 00 00 01 skb headroom: 00000020: 02 40 43 80 00 00 86 dd skb linear: 00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00 skb linear: 00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00 skb linear: 00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d skb linear: 00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c skb linear: 00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81 skb linear: 00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00 skb tailroom: 00000000: ... skb tailroom: 00000010: ... skb tailroom: 00000020: ... skb tailroom: 00000030: ... Call Trace, how it happens exactly: ... macvlan_handle_frame+0x321/0x425 [macvlan] ? macvlan_forward_source+0x110/0x110 [macvlan] __netif_receive_skb_core+0x545/0xda0 ? enqueue_task_fair+0xe5/0x8e0 ? __netif_receive_skb_one_core+0x36/0x70 __netif_receive_skb_one_core+0x36/0x70 process_backlog+0x97/0x140 net_rx_action+0x1eb/0x350 ? __hrtimer_run_queues+0x136/0x2e0 __do_softirq+0xe3/0x383 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq.part.4+0x4e/0x50 netif_rx_ni+0x60/0xd0 dev_loopback_xmit+0x83/0xf0 ip6_finish_output2+0x575/0x590 [ipv6] ? ip6_cork_release.isra.1+0x64/0x90 [ipv6] ? __ip6_make_skb+0x38d/0x680 [ipv6] ? ip6_output+0x6c/0x140 [ipv6] ip6_output+0x6c/0x140 [ipv6] ip6_send_skb+0x1e/0x60 [ipv6] rawv6_sendmsg+0xc4b/0xe10 [ipv6] ? proc_put_long+0xd0/0xd0 ? rw_copy_check_uvector+0x4e/0x110 ? sock_sendmsg+0x36/0x40 sock_sendmsg+0x36/0x40 ___sys_sendmsg+0x2b6/0x2d0 ? proc_dointvec+0x23/0x30 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6] ? dev_forward_change+0x130/0x130 [ipv6] ? _raw_spin_unlock+0x12/0x30 ? proc_sys_call_handler.isra.14+0x9f/0x110 ? __call_rcu+0x213/0x510 ? get_max_files+0x10/0x10 ? trace_hardirqs_on+0x2c/0xe0 ? __sys_sendmsg+0x63/0xa0 __sys_sendmsg+0x63/0xa0 do_syscall_64+0x6c/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29macvlan: fix null dereference in macvlan_device_event()Taehee Yoo
[ Upstream commit 4dee15b4fd0d61ec6bbd179238191e959d34cf7a ] In the macvlan_device_event(), the list_first_entry_or_null() is used. This function could return null pointer if there is no node. But, the macvlan module doesn't check the null pointer. So, null-ptr-deref would occur. bond0 | +----+-----+ | | macvlan0 macvlan1 | | dummy0 dummy1 The problem scenario. If dummy1 is removed, 1. ->dellink() of dummy1 is called. 2. NETDEV_UNREGISTER of dummy1 notification is sent to macvlan module. 3. ->dellink() of macvlan1 is called. 4. NETDEV_UNREGISTER of macvlan1 notification is sent to bond module. 5. __bond_release_one() is called and it internally calls dev_set_mac_address(). 6. dev_set_mac_address() calls the ->ndo_set_mac_address() of macvlan1, which is macvlan_set_mac_address(). 7. macvlan_set_mac_address() calls the dev_set_mac_address() with dummy1. 8. NETDEV_CHANGEADDR of dummy1 is sent to macvlan module. 9. In the macvlan_device_event(), it calls list_first_entry_or_null(). At this point, dummy1 and macvlan1 were removed. So, list_first_entry_or_null() will return NULL. Test commands: ip netns add nst ip netns exec nst ip link add bond0 type bond for i in {0..10} do ip netns exec nst ip link add dummy$i type dummy ip netns exec nst ip link add macvlan$i link dummy$i \ type macvlan mode passthru ip netns exec nst ip link set macvlan$i master bond0 done ip netns del nst Splat looks like: [ 40.585687][ T146] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEI [ 40.587249][ T146] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 40.588342][ T146] CPU: 1 PID: 146 Comm: kworker/u8:2 Not tainted 5.7.0-rc1+ #532 [ 40.589299][ T146] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 40.590469][ T146] Workqueue: netns cleanup_net [ 40.591045][ T146] RIP: 0010:macvlan_device_event+0x4e2/0x900 [macvlan] [ 40.591905][ T146] Code: 00 00 00 00 00 fc ff df 80 3c 06 00 0f 85 45 02 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff d2 [ 40.594126][ T146] RSP: 0018:ffff88806116f4a0 EFLAGS: 00010246 [ 40.594783][ T146] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 40.595653][ T146] RDX: 0000000000000000 RSI: ffff88806547ddd8 RDI: ffff8880540f1360 [ 40.596495][ T146] RBP: ffff88804011a808 R08: fffffbfff4fb8421 R09: fffffbfff4fb8421 [ 40.597377][ T146] R10: ffffffffa7dc2107 R11: 0000000000000000 R12: 0000000000000008 [ 40.598186][ T146] R13: ffff88804011a000 R14: ffff8880540f1000 R15: 1ffff1100c22de9a [ 40.599012][ T146] FS: 0000000000000000(0000) GS:ffff888067800000(0000) knlGS:0000000000000000 [ 40.600004][ T146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.600665][ T146] CR2: 00005572d3a807b8 CR3: 000000005fcf4003 CR4: 00000000000606e0 [ 40.601485][ T146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 40.602461][ T146] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 40.603443][ T146] Call Trace: [ 40.603871][ T146] ? nf_tables_dump_setelem+0xa0/0xa0 [nf_tables] [ 40.604587][ T146] ? macvlan_uninit+0x100/0x100 [macvlan] [ 40.605212][ T146] ? __module_text_address+0x13/0x140 [ 40.605842][ T146] notifier_call_chain+0x90/0x160 [ 40.606477][ T146] dev_set_mac_address+0x28e/0x3f0 [ 40.607117][ T146] ? netdev_notify_peers+0xc0/0xc0 [ 40.607762][ T146] ? __module_text_address+0x13/0x140 [ 40.608440][ T146] ? notifier_call_chain+0x90/0x160 [ 40.609097][ T146] ? dev_set_mac_address+0x1f0/0x3f0 [ 40.609758][ T146] dev_set_mac_address+0x1f0/0x3f0 [ 40.610402][ T146] ? __local_bh_enable_ip+0xe9/0x1b0 [ 40.611071][ T146] ? bond_hw_addr_flush+0x77/0x100 [bonding] [ 40.611823][ T146] ? netdev_notify_peers+0xc0/0xc0 [ 40.612461][ T146] ? bond_hw_addr_flush+0x77/0x100 [bonding] [ 40.613213][ T146] ? bond_hw_addr_flush+0x77/0x100 [bonding] [ 40.613963][ T146] ? __local_bh_enable_ip+0xe9/0x1b0 [ 40.614631][ T146] ? bond_time_in_interval.isra.31+0x90/0x90 [bonding] [ 40.615484][ T146] ? __bond_release_one+0x9f0/0x12c0 [bonding] [ 40.616230][ T146] __bond_release_one+0x9f0/0x12c0 [bonding] [ 40.616949][ T146] ? bond_enslave+0x47c0/0x47c0 [bonding] [ 40.617642][ T146] ? lock_downgrade+0x730/0x730 [ 40.618218][ T146] ? check_flags.part.42+0x450/0x450 [ 40.618850][ T146] ? __mutex_unlock_slowpath+0xd0/0x670 [ 40.619519][ T146] ? trace_hardirqs_on+0x30/0x180 [ 40.620117][ T146] ? wait_for_completion+0x250/0x250 [ 40.620754][ T146] bond_netdev_event+0x822/0x970 [bonding] [ 40.621460][ T146] ? __module_text_address+0x13/0x140 [ 40.622097][ T146] notifier_call_chain+0x90/0x160 [ 40.622806][ T146] rollback_registered_many+0x660/0xcf0 [ 40.623522][ T146] ? netif_set_real_num_tx_queues+0x780/0x780 [ 40.624290][ T146] ? notifier_call_chain+0x90/0x160 [ 40.624957][ T146] ? netdev_upper_dev_unlink+0x114/0x180 [ 40.625686][ T146] ? __netdev_adjacent_dev_unlink_neighbour+0x30/0x30 [ 40.626421][ T146] ? mutex_is_locked+0x13/0x50 [ 40.627016][ T146] ? unregister_netdevice_queue+0xf2/0x240 [ 40.627663][ T146] unregister_netdevice_many.part.134+0x13/0x1b0 [ 40.628362][ T146] default_device_exit_batch+0x2d9/0x390 [ 40.628987][ T146] ? unregister_netdevice_many+0x40/0x40 [ 40.629615][ T146] ? dev_change_net_namespace+0xcb0/0xcb0 [ 40.630279][ T146] ? prepare_to_wait_exclusive+0x2e0/0x2e0 [ 40.630943][ T146] ? ops_exit_list.isra.9+0x97/0x140 [ 40.631554][ T146] cleanup_net+0x441/0x890 [ ... ] Fixes: e289fd28176b ("macvlan: fix the problem when mac address changes for passthru mode") Reported-by: syzbot+5035b1f9dc7ea4558d5a@syzkaller.appspotmail.com Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18macvlan: add cond_resched() during multicast processingMahesh Bandewar
[ Upstream commit ce9a4186f9ac475c415ffd20348176a4ea366670 ] The Rx bound multicast packets are deferred to a workqueue and macvlan can also suffer from the same attack that was discovered by Syzbot for IPvlan. This solution is not as effective as in IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing to a workqueue while macvlan does this way only for the Rx. This fix should address the Rx codition to certain extent. Tx is still suseptible. Tx multicast processing happens when .ndo_start_xmit is called, hence we cannot add cond_resched(). However, it's not that severe since the user which is generating / flooding will be affected the most. Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()Eric Dumazet
[ Upstream commit 1712b2fff8c682d145c7889d2290696647d82dab ] I missed the fact that macvlan_broadcast() can be used both in RX and TX. skb_eth_hdr() makes only sense in TX paths, so we can not use it blindly in macvlan_broadcast() Fixes: 96cc4b69581d ("macvlan: do not assume mac_header is set in macvlan_broadcast()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jurgen Van Ham <juvanham@gmail.com> Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12macvlan: do not assume mac_header is set in macvlan_broadcast()Eric Dumazet
[ Upstream commit 96cc4b69581db68efc9749ef32e9cf8e0160c509 ] Use of eth_hdr() in tx path is error prone. Many drivers call skb_reset_mac_header() before using it, but others do not. Commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") attempted to fix this generically, but commit d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") brought back the macvlan bug. Lets add a new helper, so that tx paths no longer have to call skb_reset_mac_header() only to get a pointer to skb->data. Hopefully we will be able to revert 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") and save few cycles in transmit fast path. BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline] BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579 CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145 __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] mc_hash drivers/net/macvlan.c:251 [inline] macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 macvlan_queue_xmit drivers/net/macvlan.c:520 [inline] macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559 __netdev_start_xmit include/linux/netdevice.h:4447 [inline] netdev_start_xmit include/linux/netdevice.h:4461 [inline] dev_direct_xmit+0x419/0x630 net/core/dev.c:4079 packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240 packet_snd net/packet/af_packet.c:2966 [inline] packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 __sys_sendto+0x262/0x380 net/socket.c:1985 __do_sys_sendto net/socket.c:1997 [inline] __se_sys_sendto net/socket.c:1993 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x442639 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639 RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x163/0x770 mm/slab.c:3665 kmalloc include/linux/slab.h:561 [inline] tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a4932000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1025 bytes inside of 4096-byte region [ffff8880a4932000, ffff8880a4933000) The buggy address belongs to the page: page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000 raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: b863ceb7ddce ("[NET]: Add macvlan driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05macvlan: schedule bc_work even if errorMenglong Dong
[ Upstream commit 1d7ea55668878bb350979c377fc72509dd6f5b21 ] While enqueueing a broadcast skb to port->bc_queue, schedule_work() is called to add port->bc_work, which processes the skbs in bc_queue, to "events" work queue. If port->bc_queue is full, the skb will be discarded and schedule_work(&port->bc_work) won't be called. However, if port->bc_queue is full and port->bc_work is not running or pending, port->bc_queue will keep full and schedule_work() won't be called any more, and all broadcast skbs to macvlan will be discarded. This case can happen: macvlan_process_broadcast() is the pending function of port->bc_work, it moves all the skbs in port->bc_queue to the queue "list", and processes the skbs in "list". During this, new skbs will keep being added to port->bc_queue in macvlan_broadcast_enqueue(), and port->bc_queue may already full when macvlan_process_broadcast() return. This may happen, especially when there are a lot of real-time threads and the process is preempted. Fix this by calling schedule_work(&port->bc_work) even if port->bc_work is full in macvlan_broadcast_enqueue(). Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-21macvlan: return correct error valueMatteo Croce
[ Upstream commit 59f997b088d26a774958cb7b17b0763cd82de7ec ] A MAC address must be unique among all the macvlan devices with the same lower device. The only exception is the passthru [sic] mode, which shares the lower device address. When duplicate addresses are detected, EBUSY is returned when bringing the interface up: # ip link add macvlan0 link eth0 type macvlan # read addr </sys/class/net/eth0/address # ip link set macvlan0 address $addr # ip link set macvlan0 up RTNETLINK answers: Device or resource busy Use correct error code which is EADDRINUSE, and do the check also earlier, on address change: # ip link set macvlan0 address $addr RTNETLINK answers: Address already in use Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-07-11macvlan: Change status when lower device goes downTravis Brown
Today macvlan ignores the notification when a lower device goes administratively down, preventing the lack of connectivity from bubbling up. Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN with NO-CARRIER which should be easy to interpret in userspace. 2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 Signed-off-by: Suresh Krishnan <skrishnan@arista.com> Signed-off-by: Travis Brown <travisb@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09net: Add support for subordinate traffic classes to netdev_pick_txAlexander Duyck
This change makes it so that we can support the concept of subordinate device traffic classes to the core networking code. In doing this we can start pulling out the driver specific bits needed to support selecting a queue based on an upper device. The solution at is currently stands is only partially implemented. I have the start of some XPS bits in here, but I would still need to allow for configuration of the XPS maps on the queues reserved for the subordinate devices. For now I am using the reference to the sb_dev XPS map as just a way to skip the lookup of the lower device XPS map for now as that would result in the wrong queue being picked. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25macvlan: Use software path for offloaded local, broadcast, and multicast trafficAlexander Duyck
This change makes it so that we use a software path for packets that are going to be locally switched between two macvlan interfaces on the same device. In addition we resort to software replication of broadcast and multicast packets instead of offloading that to hardware. The general idea is that using the device for east/west traffic local to the system is extremely inefficient. We can only support up to whatever the PCIe limit is for any given device so this caps us at somewhere around 20G for devices supported by ixgbe. This is compounded even further when you take broadcast and multicast into account as a single 10G port can come to a crawl as a packet is replicated up to 60+ times in some cases. In order to get away from that I am implementing changes so that we handle broadcast/multicast replication and east/west local traffic all in software. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25macvlan: Rename fwd_priv to accel_priv and add accessor functionAlexander Duyck
This change renames the fwd_priv member to accel_priv as this more accurately reflects the actual purpose of this value. In addition I am adding an accessor which will allow us to further abstract this in the future if needed. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-11macvlan: filter out unsupported feature flagsShannon Nelson
Adding a macvlan device on top of a lowerdev that supports the xfrm offloads fails with a new regression: # ip link add link ens1f0 mv0 type macvlan RTNETLINK answers: Operation not permitted Tracing down the failure shows that the macvlan device inherits the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags from the lowerdev, but with no dev->xfrmdev_ops API filled in, it doesn't actually support xfrm. When the request is made to add the new macvlan device, the XFRM listener for NETDEV_REGISTER calls xfrm_api_check() which fails the new registration because dev->xfrmdev_ops is NULL. The macvlan creation succeeds when we filter out the ESP feature flags in macvlan_fix_features(), so let's filter them out like we're already filtering out ~NETIF_F_NETNS_LOCAL. When XFRM support is added in the future, we can add the flags into MACVLAN_FEATURES. This same problem could crop up in the future with any other new feature flags, so let's filter out any flags that aren't defined as supported in macvlan. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-23macvlan: fix use-after-free in macvlan_common_newlink()Alexey Kodanev
The following use-after-free was reported by KASan when running LTP macvtap01 test on 4.16-rc2: [10642.528443] BUG: KASAN: use-after-free in macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450 ... [10642.963873] Call Trace: [10642.994352] dump_stack+0x5c/0x7c [10643.035325] print_address_description+0x75/0x290 [10643.092938] kasan_report+0x28d/0x390 [10643.137971] ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.207963] macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.275978] macvtap_newlink+0x171/0x260 [macvtap] [10643.334532] rtnl_newlink+0xd4f/0x1300 ... [10646.256176] Allocated by task 18450: [10646.299964] kasan_kmalloc+0xa6/0xd0 [10646.343746] kmem_cache_alloc_trace+0xf1/0x210 [10646.397826] macvlan_common_newlink+0x6de/0x14a0 [macvlan] [10646.464386] macvtap_newlink+0x171/0x260 [macvtap] [10646.522728] rtnl_newlink+0xd4f/0x1300 ... [10647.022028] Freed by task 18450: [10647.061549] __kasan_slab_free+0x138/0x180 [10647.111468] kfree+0x9e/0x1c0 [10647.147869] macvlan_port_destroy+0x3db/0x650 [macvlan] [10647.211411] rollback_registered_many+0x5b9/0xb10 [10647.268715] rollback_registered+0xd9/0x190 [10647.319675] register_netdevice+0x8eb/0xc70 [10647.370635] macvlan_common_newlink+0xe58/0x14a0 [macvlan] [10647.437195] macvtap_newlink+0x171/0x260 [macvtap] Commit d02fd6e7d293 ("macvlan: Fix one possible double free") handles the case when register_netdevice() invokes ndo_uninit() on error and as a result free the port. But 'macvlan_port_get_rtnl(dev))' check (returns dev->rx_handler_data), which was added by this commit in order to prevent double free, is not quite correct: * for macvlan it always returns NULL because 'lowerdev' is the one that was used to register rx handler (port) in macvlan_port_create() as well as to unregister it in macvlan_port_destroy(). * for macvtap it always returns a valid pointer because macvtap registers its own rx handler before macvlan_common_newlink(). Fixes: d02fd6e7d293 ("macvlan: Fix one possible double free") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02macvlan: Fix one possible double freeGao Feng
Because the macvlan_uninit would free the macvlan port, so there is one double free case in macvlan_common_newlink. When the macvlan port is just created, then register_netdevice or netdev_upper_dev_link failed and they would invoke macvlan_uninit. Then it would reach the macvlan_port_destroy which triggers the double free. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19macvlan/macvtap: Add support for L2 forwarding offloads with macvtapAlexander Duyck
This patch reverts earlier commit b13ba1b83f52 ("macvlan: forbid L2 fowarding offload for macvtap"). The reason for reverting this is because the original patch no longer fixes what it previously did as the underlying structure has changed for macvtap. Specifically macvtap originally pulled packets directly off of the lowerdev. However in commit 6acf54f1cf0a ("macvtap: Add support of packet capture on macvtap device.") that code was changed and instead macvtap would listen directly on the macvtap device itself instead of the lower device. As such, the L2 forwarding offload should now be able to provide a performance advantage of skipping the checks on the lower dev while not introducing any sort of regression. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14macvlan: Only update pkt_type if destination MAC address matchesAlexander Duyck
This patch updates the pkt_type to PACKET_HOST only if the destination MAC address matches on the on the source based macvlan. It didn't make sense to be updating broadcast, multicast, and non-local destined frames with PACKET_HOST. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14macvlan: Only deliver one copy of the frame to the macvlan interfaceAlexander Duyck
This patch intoduces a slight adjustment for macvlan to address the fact that in source mode I was seeing two copies of any packet addressed to the macvlan interface being delivered where there should have been only one. The issue appears to be that one copy was delivered based on the source MAC address and then the second copy was being delivered based on the destination MAC address. To fix it I am just treating a unicast address match as though it is not a match since source based macvlan isn't supposed to be matching based on the destination MAC anyway. Fixes: 79cf79abce71 ("macvlan: add source mode") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04net: Add extack to upper device linkingDavid Ahern
Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20macvlan: code refine to check data before usingZhang Shengju
This patch checks data first at one place, return if it's null. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18macvlan: add offload features for encapsulationDimitris Michailidis
Currently macvlan devices do not set their hw_enc_features making encapsulated Tx packets resort to SW fallbacks. Add encapsulation GSO offloads to ->features as is done for the other GSOs and set ->hw_enc_features. Signed-off-by: Dimitris Michailidis <dmichail@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17macvlan/macvtap: Remove NETIF_F_UFO advertisement.David S. Miller
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: Let passthru macvlan correctly restore lower mac addressVlad Yasevich
Passthru macvlans directly change the mac address of the lower level device. That's OK, but after the macvlan is deleted, the lower device is left with changed address and one needs to reboot to bring back the origina HW addresses. This scenario is actually quite common with passthru macvtap devices. This patch attempts to solve this, by storing the mac address of the lower device in macvlan_port structure and keeping track of it through the changes. After this patch, any changes to the lower device mac address done trough the macvlan device, will be reverted back. Any changs done directly to the lower device mac address will be kept. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: convert port passthru to flags.Vlad Yasevich
Convert the port passthru boolean into flags with accesor functions. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: Fix passthru macvlan mac address inheritanceVlad Yasevich
When a lower device of the passthru macvlan changes it's address, passthru macvlan is supposed to change it's own address as well. However, that doesn't happen correctly because the check in macvlan_addr_busy() will catch the fact that the lower level (port) mac address is the same as the address we are trying to assign to the macvlan, and return an error. As a reasult, the address of the passthru macvlan device is never changed. The same thing happens when the user attempts to change the mac address of the passthru macvlan. The simple solution appers to be to not check against the lower device in case of passthru macvlan device, since the 2 addresses are _supposed_ to be the same. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22macvlan: Do not return error when setting the same mac addressVlad Yasevich
The user currently gets an EBUSY error when attempting to set the mac address on a macvlan device to the same value. This should really be a no-op as nothing changes. Catch the condition and return early. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14macvlan: propagate the mac address change status for lowerdevZhang Shengju
The macvlan dev should propagate the return value of mac address change for lower device in the passthru mode, instead of always return 0. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07net: Fix inconsistent teardown and release of private netdev state.David S. Miller
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15macvlan: Fix performance issues with vlan tagged packetsVlad Yasevich
Macvlan always turns on offload features that have sofware fallback (NETIF_GSO_SOFTWARE). This allows much higher guest-guest communications over macvtap. However, macvtap does not turn on these features for vlan tagged traffic. As a result, depending on the HW that mactap is configured on, the performance of guest-guest communication over a vlan is very inconsistent. If the HW supports TSO/UFO over vlans, then the performance will be fine. If not, the the performance will suffer greatly since the VM may continue using TSO/UFO, and will force the host segment the traffic and possibly overlow the macvtap queue. This patch adds the always on offloads to vlan_features. This makes sure that any vlan tagged traffic between 2 guest will not be segmented needlessly. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25macvlan: Fix device ref leak when purging bc_queueHerbert Xu
When a parent macvlan device is destroyed we end up purging its broadcast queue without dropping the device reference count on the packet source device. This causes the source device to linger. This patch drops that reference count. Fixes: 260916dfb48c ("macvlan: Fix potential use-after free for...") Reported-by: Joe Ghalam <Joe.Ghalam@dell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11tap: Abstract type of virtual interface from tap implementationSainath Grandhi
macvlan object is re-structured to hold tap related elements in a separate entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with idr and fetched again on tap_open. Few of the tap functions are modified to accepted tap_dev as argument. tap_dev object includes callbacks to be used by underlying virtual interface to take care of tx and rx accounting. Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20macvlan: use netdev_is_rx_handler_busy instead of checking specific typeMahesh Bandewar
netdev_is_rx_handler_busy() check is a superset of netif_is_ipvlan_port() check and hence should be preferred. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07driver: macvlan: Remove the rcu member of macvlan_portGao Feng
When free macvlan_port in macvlan_port_destroy, it is safe to free directly because netdev_rx_handler_unregister could enforce one grace period. So it is unnecessary to use kfree_rcu for macvlan_port. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23driver: macvlan: Check if need rollback multicast setting in macvlan_openGao Feng
When dev_set_promiscuity failed in macvlan_open, it always invokes dev_set_allmulti without checking if necessary. Now check the IFF_ALLMULTI flag firstly before rollback the multicast setting in the error handler. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21driver: macvlan: Remove duplicated IFF_UP condition check in ↵Gao Feng
macvlan_forward_source The function macvlan_forward_source_one has already checked the flag IFF_UP, so needn't check it outside in macvlan_forward_source too. Signed-off-by: Gao Feng <gfree.wind@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-14driver: macvlan: Replace integer number with bool valueGao Feng
The return value of function macvlan_addr_busy is used as bool value, so use bool value instead of integer number "1" and "0". Signed-off-by: Gao Feng <gfree.wind@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.Gao Feng
When there is no existing macvlan port in lowdev, one new macvlan port would be created. But it doesn't be destoried when something failed later. It casues some memleak. Now add one flag to indicate if new macvlan port is created. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in core net infraJarod Wilson
geneve: - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu - This one isn't quite as straight-forward as others, could use some closer inspection and testing macvlan: - set min/max_mtu tun: - set min/max_mtu, remove tun_net_change_mtu vxlan: - Merge __vxlan_change_mtu back into vxlan_change_mtu - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function - This one is also not as straight-forward and could use closer inspection and testing from vxlan folks bridge: - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported sch_teql: - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535) macsec: - min_mtu = 0, max_mtu = 65535 macvlan: - min_mtu = 0, max_mtu = 65535 ntb_netdev: - min_mtu = 0, max_mtu = 65535 veth: - min_mtu = 68, max_mtu = 65535 8021q: - min_mtu = 0, max_mtu = 65535 CC: netdev@vger.kernel.org CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Tom Herbert <tom@herbertland.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Paolo Abeni <pabeni@redhat.com> CC: Jiri Benc <jbenc@redhat.com> CC: WANG Cong <xiyou.wangcong@gmail.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Pravin B Shelar <pshelar@ovn.org> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Pravin Shelar <pshelar@nicira.com> CC: Maxim Krasnyansky <maxk@qti.qualcomm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13net: remove type_check from dev_get_nest_level()Sabrina Dubroca
The idea for type_check in dev_get_nest_level() was to count the number of nested devices of the same type (currently, only macvlan or vlan devices). This prevented the false positive lockdep warning on configurations such as: eth0 <--- macvlan0 <--- vlan0 <--- macvlan1 However, this doesn't prevent a warning on a configuration such as: eth0 <--- macvlan0 <--- vlan0 eth1 <--- vlan1 <--- macvlan1 In this case, all the locks end up with a nesting subclass of 1, so lockdep thinks that there is still a deadlock: - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then take (vlan_netdev_xmit_lock_key, 1) - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then take (macvlan_netdev_addr_lock_key, 1) By removing the linktype check in dev_get_nest_level() and always incrementing the nesting depth, lockdep considers this configuration valid. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net: macvlan: call netdev_lockdep_set_classes()Eric Dumazet
In case a qdisc is used on a macvlan device, we need to use different lockdep classes to avoid false positives. Use the new netdev_lockdep_set_classes() generic helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-01macvlan: Avoid unnecessary multicast cloningHerbert Xu
Currently we always queue a multicast packet for further processing, even if none of the macvlan devices are subscribed to the address. This patch optimises this by adding a global multicast filter for a macvlan_port. Note that this patch doesn't handle the broadcast addresses of the individual macvlan devices correctly, if they are not all identical to vlan->lowerdev. However, this is already broken because there is no mechanism in place to update the individual multicast filters when you change the broadcast address. If someone cares enough they should fix this by collecting all broadcast addresses for a macvlan as we do for multicast and unicast. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-01macvlan: Fix potential use-after free for broadcastsHerbert Xu
When we postpone a broadcast packet we save the source port in the skb if it is local. However, the source port can disappear before we get a chance to process the packet. This patch fixes this by holding a ref count on the netdev. It also delays the skb->cb modification until after we allocate the new skb as you should not modify shared skbs. Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>