summaryrefslogtreecommitdiffstats
path: root/drivers/net/macvtap.c
AgeCommit message (Collapse)Author
2014-11-21macvtap: Fix csum_start when VLAN tags are presentHerbert Xu
commit 3ce9b20f1971690b8b3b620e735ec99431573b39 upstream. When VLAN is in use in macvtap_put_user, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN tags are detected. Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21Revert "drivers/net: Disable UFO through virtio"Greg Kroah-Hartman
This reverts commit 2b52d6c6beda6308ba95024a1eba1dfc9515ba32 which was commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 upstream. Ben writes: Please drop this patch for 3.14 and 3.17. It causes problems for migration of VMs and we're probably going to revert part of this. The following patch ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets") might no longer apply, in which case you can drop that as well until we have this sorted out upstream. Cc: Ben Hutchings <ben@decadent.org.uk> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packetsBen Hutchings
[ Upstream commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7 ] UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14drivers/net: Disable UFO through virtioBen Hutchings
[ Upstream commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 ] IPv6 does not allow fragmentation by routers, so there is no fragmentation ID in the fixed header. UFO for IPv6 requires the ID to be passed separately, but there is no provision for this in the virtio net protocol. Until recently our software implementation of UFO/IPv6 generated a new ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet passed through a tap, which is even worse. Unfortunately there is no distinction between UFO/IPv4 and v6 features, so disable UFO on taps and virtio_net completely until we have a proper solution. We cannot depend on VM managers respecting the tap feature flags, so keep accepting UFO packets but log a warning the first time we do this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15macvtap: Fix race between device delete and open.Vlad Yasevich
[ Upstream commit 40b8fe45d1f094e3babe7b2dc2b71557ab71401d ] In macvtap device delete and open calls can race and this causes a list curruption of the vlan queue_list. The race intself is triggered by the idr accessors that located the vlan device. The device is stored into and removed from the idr under both an rtnl and a mutex. However, when attempting to locate the device in idr, only a mutex is taken. As a result, once cpu perfoming a delete may take an rtnl and wait for the mutex, while another cput doing an open() will take the idr mutex first to fetch the device pointer and later take an rtnl to add a queue for the device which may have just gotten deleted. With this patch, we now hold the rtnl for the duration of the macvtap_open() call thus making sure that open will not race with delete. CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-31mactap: Fix checksum errors for non-gso packets in bridge modeVlad Yasevich
[ Upstream commit cbdb04279ccaefcc702c8757757eea8ed76e50cf ] The following is a problematic configuration: VM1: virtio-net device connected to macvtap0@eth0 VM2: e1000 device connect to macvtap1@eth0 The problem is is that virtio-net supports checksum offloading and thus sends the packets to the host with CHECKSUM_PARTIAL set. On the other hand, e1000 does not support any acceleration. For small TCP packets (and this includes the 3-way handshake), e1000 ends up receiving packets that only have a partial checksum set. This causes TCP to fail checksum validation and to drop packets. As a result tcp connections can not be established. Commit 3e4f8b787370978733ca6cae452720a4f0c296b8 macvtap: Perform GSO on forwarding path. fixes this issue for large packets wthat will end up undergoing GSO. This commit adds a check for the non-GSO case and attempts to compute the checksum for partially checksummed packets in the non-GSO case. CC: Daniel Lezcano <daniel.lezcano@free.fr> CC: Patrick McHardy <kaber@trash.net> CC: Andrian Nord <nightnord@gmail.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-16drivers/net: delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. This covers everything under drivers/net except for wireless, which has been submitted separately. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: Change skb_get_rxhash to skb_get_hashTom Herbert
Changing name of function as part of making the hash in skbuff to be generic property, not just for receive path. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12macvlan: Remove custom recieve and forward handlersVlad Yasevich
Since now macvlan and macvtap use the same receive and forward handlers, we can remove them completely and use netif_rx and dev_forward_skb() directly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12macvtap: Add support of packet capture on macvtap device.Vlad Yasevich
Macvtap device currently doesn not allow a user to capture traffic on due to the fact that it steals the packets from the network stack before the skb->dev is set correctly on the receive side, and that use uses macvlan transmit path directly on the send side. As a result, we never get a change to give traffic to the taps while the correct device is set in the skb. This patch makes macvtap device behave almost exaclty like macvlan. On the send side, we switch to using dev_queue_xmit(). On the receive side, to deliver packets to macvtap, we now use netif_rx and dev_forward_skb just like macvlan. The only differnce now is that macvtap has its own rx_handler which is attached to the macvtap netdev. It is here that we now steal the packet and provide it to the socket. As a result, we can now capture traffic on the macvtap device: tcpdump -i macvtap0 It also gives us the abilit to add tc actions to the macvtap device and actually utilize different bandwidth management queues on output. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11macvtap: signal truncated packetsJason Wang
macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the trunca checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10net: Revert macvtap/tun truncation signalling changes.David S. Miller
Jason Wang and Michael S. Tsirkin are still discussing how to properly fix this. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10macvtap: signal truncated packetsJason Wang
macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the truncated checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10Revert "macvtap: remove useless codes in macvtap_aio_read() and ↵David S. Miller
macvtap_recvmsg()" This reverts commit 41e4af69a5984a3193ba3108fb4e067b0e34dc73. MSG_TRUNC handling was broken and is going to be fixed in the 'net' tree, so revert this. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()Zhi Yong Wu
By checking related codes, it is impossible that ret > len or total_len, so we should remove some useless coeds in both above functions. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06macvtap: remove unused parameter in macvtap_do_read()Zhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06macvtap: remove the dead branchZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06macvtap: update file current positionZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-29macvtap: Do not double-count received packetsVlad Yasevich
Currently macvlan will count received packets after calling each vlans receive handler. Macvtap attempts to count the packet yet again when the user reads the packet from the tap socket. This code doesn't do this consistently either. Remove the counting from macvtap and let only macvlan count received packets. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28macvtap: fix tx_dropped counting errorJason Wang
After commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless tx path), tx stat counter were converted to percpu stat structure. So we need use to this also for tx_dropped in macvtap. Otherwise, the management won't notice the dropping packet in macvtap tx path. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14macvtap: limit head length of skb allocatedJason Wang
We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20macvtap: Ignore tap features when VNET_HDR is offVlad Yasevich
When the user turns off VNET_HDR support on the macvtap device, there is no way to provide any offload information to the user. So, it's safer to ignore offload setting then depend on the user setting them correctly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.Vlad Yasevich
When the user turns off IFF_VNET_HDR flag, attempts to change offload features via TUNSETOFFLOAD do not work. This could cause GSO packets to be delivered to the user when the user is not prepared to handle them. To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20macvtap: simplify usage of tap_featuresVlad Yasevich
In macvtap, tap_features specific the features of that the user has specified via ioctl(). If we treat macvtap as a macvlan+tap then we could all the tap a pseudo-device and give it other features like SG and GSO. Then we can stop using the features of lower device (macvlan) when forwarding the traffic the tap. This solves the issue of possible checksum offload mismatch between tap feature and macvlan features. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-11macvtap: fix two racesEric Dumazet
Since commit ac4e4af1e59e1 ("macvtap: Consistently use rcu functions"), Thomas gets two different warnings : BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892 caller is macvtap_do_read+0x45c/0x600 [macvtap] CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bcec>] dump_stack+0x74/0xd8 [<0000000000481066>] debug_smp_processor_id+0xf6/0x114 [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap] [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap] [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<00000000006934a6>] kernel_thread_starter+0x6/0xc [<00000000006934a0>] kernel_thread_starter+0x0/0xc And BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898 caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan] CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bdb8>] dump_stack+0x74/0xd4 [<0000000000481132>] debug_smp_processor_id+0xf6/0x114 [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan] [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap] [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap] [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<000000000069356e>] kernel_thread_starter+0x6/0xc [<0000000000693568>] kernel_thread_starter+0x0/0xc 2 locks held by vhost-45897/45898: #0: (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net] #1: (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap] In the first case, macvtap_put_user() calls macvlan_count_rx() in a preempt-able context, and this is not allowed. In the second case, macvtap_get_user() calls macvlan_start_xmit() with BH enabled, and this is not allowed. Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com> Bisected-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10net: attempt high order allocations in sock_alloc_send_pskb()Eric Dumazet
Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: move zerocopy_sg_from_iovec() to net/core/datagram.cJason Wang
To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: move iov_pages() to net/core/iovec.cJason Wang
To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang
We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16macvtap: do not assume 802.1Q when send vlan packetsJason Wang
The hard-coded 8021.q proto will break 802.1ad traffic. So switch to use vlan->proto. Cc: Basil Gor <basil.gor@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16macvtap: fix the missing ret value of TUNSETQUEUEJason Wang
Commit 441ac0fcaadc76ad09771812382345001dd2b813 (macvtap: Convert to using rtnl lock) forget to return what macvtap_ioctl_set_queue() returns to its caller. This may break multiqueue API by always falling through to TUNGETFEATURES. Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10macvtap: correctly linearize skb when zerocopy is usedJason Wang
Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to linearize parts of the skb to let the rest of iov to be fit in the frags, we need count copylen into linear when calling macvtap_alloc_skb() instead of partly counting it into data_len. Since this breaks zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should be zero at beginning. This cause nr_frags to be increased wrongly without setting the correct frags. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Perform GSO on forwarding path.Vlad Yasevich
When macvtap forwards skb to its tap, it needs to check if GSO needs to be performed. This is sometimes necessary when the HW device performed GRO, but the guest reading from the tap does not support it (ex: Windows 7). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Let TUNSETOFFLOAD actually controll offload features.Vlad Yasevich
When the user issues TUNSETOFFLOAD ioctl, macvtap does not do anything other then to verify arguments. This patch adds functionality to allow users to actually control offload features. NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the features can be controlled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Consistently use rcu functionsVlad Yasevich
Currently macvtap uses rcu_bh functions in its user facing fuction macvtap_get_user() and macvtap_put_user(). However, its packet handlers use normal rcu as the rcu_read_lock() is taken in netif_receive_skb(). We can safely discontinue the usage or rcu with bh disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: Convert to using rtnl lockVlad Yasevich
Macvtap uses a private lock to protect the relationship between macvtap_queue and macvlan_dev. The private lock is not needed since the relationship is managed by user via open(), release(), and dellink() calls. dellink() already happens under rtnl, so we can safely convert open() and release(), and use it in ioctl() as well. Suggested by Eric Dumazet. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25macvtap: fix recovery from gup errorsMichael S. Tsirkin
get user pages might fail partially in macvtap zero copy mode. To recover we need to put all pages that we got, but code used a wrong index resulting in double-free errors. Reported-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13macvtap: fix uninitialized return value macvtap_ioctl_set_queue()Jason Wang
Return -EINVAL on illegal flag instead of uninitialized value. This fixes the kbuild test warning. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13macvtap: slient sparse warningsJason Wang
This patch silents the following sparse warnings: drivers/net/macvtap.c:98:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:98:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:98:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:120:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:120:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:120:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:151:22: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:233:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:243:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:247:15: error: incompatible types in comparison expression (different address spaces) CC [M] drivers/net/macvtap.o drivers/net/macvlan.c:232:24: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: enable multiqueue flagJason Wang
To notify the userspace about our capability of multiqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: add TUNSETQUEUE ioctlJason Wang
This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or enable a queue of macvtap. This is used to be compatible at API layer of tuntap to simplify the userspace to manage the queues. This is done through introducing a linked list to track all taps while using vlan->taps array to only track active taps. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: eliminate linear searchJason Wang
Linear search were used in both get_slot() and macvtap_get_queue(), this is because: - macvtap didn't reshuffle the array of taps when create or destroy a queue, so when adding a new queue, macvtap must do linear search to find a location for the new queue. This will also complicate the TUNSETQUEUE implementation for multiqueue API. - the queue itself didn't track the queue index, so the we must do a linear search in the array to find the location of a existed queue. The solution is straightforward: reshuffle the array and introduce a queue_index to macvtap_queue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: introduce macvtap_get_vlan()Jason Wang
Factor out the device holding logic to a macvtap_get_vlan(), this will be also used by multiqueue API. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: do not add self to waitqueue if doing a nonblock readJason Wang
There's no need to add self to waitqueue if doing a nonblock read. This could help to avoid the spinlock contention. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-07macvtap: fix a possible race between queue selection and changing queuesJason Wang
Complier may generate codes that re-read the vlan->numvtaps during macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the same time and which can lead unexpected result (e.g. very huge value). We need prevent the compiler from generating such codes by adding an ACCESS_ONCE() to make sure vlan->numvtaps were only read once. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>