aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/hyperv
AgeCommit message (Collapse)Author
2020-09-23hv_netvsc: Remove "unlikely" from netvsc_select_queueHaiyang Zhang
commit 4d820543c54c47a2bd3c95ddbf52f83c89a219a0 upstream. When using vf_ops->ndo_select_queue, the number of queues of VF is usually bigger than the synthetic NIC. This condition may happen often. Remove "unlikely" from the comparison of ndev->real_num_tx_queues. Fixes: b3bf5666a510 ("hv_netvsc: defer queue selection to VF") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-26hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()Haiyang Zhang
[ Upstream commit c3d897e01aef8ddc43149e4d661b86f823e3aae7 ] netvsc_vf_xmit() / dev_queue_xmit() will call VF NIC’s ndo_select_queue or netdev_pick_tx() again. They will use skb_get_rx_queue() to get the queue number, so the “skb->queue_mapping - 1” will be used. This may cause the last queue of VF not been used. Use skb_record_rx_queue() here, so that the skb_get_rx_queue() called later will get the correct queue number, and VF will be able to use all queues. Fixes: b3bf5666a510 ("hv_netvsc: defer queue selection to VF") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21hv_netvsc: do not use VF device if link is downStephen Hemminger
[ Upstream commit 7c9864bbccc23e1812ac82966555d68c13ea4006 ] If the accelerated networking SRIOV VF device has lost carrier use the synthetic network device which is available as backup path. This is a rare case since if VF link goes down, normally the VMBus device will also loose external connectivity as well. But if the communication is between two VM's on the same host the VMBus device will still work. Reported-by: "Shah, Ashish N" <ashish.n.shah@intel.com> Fixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-27hv_netvsc: flag software created hash valueStephen Hemminger
[ Upstream commit df9f540ca74297a84bafacfa197e9347b20beea5 ] When the driver needs to create a hash value because it was not done at higher level, then the hash should be marked as a software not hardware hash. Fixes: f72860afa2e3 ("hv_netvsc: Exclude non-TCP port numbers from vRSS hashing") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27netvsc: unshare skb in VF rx handlerStephen Hemminger
[ Upstream commit 996ed04741467f6d1552440c92988b132a9487ec ] The netvsc VF skb handler should make sure that skb is not shared. Similar logic already exists in bonding and team device drivers. This is not an issue in practice because the VF devicex does not send up shared skb's. But the netvsc driver should do the right thing if it did. Fixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23hv_netvsc: Fix memory leak when removing rndis deviceMohammed Gamal
[ Upstream commit 536dc5df2808efbefc5acee334d3c4f701790ec0 ] kmemleak detects the following memory leak when hot removing a network device: unreferenced object 0xffff888083f63600 (size 256): comm "kworker/0:1", pid 12, jiffies 4294831717 (age 1113.676s) hex dump (first 32 bytes): 00 40 c7 33 80 88 ff ff 00 00 00 00 10 00 00 00 .@.3............ 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... backtrace: [<00000000d4a8f5be>] rndis_filter_device_add+0x117/0x11c0 [hv_netvsc] [<000000009c02d75b>] netvsc_probe+0x5e7/0xbf0 [hv_netvsc] [<00000000ddafce23>] vmbus_probe+0x74/0x170 [hv_vmbus] [<00000000046e64f1>] really_probe+0x22f/0xb50 [<000000005cc35eb7>] driver_probe_device+0x25e/0x370 [<0000000043c642b2>] bus_for_each_drv+0x11f/0x1b0 [<000000005e3d09f0>] __device_attach+0x1c6/0x2f0 [<00000000a72c362f>] bus_probe_device+0x1a6/0x260 [<0000000008478399>] device_add+0x10a3/0x18e0 [<00000000cf07b48c>] vmbus_device_register+0xe7/0x1e0 [hv_vmbus] [<00000000d46cf032>] vmbus_add_channel_work+0x8ab/0x1770 [hv_vmbus] [<000000002c94bb64>] process_one_work+0x919/0x17d0 [<0000000096de6781>] worker_thread+0x87/0xb40 [<00000000fbe7397e>] kthread+0x333/0x3f0 [<000000004f844269>] ret_from_fork+0x3a/0x50 rndis_filter_device_add() allocates an instance of struct rndis_device which never gets deallocated as rndis_filter_device_remove() sets net_device->extension which points to the rndis_device struct to NULL, leaving the rndis_device dangling. Since net_device->extension is eventually freed in free_netvsc_device(), we refrain from setting it to NULL inside rndis_filter_device_remove() Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12hv_netvsc: Fix unwanted rx_table resetHaiyang Zhang
[ Upstream commit b0689faa8efc5a3391402d7ae93bd373b7248e51 ] In existing code, the receive indirection table, rx_table, is in struct rndis_device, which will be reset when changing MTU, ringparam, etc. User configured receive indirection table values will be lost. To fix this, move rx_table to struct net_device_context, and check netif_is_rxfh_configured(), so rx_table will be set to default only if no user configured value. Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12hv_netvsc: Fix error handling in netvsc_attach()Haiyang Zhang
[ Upstream commit 719b85c336ed35565d0f3982269d6f684087bb00 ] If rndis_filter_open() fails, we need to remove the rndis device created in earlier steps, before returning an error code. Otherwise, the retry of netvsc_attach() from its callers will fail and hang. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-10hv_netvsc: Fix a warning of suspicious RCU usageDexuan Cui
[ Upstream commit 6d0d779dca73cd5acb649c54f81401f93098b298 ] This fixes a warning of "suspicious rcu_dereference_check() usage" when nload runs. Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-17hv_netvsc: Fix unwanted wakeup after tx_disableHaiyang Zhang
[ Upstream commit 1b704c4a1ba95574832e730f23817b651db2aa59 ] After queue stopped, the wakeup mechanism may wake it up again when ring buffer usage is lower than a threshold. This may cause send path panic on NULL pointer when we stopped all tx queues in netvsc_detach and start removing the netvsc device. This patch fix it by adding a tx_disable flag to prevent unwanted queue wakeup. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Reported-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13hv_netvsc: Fix IP header checksum for coalesced packetsHaiyang Zhang
[ Upstream commit bf48648d650db1146b75b9bd358502431e86cf4f ] Incoming packets may have IP header checksum verified by the host. They may not have IP header checksum computed after coalescing. This patch re-compute the checksum when necessary, otherwise the packets may be dropped, because Linux network stack always checks it. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05hv_netvsc: Fix ethtool change hash key errorHaiyang Zhang
[ Upstream commit b4a10c750424e01b5e37372fef0a574ebf7b56c3 ] Hyper-V hosts require us to disable RSS before changing RSS key, otherwise the changing request will fail. This patch fixes the coding error. Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table") Reported-by: Wei Hu <weh@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [sl: fix up subject line] Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-18hv_netvsc: fix schedule in RCU contextStephen Hemminger
[ Upstream commit 018349d70f28a78d5343b3660cb66e1667005f8a ] When netvsc device is removed it can call reschedule in RCU context. This happens because canceling the subchannel setup work could (in theory) cause a reschedule when manipulating the timer. To reproduce, run with lockdep enabled kernel and unbind a network device from hv_netvsc (via sysfs). [ 160.682011] WARNING: suspicious RCU usage [ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted [ 160.709937] ----------------------------- [ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section! [ 160.723691] [ 160.723691] other info that might help us debug this: [ 160.723691] [ 160.730955] [ 160.730955] rcu_scheduler_active = 2, debug_locks = 1 [ 160.762813] 5 locks held by rebind-eth.sh/1812: [ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0 [ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0 [ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0 [ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250 [ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.828629] [ 160.828629] stack backtrace: [ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2 [ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 160.832952] Call Trace: [ 160.832952] dump_stack+0x85/0xcb [ 160.832952] ___might_sleep+0x1a3/0x240 [ 160.832952] __flush_work+0x57/0x2e0 [ 160.832952] ? __mutex_lock+0x83/0x990 [ 160.832952] ? __kernfs_remove+0x24f/0x2e0 [ 160.832952] ? __kernfs_remove+0x1b2/0x2e0 [ 160.832952] ? mark_held_locks+0x50/0x80 [ 160.832952] ? get_work_pool+0x90/0x90 [ 160.832952] __cancel_work_timer+0x13c/0x1e0 [ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc] [ 160.832952] ? __lock_is_held+0x55/0x90 [ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc] [ 160.832952] vmbus_remove+0x26/0x30 [ 160.832952] device_release_driver_internal+0x18a/0x250 [ 160.832952] unbind_store+0xb4/0x180 [ 160.832952] kernfs_fop_write+0x113/0x1a0 [ 160.832952] __vfs_write+0x36/0x1a0 [ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80 [ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 160.832952] ? __sb_start_write+0x141/0x1a0 [ 160.832952] ? vfs_write+0x184/0x1b0 [ 160.832952] vfs_write+0xbe/0x1b0 [ 160.832952] ksys_write+0x55/0xc0 [ 160.832952] do_syscall_64+0x60/0x1b0 [ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 160.832952] RIP: 0033:0x7fe48f4c8154 Resolve this by getting RTNL earlier. This is safe because the subchannel work queue does trylock on RTNL and will detect the race. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26hv/netvsc: Fix NULL dereference at single queue mode fallbackTakashi Iwai
commit b19b46346f483ae055fa027cb2d5c2ca91484b91 upstream. The recent commit 916c5e1413be ("hv/netvsc: fix handling of fallback to single queue mode") tried to fix the fallback behavior to a single queue mode, but it changed the function to return zero incorrectly, while the function should return an object pointer. Eventually this leads to a NULL dereference at the callers that expect non-NULL value. Fix it by returning the proper net_device object. Fixes: 916c5e1413be ("hv/netvsc: fix handling of fallback to single queue mode") Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Alakesh Haloi <alakeshh@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()Dexuan Cui
[ Upstream commit e04e7a7bbd4bbabef4e1a58367e5fc9b2edc3b10 ] This patch fixes the race between netvsc_probe() and rndis_set_subchannel(), which can cause a deadlock. These are the related 3 paths which show the deadlock: path #1: Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus] Call Trace: schedule schedule_preempt_disabled __mutex_lock __device_attach bus_probe_device device_add vmbus_device_register vmbus_onoffer vmbus_onmessage_work process_one_work worker_thread kthread ret_from_fork path #2: schedule schedule_preempt_disabled __mutex_lock netvsc_probe vmbus_probe really_probe __driver_attach bus_for_each_dev driver_attach_async async_run_entry_fn process_one_work worker_thread kthread ret_from_fork path #3: Workqueue: events netvsc_subchan_work [hv_netvsc] Call Trace: schedule rndis_set_subchannel netvsc_subchan_work process_one_work worker_thread kthread ret_from_fork Before path #1 finishes, path #2 can start to run, because just before the "bus_probe_device(dev);" in device_add() in path #1, there is a line "object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can immediately try to load hv_netvsc and hence path #2 can start to run. Next, path #2 offloads the subchannal's initialization to a workqueue, i.e. path #3, so we can end up in a deadlock situation like this: Path #2 gets the device lock, and is trying to get the rtnl lock; Path #3 gets the rtnl lock and is waiting for all the subchannel messages to be processed; Path #1 is trying to get the device lock, but since #2 is not releasing the device lock, path #1 has to sleep; since the VMBus messages are processed one by one, this means the sub-channel messages can't be procedded, so #3 has to sleep with the rtnl lock held, and finally #2 has to sleep... Now all the 3 paths are sleeping and we hit the deadlock. With the patch, we can make sure #2 gets both the device lock and the rtnl lock together, gets its job done, and releases the locks, so #1 and #3 will not be blocked for ever. Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15hv_netvsc: ignore devices that are not PCIStephen Hemminger
[ Upstream commit b93c1b5ac8643cc08bb74fa8ae21d6c63dfcb23d ] Registering another device with same MAC address (such as TAP, VPN or DPDK KNI) will confuse the VF autobinding logic. Restrict the search to only run if the device is known to be a PCI attached VF. Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-24hv/netvsc: fix handling of fallback to single queue modeStephen Hemminger
[ Upstream commit 916c5e1413be058d1c1f6e502db350df890730ce ] The netvsc device may need to fallback to running in single queue mode if host side only wants to support single queue. Recent change for handling mtu broke this in setup logic. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 3ffe64f1a641 ("hv_netvsc: split sub-channel setup into async and sync") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03hv_netvsc: fix network namespace issues with VF supportStephen Hemminger
[ Upstream commit 7bf7bb37f16a80465ee3bd7c6c966f96f5a075a6 ] When finding the parent netvsc device, the search needs to be across all netvsc device instances (independent of network namespace). Find parent device of VF using upper_dev_get routine which searches only adjacent list. Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> netns aware byref Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25hv_netvsc: Fix napi reschedule while receive completion is busyHaiyang Zhang
[ Upstream commit 6b81b193b83e87da1ea13217d684b54fccf8ee8a ] If out ring is full temporarily and receive completion cannot go out, we may still need to reschedule napi if certain conditions are met. Otherwise the napi poll might be stopped forever, and cause network disconnect. Fixes: 7426b1a51803 ("netvsc: optimize receive completions") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-22hv_netvsc: split sub-channel setup into async and syncStephen Hemminger
[ Upstream commit 3ffe64f1a641b80a82d9ef4efa7a05ce69049871 ] When doing device hotplug the sub channel must be async to avoid deadlock issues because device is discovered in softirq context. When doing changes to MTU and number of channels, the setup must be synchronous to avoid races such as when MTU and device settings are done in a single ip command. Reported-by: Thomas Walker <Thomas.Walker@twosigma.com> Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26hv_netvsc: Fix a network regression after ifdown/ifupDexuan Cui
[ Upstream commit 52acf73b6e9a6962045feb2ba5a8921da2201915 ] Recently people reported the NIC stops working after "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not enabled, after the refactoring of the common detach logic: when the NIC has sub-channels, usually we enable all the TX queues after all sub-channels are set up: see rndis_set_subchannel() -> netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where the number of channels doesn't change, we also must make sure the TX queues are enabled. The patch fixes the regression. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: enable multicast if necessaryStephen Hemminger
[ Upstream commit f03dbb06dc380274e351ca4b1ee1587ed4529e62 ] My recent change to netvsc drive in how receive flags are handled broke multicast. The Hyper-v/Azure virtual interface there is not a multicast filter list, filtering is only all or none. The driver must enable all multicast if any multicast address is present. Fixes: 009f766ca238 ("hv_netvsc: filter multicast/broadcast") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: fix locking during VF setupStephen Hemminger
[ Upstream commit b0dee7910317f41f398838992516af6a3b981d86 ] The dev_uc/mc_sync calls need to have the device address list locked. This was spotted by running with lockdep enabled. Fixes: bee9d41b37ea ("hv_netvsc: propagate rx filters to VF") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: fix locking for rx_modeStephen Hemminger
[ Upstream commit 35a57b7fef136fa3d5b735ba773f191b95110fa0 ] The rx_mode operation handler is different than other callbacks in that is not always called with rtnl held. Therefore use RCU to ensure that references are valid. Fixes: bee9d41b37ea ("hv_netvsc: propagate rx filters to VF") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: fix filter flagsStephen Hemminger
[ Upstream commit de3d50aadd40bf68614db9fd157b275ce9c2d467 ] The recent change to not always enable all multicast and broadcast was broken; meant to set filter, not change flags. Fixes: 009f766ca238 ("hv_netvsc: filter multicast/broadcast") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: propagate rx filters to VFStephen Hemminger
[ Upstream commit bee9d41b37ea6b1f860e5bc0989cf1cf1d7e6ab3 ] The netvsc device should propagate filters to the SR-IOV VF device (if present). The flags also need to be propagated to the VF device as well. This only really matters on local Hyper-V since Azure does not support multiple addresses. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: filter multicast/broadcastStephen Hemminger
[ Upstream commit 009f766ca2383d8788acd65c2c36c51bbfb19470 ] The netvsc driver was always enabling all multicast and broadcast even if netdevice flag had not enabled it. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30hv_netvsc: use napi_schedule_irqoffStephen Hemminger
[ Upstream commit 68633edaef655ce94e51088ecef5dd4e1d2f6f34 ] Since the netvsc_channel_cb is already called in interrupt context from vmbus, there is no need to do irqsave/restore. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Fix net device attach on older Windows hostsMohammed Gamal
[ Commit 55be9f25be1ca5bda75c39808fc77e42691bc07f upstream. ] On older windows hosts the net_device instance is returned to the caller of rndis_filter_device_add() without having the presence bit set first. This would cause any subsequent calls to network device operations (e.g. MTU change, channel change) to fail after the device is detached once, returning -ENODEV. Instead of returning the device instabce, we take the exit path where we call netif_device_attach() Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Ensure correct teardown message sequence orderMohammed Gamal
[ Commit a56d99d714665591fed8527b90eef21530ea61e0 upstream. ] Prior to commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") the call sequence in netvsc_device_remove() was as follows (as implemented in netvsc_destroy_buf()): 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message 2- Teardown receive buffer GPADL 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message 4- Teardown send buffer GPADL 5- Close vmbus This didn't work for WS2016 hosts. Commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") rearranged the teardown sequence as follows: 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message 3- Close vmbus 4- Teardown receive buffer GPADL 5- Teardown send buffer GPADL That worked well for WS2016 hosts, but it prevented guests on older hosts from shutting down after changing network settings. Commit 0ef58b0a05c1 ("hv_netvsc: change GPAD teardown order on older versions") ensured the following message sequence for older hosts 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message 3- Teardown receive buffer GPADL 4- Teardown send buffer GPADL 5- Close vmbus However, with this sequence calling `ip link set eth0 mtu 1000` hangs and the process becomes uninterruptible. On futher analysis it turns out that on tearing down the receive buffer GPADL the kernel is waiting indefinitely in vmbus_teardown_gpadl() for a completion to be signaled. Here is a snippet of where this occurs: int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle) { struct vmbus_channel_gpadl_teardown *msg; struct vmbus_channel_msginfo *info; unsigned long flags; int ret; info = kmalloc(sizeof(*info) + sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL); if (!info) return -ENOMEM; init_completion(&info->waitevent); info->waiting_channel = channel; [....] ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_gpadl_teardown), true); if (ret) goto post_msg_err; wait_for_completion(&info->waitevent); [....] } The completion is signaled from vmbus_ongpadl_torndown(), which gets called when the corresponding message is received from the host, which apparently never happens in that case. This patch works around the issue by restoring the first mentioned message sequence for older hosts Fixes: 0ef58b0a05c1 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()Mohammed Gamal
[ Commit 7992894c305eaf504d005529637ff8283d0a849d upstream. ] Split each of the functions into two for each of send/recv buffers. This will be needed in order to implement a fine-grained messaging sequence to the host so that we accommodate the requirements of different Windows versions Fixes: 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Use Windows version instead of NVSP version on GPAD teardownMohammed Gamal
commit 2afc5d61a7197de25a61f54ea4ecfb4cb62b1d42A upstram When changing network interface settings, Windows guests older than WS2016 can no longer shutdown. This was addressed by commit 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order on older versions"), however the issue also occurs on WS2012 guests that share NVSP protocol versions with WS2016 guests. Hence we use Windows version directly to differentiate them. Fixes: 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: common detach logicStephen Hemminger
[ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ] Make common function for detaching internals of device during changes to MTU and RSS. Make sure no more packets are transmitted and all packets have been received before doing device teardown. Change the wait logic to be common and use usleep_range(). Changes transmit enabling logic so that transmit queues are disabled during the period when lower device is being changed. And enabled only after sub channels are setup. This avoids issue where it could be that a packet was being sent while subchannel was not initialized. Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: change GPAD teardown order on older versionsStephen Hemminger
[ Commit 0ef58b0a05c127762f975c3dfe8b922e4aa87a29 upstream. ] On older versions of Windows, the host ignores messages after vmbus channel is closed. Workaround this by doing what Windows does and send the teardown before close on older versions of NVSP protocol. Reported-by: Mohammed Gamal <mgamal@redhat.com> Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: use RCU to fix concurrent rx and queue changesStephen Hemminger
[ Commit 02400fcee2542ee334a2394e0d9f6efd969fe782 upstream. ] The receive processing may continue to happen while the internal network device state is in RCU grace period. The internal RNDIS structure is associated with the internal netvsc_device structure; both have the same RCU lifetime. Defer freeing all associated parts until after grace period. Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: disable NAPI before channel closeStephen Hemminger
[ Commit 8348e0460ab1473f06c8b824699dd2eed3c1979d upstream. ] This makes sure that no CPU is still process packets when the channel is closed. Fixes: 76bb5db5c749 ("netvsc: fix use after free on module removal") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: defer queue selection to VFStephen Hemminger
[ Commit b3bf5666a51068ad5ddd89a76ed877101ef3bc16 upstream. ] When VF is used for accelerated networking it will likely have more queues (and different policy) than the synthetic NIC. This patch defers the queue policy to the VF so that all the queues can be used. This impacts workloads like local generate UDP. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: fix race in napi poll when reschedulingStephen Hemminger
[ Commit d64e38ae690e3337db0d38d9b149a193a1646c4b upstream. ] There is a race between napi_reschedule and re-enabling interrupts which could lead to missed host interrrupts. This occurs when interrupts are re-enabled (hv_end_read) and vmbus irq callback (netvsc_channel_cb) has already scheduled NAPI. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: cancel subchannel setup before halting deviceStephen Hemminger
[ Commit a7483ec0267c69b34e818738da60b392623da94b upstream. ] Block setup of multiple channels earlier in the teardown process. This avoids possible races between halt and subchannel initialization. Suggested-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: fix error unwind handling if vmbus_open failsStephen Hemminger
[ Commit fcfb4a00d1e514e8313277a01ef919de1113025b upstream. ] Need to delete NAPI association if vmbus_open fails. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: only wake transmit queue if link is upStephen Hemminger
[ Commit f4950e4586dfc957e0a28226eeb992ddc049b5a2 upstream. ] Don't wake transmit queues if link is not up yet. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: avoid retry on send during shutdownStephen Hemminger
[ Commit 12f69661a49446840d742d8feb593ace022d9f66 upstream. ] Change the initialization order so that the device is ready to transmit (ie connect vsp is completed) before setting the internal reference to the device with RCU. This avoids any races on initialization and prevents retry issues on shutdown. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Use the num_online_cpus() for channel limitHaiyang Zhang
[ Commit 25a39f7f975c3c26a0052fbf9b59201c06744332 upstream. ] Since we no longer localize channel/CPU affiliation within one NUMA node, num_online_cpus() is used as the number of channel cap, instead of the number of processors in a NUMA node. This patch allows a bigger range for tuning the number of channels. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: empty current transmit aggregation if flow blockedStephen Hemminger
[ Commit cfd8afd986cdb59ea9adac873c5082498a1eb7c0 upstream. ] If the transmit queue is known full, then don't keep aggregating data. And the cp_partial flag which indicates that the current aggregation buffer is full can be folded in to avoid more conditionals. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: preserve hw_features on mtu/channels/ringparam changesVitaly Kuznetsov
[ Commit aefd80e874e98a864915df5b7d90824a4340b450 upstream. ] rndis_filter_device_add() is called both from netvsc_probe() when we initially create the device and from set channels/mtu/ringparam routines where we basically remove the device and add it back. hw_features is reset in rndis_filter_device_add() and filled with host data. However, we lose all additional flags which are set outside of the driver, e.g. register_netdevice() adds NETIF_F_SOFT_FEATURES and many others. Unfortunately, calls to rndis_{query_hwcaps(), _set_offload_params()} calls cannot be avoided on every RNDIS reset: host expects us to set required features explicitly. Moreover, in theory hardware capabilities can change and we need to reflect the change in hw_features. Reset net->hw_features bits according to host data in rndis_netdev_set_hwcaps(), clear corresponding feature bits from net->features in case some features went missing (will never happen in real life I guess but let's be consistent). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: netvsc_teardown_gpadl() splitVitaly Kuznetsov
[ Commit 0cf737808ae7cb25e952be619db46b9147a92f46 upstream. ] It was found that in some cases host refuses to teardown GPADL for send/ receive buffers (probably when some work with these buffere is scheduled or ongoing). Change the teardown logic to be: 1) Send NVSP_MSG1_TYPE_REVOKE_* messages 2) Close the channel 3) Teardown GPADLs. This seems to work reliably. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Set tx_table to equal weight after subchannels openHaiyang Zhang
[ Commit a6fb6aa3cfa9047b62653dbcfc9bcde6e2272b41 upstream. ] In some cases, like internal vSwitch, the host doesn't provide send indirection table updates. This patch sets the table to be equal weight after subchannels are all open. Otherwise, all workload will be on one TX channel. As tested, this patch has largely increased the throughput over internal vSwitch. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Add initialization of tx_table in netvsc_device_add()Haiyang Zhang
[ Commit 6b0cbe315868d613123cf387052ccda5f09d49ea upstream. ] tx_table is part of the private data of kernel net_device. It is only zero-ed out when allocating net_device. We may recreate netvsc_device w/o recreating net_device, so the private netdev data, including tx_table, are not zeroed. It may contain channel numbers for the older netvsc_device. This patch adds initialization of tx_table each time we recreate netvsc_device. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Rename tx_send_table to tx_tableHaiyang Zhang
[ Commit 39e91cfbf6f5fb26ba64cc2e8874372baf1671e7 upstream. ] Simplify the variable name: tx_send_table Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25hv_netvsc: Rename ind_table to rx_tableHaiyang Zhang
[ Commit 47371300dfc269dd8d150e5b872bdbbda98ba809 upstream. ] Rename this variable because it is the Receive indirection table. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>