aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/i40evf/i40evf_main.c
AgeCommit message (Collapse)Author
2021-11-26iavf: Fix for the false positive ASQ/ARQ errors while issuing VF resetSurabhi Boob
[ Upstream commit 321421b57a12e933f92b228e0e6d0b2c6541f41d ] While issuing VF Reset from the guest OS, the VF driver prints logs about critical / Overflow error detection. This is not an actual error since the VF_MBX_ARQLEN register is set to all FF's for a short period of time and the VF would catch the bits set if it was reading the register during that spike of time. This patch introduces an additional check to ignore this condition since the VF is in reset. Fixes: 19b73d8efaa4 ("i40evf: Add additional check for reset") Signed-off-by: Surabhi Boob <surabhi.boob@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-28iavf: Fix an error handling path in 'iavf_probe()'Christophe JAILLET
[ Upstream commit af30cbd2f4d6d66a9b6094e0aa32420bc8b20e08 ] If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function. Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-04-26i40evf: Don't schedule reset_task when device is being removedAvinash Dayanand
[ Upstream commit 06aa040f039404a0039a5158cd12f41187487a1f ] When a host disables and enables a PF device, all the associated VFs are removed and added back in. It also generates a PFR which in turn resets all the connected VFs. This behaviour is different from that of Linux guest on Linux host. Hence we end up in a situation where there's a PFR and device removal at the same time. And watchdog doesn't have a clue about this and schedules a reset_task. This patch adds code to send signal to reset_task that the device is currently being removed. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-12i40evf: don't rely on netif_running() outside rtnl_lock()Jacob Keller
[ Upstream commit 44b034b406211fc103159f82b9e601e05675c739 ] In i40evf_reset_task we use netif_running() to determine whether or not the device is currently up. This allows us to properly free queue memory and shut down things before we request the hardware reset. It turns out that we cannot be guaranteed of netif_running() returning false until the device is fully up, as the kernel core code sets __LINK_STATE_START prior to calling .ndo_open. Since we're not holding the rtnl_lock(), it's possible that the driver's i40evf_open handler function is currently being called while we're resetting. We can't simply hold the rtnl_lock() while checking netif_running() as this could cause a deadlock with the i40evf_open() function. Additionally, we can't avoid the deadlock by holding the rtnl_lock() over the whole reset path, as this essentially serializes all resets, and can cause massive delays if we have multiple VFs on a system. Instead, lets just check our own internal state __I40EVF_RUNNING state field. This allows us to ensure that the state is correct and is only set after we've finished bringing the device up. Without this change we might free data structures about device queues and other memory before they've been fully allocated. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25i40e/i40evf: spread CPU affinity hints across online CPUs onlyJacob Keller
[ Upstream commit be664cbefc50977aaefc868ba6a1109ec9b7449d ] Currently, when setting up the IRQ for a q_vector, we set an affinity hint based on the v_idx of that q_vector. Meaning a loop iterates on v_idx, which is an incremental value, and the cpumask is created based on this value. This is a problem in systems with multiple logical CPUs per core (like in simultaneous multithreading (SMT) scenarios). If we disable some logical CPUs, by turning SMT off for example, we will end up with a sparse cpu_online_mask, i.e., only the first CPU in a core is online, and incremental filling in q_vector cpumask might lead to multiple offline CPUs being assigned to q_vectors. Example: if we have a system with 8 cores each one containing 8 logical CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is disabled, only the 1st CPU in each core remains online, so the cpu_online_mask in this case would have only 8 bits set, in a sparse way. In general case, when SMT is off the cpu_online_mask has only C bits set: 0, 1*N, 2*N, ..., C*(N-1) where C == # of cores; N == # of logical CPUs per core. In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set. Instead, we should only assign hints for CPUs which are online. Even better, the kernel already provides a function, cpumask_local_spread() which takes an index and returns a CPU, spreading the interrupts across local NUMA nodes first, and then remote ones if necessary. Since we generally have a 1:1 mapping between vectors and CPUs, there is no real advantage to spreading vectors to local CPUs first. In order to avoid mismatch of the default XPS hints, we'll pass -1 so that it spreads across all CPUs without regard to the node locality. Note that we don't need to change the q_vector->affinity_mask as this is initialized to cpu_possible_mask, until an actual affinity is set and then notified back to us. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-27i40e: initialize our affinity_mask based on cpu_possible_maskJacob Keller
On older kernels a call to irq_set_affinity_hint does not guarantee that the IRQ affinity will be set. If nothing else on the system sets the IRQ affinity this can result in a bug in the i40e_napi_poll() routine where we notice that our interrupt fired on the "wrong" CPU according to our internal affinity_mask variable. This results in a bug where we continuously tell NAPI to stop polling to move the interrupt to a new CPU, but the CPU never changes because our affinity mask does not match the actual mask setup for the IRQ. The root problem is a mismatched affinity mask value. So lets initialize the value to cpu_possible_mask instead. This ensures that prior to the first time we get an IRQ affinity notification we'll have the mask set to include every possible CPU. We use cpu_possible_mask instead of cpu_online_mask since the former is almost certainly never going to change, while the later might change after we've made a copy. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27i40e/i40evf: support for VF VLAN tag stripping controlMariusz Stachura
This patch gives VF capability to control VLAN tag stripping via ethtool. As rx-vlan-offload was fixed before, now the VF is able to change it using "ethtool --offload <IF> rxvlan on/off" settings. Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27i40evf: fix possible snprintf truncation of q_vector->nameJacob Keller
The q_vector names are based on the interface name with a driver prefix, the type of q_vector setup, and the queue number. We previously set the size of this variable to IFNAMSIZ + 9, which is incorrect, because we actually include a minimum of 14 characters extra beyond the interface name size. New versions of GCC since 7 include a new warning that detects this possible truncation and complains. We can fix this by increasing the size in case our interface name is too large to avoid truncation. We don't need to go beyond 14 because the compiler is smart enough to realize our values can never exceed size of 1. We do go up to 15 here because possible future changes may increase the number of queues beyond one digit. While we are here, also change some variables to be unsigned (since they are never negative) and stop using an extra unnecessary %s format specifier. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25i40e: use cpumask_copy instead of direct assignmentJacob Keller
According to the header file cpumask.h, we shouldn't be directly copying a cpumask_t, since its a bitmap and might not be copied correctly. Lets use the provided cpumask_copy() function instead. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25i40evf: use netdev variable in reset taskAlan Brady
If we're going to bother initializing a variable to reference it we might as well use it. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25i40e/i40evf: rename vf_offload_flags to vf_cap_flags in struct ↵Stefan Assmann
virtchnl_vf_resource The current name of vf_offload_flags indicates that the bitmap is limited to offload related features. Make this more generic by renaming it to vf_cap_flags, which allows for other capabilities besides offloading to be added. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25i40e: separate hw_features from runtime changing flagsJacob Keller
The number of flags found in pf->flags has grown quite large, and there are a lot of different types of flags. Most of the flags are simply hardware features which are enabled on some firmware or some MAC types. Other flags are dynamic run-time flags which enable or disable certain features of the driver. Separate these two types of flags into pf->hw_features and pf->flags. The hw_features list will contain a set of features which are enabled at init time. This will not contain toggles or otherwise dynamically changing features. These flags should not need atomic protections, as they will be set once during init and then be essentially read only. Everything else will remain in the flags variable. These flags may be modified at any time during run time. A future patch may wish to convert these flags into set_bit/clear_bit/test_bit or similar approach to ensure atomic correctness. The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but currently is used by ethtool in the private flags settings, and thus has been left as part of flags. Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the hw_features but this patch has not tried to untangle it yet. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25i40evf: prevent VF close returning before state transitions to DOWNSudheer Mogilappagari
Currently i40evf_close() can return before state transitions to __I40EVF_DOWN because of the latency involved in processing and receiving response from PF driver and scheduling of VF watchdog_task. Due to this inconsistency an immediate call to i40evf_open() fails because state is still DOWN_PENDING. When a VF interface is in up state and we try to add it as slave, The bonding driver calls dev_close() and dev_open() in short duration resulting in dev_open returning error. The ifenslave command needs to be run again for dev_open to succeed. This fix ensures that watchdog timer is scheduled immediately after admin queue operations are scheduled in i40evf_down(). In addition a wait condition is added at the end of i40evf_close so that function wont return when state is still DOWN_PENDING. The timeout value is chosen after some profiling and includes some buffer. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25i40e/i40evf: adjust packet size to account for double VLANsMitch Williams
Now that the kernel supports double VLAN tags, we should at least play nice. Adjust the max packet size to account for two VLAN tags, not just one. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-07-26i40evf: Use le32_to_cpu before evaluating HW desc fieldsTushar Dave
i40e hardware descriptor fields are in little-endian format. Driver must use le32_to_cpu while evaluating these fields otherwise on big-endian arch we end up evaluating incorrect values, cause errors like: i40evf 0000:03:0a.0: Expected response 24 from PF, received 402653184 i40evf 0000:03:0a.1: Expected response 7 from PF, received 117440512 Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20i40evf: assign num_active_queues inside i40evf_alloc_queuesJacob Keller
The variable num_active_queues represents the number of active queues we have for the device. We assign this pretty early in i40evf_init_subtask. Several code locations are written with loops over the tx_rings and rx_rings structures, which don't get allocated until i40evf_alloc_queues, and which get freed by i40evf_free_queues. These call sites were written under the assumption that tx_rings and rx_rings would always be allocated at least when num_active_queues is non-zero. Lets fix this by moving the assignment into the function where we allocate queues. We'll use a temporary variable for storage so that we don't assign the value in the adapter structure until after the rings have been set up. Finally, when we free the queues, we'll clear the value to ensure that we do not loop over the rings memory that no longer exists. This resolves a possible NULL pointer dereference in i40evf_get_ethtool_stats which could occur if the VF fails to recover from a reset, and then a user requests statistics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-01i40evf: Add support for Adaptive Virtual FunctionPreethi Banala
Add device ID define and mac_type assignment needed for Adaptive Virtual Function (VF Base Mode Support). Also, update version to v3.0.0 in order to indicate clearly that this is the first driver supporting the AVF device ID. Signed-off-by: Preethi Banala <preethi.banala@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-01virtchnl: finish conversion to virtchnl interfaceJesse Brandeburg
This patch implements the complete version of the virtchnl.h file with final renames, and fixes the related code in i40e and i40evf. It also expands comments, and adds details on the usage of certain fields. In addition, due to the changes a couple of casts are needed to prevent errors found by sparse after renaming some fields. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-01virtchnl: rename i40e to generic virtchnlJesse Brandeburg
This morphs all the i40e and i40evf references to/in virtchnl.h to be generic, using only automated methods. Updates all the callers to use the new names. A followup patch provides separate clean ups for messy line conversions from these "automatic" changes, to make them more reviewable. Was executed with the following sed script: sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_client.c sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_prototype.h sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40e_common.c sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40e_prototype.h sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf.h sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf_client.c sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf_main.c sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c sed -i -f transform_script include/linux/avf/virtchnl.h transform_script: ----8<---- s/I40E_VIRTCHNL_SUPPORTED_QTYPES/SAVE_ME_SUPPORTED_QTYPES/g s/I40E_VIRTCHNL_VF_CAP/SAVE_ME_VF_CAP/g s/I40E_VIRTCHNL_/VIRTCHNL_/g s/i40e_virtchnl_/virtchnl_/g s/i40e_vfr_/virtchnl_vfr_/g s/I40E_VFR_/VIRTCHNL_VFR_/g s/VIRTCHNL_OP_ADD_ETHER_ADDRESS/VIRTCHNL_OP_ADD_ETH_ADDR/g s/VIRTCHNL_OP_DEL_ETHER_ADDRESS/VIRTCHNL_OP_DEL_ETH_ADDR/g s/VIRTCHNL_OP_FCOE/VIRTCHNL_OP_RSVD/g s/SAVE_ME_SUPPORTED_QTYPES/I40E_VIRTCHNL_SUPPORTED_QTYPES/g s/SAVE_ME_VF_CAP/I40E_VIRTCHNL_VF_CAP/g ----8<---- Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30i40evf: hide unused variableArnd Bergmann
On architectures with larger pages, we get a warning about an unused variable: drivers/net/ethernet/intel/i40evf/i40evf_main.c: In function 'i40evf_configure_rx': drivers/net/ethernet/intel/i40evf/i40evf_main.c:690:21: error: unused variable 'netdev' [-Werror=unused-variable] This moves the declaration into the #ifdef to avoid the warning. Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30i40evf: allocate queues before we setup the interrupts and q_vectorsJacob Keller
This matches the ordering of how we free stuff during reset and remove. It also makes logical sense because we set the interrupts based on the number of queues. Currently this doesn't really matter in practice. However a future patch moves the assignment of num_active_queues into i40evf_alloc_queues, which is required by i40evf_set_interrupt_capability. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30i40evf: remove needless min_t() on num_online_cpus()*2Jacob Keller
We already set pairs to the value of adapter->num_active_queues. This value is limited by vsi_res->num_queue_pairs and num_online_cpus(). This means that pairs by definition is already smaller than num_online_cpus()*2, so we don't even need to bother with this check. Lets just remove it and update the comment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30i40e: use DECLARE_BITMAP for state fieldsJacob Keller
Instead of assuming our flags fit within an unsigned long, use DECLARE_BITMAP which will ensure that we always allocate enough space. Additionally, use __I40E_STATE_SIZE__ markers as the last element of the enumeration so that the size of the BITMAP is compile-time assigned rather than programmer-time assigned. This ensures that potential future flag additions do not actually overrun the array. This is especially important as 32bit systems would only have 32bit longs instead of 64bit longs as we generally have assumed in the prior code. This change also removes a dereference of the state fields throughout the code, so it does have a bit of code churn. The conversions were automated using sed replacements with an alternation s/&(vsi->back|vsi|pf)->state/\1->state/ s/&adapter->vsi.state/adapter->vsi.state/ For debugfs, we modify the printing so that we can display chunks of the state value on new lines. This ensures that we can print the entire set of state values. Additionally, we now print them as 08lx to ensure that they display nicely. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30i40e: separate PF and VSI state flagsJacob Keller
Avoid using the same named flags for both vsi->state and pf->state. This makes code review easier, as it is more likely that future authors will use the correct state field when checking bits. Previous commits already found issues with at least one check, and possibly others may be incorrect. This reduces confusion as it is more clear what each flag represents, and which flags are valid for which state field. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30i40e: remove unnecessary msleep() delay in i40e_free_vfsJacob Keller
The delay was added because of a desire to ensure that the VF driver can finish up removing. However, pci_disable_sriov already has its own ssleep() call that will sleep for an entire second, so there is no reason to add extra delay on top of this by using msleep here. In practice, an msleep() won't have a huge impact on timing but there is no real value in keeping it, so lets just simplify the code and remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-19i40e/i40evf: Add tracepointsScott Peterson
This patch adds tracepoints to the i40e and i40evf drivers to which BPF programs can be attached for feature testing and verification. It's expected that an attached BPF program will identify and count or log some interesting subset of traffic. The bcc-tools package is helpful there for containing all the BPF arcana in a handy Python wrapper. Though you can make these tracepoints log trace messages, the messages themselves probably won't be very useful (other to verify the tracepoint is being called while you're debugging your BPF program). The idea here is that tracepoints have such low performance cost when disabled that we can leave these in the upstream drivers. This may eventually enable the instrumentation of unmodified customer systems should the need arise to verify a NIC feature is working as expected. In general this enables one set of feature verification tools to be used on these drivers whether they're built with the kernel or separately. Users are advised against using these tracepoints for anything other than a diagnostic tool. They have a performance impact when enabled, and their exact placement and form may change as we see how well they work in practice for the purposes above. Change-ID: Id6014a7322c0e6d08068114dd20bd156f2f6435e Signed-off-by: Scott Peterson <scott.d.peterson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-19i40evf: Use net_device_stats from struct net_deviceTobias Klauser
Instead of using a private copy of struct net_device_stats in struct i40evf_adapter, use stats from struct net_device. Also remove the now unnecessary .ndo_get_stats function. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e/i40evf: Add support for padding start of framesAlexander Duyck
This patch adds padding to the start of frames to make room for headroom for us to eventually start using build_skb. Right now we guarantee at least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is available. For example on x86 the headroom should be 192 bytes. On systems that have too large of a cache line size to support storing 1.5K padding and shared info we default to using 3K buffers and reserve everything that isn't used for skb_shared_info or the data buffer for headroom. Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08i40e/i40evf: Add support for using order 1 pages with a 3K bufferAlexander Duyck
There are situations where adding padding to the front and back of an Rx buffer will require that we add additional padding. Specifically if NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need to use 2K buffers which leaves us with no room for the padding. To preemptively address these cases I am adding support for 3K buffers to the Rx path so that we can provide the additional padding needed in the event of NET_IP_ALIGN being non-zero or a cache line being greater than 64. Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-06i40e/i40evf: Add capability exchange for outer checksumPreethi Banala
This patch adds a capability negotiation between VF and PF using ENCAP/ ENCAP_CSUM offload flags in order for the VF to support outer checksum and TSO offloads for encapsulated packets. These capabilities were assumed by default and enabled in current hardware. Going forward, these features needs to be negotiated with PF before advertising to the stack. Additionally, strip out the mac.type checks for X722 since outer checksums are enabled based on the ENCAP_CSUM offload negotiation flag and maintain consistency between drivers in how the features are configured. Change-ID: Ie380a6f57eca557a2bb575b66b12fae36d308920 Signed-off-by: Preethi Banala <preethi.banala@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-29i40e/i40evf: Change the way we limit the maximum frame size for RxAlexander Duyck
This patch changes the way we handle the maximum frame size for the Rx path. Previously we were rounding up to 2K for a 1500 MTU and then brining the max frame size down to MTU plus a fixed amount. With this patch applied what we now do is limit the maximum frame to 1.5K minus the value for NET_IP_ALIGN for standard MTU, and for any MTU greater than 1500 we allow up to the maximum frame size. This makes the behavior more consistent with the other drivers such as igb which had similar logic. In addition it reduces the test matrix for MTU since we only have two max frame sizes that are handled for Rx now. Change-ID: I23a9d3c857e7df04b0ef28c64df63e659c013f3f Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-15i40e/i40evf: Change version from 1.6.27 to 2.1.7Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-15i40evf: add client interfaceMitch Williams
In preparation for upcoming RDMA-capable hardware, add a client interface to the VF driver. This is a slightly-simplified version of the PF client interface, with the names changed to protect the innocent. Due to the nature of the VF<->PF interactions, the client interface sometimes needs to call back into itself to pass messages. Because of this, we can't use the coarse-grained locking like the PF's client interface uses. Instead, we handle all client interactions in a separate thread so the watchdog can still run and process virtual channel messages. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-18i40evf: add commentMitch Williams
Add a comment to reduce confusion. Change-ID: I3d5819c0f3f5174680442ae54398a073d4a61f4f Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-18i40evf: free rings in remove functionMitch Williams
When the i40evf_remove() calls netdev close, the device doesn't actually close - it schedules the work for the watchdog to perform. Since we're stopping the watchdog, this work doesn't get done. However, we're resetting the part, so we can free resources after the reset request has gone through. This plugs a memory leak. Change-ID: Id5335dcaf76ce00d2a4c3d26e9faf711d7f051cf Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11i40e: Add bus number info to i40e_bus_info structSudheer Mogilappagari
Currently i40e_bus_info has PCI device and function info only and log messages print device number as bus number. Added field to provide bus number info and modified log statements to print bus, device and function information. Change-ID: I811617cee2714cc0d6bade8d369f57040990756f Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11i40e/i40evf : Changed version from 1.6.25 to 1.6.27Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-02i40evf: remove unused device IDMitch Williams
This device ID was intended for use when running Linux VF drivers under Hyper-V, but we have determined that it is not necessary. Since it is unused, and will never be used, remove it. Change-ID: I74998ab4237db043cd400547bb54a0a5e2a37ea5 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06Changed version from 1.6.21 to 1.6.25Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02i40evf: protect against NULL msix_entries and q_vectors pointersJacob Keller
Update the functions which free msix_entries and q_vectors so that they are safe against NULL values. This allows calling code to not care whether these have already been freed when disabling and freeing them. Change-ID: I31bfd1c0da18023d971b618edc6fb049721f3298 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02i40evf: check for msix_entries null dereferenceAlan Brady
It is possible for msix_entries to be freed by a previous suspend/remove before a VF is closed. This patch fixes the issue by checking for NULL before dereferencing msix_entries and returning early in the case where it is NULL within the i40evf_close code path. Without this patch it is possible to trigger a kernel panic through NULL dereference. Change-ID: I92a2746e82533a889e25f91578eac9abd0388ae2 Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02i40evf: Move some i40evf_reset_task code to separate functionJoe Perches
The i40evf_reset_task function is a couple hundred lines and it has a separable block that disables VF. Move that block to a new i40evf_disable_vf function to shorten i40evf_reset_task a bit. Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02i40evf: Be much more verbose about what we can and cannot offloadAlexander Duyck
This change makes it so that we are much more robust about defining what we can and cannot offload. Previously we were performing no checks. This should bring us up to parity with the i40e PF driver. In addition the device only supports GSO as long as the MSS is 64 or greater. We were not checking this so an MSS less than that was resulting in Tx hangs. Change-ID: If533553ec92fc6ba694eab6ac81fdaf3004f3592 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-31i40evf: avoid an extra msleep whileJacob Keller
Remove the second call to msleep outside the loop, and move the msleep within the loop as the first step. This guarantees that a single loop will wait the minimum time first, and then after the reset finishes we no longer need an extra msleep. Change-ID: Ib2086f0a142402b614f67846bc091754203a0b9a Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28i40e/i40evf: Changed version from 1.6.19 to 1.6.21Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28i40e/i40evf: Changed version from 1.6.16 to 1.6.19Bimmy Pujari
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28i40e/i40evf: fix interrupt affinity bugAlan Brady
There exists a bug in which a 'perfect storm' can occur and cause interrupts to fail to be correctly affinitized. This causes unexpected behavior and has a substantial impact on performance when it happens. The bug occurs if there is heavy traffic, any number of CPUs that have an i40e interrupt are pegged at 100%, and the interrupt afffinity for those CPUs is changed. Instead of moving to the new CPU, the interrupt continues to be polled while there is heavy traffic. The bug is most readily realized as the driver is first brought up and all interrupts start on CPU0. If there is heavy traffic and the interrupt starts polling before the interrupt is affinitized, the interrupt will be stuck on CPU0 until traffic stops. The bug, however, can also be wrought out more simply by affinitizing all the interrupts to a single CPU and then attempting to move any of those interrupts off while there is heavy traffic. This patch fixes the bug by registering for update notifications from the kernel when the interrupt affinity changes. When that fires, we cache the intended affinity mask. Then, while polling, if the cpu is pegged at 100% and we failed to clean the rings, we check to make sure we have the correct affinity and stop polling if we're firing on the wrong CPU. When the kernel successfully moves the interrupt, it will start polling on the correct CPU. The performance impact is minimal since the only time this section gets executed is when performance is already compromised by the CPU. Change-ID: I4410a880159b9dba1f8297aa72bef36dca34e830 Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-18ethernet/intel: use core min/max MTU checkingJarod Wilson
e100: min_mtu 68, max_mtu 1500 - remove e100_change_mtu entirely, is identical to old eth_change_mtu, and no longer serves a purpose. No need to set min_mtu or max_mtu explicitly, as ether_setup() will already set them to 68 and 1500. e1000: min_mtu 46, max_mtu 16110 e1000e: min_mtu 68, max_mtu varies based on adapter fm10k: min_mtu 68, max_mtu 15342 - remove fm10k_change_mtu entirely, does nothing now i40e: min_mtu 68, max_mtu 9706 i40evf: min_mtu 68, max_mtu 9706 igb: min_mtu 68, max_mtu 9216 - There are two different "max" frame sizes claimed and both checked in the driver, the larger value wasn't relevant though, so I've set max_mtu to the smaller of the two values here to retain identical behavior. igbvf: min_mtu 68, max_mtu 9216 - Same issue as igb duplicated ixgb: min_mtu 68, max_mtu 16114 - Also remove pointless old == new check, as that's done in dev_set_mtu ixgbe: min_mtu 68, max_mtu 9710 ixgbevf: min_mtu 68, max_mtu dependent on hardware/firmware - Some hw can only handle up to max_mtu 1504 on a vf, others 9710 CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24i40evf: support queue-specific settings for interrupt moderationJacob Keller
In commit a75e8005d506f3 ("i40e: queue-specific settings for interrupt moderation") the i40e driver gained support for setting interrupt moderation values per queue. This patch adds support for this feature to the i40evf driver as well. In addition, a few changes are made to the i40e implementation to add function header documentation comments, as well. This behaves in a similar fashion to the implementation in i40e. Thus, requesting the moderation value when no queue is provided will report queue 0 value, while setting the value without a queue will set all queues at once. Change-ID: I1f310a57c8e6c84a8524c178d44d1b7a6d3a848e Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24i40evf: enable adaptive interrupt throttlingMitch Williams
All of the code to support adaptive interrupt throttling is already in the interrupt handler, it just needs to be enabled. Fill out the data structures properly to make it happen. Single-flow traffic tests may show slightly lower throughput, but interrupts per second will drop by about 75%. Change-ID: I9cd7d42c025b906bf1bb85c6aeb6112684aa6471 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>