aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
AgeCommit message (Collapse)Author
2021-10-06net/mlx4_en: Don't allow aRFS for encapsulated packetsAya Levin
[ Upstream commit fdbccea419dc782079ce5881d2705cc9e3881480 ] Driver doesn't support aRFS for encapsulated packets, return early error in such a case. Fixes: 1eb8c695bda9 ("net/mlx4_en: Add accelerated RFS support") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17net/mlx4_en: update moderation when config resetKevin(Yudong) Yang
commit 00ff801bb8ce6711e919af4530b6ffa14a22390a upstream. This patch fixes a bug that the moderation config will not be applied when calling mlx4_en_reset_config. For example, when turning on rx timestamping, mlx4_en_reset_config() will be called, causing the NIC to forget previous moderation config. This fix is in phase with a previous fix: commit 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called") Tested: Before this patch, on a host with NIC using mlx4, run netserver and stream TCP to the host at full utilization. $ sar -I SUM 1 INTR intr/s 14:03:56 sum 48758.00 After rx hwtstamp is enabled: $ sar -I SUM 1 14:10:38 sum 317771.00 We see the moderation is not working properly and issued 7x more interrupts. After the patch, and turned on rx hwtstamp, the rate of interrupts is as expected: $ sar -I SUM 1 14:52:11 sum 49332.00 Fixes: 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called") Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> CC: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29net/mlx4_en: Handle TX error CQEMoshe Shemesh
[ Upstream commit ba603d9d7b1215c72513d7c7aa02b6775fd4891b ] In case error CQE was found while polling TX CQ, the QP is in error state and all posted WQEs will generate error CQEs without any data transmitted. Fix it by reopening the channels, via same method used for TX timeout handling. In addition add some more info on error CQE and WQE for debug. Fixes: bd2f631d7c60 ("net/mlx4_en: Notify user when TX ring in error state") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29net/mlx4_en: Avoid scheduling restart task if it is already runningMoshe Shemesh
[ Upstream commit fed91613c9dd455dd154b22fa8e11b8526466082 ] Add restarting state flag to avoid scheduling another restart task while such task is already running. Change task name from watchdog_task to restart_task to better fit the task role. Fixes: 1e338db56e5a ("mlx4_en: Fix a race at restart task") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17net/mlx4_en: Change min MTU size to ETH_MIN_MTUEran Ben Elisha
[ Upstream commit 24be19e47779d604d1492c114459dca9a92acf78 ] NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in the RFC791 and in the network stack. Remove old mlx4_en only define for it, which was set to wrong value. Fixes: b80f71f5816f ("ethernet/mellanox: use core min/max MTU checking") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19net/mlx4_en: Fix an error handling path in 'mlx4_en_init_netdev()'Christophe JAILLET
[ Upstream commit a577d868b768a3baf16cdd4841ab8cfb165521d6 ] If an error occurs, 'mlx4_en_destroy_netdev()' is called. It then calls 'mlx4_en_free_resources()' which does the needed resources cleanup. So, doing some explicit kfree in the error handling path would lead to some double kfree. Simplify code to avoid such a case. Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-12net/mlx4_en: Change default QoS settingsMoni Shoua
[ Upstream commit a42b63c1ac1986f17f71bc91a6b0aaa14d4dae71 ] Change the default mapping between TC and TCG as follows: Prio | TC/TCG | from to | (set by FW) (set by SW) ---------+----------------------------------- 0 | 0/0 0/7 1 | 1/0 0/6 2 | 2/0 0/5 3 | 3/0 0/4 4 | 4/0 0/3 5 | 5/0 0/2 6 | 6/0 0/1 7 | 7/0 0/0 These new settings cause that a pause frame for any prio stops traffic for all prios. Fixes: 564c274c3df0 ("net/mlx4_en: DCB QoS support") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-08-29net/mlx4: Add user mac FW update supportMoshe Shemesh
Adding support for updating the FW on new port mac, when port mac change is requested by the user. This info is required by the FW as OEM management tools require this info directly from the NIC FW. Check device capability bit to verify the FW supports user mac. If the FW does support it, use set_port command to notify the FW on the new mac. The feature is relevant only to PF port mac. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: get rid of struct tc_to_netdevJiri Pirko
Get rid of struct tc_to_netdev which is now just unnecessary container and rather pass per-type structures down to drivers directly. Along with that, consolidate the naming of per-type structure variables in cls_*. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: change return value of ndo_setup_tc for driver supporting mqprio ↵Jiri Pirko
only Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the drivers have it like that, so be aligned. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: push cls related args into cls_common structureJiri Pirko
As ndo_setup_tc is generic offload op for whole tc subsystem, does not really make sense to have cls-specific args. So move them under cls_common structurure which is embedded in all cls structs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: make type an argument for ndo_setup_tcJiri Pirko
Since the type is always present, push it to be a separate argument to ndo_setup_tc. On the way, name the type enum and use it for arg type. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24(IB, net)/mlx4: Add resource utilization supportMoshe Shemesh
Adding visibility of resource usage of QPs, CQs and counters used by virtual functions. This feature will be used to give the PF administrator more data while debugging VF status. Usage info was added to ALLOC_RES command, to notify the PF if the resource which is being reserved or allocated for the VF will be used by kernel driver or by user verbs. Updated reservation and allocation functions of QP, CQ and counter with additional usage parameter. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-29net/mlx4_en: Do not allocate redundant TX queues when TC is disabledInbar Karmy
Currently the number of TX queues that are allocated doesn't depend on the number of TCs, the module always loads with max num of UP per channel. In order to prevent the allocation of unnecessary memory, the module will load with minimum number of UPs per channel, and the user will be able to control the number of TX queues per channel by changing the number of TC to 8 using the tc command. The variable num_up will hold the information about the current number of UPs. Due to the change, needed to remove the lines that set the value of UP to be different than zero in the func "mlx4_en_select_queue", since now the num of TX queues that are allocated is only one per channel in default. In order not to force the UP to be zero in case of only one TC, added a condition before forcing it in the func "mlx4_en_fill_qp_context". Tested: After the module is loaded with minimum number of UP per channel, to increase num of TCs to 8, use: tc qdisc add dev ens8 root mqprio num_tc 8 In order to decrease the number of TCs to minimum number of UP per channel, use: tc qdisc del dev ens8 root Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Tarick Bedeir <tarick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29net/mlx4_en: Add dynamic variable to hold the number of user priorities (UP)Inbar Karmy
Until this patch, the number of UPs was hard coded for eight. Replace this with a variable in struct "mlx4_en_port_profile". Currently, the variable will hold the maximum number of UP, as before. The patch creates an infrastructure to add an option for dynamic change of the actual number of TCs. Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Tarick Bedeir <tarick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net/mlx4: fix spelling mistake: "coalesing" -> "coalescing"Colin Ian King
Trivial fix to spelling mistake in en_dbg debug message Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau
Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Poll XDP TX completion queue in RX NAPITariq Toukan
Instead of having their own NAPIs, XDP TX completion queues get polled within the corresponding RX NAPI. This prevents any possible race on TX ring prod/cons indices, between the context that issues the transmits (RX NAPI) and the context that handles the completions (was previously done in a separate NAPI). This also improves performance, as it decreases the number of NAPIs running on a CPU, saving the overhead of syncing and switching between the contexts. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 12.0 Mpps | 13.8 Mpps | 15% | IPv6 | 12.0 Mpps | 13.8 Mpps | 15% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Optimized single ring steeringSaeed Mahameed
Avoid touching RX QP RSS context when loading with only one RX ring, to allow optimized A0 RX steering. Enable by: - loading mlx4_core with module param: log_num_mgm_entry_size = -6. - then: ethtool -L <interface> rx 1 Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_DROP packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 20.5 Mpps | 28.1 Mpps | 37% | IPv6 | 18.4 Mpps | 28.1 Mpps | 53% | ------------------------------------- Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08net: propagate tc filter chain index down the ndo_setup_tc callJiri Pirko
We need to push the chain index down to the drivers, so they have the information to which chain the rule belongs. For now, no driver supports multichain offload, so only chain 0 is supported. This is needed to prevent chain squashes during offload for now. Later this will be used to implement multichain offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALLMiroslav Lichvar
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid filter and update drivers which can timestamp all packets, or which explicitly list unsupported filters instead of using a default case, to handle the filter. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15mqprio: Modify mqprio to pass user parameters via ndo_setup_tc.Amritha Nambiar
The configurable priority to traffic class mapping and the user specified queue ranges are used to configure the traffic class, overriding the hardware defaults when the 'hw' option is set to 0. However, when the 'hw' option is non-zero, the hardware QOS defaults are used. This patch makes it so that we can pass the data the user provided to ndo_setup_tc. This allows us to pull in the queue configuration if the user requested it as well as any additional hardware offload type requested by using a value other than 1 for the hw value. Finally it also provides a means for the device driver to return the level supported for the offload type via the qopt->hw value. Previously we were just always assuming the value to be 1, in the future values beyond just 1 may be supported. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-23net/mlx4: Spoofcheck and zero MAC can't coexistEugenia Emantayev
Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-19mlx4: fix potential divide by 0 in mlx4_en_auto_moderation()Eric Dumazet
1) In the case where rate == priv->pkt_rate_low == priv->pkt_rate_high, mlx4_en_auto_moderation() does a divide by zero. 2) We want to properly change the moderation parameters if rx_frames was changed (like in ethtool -C eth0 rx-frames 16) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The conflict was an interaction between a bug fix in the netvsc driver in 'net' and an optimization of the RX path in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G'Martin KaFai Lau
After calling mlx4_en_try_alloc_resources (e.g. by changing the number of rx-queues with ethtool -L), the existing xdp_prog becomes inactive. The bug is that the xdp_prog ptr has not been carried over from the old rx-queues to the new rx-queues Fixes: 47a38e155037 ("net/mlx4_en: add support for fast rx drop bpf program") Cc: Brenden Blanco <bblanco@plumgrid.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02mlx4: Fix memory leak after mlx4_en_update_priv()Martin KaFai Lau
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t] are over-written by src->tx_ring[t] and src->tx_cq[t] without first calling kfree. One of the reproducible code paths is by doing 'ethtool -L'. The fix is to do the kfree in mlx4_en_free_resources(). Here is the kmemleak report: unreferenced object 0xffff880841211800 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff880841213000 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff (gdb) list *mlx4_en_try_alloc_resources+0x118 0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145). 2140 if (!dst->tx_ring_num[t]) 2141 continue; 2142 2143 dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) * 2144 MAX_TX_RINGS, GFP_KERNEL); 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) * 2149 MAX_TX_RINGS, GFP_KERNEL); (gdb) list *mlx4_en_try_alloc_resources+0x13b 0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150). 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) * 2149 MAX_TX_RINGS, GFP_KERNEL); 2150 if (!dst->tx_cq[t]) { 2151 kfree(dst->tx_ring[t]); 2152 goto err_free_tx; 2153 } 2154 } Fixes: ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems") Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_en: Pass user MTU value to Firmware at set port commandShaker Daibes
When starting the port, driver will inform Firmware about the actual MTU which does not include implicit headers, such as FCS or VLAN tags. Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-01-16mlx4: do not call napi_schedule() without careEric Dumazet
Disable BH around the call to napi_schedule() to avoid following warning [ 52.095499] NOHZ: local_softirq_pending 08 [ 52.421291] NOHZ: local_softirq_pending 08 [ 52.608313] NOHZ: local_softirq_pending 08 Fixes: 8d59de8f7bb3 ("net/mlx4_en: Process all completions in RX rings after port goes up") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two AF_* families adding entries to the lockdep tables at the same time. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10mlx4: Return EOPNOTSUPP instead of ENOTSUPPMartin KaFai Lau
In commit b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"), it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it. Fixes: b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08net: make ndo_get_stats64 a void functionstephen hemminger
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23net/mlx4_en: Fix user prio field in XDP forwardTariq Toukan
The user prio field is wrong (and overflows) in the XDP forward flow. This is a result of a bad value for num_tx_rings_p_up, which should account all XDP TX rings, as they operate for the same user prio. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Reserve headroom for receiving packet when XDP prog is activeMartin KaFai Lau
Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head() support. This patch only affects the code path when XDP is active. After testing, the tx_dropped counter is incremented if the xdp_prog sends more than wire MTU. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrsMartin KaFai Lau
When XDP is active in mlx4, mlx4 is using one page/pkt. At the same time (i.e. when XDP is active), it is currently limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 in x86. AFAICT, we can at least raise the MTU limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is doing. It will be useful in the next patch which allows XDP program to extend the packet by adding new header(s). Note: In the earlier XDP patches, there is already existing guard to ensure the page/pkt scheme only applies when XDP is active in mlx4. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02mlx4: fix use-after-free in mlx4_en_fold_software_stats()Eric Dumazet
My recent commit to get more precise rx/tx counters in ndo_get_stats64() can lead to crashes at device dismantle, as Jesper found out. We must prevent mlx4_en_fold_software_stats() trying to access tx/rx rings if they are deleted. Fix this by adding a test against priv->port_up in mlx4_en_fold_software_stats() Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port() allows us to eventually broadcast the latest/current counters to rtnetlink monitors. Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29mlx4: give precise rx/tx bytes/packets countersEric Dumazet
mlx4 stats are chaotic because a deferred work queue is responsible to update them every 250 ms. Even sampling stats every one second with "sar -n DEV 1" gives variations like the following : lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65 07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50 07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98 07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26 07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>> 07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23 07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39 07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46 07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48 07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13 07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97 Average: eth0 142827.50 3179259.70 9206.30 4700578.16 This patch allows rx/tx bytes/packets counters being folded at the time we need stats. We now can fetch stats every 1 ms if we want to check NIC behavior on a small time window. It is also easier to detect anomalies. lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65 07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42 07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02 07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16 07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39 07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21 07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05 07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73 07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07 07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35 07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84 Average: eth0 142835.66 3182165.93 9206.98 4704874.08 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow"Tariq Toukan
This reverts commit 9d76931180557270796f9631e2c79b9c7bb3c9fb. Using unregister_netdev at shutdown flow prevents calling the netdev's ndos or trying to access its freed resources. This fixes crashes like the following: Call Trace: [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30 [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0 [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0 [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50 [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20 [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50 [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50 [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170 [<ffffffff815a6205>] linkwatch_event+0x25/0x30 [<ffffffff81099a82>] process_one_work+0x152/0x400 [<ffffffff8109a325>] worker_thread+0x125/0x4b0 [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350 [<ffffffff8109fc6a>] kthread+0xca/0xe0 [<ffffffff8109fba0>] ? kthread_park+0x60/0x60 [<ffffffff816a1285>] ret_from_fork+0x25/0x30 Fixes: 9d7693118055 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reported-by: Steve Wise <swise@opengridcomputing.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()Eric Dumazet
Per RX ring packets/bytes counters are not protected by global priv->stats_lock. Better not confuse the reader, and use READ_ONCE() to show we read these counters without surrounding synchronization. Interrupt moderation is best effort, and we do not really care of ultra precise counters. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24mlx4: reorganize struct mlx4_en_tx_ringEric Dumazet
Goal is to reorganize this critical structure to increase performance. ndo_start_xmit() should only dirty one cache line, and access as few cache lines as possible. Add sp_ (Slow Path) prefix to fields that are not used in fast path, to make clear what is going on. After this patch pahole reports something much better, as all ndo_start_xmit() needed fields are packed into two cache lines instead of seven or eight struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 */ u32 cons; /* 0x4 0x4 */ long unsigned int wake_queue; /* 0x8 0x8 */ struct netdev_queue * tx_queue; /* 0x10 0x8 */ u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x18 0x8 */ struct mlx4_en_rx_ring * recycle_ring; /* 0x20 0x8 */ /* XXX 24 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 prod; /* 0x40 0x4 */ unsigned int tx_dropped; /* 0x44 0x4 */ long unsigned int bytes; /* 0x48 0x8 */ long unsigned int packets; /* 0x50 0x8 */ long unsigned int tx_csum; /* 0x58 0x8 */ long unsigned int tso_packets; /* 0x60 0x8 */ long unsigned int xmit_more; /* 0x68 0x8 */ struct mlx4_bf bf; /* 0x70 0x18 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ __be32 doorbell_qpn; /* 0x88 0x4 */ __be32 mr_key; /* 0x8c 0x4 */ u32 size; /* 0x90 0x4 */ u32 size_mask; /* 0x94 0x4 */ u32 full_size; /* 0x98 0x4 */ u32 buf_size; /* 0x9c 0x4 */ void * buf; /* 0xa0 0x8 */ struct mlx4_en_tx_info * tx_info; /* 0xa8 0x8 */ int qpn; /* 0xb0 0x4 */ u8 queue_index; /* 0xb4 0x1 */ bool bf_enabled; /* 0xb5 0x1 */ bool bf_alloced; /* 0xb6 0x1 */ u8 hwtstamp_tx_type; /* 0xb7 0x1 */ u8 * bounce_buf; /* 0xb8 0x8 */ /* --- cacheline 3 boundary (192 bytes) --- */ long unsigned int queue_stopped; /* 0xc0 0x8 */ struct mlx4_hwq_resources sp_wqres; /* 0xc8 0x58 */ /* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */ struct mlx4_qp sp_qp; /* 0x120 0x30 */ /* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */ struct mlx4_qp_context sp_context; /* 0x150 0xf8 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ cpumask_t sp_affinity_mask; /* 0x248 0x20 */ enum mlx4_qp_state sp_qp_state; /* 0x268 0x4 */ u16 sp_stride; /* 0x26c 0x2 */ u16 sp_cqn; /* 0x26e 0x2 */ /* size: 640, cachelines: 10, members: 36 */ /* sum members: 600, holes: 1, sum holes: 24 */ /* padding: 16 */ }; Instead of this silly placement : struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 */ u32 cons; /* 0x4 0x4 */ long unsigned int wake_queue; /* 0x8 0x8 */ /* XXX 48 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 prod; /* 0x40 0x4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int bytes; /* 0x48 0x8 */ long unsigned int packets; /* 0x50 0x8 */ long unsigned int tx_csum; /* 0x58 0x8 */ long unsigned int tso_packets; /* 0x60 0x8 */ long unsigned int xmit_more; /* 0x68 0x8 */ unsigned int tx_dropped; /* 0x70 0x4 */ /* XXX 4 bytes hole, try to pack */ struct mlx4_bf bf; /* 0x78 0x18 */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ long unsigned int queue_stopped; /* 0x90 0x8 */ cpumask_t affinity_mask; /* 0x98 0x10 */ struct mlx4_qp qp; /* 0xa8 0x30 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mlx4_hwq_resources wqres; /* 0xd8 0x58 */ /* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */ u32 size; /* 0x130 0x4 */ u32 size_mask; /* 0x134 0x4 */ u16 stride; /* 0x138 0x2 */ /* XXX 2 bytes hole, try to pack */ u32 full_size; /* 0x13c 0x4 */ /* --- cacheline 5 boundary (320 bytes) --- */ u16 cqn; /* 0x140 0x2 */ /* XXX 2 bytes hole, try to pack */ u32 buf_size; /* 0x144 0x4 */ __be32 doorbell_qpn; /* 0x148 0x4 */ __be32 mr_key; /* 0x14c 0x4 */ void * buf; /* 0x150 0x8 */ struct mlx4_en_tx_info * tx_info; /* 0x158 0x8 */ struct mlx4_en_rx_ring * recycle_ring; /* 0x160 0x8 */ u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168 0x8 */ u8 * bounce_buf; /* 0x170 0x8 */ struct mlx4_qp_context context; /* 0x178 0xf8 */ /* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */ int qpn; /* 0x270 0x4 */ enum mlx4_qp_state qp_state; /* 0x274 0x4 */ u8 queue_index; /* 0x278 0x1 */ bool bf_enabled; /* 0x279 0x1 */ bool bf_alloced; /* 0x27a 0x1 */ /* XXX 5 bytes hole, try to pack */ /* --- cacheline 10 boundary (640 bytes) --- */ struct netdev_queue * tx_queue; /* 0x280 0x8 */ int hwtstamp_tx_type; /* 0x288 0x4 */ /* size: 704, cachelines: 11, members: 36 */ /* sum members: 587, holes: 6, sum holes: 65 */ /* padding: 52 */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23net/mlx4_en: Free netdev resources under state lockTariq Toukan
Make sure mlx4_en_free_resources is called under the netdev state lock. This is needed since RCU dereference of XDP prog should be protected. Fixes: 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sagi Grimberg <sagi@grimberg.me> CC: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error pathDaniel Borkmann
Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") added a bug in that the prog's reference count is not dropped in the error path when mlx4_en_try_alloc_resources() is failing from mlx4_xdp_set(). We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we need to release again. Earlier in the call path, dev_change_xdp_fd() itself holds a reference to the prog as well (hence the '- 1' in the bpf_prog_add()), so a simple atomic_sub() is safe to use here. When an error is propagated, then bpf_prog_put() is called eventually from dev_change_xdp_fd() Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09Revert "net/mlx4_en: Fix panic during reboot"Tariq Toukan
This reverts commit 9d2afba058722d40cc02f430229c91611c0e8d16. The original issue would possibly exist if an external module tried calling our "ethtool_ops" without checking if it still exists. The right way of solving it is by simply doing the check in the caller side. Currently, no action is required as there's no such use case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/mlx4_en: Add ethtool statistics for XDP casesTariq Toukan
XDP statistics are reported in ethtool, in total and per ring, as follows: - xdp_drop: the number of packets dropped by xdp. - xdp_tx: the number of packets forwarded by xdp. - xdp_tx_full: the number of times an xdp forward failed due to a full tx xdp ring. In addition, all packets that are dropped/forwarded by XDP are no longer accounted in rx_packets/rx_bytes of the ring, so that they count traffic that is passed to the stack. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>