aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
AgeCommit message (Collapse)Author
2019-09-12net/mlx5: Allocate root ns memory using kzalloc to match kfreeParav Pandit
commit 25fa506b70cadb580c1e9cbd836d6417276d4bcd upstream. root ns is yet another fs core node which is freed using kfree() by tree_put_node(). Rest of the other fs core objects are also allocated using kmalloc variants. However, root ns memory is allocated using kvzalloc(). Hence allocate root ns memory using kzalloc(). Fixes: 2530236303d9e ("net/mlx5_core: Flow steering tree initialization") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-09-12net/mlx5: Avoid double free in fs init error unwinding pathParav Pandit
commit 9414277a5df3669c67e818708c0f881597e0118e upstream. In below code flow, for ingress acl table root ns memory leads to double free. mlx5_init_fs init_ingress_acls_root_ns() init_ingress_acl_root_ns kfree(steering->esw_ingress_root_ns); /* steering->esw_ingress_root_ns is not marked NULL */ mlx5_cleanup_fs cleanup_ingress_acls_root_ns steering->esw_ingress_root_ns non NULL check passes. kfree(steering->esw_ingress_root_ns); /* double free */ Similar issue exist for other tables. Hence zero out the pointers to not process the table again. Fixes: 9b93ab981e3bf ("net/mlx5: Separate ingress/egress namespaces for each vport") Fixes: 40c3eebb49e51 ("net/mlx5: Add support in RDMA RX steering") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-01-12net/mlx5: Typo fix in del_sw_hw_ruleYuval Avnery
commit f0337889147c956721696553ffcc97212b0948fe upstream. Expression terminated with "," instead of ";", resulted in set_fte getting bad value for modify_enable_mask field. Fixes: bd5251dbf156 ("net/mlx5_core: Introduce flow steering destination of type counter") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-09-26net/mlx5: Fix possible deadlock from lockdep when adding fte to fgRoi Dayan
[ Upstream commit ad9421e36a77056a4f095d49b9605e80b4d216ed ] This is a false positive report due to incorrect nested lock annotations as we lock multiple fgs with the same subclass. Instead of locking all fgs only lock the one being used as was done before. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26net/mlx5: Fix not releasing read lock when adding flow rulesRoi Dayan
[ Upstream commit 071304772fc747d5df13c51f1cf48a4b922a5e0d ] If building match list fg fails and we never jumped to search_again_locked label then the function returned without unlocking the read lock. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-18net/mlx5: Fix 'DON'T_TRAP' functionalityRaed Salem
The flow counters binding support commit introduced a code change where none NULL 'rule_dest' is always passed to mlx5_add_flow_rules, this breaks 'DON'T_TRAP' rules insertion. The fix uses the equivalent 'dest_num' value instead of dest pointer at the failed check. fixes: 3b3233fbf02e ('IB/mlx5: Add flow counters binding support') Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26net/mlx5: E-Switch, Avoid setup attempt if not being e-switch managerOr Gerlitz
In smartnic env, the host (PF) driver might not be an e-switch manager, hence the FW will err on driver attempts to deal with setting/unsetting the eswitch and as a result the overall setup of sriov will fail. Fix that by avoiding the operation if e-switch management is not allowed for this driver instance. While here, move to use the correct name for the esw manager capability name. Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Guy Kushnir <guyk@mellanox.com> Reviewed-by: Eli Cohen <eli@melloanox.com> Tested-by: Eli Cohen <eli@melloanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a quiet cycle for RDMA, the big bulk is the usual smallish driver updates and bug fixes. About four new uAPI related things. Not as much Szykaller patches this time, the bugs it finds are getting harder to fix. Summary: - More work cleaning up the RDMA CM code - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe - Driver specific resource tracking and reporting via netlink - Continued work for name space support from Parav - MPLS support for the verbs flow steering uAPI - A few tricky IPoIB fixes improving robustness - HFI1 driver support for the '16B' management packet format - Some auditing to not print kernel pointers via %llx or similar - Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it entirely. The user space side of this was long ago replaced with RDMA-CM and syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because nobody uses it. - Purge more bogus BUG_ON's from Leon - 'flow counters' verbs uAPI - T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA tree due to dependencies" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits) RDMA/mlx5: Update SPDX tags to show proper license RDMA/restrack: Change SPDX tag to properly reflect license IB/hfi1: Fix comment on default hdr entry size IB/hfi1: Rename exp_lock to exp_mutex IB/hfi1: Add bypass register defines and replace blind constants IB/hfi1: Remove unused variable IB/hfi1: Ensure VL index is within bounds IB/hfi1: Fix user context tail allocation for DMA_RTAIL IB/hns: Use zeroing memory allocator instead of allocator/memset infiniband: fix a possible use-after-free bug iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency IB/isert: use T10-PI check mask definitions from core layer IB/iser: use T10-PI check mask definitions from core layer RDMA/core: introduce check masks for T10-PI offload IB/isert: fix T10-pi check mask setting IB/mlx5: Add counters read support IB/mlx5: Add flow counters read support IB/mlx5: Add flow counters binding support IB/mlx5: Add counters create and destroy support IB/uverbs: Add support for flow counters ...
2018-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Add Maglev hashing scheduler to IPVS, from Inju Song. 2) Lots of new TC subsystem tests from Roman Mashak. 3) Add TCP zero copy receive and fix delayed acks and autotuning with SO_RCVLOWAT, from Eric Dumazet. 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard Brouer. 5) Add ttl inherit support to vxlan, from Hangbin Liu. 6) Properly separate ipv6 routes into their logically independant components. fib6_info for the routing table, and fib6_nh for sets of nexthops, which thus can be shared. From David Ahern. 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP messages from XDP programs. From Nikita V. Shirokov. 8) Lots of long overdue cleanups to the r8169 driver, from Heiner Kallweit. 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau. 10) Add traffic condition monitoring to iwlwifi, from Luca Coelho. 11) Plumb extack down into fib_rules, from Roopa Prabhu. 12) Add Flower classifier offload support to igb, from Vinicius Costa Gomes. 13) Add UDP GSO support, from Willem de Bruijn. 14) Add documentation for eBPF helpers, from Quentin Monnet. 15) Add TLS tx offload to mlx5, from Ilya Lesokhin. 16) Allow applications to be given the number of bytes available to read on a socket via a control message returned from recvmsg(), from Soheil Hassas Yeganeh. 17) Add x86_32 eBPF JIT compiler, from Wang YanQing. 18) Add AF_XDP sockets, with zerocopy support infrastructure as well. From Björn Töpel. 19) Remove indirect load support from all of the BPF JITs and handle these operations in the verifier by translating them into native BPF instead. From Daniel Borkmann. 20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha. 21) Allow XDP programs to do lookups in the main kernel routing tables for forwarding. From David Ahern. 22) Allow drivers to store hardware state into an ELF section of kernel dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy. 23) Various RACK and loss detection improvements in TCP, from Yuchung Cheng. 24) Add TCP SACK compression, from Eric Dumazet. 25) Add User Mode Helper support and basic bpfilter infrastructure, from Alexei Starovoitov. 26) Support ports and protocol values in RTM_GETROUTE, from Roopa Prabhu. 27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard Brouer. 28) Add lots of forwarding selftests, from Petr Machata. 29) Add generic network device failover driver, from Sridhar Samudrala. * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits) strparser: Add __strp_unpause and use it in ktls. rxrpc: Fix terminal retransmission connection ID to include the channel net: hns3: Optimize PF CMDQ interrupt switching process net: hns3: Fix for VF mailbox receiving unknown message net: hns3: Fix for VF mailbox cannot receiving PF response bnx2x: use the right constant Revert "net: sched: cls: Fix offloading when ingress dev is vxlan" net: dsa: b53: Fix for brcm tag issue in Cygnus SoC enic: fix UDP rss bits netdev-FAQ: clarify DaveM's position for stable backports rtnetlink: validate attributes in do_setlink() mlxsw: Add extack messages for port_{un, }split failures netdevsim: Add extack error message for devlink reload devlink: Add extack to reload and port_{un, }split operations net: metrics: add proper netlink validation ipmr: fix error path when ipmr_new_table fails ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds net: hns3: remove unused hclgevf_cfg_func_mta_filter netfilter: provide udp*_lib_lookup for nf_tproxy qed*: Utilize FW 8.37.2.0 ...
2018-06-06treewide: Use struct_size() for kmalloc()-familyKees Cook
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script: // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len * // sizeof *pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@ - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-05-25net/mlx5: E-switch, Create a second level FDB flow tableChris Mi
If firmware supports the forward action with a destination list that includes a flow table, create a second level FDB flow table. This is going to be used for flow based mirroring under the switchdev offloads mode. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24Merge tag 'mlx5-updates-2018-05-17' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next mlx5-updates-2018-05-17 mlx5 core dirver updates for both net-next and rdma-next branches. From Christophe JAILLET, first three patche to use kvfree where needed. From: Or Gerlitz <ogerlitz@mellanox.com> Next six patches from Roi and Co adds support for merged sriov e-switch which comes to serve cases where both PFs, VFs set on them and both uplinks are to be used in single v-switch SW model. When merged e-switch is supported, the per-port e-switch is logically merged into one e-switch that spans both physical ports and all the VFs. This model allows to offload TC eswitch rules between VFs belonging to different PFs (and hence have different eswitch affinity), it also sets the some of the foundations needed for uplink LAG support. * tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5e: Explicitly set source e-switch in offloaded TC rules net/mlx5: Add source e-switch owner net/mlx5e: Explicitly set destination e-switch in FDB rules net/mlx5: Add destination e-switch owner net/mlx5: Properly handle a vport destination when setting FTE net/mlx5: Add merged e-switch cap IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()' net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()' net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-18Merge tag 'mlx5-updates-2018-05-17' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-05-17 mlx5 core dirver updates for both net-next and rdma-next branches. From Christophe JAILLET, first three patche to use kvfree where needed. From: Or Gerlitz <ogerlitz@mellanox.com> Next six patches from Roi and Co adds support for merged sriov e-switch which comes to serve cases where both PFs, VFs set on them and both uplinks are to be used in single v-switch SW model. When merged e-switch is supported, the per-port e-switch is logically merged into one e-switch that spans both physical ports and all the VFs. This model allows to offload TC eswitch rules between VFs belonging to different PFs (and hence have different eswitch affinity), it also sets the some of the foundations needed for uplink LAG support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17net/mlx5: Add source e-switch ownerShahar Klein
The source e-switch owner allows a vport on one e-switch port be associated with a rule defined on the second port e-switch. The role of the source eswitch owner valid bit in the flow group is to allow the firmware fail driver attempts to wild card the source eswitch match field. If this bit is not set, the firmware ignores the source eswitch owner field totally. Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17net/mlx5: Add destination e-switch ownerShahar Klein
The destination e-switch owner allows a rule in namespace of one e-switch owner to point to a vport that is natively associated with another e-switch owner. Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16IB/mlx5: Add support for MPLS flow specificationAriel Levkovich
This patch introduces support for the MPLS flow spec and allows the creation of rules that are matching on the MPLS label. Applying the rule matching depends on the flow specs order and the location of the MPLS in the spec list as there are different configurations to be made in the device in the cases of MPLSoGRE and MPLSoUDP vs. non-encapsulated MPLS. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-26net/mlx5: Properly deal with flow counters when deleting rulesChris Mi
When deleting a flow counter, the modify mask should be the action and the flow counter. Otherwise the flow counter is not deleted and we'll get a firmware warning when deleting the remaining destinations on the same FTE. It only happens in the presence of flow counter and multiple vport destinations. If there is only one vport destination, there is no need to update the FTE when deleting the only vport destination, we just delete the FTE. Fixes: ae05831424ed ("net/mlx5: Add option to add fwd rule with counter") Signed-off-by: Chris Mi <chrism@mellanox.com> Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-04-26net/mlx5: Avoid cleaning flow steering table twice during error flowTalat Batheesh
When we fail to initialize the RX root namespace, we need to clean only that and not the entire flow steering. Currently the code may try to clean the flow steering twice on error witch leads to null pointer deference. Make sure we clean correctly. Fixes: fba53f7b5719 ("net/mlx5: Introduce mlx5_flow_steering structure") Signed-off-by: Talat Batheesh <talatb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-26net/mlx5: Add core support for vlan push/pop steering actionOr Gerlitz
Newer NICs (ConnectX-5 and onward) can apply vlan pop or push as an action taking place during flow steering. Add the core bits for that. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-20mlx5: Remove call to ida_pre_getMatthew Wilcox
The mlx5 driver calls ida_pre_get() in a loop for no readily apparent reason. The driver uses ida_simple_get() which will call ida_pre_get() by itself and there's no need to use ida_pre_get() unless using ida_get_new(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08Merge tag 'mlx5-updates-2018-02-28-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-2 (IPSec-2) This series follows our previous one to lay out the foundations for IPSec in user-space and extend current kernel netdev IPSec support. As noted in our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)", the IPSec mechanism will be supported through our flow steering mechanism. Therefore, we need to change the initialization order. Furthermore, IPsec is also supported in both egress and ingress. Since our current flow steering is egress only, we add an empty (only implemented through FPGA steering ops) egress namespace to handle that case. We also implement the required flow steering callbacks and logic in our FPGA driver. We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we add support for some new FPGA command interface that supports them. The other required bits are added too. The new features and requirements are advertised via cap bits. Last but not least, we revise our driver's accel_esp API. This API will be shared between our netdev and IB driver, so we need to have all the required functionality from both worlds. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07net/mlx5: Add flow-steering commands for FPGA IPSec implementationAviad Yehezkel
In order to add a context to the FPGA, we need to get both the software transform context (which includes the keys, etc) and the source/destination IPs (which are included in the steering rule). Therefore, we register new set of firmware like commands for the FPGA. Each time a rule is added, the steering core infrastructure calls the FPGA command layer. If the rule is intended for the FPGA, it combines the IPs information with the software transformation context and creates the respective hardware transform. Afterwards, it calls the standard steering command layer. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec capsSaeed Mahameed
Fix build break of mlx5_accel_ipsec_device_caps is not defined when MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles such case. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Doug Ledford <dledford@redhat.com>
2018-03-07Merge tag 'mlx5-updates-2018-02-28-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-1 (IPSec-1) This series consists of some fixes and refactors for the mlx5 drivers, especially around the FPGA and flow steering. Most of them are trivial fixes and are the foundation of allowing IPSec acceleration from user-space. We use flow steering abstraction in order to accelerate IPSec packets. When a user creates a steering rule, [s]he states that we'll carry an encrypt/decrypt flow action (using a specific configuration) for every packet which conforms to a certain match. Since currently offloading these packets is done via FPGA, we'll add another set of flow steering ops. These ops will execute the required FPGA commands and then call the standard steering ops. In order to achieve this, we need that the commands will get all the required information. Therefore, we pass the fte object and embed the flow_action struct inside the fte. In addition, we add the shim layer that will later be used for alternating between the standard and the FPGA steering commands. Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout" are very relevant for user-space applications, as these applications could be killed, but we still want to wait for the FPGA and update the kernel's database. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06net/mlx5: Flow steering cmd interface should get the fte when deletingAviad Yehezkel
Previously, deleting a flow steering entry only got the index. Since the FPGA implementation of FTE's deletion might need to dig inside the FTE itself, we would like to get the FTE's context. Changing the interface to pass the FTE context. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5: Embed mlx5_flow_act into fs_fteMatan Barak
fte objects contain the match value and action. Currently, extending the actions require in adding them both to the API and fs_fte. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5: Add empty egress namespace to flow steering coreAviad Yehezkel
Currently, we don't support egress flow steering namespace in mlx5 flow steering core implementation. However, when we want to encrypt a packet, we model it as a flow steering rule in the egress path. To overcome this, we add an empty egress namespace to flow steering. This namespace is initialized only when ipsec support exists. In the future, this will grow to a full blown full steering implementation, resembling the ingress path. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06net/mlx5: Add shim layer between fs and cmdMatan Barak
The shim layer allows each namespace to define possibly different functionality for add/delete/update commands. The shim layer introduced here, will be used to support flow steering with the FPGA. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06{net,IB}/mlx5: Add has_tag to mlx5_flow_actMatan Barak
The has_tag member will indicate whether a tag action was specified in flow specification. A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0 means that the user doesn't care about flow_tag. HW always provide a flow_tag = 0 if all flow tags requested on a specific flow are 0. So we need a way (in the driver) to differentiate between a user really requesting flow_tag = 0 and a user who does not care, in order to be able to report conflicting flow tags on a specific flow. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: Fix error handling when adding flow rulesVlad Buslov
If building match list or adding existing fg fails when node is locked, function returned without unlocking it. This happened if node version changed or adding existing fg returned with EAGAIN after jumping to search_again_locked label. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: Add header re-write to the checks for conflicting actionsOr Gerlitz
We can't allow only some of the rules sharing an FTE to ask for header re-write, add it to the conflicting action checks. Fixes: 0d235c3fabb7 ('net/mlx5: Add hash table to search FTEs in a flow-group') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19net/mlx5e: Enlarge the NIC TC offload steering prio to support two levelsOr Gerlitz
This will allow to be able and set TC rule whose steering dest is RSS TTC steering table. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28Merge tag 'mlx5-shared-4.16-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== Mellanox, mlx5 E-Switch updates 2017-12-19 This series includes updates for mlx5 E-Switch infrastructures, to be merged into net-next and rdma-next trees. Mark's patches provide E-Switch refactoring that generalize the mlx5 E-Switch vf representors interfaces and data structures. The serious is mainly focused on moving ethernet (netdev) specific representors logic out of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which provides better separation and allows future support for other types of vf representors (e.g. RDMA). Gal's patches at the end of this serious, provide a simple syntax fix and two other patches that handles vport ingress/egress ACL steering name spaces to be aligned with the Firmware/Hardware specs. V1->V2: - Addressed coding style comments in patches #1 and #7 - The series is still based on rc4, as now I see net-next is also @rc4. V2->V3: - Fixed compilation warning, reported by Dave. Please pull and let me know if there's any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29net/mlx5: Separate ingress/egress namespaces for each vportGal Pressman
Each vport has its own root flow table for the ACL flow tables and root flow table is per namespace, therefore we should create a namespace for each vport. Fixes: efdc810ba39d ("net/mlx5: Flow steering, Add vport ACL support") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29net/mlx5: Fix ingress/egress naming mistakeGal Pressman
The functions names do not represent their actions, switch the mistaken ingress/egress naming. Fixes: fba53f7b5719 ("net/mlx5: Introduce mlx5_flow_steering structure") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Fix steering memory leakMaor Gottlieb
Flow steering priority and namespace are software only objects that didn't have the proper destructors and were not freed during steering cleanup. Fix it by adding destructor functions for these objects. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04net/mlx5: Initialize destination_flow struct to 0Rabie Loulou
This is needed in order to enlarge it with more members that will get value of 0 when not set. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-22drivers, net, mlx5: convert fs_node.refcount from atomic_t to refcount_tElena Reshetova
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable fs_node.refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16Merge tag 'mlx5-updates-2017-10-11' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2017-10-11: IPoIB Multi Pkey support This series provides the support for IPoIB Multi Pkey. InfiniBand Pkeys are the equivalent of Ethernet vlans. Currently IPoIB device driver supports only default Pkey and IPoIB Pkey child interfaces are not supported with IPoIB offloads mode, this series will add the support for that by allowing creating mlx5 multiple IPoIB netdevices with a non-default Pkey. mlx5 IPoIB Pkey child interface is smaller version of mlx5i IPoIB interfaces and shares most of its resources with the parent IPoIB interface, namely RX steering and ring queue resources. The only mlx5 resources a child Pkey interface will be creating are the TX rings, since they should be assigned to a specific Pkey. mlx5i Pkey netdev is implemented via new mlx5e netdev profile implemented in mlx5/core/ipoib/ipoib_vlan.c. The series starts with a refactoring of mlx5e PTP and mlx5 clock implementation to move the code to be part of mlx5 core rather than mlx5e netdevice, in order to make mlx5 clock and PTP registration part of the core to be shared with mlx5e master Ethernet netdev/IPoIB parent netdev and mlx5_ib in the near future. Add the support for attaching multiple underlay QPs for the different Pkeys in mlx5 core RX steering. Add Pkey index to rdma_netdev to add the ability to set PKEY index to lower IPoIB offload netdev. Use hash-table to map between DQPN (Destination QP number) to child netdev for the IPoIB parent netdev to forward RX packets to the corresponding child Pkey netdev, since the RX rings are shared. The reset of the series adds the ipoib child Pkey: mlx5e netdev profile, netdev nods implementation and minimal set of ethtool callbacks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14net/mlx5: Support for attaching multiple underlay QPs to root flow tableAlex Vesker
Previous support allowed connecting only a single QPN to the FT. Now using a linked list multiple QPNs can be attached to the same FT. Supporting attaching multiple underlay QPs is required for PKEY support in which child and parent share the same FT. The actual attaching/detaching FW commands will be called inside the function symmetrically. This change requires a change in IPoIB open and close functions, the attaching/detaching to/from the FT is done each time we open/close. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-09-26net/mlx5: Add FGs and FTEs memory poolMaor Gottlieb
Add memory pool allocation for flow groups and flow table entry. It is useful because these objects are not small and could be allocated/deallocated many times. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Allocate FTE object without lockMaor Gottlieb
Allocation of new FTE is a massive operation, part of it could be done without taking the flow group write lock. Split the FTE allocation to two functions of actions which need to be under lock and action which don't have. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Support multiple updates of steering rules in parallelMaor Gottlieb
Most of the time spent on adding new flow steering rule is executing the firmware command. The most common action is adding a new flow steering entry. In order to enhance the update rate we parallelize the commands by doing the following: 1) Replace the mutex lock with readers-writers semaphore and take the write lock only when necessary (e.g. allocating a new flow table entry index or adding a node to the parent's children list). When we try to find a suitable child in the parent's children list (e.g. search for flow group with the same match_criteria of the rule) then we only take the read lock. 2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST) will have an incremental version. The version is increased when the entity is changed (e.g. when a new FTE was added to FG - the FG's version is increased). Versioning is used in order to determine if the last traverse of an entity's children is valid or a rescan under write lock is required. This support improves the insertion rate of steering rules from ~5k/sec to ~40k/sec. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Replace fs_node mutex with reader/writer semaphoreMaor Gottlieb
Currently, steering object is protected by mutex lock, replace the mutex lock with reader/writer semaphore . In this patch we still use only write semaphore. In downstream patches we will switch part of the write locks to read locks. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Refactor FTE and FG creation codeMaor Gottlieb
Split the creation code to two parts: 1) Object allocation - allocate the steering node and initialize its resources. 2) The firmware command execution. Adding active flag to each node - this flag indicates if the object exists in the hardware or not, if not we don't free the hardware resource in error flow. This change will give us the ability to take write lock on the parent node (e.g. FG for FTE creationg) only on the first part. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Export building of matched flow groups listMaor Gottlieb
Refactor the code and export the build of the matched flow groups list to separate function. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Move the entry index allocator to flow groupMaor Gottlieb
When new flow table entry is added, we search for free index in the flow group and not in the flow table, therefore we can move the allocator from flow table to flow group. In downstream patches it will enable us to lock smaller part of the steering tree. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Avoid NULL pointer dereference on steering cleanupMaor Gottlieb
On cleanup, when the node is the last child of parent then it calls to tree_put_node on the parent, if the parent's reference count is decremented to 0 (for e.g. when deleting last destination of FTE) then we free the parent as well and vice versa. In such a case we will try to free the parent node again. Increment the parent reference count before cleaning it's children will prevent implicit release of the parent object. Fixes: 0da2d66666d3 ('net/mlx5: Properly remove all steering objects') signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26net/mlx5: Fix creating a new FTE when an existing but full FTE existsMatan Barak
Currently, when a flow steering rule is added, we look for a FTE with an identical value. If we find a match, we try to merge the required destinations with the existing ones. In a case where the existing destination list is full, the code should return an error to its consumer. However, the current code just tries to create another FTE. Fixing that by returning an error in this special scenario. Fixes: f478be79a22e ("net/mlx5: Add hash table for flow groups in flow table") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-31net/mlx5e: Support RSS for GRE tunneled packetsGal Pressman
Introduce a new flow table and indirect TIRs which are used to hash the inner packet headers of GRE tunneled packets. When a GRE tunneled packet is received, the TTC flow table will match the new IPv4/6->GRE rules which will forward it to the inner TTC table. The inner TTC is similar to its counterpart outer TTC table, but matching the inner packet headers instead of the outer ones (and does not include the new IPv4/6->GRE rules). The new rules will not add steering hops since they are added to an already existing flow group which will be matched regardless of this patch. Non GRE traffic will not be affected. The inner flow table will forward the packet to inner indirect TIRs which hash the inner packet and thus result in RSS for the tunneled packets. Testing 8 TCP streams bandwidth over GRE: System: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Before: 21.3 Gbps (Single RQ) Now : 90.5 Gbps (RSS spread on 8 RQs) Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>