aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core
AgeCommit message (Collapse)Author
2016-08-22IB/core: Fix possible memory leak in cma_resolve_iboe_route()Wei Yongjun
'work' and 'route->path_rec' are malloced in cma_resolve_iboe_route() and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Fixes: 200298326b27 ('IB/core: Validate route when we init ah') Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford
2016-08-04dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski
The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03IB/core: Support for CMA multicast join flagsAlex Vesker
Added UCMA and CMA support for multicast join flags. Flags are passed using UCMA CM join command previously reserved fields. Currently supporting two join flags indicating two different multicast JoinStates: 1. Full Member: The initiator creates the Multicast group(MCG) if it wasn't previously created, can send Multicast messages to the group and receive messages from the MCG. 2. Send Only Full Member: The initiator creates the Multicast group(MCG) if it wasn't previously created, can send Multicast messages to the group but doesn't receive any messages from the MCG. IB: Send Only Full Member requires a query of ClassPortInfo to determine if SM/SA supports this option. If SM/SA doesn't support Send-Only there will be no join request sent and an error will be returned. ETH: When Send Only Full Member is requested no IGMP join will be sent. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03IB/sa: Add cached attribute containing SM information to SA portAlex Vesker
Added a new SA port attribute containing SM ClassPortInfo fields, (ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for checking SM support for specific features. The attribute is cached to avoid resending queries, caching is done when a successful ClassPortInfo reply is received on the port. Invalidation of the attribute is done on SM change events, SM re-registration events, and SM LID change events. The fields in ClassPortInfo should not change during SM runtime without an event. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03IB/uverbs: Fix race between uverbs_close and remove_oneJason Gunthorpe
Fixes an oops that might happen if uverbs_close races with remove_one. Both contexts may run ib_uverbs_cleanup_ucontext, it depends on the flow. Currently, there is no protection for a case that remove_one didn't make the cleanup it runs to its end, the underlying ib_device was freed then uverbs_close will call ib_uverbs_cleanup_ucontext and OOPs. Above might happen if uverbs_close deleted the file from the list then remove_one didn't find it and runs to its end. Fixes to protect against that case by a new cleanup lock so that ib_uverbs_cleanup_ucontext will be called always before that remove_one is ended. Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one") Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03Use smaller 512 byte messages for portmapper messagesMustafa Ismail
Portmapper messages are short and do not occupy more than 512 bytes. Lower portmapper message size to 512 bytes. This change significantly reduces the amount of memory needed when trying to establish a large number of connections simultaneously. The old value is based on page size. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8Doug Ledford
2016-08-02iw_cm: free cm_id resources on the last derefSteve Wise
Remove the complicated logic to free the iw_cm_id inside iw_cm event handlers vs when an application thread destroys the cm_id. Also remove the block in iw_destroy_cm_id() to block the application until all references are removed. This block can cause a deadlock when disconnecting or destroying cm_ids inside an rdma_cm event handler. Simply allowing the last deref of the iw_cm_id to free the memory is cleaner and avoids this potential deadlock. Also a flag is added, IW_CM_DROP_EVENTS, that is set when the cm_id is marked for destruction. If any events are pending on this iw_cm_id, then as they are processed they will be dropped vs posted upstream if IW_CM_DROP_EVENTS is set. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/core: Add flow control to the portmapper netlink callsMustafa Ismail
During connection establishment with a large number of connections, it is possible that the connection requests might fail. Adding flow control prevents this failure. Change ibnl_unicast to use blocking to enable flow control. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/core, RDMA RW API: Do not exceed QP SGE send limitBart Van Assche
Compute the SGE limit for RDMA READ and WRITE requests in ib_create_qp(). Use that limit in the RDMA RW API implementation. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/core: Make rdma_rw_ctx_init() initialize all used fieldsBart Van Assche
Some but not all callers of rdma_rw_ctx_init() zero-initialize struct rdma_rw_ctx. Hence make rdma_rw_ctx_init() initialize all work request fields that will be read by ib_post_send(). Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: <stable@vger.kernel.org> #v4.7+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-07-12IB core: Add port_xmit_wait counterChristoph Lameter
Add the missing port_xmit_wait counter. This counter is displayed through some tools like perfquery but is not available via sysfs. For the PORT_PMA_ATTR macro the _counter field is set to zero allowing us to specify the offset directly like with PORT_PMA_ATTR_EXT See also the earlier work in 2008 by Vladimir Skolovsky https://www.mail-archive.com/general@lists.openfabrics.org/msg20313.html Signed-off-by: Vladimir Sokolvsky <vlad@mellanox.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23Merge branches 'cxgb4-4.8', 'mlx5-4.8' and 'fw-version' into k.o/for-4.8Doug Ledford
2016-06-23Merge branches '4.7-rc-misc', 'hfi1-fixes', 'i40iw-rc-fixes' and ↵Doug Ledford
'mellanox-rc-fixes' into k.o/for-4.7-rc
2016-06-23IB/core: Export a common fw_ver sysfs entryIra Weiny
Now that all the devices have stopped exporting their own sysfs entry points we can have the core export this on their behalf. Eventually this may be removed but this provides for backwards compatibility. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Add get FW version string to the coreIra Weiny
Allow for a common core function to get firmware version strings from the individual devices. In later patches this format can then then be used to pass a properly formated version string through the IPoIB layer. The problem with the current code in the IPoIB layer is that it is specific to certain hardware types. Furthermore, this gives us a common function through which the core can provide a common sysfs entry. Eventually we may want to remove the sysfs export but this provides for user space backwards compatibility. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Add IPv6 support to flow steeringMaor Gottlieb
Add IPv6 flow specification support. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Extend create QP to get RWQ indirection tableYishai Hadas
User applications that want to spread incoming traffic between several WQs should create a QP which contains an indirection table. When such a QP is created other receive side parameters are not valid and should not be given. Its send side is optional and assumed active based on max_send_wr capability value. Extend create QP to work accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Extend create QP to get indirection tableYishai Hadas
Extend create QP to get Receive Work Queue (WQ) indirection table. QP can be created with external Receive Work Queue indirection table, in that case it is ready to receive immediately. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Introduce RWQ Indirection tableYishai Hadas
User applications that want to spread traffic on several WQs, need to create an indirection table, by using already created WQs. Adding uverbs API in order to create and destroy this table. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Introduce Receive Work Queue indirection tableYishai Hadas
Introduce Receive Work Queue (WQ) indirection table. This object can be used to spread incoming traffic to different receive Work Queues. A Receive WQ indirection table points to variable size of WQs. This table is given to a QP in downstream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimerg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Add WQ supportYishai Hadas
User space applications which use RSS functionality need to create a work queue object (WQ). The lifetime of such an object is: * Create a WQ * Modify the WQ from reset to init state. * Use the WQ (by downstream patches). * Destroy the WQ. These commands are added to the uverbs API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Introduce Work Queue object and its verbsYishai Hadas
Introduce Work Queue object and its create/destroy/modify verbs. QP can be created without internal WQs "packaged" inside it, this QP can be configured to use "external" WQ object as its receive/send queue. WQ is a necessary component for RSS technology since RSS mechanism is supposed to distribute the traffic between multiple Receive Work Queues. WQ associated (many to one) with Completion Queue and it owns WQ properties (PD, WQ size, etc.). WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue), it may be extend to others such as IB_WQT_SQ. (send queue). WQ from type IB_WQT_RQ contains receive work requests. PD is an attribute of a work queue (i.e. send/receive queue), it's used by the hardware for security validation before scattering to a memory region which is pointed by the WQ. For that, an external WQ object needs a PD, letting the hardware makes that validation. When accessing a memory region that is pointed by the WQ its PD is used and not the QP's PD, this behavior is similar to a SRQ and a QP. WQ context is subject to a well-defined state transitions done by the modify_wq verb. When WQ is created its initial state becomes IB_WQS_RESET. >From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY. >From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET or to IB_WQS_ERR. >From IB_WQS_ERR it can be modified to IB_WQS_RESET. Note: transition to IB_WQS_ERR might occur implicitly in case there was some HW error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Initialize ib_qp_init_attr with zerosMaor Gottlieb
Initialize ib_qp_init_attr with zeros in order to avoid from garbage in fields that won't be set with user values. Fixes: a060b5629ab06 ('IB/core: generic RDMA READ/WRITE API') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUIDEli Cohen
When virtualziation is supported, VFs may send SA MADs to a GID formed by the concatenation of the subnet prefix with the IB_SA_WELL_KNOWN_GUID. When a response is required, the current code will search the local HCA's port for the received GID to figure out the GID index of the entry containing this GID. However, since this is not a real GID it will not be found and error will be printed. We change the logic to check if the destination GID is this special GID and avoid lookup in this case and use GID index 0. Fixes: a0c1b2a35087 ('IB/core: Support accessing SA in virtualized environment') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix RoCE v1 multicast join logic issueAlex Vesker
During multicast join of RoCEv1, IGMP join state and max hop limit were updated incorrectly. IGMP join should be sent and marked as joined only on RoCEv2 after a successful join. Max hops should be updated to the hop limit on RoCEv2 regardless of the join state. Fixes: bee3c3c91865 ('IB/cma: Join and leave multicast groups...') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix no default GIDs when netdevice reregistersTalat Batheesh
Currently, when the netdevice returned by get_netdev is unregistered, we delete all GIDs (including the default GIDs) and reset their attributes. Therefore, when we re-register it, no default GIDs will be assigned (as their "default GID") attribute will be reset. Fixing this by keeping "default GID" attribute. Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/cma: Make the code easier to verifyBart Van Assche
Static source code analysis tools like smatch cannot handle functions that lock or not lock a mutex depending on the value of the arguments. Hence inline the function cma_disable_callback(). Additionally, this patch realizes a small performance optimization by reducing the number of mutex_lock() and mutex_unlock() calls in the modified functions. With this patch applied smatch no longer complains about source file cma.c. Without this patch smatch reports the following for this source file: drivers/infiniband/core/cma.c:1959: cma_req_handler() warn: inconsistent returns 'mutex:&listen_id->handler_mutex'. Locked on: line 1880 line 1959 Unlocked on: line 1941 drivers/infiniband/core/cma.c:2112: iw_conn_req_handler() warn: inconsistent returns 'mutex:&listen_id->handler_mutex'. Locked on: line 2048 Unlocked on: line 2112 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Leon Romanovsky <leonro@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Initialize sysfs attributes before sysfs create groupMark Bloch
For dynamically allocated sysfs attributes there is a need to call sysfs_attr_init in order to comply with lockdep, not calling it will result in error complaining key is not in .data section. Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Fix removal of default GID cache entryAviv Heller
When deleting a default GID from the cache, its gid_type field is set to 0. This could set the gid_type to RoCE v1 for a RoCE v2 default GID, essentially making it inaccessible to future modifications, since it is no longer found by find_gid(). This fix preserves the gid_type value for default gids during cache operations. Fixes: b39ffa1df505 ('IB/core: Add gid_type to gid attribute') Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Fix query port failure in RoCEEli Cohen
Currently ib_query_port always attempts to to read the subnet prefix by calling ib_query_gid(). For RoCE/iWARP there is no subnet manager and no subnet prefix. Fix this by querying GID[0] only for IB networks. Fixes: fad61ad4e755 ('IB/core: Add subnet prefix to port info') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: fix error unwind in sysfs hw counters codeDoug Ledford
Between the initial and final versions of the function setup_hw_stats, the order of variable initialization was changed. However, the unwind flow on error did not properly keep up with the flow changes. Make the unwind flow match a proper unwind of the allocation flow, then remove no longer needed variable initializations. Fixes: b40f4757daa1 (IB/core: Make device counter infrastructure dynamic) Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/core: Fix array length allocationDoug Ledford
The new sysfs hw_counters code had an off by one in its array allocation length. Fix that and the comment along with it. Reported-by: Mark Bloch <markb@mellanox.com> Fixes: b40f4757daa1 (IB/core: Make device counter infrastructure dynamic) Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/mad: Fix indentationBart Van Assche
Make indentation consistent. Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hal Rosenstock <hal@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-By: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06RDMA/core: Fix indentationBart Van Assche
Make indentation consistent. Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Cc: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@gimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/core: fix null pointer deref and mem leak in error handlingColin Ian King
The current error handling in setup_hw_stats has a couple of issues. It is possible to generate a null pointer deference on the kfree of hsag->attrs[i] because two of the early error exit paths jump to the kfree when hsags NULL and not allocated. Fix this by moving the kfree on stats and jumping to that, avoiding the hsag freeing. Secondly, there is a memory leak of stats if the hsag allocation fails; instead of returning, jump to the kfree on stats. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/core: fix an error code in ib_core_init()Dan Carpenter
We should return the error code if ib_add_ibnl_clients() fails. The current code returns success. Fixes: 735c631ae99d ('IB/core: Register SA ibnl client during ib_core initialization') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/cm: Fix a recently introduced locking bugBart Van Assche
ib_cm_notify() can be called from interrupt context. Hence do not reenable interrupts unconditionally in cm_establish(). This patch avoids that lockdep reports the following warning: WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0 DEBUG_LOCKS_WARN_ON(current->hardirq_context) Call Trace: <IRQ> [<ffffffff812bd0e5>] dump_stack+0x67/0x92 [<ffffffff81056f21>] __warn+0xc1/0xe0 [<ffffffff81056f8a>] warn_slowpath_fmt+0x4a/0x50 [<ffffffff810a5932>] trace_hardirqs_on_caller+0x112/0x1b0 [<ffffffff810a59dd>] trace_hardirqs_on+0xd/0x10 [<ffffffff815992c7>] _raw_spin_unlock_irq+0x27/0x40 [<ffffffffa0382e9c>] ib_cm_notify+0x25c/0x290 [ib_cm] [<ffffffffa068fbc1>] srpt_qp_event+0xa1/0xf0 [ib_srpt] [<ffffffffa04efb97>] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib] [<ffffffffa034ec0a>] mlx4_qp_event+0x5a/0xc0 [mlx4_core] [<ffffffffa03365f8>] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core] [<ffffffffa0336f9c>] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core] [<ffffffff810b0914>] handle_irq_event_percpu+0x64/0x100 [<ffffffff810b09e4>] handle_irq_event+0x34/0x60 [<ffffffff810b3a6a>] handle_edge_irq+0x6a/0x150 [<ffffffff8101ad05>] handle_irq+0x15/0x20 [<ffffffff8101a66c>] do_IRQ+0x5c/0x110 [<ffffffff8159a2c9>] common_interrupt+0x89/0x89 [<ffffffff81297a17>] blk_run_queue_async+0x37/0x40 [<ffffffffa0163e53>] rq_completed+0x43/0x70 [dm_mod] [<ffffffffa0164896>] dm_softirq_done+0x176/0x280 [dm_mod] [<ffffffff812a26c2>] blk_done_softirq+0x52/0x90 [<ffffffff8105bc1f>] __do_softirq+0x10f/0x230 [<ffffffff8105bec8>] irq_exit+0xa8/0xb0 [<ffffffff8103653e>] smp_trace_call_function_single_interrupt+0x2e/0x30 [<ffffffff81036549>] smp_call_function_single_interrupt+0x9/0x10 [<ffffffff8159a959>] call_function_single_interrupt+0x89/0x90 <EOI> Fixes: commit be4b499323bf (IB/cm: Do not queue work to a device that's going away) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Nikolay Borisov <kernel@kyup.com> Cc: stable <stable@vger.kernel.org> # v4.2+ Acked-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/core: Make device counter infrastructure dynamicChristoph Lameter
In practice, each RDMA device has a unique set of counters that the hardware implements. Having a central set of counters that they must all adhere to is limiting and causes many useful counters to not be available. Therefore we create a dynamic counter registration infrastructure. The driver must implement a stats structure allocation routine, in which the driver must place the directory name it wants, a list of names for all of the counters, an array of u64 counters themselves, plus a few generic configuration options. We then implement a core routine to create a sysfs file for each of the named stats elements, and a core routine to retrieve the stats when any of the sysfs attribute files are read. To avoid excessive beating on the stats generation routine in the drivers, the core code also caches the stats for a short period of time so that someone attempting to read all of the stats in a given device's directory will not result in a stats generation call per file read. Future work will attempt to standardize just the shared stats elements, and possibly add a method to get the stats via netlink in addition to sysfs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> [ Add caching, make structure names more informative, add i40iw support, other significant rewrites from the original patch ]
2016-05-26Merge branches 'misc-4.7-2', 'ipoib' and 'ib-router' into k.o/for-4.7Doug Ledford
2016-05-25IB/core: Support new type of join-state for multicastErez Shitrit
There are four types for MCG, FullMember, NonMember, SendOnlyNonMember, and the new added type: SendOnlyFullMember. Add support for the new SendOnlyFullMember join state. The new type allows host to send join request as sendonly, it will cause the group to be created but without getting packets from this multicast back to the host. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-25IB/SA Agent: Add support for SA agent get ClassPortInfoErez Shitrit
New SA query function to return the ClassPortInfo struct from the SA. If the SM supports FullMemberSendOnly mode for MCG's, it sets a capability bit in the capability_mask2 field of the response. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Add IP to GID netlink offloadMark Bloch
There is an assumption that rdmacm is used only between nodes in the same IB subnet, this why ARP resolution can be used to turn IP to GID in rdmacm. When dealing with IB communication between subnets this assumption is no longer valid. ARP resolution will get us the next hop device address and not the peer node's device address. To solve this issue, we will check user space if it can provide the GID of the peer node, and fail if not. We add a sequence number to identify each request and fill in the GID upon answer from userspace. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Register SA ibnl client during ib_core initializationMark Bloch
Move SA ibnl client registration to ib_core module init. This will allow us to register a single client to handle all RDMA_NL_LS operations and make it SA independent. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/SA: Integrate ib_sa module into ib_core moduleMark Bloch
Consolidate ib_sa into ib_core, this commit eliminates ib_sa.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/MAD: Integrate ib_mad module into ib_core moduleMark Bloch
Consolidate ib_mad into ib_core, this commit eliminates ib_mad.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Integrate IB address resolution module into coreLeon Romanovsky
IB address resolution is declared as a module (ib_addr.ko) which loads itself before IB core module (ib_core.ko). It causes to the scenario where IB netlink which is initialized by IB core can't be used by ib_addr.ko. In order to solve it, we are converting ib_addr.ko to be part of IB core module. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>