aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/hfi1/pio.c
AgeCommit message (Collapse)Author
2024-03-01IB/hfi1: Fix a memleak in init_credit_returnZhipeng Lu
[ Upstream commit 809aa64ebff51eb170ee31a95f83b2d21efa32e2 ] When dma_alloc_coherent fails to allocate dd->cr_base[i].va, init_credit_return should deallocate dd->cr_base and dd->cr_base[i] that allocated before. Or those resources would be never freed and a memleak is triggered. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Link: https://lore.kernel.org/r/20240112085523.3731720-1-alexious@zju.edu.cn Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10IB/hfi1: Correctly move list in sc_disable()Dean Luick
[ Upstream commit 1afac08b39d85437187bb2a92d89a741b1078f55 ] Commit 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") incorrectly tries to move a list from one list head to another. The result is a kernel crash. The crash is triggered when a link goes down and there are waiters for a send to complete. The following signature is seen: BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] Call Trace: sc_disable+0x1ba/0x240 [hfi1] pio_freeze+0x3d/0x60 [hfi1] handle_freeze+0x27/0x1b0 [hfi1] process_one_work+0x1b0/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xd7/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 The fix is to use the correct call to move the list. Fixes: 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/166610327042.674422.6146908799669288976.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-02IB/hfi1: Fix abba locking issue with sc_disable()Mike Marciniszyn
commit 13bac861952a78664907a0f927d3e874e9a59034 upstream. sc_disable() after having disabled the send context wakes up any waiters by calling hfi1_qp_wakeup() while holding the waitlock for the sc. This is contrary to the model for all other calls to hfi1_qp_wakeup() where the waitlock is dropped and a local is used to drive calls to hfi1_qp_wakeup(). Fix by moving the sc->piowait into a local list and driving the wakeup calls from the list. Fixes: 099a884ba4c0 ("IB/hfi1: Handle wakeup of orphaned QPs for pio") Link: https://lore.kernel.org/r/20211013141852.128104.2682.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-17IB/hfi1: Handle port down properly in pioMike Marciniszyn
The call to sc_buffer_alloc currently returns NULL (no buffer) or a buffer descriptor. There is a third case when the port is down. Currently that returns NULL and this prevents the caller from properly handling the sc_buffer_alloc() failure. A verbs code link test after the call is racy so the indication needs to come from the state check inside the allocation routine to be valid. Fix by encoding the ECOMM failure like SDMA. IS_ERR_OR_NULL() tests are added at all call sites. For verbs send, this needs to treat any error by returning a completion without any MMIO copy. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17IB/hfi1: Handle wakeup of orphaned QPs for pioMike Marciniszyn
Once a send context is taken down due to a link failure, any QPs waiting for pio credits will stay on the waitlist indefinitely. Fix by wakeing up all QPs linked to piowait list. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-05-29IB/hfi1: Remove extra brackets from an ifDennis Dalessandro
A recent patch to hfi1 left behind a checkpatch error. Fixes: fb24ea52f78e ("drivers: Remove explicit invocations of mmiowb()") Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08drivers: Remove explicit invocations of mmiowb()Will Deacon
mmiowb() is now implied by spin_unlock() on architectures that require it, so there is no reason to call it from driver code. This patch was generated using coccinelle: @mmiowb@ @@ - mmiowb(); and invoked as: $ for d in drivers include/linux/qed sound; do \ spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation. If you've ended up bisecting to this commit, you can reintroduce the mmiowb() calls using wmb() instead, which should restore the old behaviour on all architectures other than some esoteric ia64 systems. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-02-05IB/hfi1: Prioritize the sending of ACK packetsKaike Wan
ACK packets are generally associated with request completion and resource release and therefore should be sent first. This patch optimizes the send engine by using the following policies: (1) QPs with RVT_S_ACK_PENDING bit set in qp->s_flags or qpriv->s_flags should have their priority incremented; (2) QPs with ACK or TID-ACK packet queued should have their priority incremented; (3) When a QP is queued to the wait list due to resource constraints, it will be queued to the head if it has ACK packet to send; (4) When selecting qps to run from the wait list, the one with the highest priority and starve_cnt will be selected; each priority will be equivalent to a fixed number of starve_cnt (16). Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-01-08cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-06IB/hfi1: Reduce lock contention on iowait_lock for sdma and pioMike Marciniszyn
Commit 4e045572e2c2 ("IB/hfi1: Add unique txwait_lock for txreq events") laid the ground work to support per resource waiting locking. This patch adds that with a lock unique to each sdma engine and pio sendcontext and makes necessary changes for verbs, PSM, and vnic to use the new locks. This is particularly beneficial for smaller messages that will exhaust resources at a faster rate. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Dump pio info for non-user send contextsKaike Wan
This patch dumps the pio info for non-user send contexts to assist debugging in the field. Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com> Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16Merge branch 'for-rc' into rdma.git for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git This is required to resolve dependencies of the next series of RDMA patches. The code motion conflicts in drivers/infiniband/core/cache.c were resolved. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-26IB/hfi1: Move UnsupportedVL bits definitions to the correct headerMichael J. Ruhl
The UnsupportedVL SendCtrl register bit information is defined in the module rather than the chip register header file. Move the defines to the appropriate header file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Fix destroy_qp hang after a link downMichael J. Ruhl
rvt_destroy_qp() cannot complete until all in process packets have been released from the underlying hardware. If a link down event occurs, an application can hang with a kernel stack similar to: cat /proc/<app PID>/stack quiesce_qp+0x178/0x250 [hfi1] rvt_reset_qp+0x23d/0x400 [rdmavt] rvt_destroy_qp+0x69/0x210 [rdmavt] ib_destroy_qp+0xba/0x1c0 [ib_core] nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma] nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma] nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma] nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma] process_one_work+0x17a/0x440 worker_thread+0x126/0x3c0 kthread+0xcf/0xe0 ret_from_fork+0x58/0x90 0xffffffffffffffff quiesce_qp() waits until all outstanding packets have been freed. This wait should be momentary. During a link down event, the cleanup handling does not ensure that all packets caught by the link down are flushed properly. This is caused by the fact that the freeze path and the link down event is handled the same. This is not correct. The freeze path waits until the HFI is unfrozen and then restarts PIO. A link down is not a freeze event. The link down path cannot restart the PIO until link is restored. If the PIO path is restarted before the link comes up, the application (QP) using the PIO path will hang (until link is restored). Fix by separating the linkdown path from the freeze path and use the link down path for link down events. Close a race condition sc_disable() by acquiring both the progress and release locks. Close a race condition in sc_stop() by moving the setting of the flag bits under the alloc lock. Cc: <stable@vger.kernel.org> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Fix context recovery when PBC has an UnsupportedVLMichael J. Ruhl
If a packet stream uses an UnsupportedVL (virtual lane), the send engine will not send the packet, and it will not indicate that an error has occurred. This will cause the packet stream to block. HFI has 8 virtual lanes available for packet streams. Each lane can be enabled or disabled using the UnsupportedVL mask. If a lane is disabled, adding a packet to the send context must be disallowed. The current mask for determining unsupported VLs defaults to 0 (allow all). This is incorrect. Only the VLs that are defined should be allowed. Determine which VLs are disabled (mtu == 0), and set the appropriate unsupported bit in the mask. The correct mask will allow the send engine to error on the invalid VL, and error recovery will work correctly. Cc: <stable@vger.kernel.org> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22IB/hfi1: Remove caches of chip CSRsMike Marciniszyn
Remove the sizeable cache of the chip sizing CSRs and replace with CSR reads as needed. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/rdmavt, IB/hfi1: Create device dependent s_flagsMike Marciniszyn
Move some s_flags defines out of rdmavt and into hfi1 because they are hfi1 specific and therefore should remain in the driver instead of bubbling up to rdmavt. Document device specific ranges in rdmavt and remap those in hfi1. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-09IB/hfi1: Reorder incorrect send context disableMichael J. Ruhl
User send context integrity bits are cleared before the context is disabled. If the send context is still processing data, any packets that need those integrity bits will cause an error and halt the send context. During the disable handling, the driver waits for the context to drain. If the context is halted, the driver will eventually timeout because the context won't drain and then incorrectly bounce the link. Reorder the bit clearing and the context disable. Examine the software state and send context status as well as the egress status to determine if a send context is in the halted state. Promote the check macros to static functions for consistency with the new check and to follow kernel style. Remove an unused define that refers to the egress timeout. Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-01IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_nodeKamenee Arumugam
Kzalloc_node API doesn't check for overflows in size multiplication. While kcalloc API check for overflows in size multiplication but these implementations are not NUMA-aware. This conversion allowed for correcting an allocation used in the hot path to be on the local NUMA and ensure us overflow free multiplication for the size of a memory allocation. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-11-15Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is a fairly plain pull request. Lots of driver updates across the stack, a huge number of static analysis cleanups including a close to 50 patch series from Bart Van Assche, and a number of new features inside the stack such as general CQ moderation support. Nothing really stands out, but there might be a few conflicts as you take things in. In particular, the cleanups touched some of the same lines as the new timer_setup changes. Everything in this pull request has been through 0day and at least two days of linux-next (since Stephen doesn't necessarily flag new errors/warnings until day2). A few more items (about 30 patches) from Intel and Mellanox showed up on the list on Tuesday. I've excluded those from this pull request, and I'm sure some of them qualify as fixes suitable to send any time, but I still have to review them fully. If they contain mostly fixes and little or no new development, then I will probably send them through by the end of the week just to get them out of the way. There was a break in my acceptance of patches which coincides with the computer problems I had, and then when I got things mostly back under control I had a backlog of patches to process, which I did mostly last Friday and Monday. So there is a larger number of patches processed in that timeframe than I was striving for. Summary: - Add iWARP support to qedr driver - Lots of misc fixes across subsystem - Multiple update series to hns roce driver - Multiple update series to hfi1 driver - Updates to vnic driver - Add kref to wait struct in cxgb4 driver - Updates to i40iw driver - Mellanox shared pull request - timer_setup changes - massive cleanup series from Bart Van Assche - Two series of SRP/SRPT changes from Bart Van Assche - Core updates from Mellanox - i40iw updates - IPoIB updates - mlx5 updates - mlx4 updates - hns updates - bnxt_re fixes - PCI write padding support - Sparse/Smatch/warning cleanups/fixes - CQ moderation support - SRQ support in vmw_pvrdma" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits) RDMA/core: Rename kernel modify_cq to better describe its usage IB/mlx5: Add CQ moderation capability to query_device IB/mlx4: Add CQ moderation capability to query_device IB/uverbs: Add CQ moderation capability to query_device IB/mlx5: Exposing modify CQ callback to uverbs layer IB/mlx4: Exposing modify CQ callback to uverbs layer IB/uverbs: Allow CQ moderation with modify CQ iw_cxgb4: atomically flush the qp iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: Fix possible circular dependency locking warning RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion IB/core: Only maintain real QPs in the security lists IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey RDMA/core: Make function rdma_copy_addr return void RDMA/vmw_pvrdma: Add shared receive queue support RDMA/core: avoid uninitialized variable warning in create_udata RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP RDMA/bnxt_re: Set QP state in case of response completion errors RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries ...
2017-11-13IB/hfi1: Do not allocate PIO send contexts for VNICNiranjana Vishwanathapura
OPA VNIC does not use PIO contexts and instead only uses SDMA engines. Do not allocate PIO contexts for VNIC ports. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-31IB/hfi1: Create workqueue for link eventsSebastian Sanchez
Currently, link down interrupts queue link entries on a workqueue intended for sending events only. Create a workqueue for queuing link events. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Serve the most starved iowait entry firstKaike Wan
When an egress resource(SDMA descriptors, pio credits) is not available, a sending thread will be put on the resource's wait queue. When the resource becomes available again, up to a fixed number of sending threads can be awakened sequentially and removed from the wait queue, depending on the number of waiting threads and the number of free resources. Since each awakened sending thread will send as many packets as possible, it is highly likely that the first sending thread will consume all the egress resources. Subsequently, it will be put back to the end of the wait queue. Depending on the timing when the later sending threads wake up, they may not be able to send any packet and be again put back to the end of the wait queue sequentially, right behind the first sending thread. This starvation cycle continues until some sending threads exceed their retry limit and consequently fail. This patch fixes the issue by two simple approaches: (1) Any starved sending thread will be put to the head of the wait queue while a served sending thread will be put to the tail; (2) The most starved sending thread will be served first. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/hfi1: Virtual Network Interface Controller (VNIC) HW supportVishwanathapura, Niranjana
HFI1 HW specific support for VNIC functionality. Dynamically allocate a set of contexts for VNIC when the first vnic port is instantiated. Allocate VNIC contexts from user contexts pool and return them back to the same pool while freeing up. Set aside enough MSI-X interrupts for VNIC contexts and assign them when the contexts are allocated. On the receive side, use an RSM rule to spread TCP/UDP streams among VNIC contexts. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-15Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14Merge branch 'hfi1' into merge-testDoug Ledford
2016-12-11IB/hfi1: Avoid credit return allocation for cpu-less NUMA nodesHarish Chegondi
Do not allocate credit return base and DMA memory for NUMA nodes without CPUs. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11IB/hfi1: Remove critical section gap in sc_buffer_alloc()Sebastian Sanchez
In sc_buffer_alloc(), the sc->alloc_lock is released before calling sc_release_update(), and it is reacquired after the function call. This causes CPU lock trading. Fix it by not dropping the lock before calling sc_release_update(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03IB/hfi1: Remove debug prints after allocation failureLeon Romanovsky
The prints after [k|v][m|z|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-15IB/hfi1: Optimize pio_buf and send_context structsSebastian Sanchez
Both pio_buf and send_context structs have oversized fields and have cachelines that can be optimized. Reduce oversized fields for both structs. Make sure pio_buf struct fits within a cacheline. Move read-only fields to their own cacheline in send_context struct. All of this will avoid cacheline trading as the ring progresses and pio buffers/send contexts are used. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-15IB/hfi1: Get rid of divide in pio buffer allocatorSebastian Sanchez
The div instruction shows costly in profiles. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-15IB/hfi1: Add unique txwait_lock for txreq eventsMike Marciniszyn
Profiling suggests that the read_seqbegin() in the txreq put logic is colliding with other uses of the iowait lock. The packet at a time use of this lock dictates a unique lock to avoid reader/writer collisions when the number of vTxWait events is low. In order to support a unique lock the iowait struct embedded in the QP is extended to remember the lock that protects the queue head. The QP destroy removes that QP from any wait list. It doesn't need to know the head because of the linked list API, but it does need to know the lock required to protect the head. This also opens up the wait logic to have unique per resources locks which needs to be in future refinement. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-15IB/hfi1: Fix integrity check flags default valuesJakub Pawlak
Prevent setting up integrity check flags when module is loaded with NO_INTEGRITY capability. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix user-space buffers mapping with IOMMU enabledTymoteusz Kielan
The dma_XXX API functions return bus addresses which are physical addresses when IOMMU is disabled. Buffer mapping to user-space is done via remap_pfn_range() with PFN based on bus address instead of physical. This results in wrong pages being mapped to user-space when IOMMU is enabled. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Handle kzalloc failure in init_pervl_scsIra Weiny
Checking the return value of the memory allocation call in init_pervl_scs() was missed. Recently the kmalloc() was changed to kzalloc() which identified the problem. While fixing this issue 2 other bugs were noticed. First, the array being allocated is accessed in the nomem path which can be reached before it is allocated. Second, kernel_send_context was not released on error. Fix both of these by creating a more common memory unwind label structure. Fixes: 35f6befc8441 ("staging/rdma/hfi1: Add qp to send context mapping for PIO") Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Fix to fully initialize send context areaTymoteusz Kielan
While handling buffer control MAD, partially initialized dd->kernel_send_context area may cause potential dereference of uninitialized pointers. Fix by using kzalloc_node() instead of kmalloc_node(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/hfi1: Increase packet egress timeoutJubin John
The current value of 500us for the packet egress timeout is too small which causes the host to declare failure on draining packets too early and unnecessarily bounces the link. Increase this to 50ms taking into account the switch packet discard timer default and the worst case per-VL package drainage rate. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/hfi1: Fix credit return threshold adjustmentJubin John
The credit return threshold adjustment on mtu change algorithm does not take into account all the kernel send contexts that are assigned per VL. Use the pio send context map to adjust the credit return thresholds for all the allocated and assigned kernel send contexts based on the MTU adjustment per VL. The pio send context map can be changed dynamically based on the actual number of operational vls which is set by the fabric manager. When this happens update the credit return threshold values for all the remapped kernel send contexts. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Fix pio map initializationJubin John
The pio map initialization function is off by 1 causing the last kernel send context that is allocated to not get mapped into the pio map which leads to the last kernel send context not being used by any of the qps. The send context reserved for VL15 is taken care of by setting the scontext variable that is used as the index into the kernel send context array to 1 and does not need to be accounted for in the kernel send context counting loop as it is currently done. Fix the kernel send context counting loop to account for all the allocated send contexts and map all of them to the different VLs. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Move driver out of stagingDennis Dalessandro
The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>