aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2019-05-25RDMA/ipoib: Allow user space differentiate between valid dev_portLeon Romanovsky
commit b79656ed44c6865e17bcd93472ec39488bcc4984 upstream. Systemd triggers the following warning during IPoIB device load: mlx5_core 0000:00:0c.0 ib0: "systemd-udevd" wants to know my dev_id. Should it look at dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more info. This is caused due to user space attempt to differentiate old systems without dev_port and new systems with dev_port. In case dev_port will be zero, the systemd will try to read dev_id instead. There is no need to print a warning in such case, because it is valid situation and it is needed to ensure systemd compatibility with old kernels. Link: https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L358 Cc: <stable@vger.kernel.org> # 4.19 Fixes: f6350da41dc7 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25RDMA/mlx5: Use get_zeroed_page() for clock_infoJason Gunthorpe
commit ddcdc368b1033e19fd3a5f750752e10e28a87826 upstream. get_zeroed_page() returns a virtual address for the page which is better than allocating a struct page and doing a permanent kmap on it. Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16RDMA/hns: Bugfix for mapping user dbLijun Ou
[ Upstream commit 2557fabd6e29f349bfa0ac13f38ac98aa5eafc74 ] When the maximum send wr delivered by the user is zero, the qp does not have a sq. When allocating the sq db buffer to store the user sq pi pointer and map it to the kernel mode, max_send_wr is used as the trigger condition, while the kernel does not consider the max_send_wr trigger condition when mapmping db. It will cause sq record doorbell map fail and create qp fail. The failed print information as follows: hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000 hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff hns3 0000:7d:00.1: sq record doorbell map failed! hns3 0000:7d:00.1: Create RC QP failed Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16IB/mlx5: Fix scatter to CQE in DCT QP creationGuy Levi
[ Upstream commit 7249c8ea227a582c14f63e9e8853eb7369122f10 ] When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command since it tried to treat it as as QP create mailbox command instead of a DCT create command. The corrupted mailbox command causes userspace to malfunction as the device doesn't create the QP as expected. A new mlx5 capability is exposed to user-space which ensures that it will not enable the feature on DCT without this fix in the kernel. Fixes: 5d6ff1babe78 ("IB/mlx5: Support scatter to CQE for DC transport type") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10RDMA/hns: Fix bug that caused srq creation to failLijun Ou
[ Upstream commit 4772e03d239484f3461e33c79d721c8ea03f7416 ] Due to the incorrect use of the seg and obj information, the position of the mtt is calculated incorrectly, and the free space of the page is not enough to store the entire mtt, resulting in access to the next page. This patch fixes this problem. Unable to handle kernel paging request at virtual address ffff00006e3cd000 ... Call trace: hns_roce_write_mtt+0x154/0x2f0 [hns_roce] hns_roce_buf_write_mtt+0xa8/0xd8 [hns_roce] hns_roce_create_srq+0x74c/0x808 [hns_roce] ib_create_srq+0x28/0xc8 Fixes: 0203b14c4f32 ("RDMA/hns: Unify the calculation for hem index in hip08") Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_removeKamal Heib
[ Upstream commit ea7a5c706fa49273cf6d1d9def053ecb50db2076 ] Make sure to free the DSR on pvrdma_pci_remove() to avoid the memory leak. Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10IB/hfi1: Fix the allocation of RSM tableKaike Wan
[ Upstream commit d0294344470e6b52d097aa7369173f32d11f2f52 ] The receive side mapping (RSM) on hfi1 hardware is a special matching mechanism to direct an incoming packet to a given hardware receive context. It has 4 instances of matching capabilities (RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of 256 entries, each of which points to a receive context. Currently, three instances of RSM have been used: 1. RSM0 by QOS; 2. RSM1 by PSM FECN; 3. RSM2 by VNIC. Each RSM instance should reserve enough entries in RMT to function properly. Since both PSM and VNIC could allocate any receive context between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must reserve enough RMT entries to cover the entire receive context index range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only the user receive contexts allocated for PSM (dd->num_user_contexts). Consequently, the sizing of dd->num_user_contexts in set_up_context_variables is incorrect. Fixes: 2280740f01ae ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10IB/hfi1: Eliminate opcode tests on mr derefKaike Wan
[ Upstream commit a8639a79e85c18c16c10089edd589c7948f19bbd ] When an old ack_queue entry is used to store an incoming request, it may need to clean up the old entry if it is still referencing the MR. Originally only RDMA READ request needed to reference MR on the responder side and therefore the opcode was tested when cleaning up the old entry. The introduction of tid rdma specific operations in the ack_queue makes the specific opcode tests wrong. Multiple opcodes (RDMA READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup. Remove the opcode specific tests associated with the ack_queue. Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10IB/hfi1: Clear the IOWAIT pending bits when QP is put into error stateKaike Wan
[ Upstream commit 93b289b9aff66eca7575b09f36f5abbeca8e6167 ] When a QP is put into error state, it may be waiting for send engine resources. In this case, the QP will be removed from the send engine's waiting list, but its IOWAIT pending bits are not cleared. This will normally not have any major impact as the QP is being destroyed. However, the QP still needs to wind down its operations, such as draining the send queue by scheduling the send engine. Clearing the pending bits will avoid any potential complications. In addition, if the QP will eventually hang, clearing the pending bits can help debugging by presenting a consistent picture if the user dumps the qp_stats. This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in priv->s_iowait.flags in this case. Fixes: 5da0fc9dbf89 ("IB/hfi1: Prepare resource waits for dual leg") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08IB/core: Destroy QP if XRC QP failsYuval Avnery
commit 535005ca8e5e71918d64074032f4b9d4fef8981e upstream. The open-coded variant missed destroy of SELinux created QP, reuse already existing ib_detroy_qp() call and use this opportunity to clean ib_create_qp() from double prints and unclear exit paths. Reported-by: Parav Pandit <parav@mellanox.com> Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-08IB/core: Fix potential memory leak while creating MAD agentsDaniel Jurgens
commit 6e88e672b69f0e627acdae74a527b730ea224b6b upstream. If the MAD agents isn't allowed to manage the subnet, or fails to register for the LSM notifier, the security context is leaked. Free the context in these cases. Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-08IB/core: Unregister notifier before freeing MAD securityDaniel Jurgens
commit d60667fc398ed34b3c7456b020481c55c760e503 upstream. If the notifier runs after the security context is freed an access of freed memory can occur. Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-08scsi: RDMA/srpt: Fix a credit leak for aborted commandsBart Van Assche
commit 40ca8757291ca7a8775498112d320205b2a2e571 upstream. Make sure that the next time a response is sent to the initiator that the credit it had allocated for the aborted request gets freed. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Fixes: 131e6abc674e ("target: Add TFO->abort_task for aborted task resources release") # v3.15 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE useLinus Torvalds
commit 6a5c5d26c4c6c3cc486fef0bf04ff9551132611b upstream. The parameter to ZERO_PAGE() was wrong, but since all architectures except for MIPS and s390 ignore it, it wasn't noticed until 0-day reported the build error. Fixes: 67f269b37f9b ("RDMA/ucontext: Fix regression with disassociate") Cc: stable@vger.kernel.org Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02RDMA/ucontext: Fix regression with disassociateJason Gunthorpe
commit 67f269b37f9b4d52c5e7f97acea26c0852e9b8a1 upstream. When this code was consolidated the intention was that the VMA would become backed by anonymous zero pages after the zap_vma_pte - however this very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED bits to transform the VMA into an anonymous VMA. Since the vm_ops was removed this broke. Now userspace gets a SIGBUS if it touches the vma after disassociation. Instead of converting the VMA to anonymous provide a fault handler that puts a zero'd page into the VMA when user-space touches it after disassociation. Cc: stable@vger.kernel.org Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02RDMA/mlx5: Use rdma_user_map_io for mapping BAR pagesJason Gunthorpe
commit d5e560d3f72382ac4e3bfe4e0f0420e6a220b039 upstream. Since mlx5 supports device disassociate it must use this API for all BAR page mmaps, otherwise the pages can remain mapped after the device is unplugged causing a system crash. Cc: stable@vger.kernel.org Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02RDMA/mlx5: Do not allow the user to write to the clock pageJason Gunthorpe
commit c660133c339f9ab684fdf568c0d51b9ae5e86002 upstream. The intent of this VMA was to be read-only from user space, but the VM_MAYWRITE masking was missed, so mprotect could make it writable. Cc: stable@vger.kernel.org Fixes: 5c99eaecb1fc ("IB/mlx5: Mmap the HCA's clock info to user-space") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02IB/rdmavt: Fix frwr memory registrationJosh Collier
commit 7c39f7f671d2acc0a1f39ebbbee4303ad499bbfa upstream. Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli
dumping commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream. The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20IB/hfi1: Failed to drain send queue when QP is put into error stateKaike Wan
commit 662d66466637862ef955f7f6e78a286d8cf0ebef upstream. When a QP is put into error state, all pending requests in the send work queue should be drained. The following sequence of events could lead to a failure, causing a request to hang: (1) The QP builds a packet and tries to send through SDMA engine. However, PIO engine is still busy. Consequently, this packet is put on the QP's tx list and the QP is put on the PIO waiting list. The field qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN; (2) The QP is put into error state by the user application and notify_error_qp() is called, which removes the QP from the PIO waiting list and the packet from the QP's tx list. In addition, qp->s_flags is cleared of RVT_S_ANY_WAIT_IO bits, which does not include HFI1_S_WAIT_PIO_DRAIN bit; (3) The hfi1_schdule_send() function is called to drain the QP's send queue. Subsequently, hfi1_do_send() is called. Since the flag bit HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As a result, hfi1_do_send() bails out without draining any request from the send queue; (4) The PIO engine completes the sending and tries to wake up any QP on its waiting list. But the QP has been removed from the PIO waiting list and therefore is kept in sleep forever. The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2). HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN. Fixes: 2e2ba09e48b7 ("IB/rdmavt, IB/hfi1: Create device dependent s_flags") Cc: <stable@vger.kernel.org> # 4.19.x+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occursWei Hu (Xavier)
[ Upstream commit d061effc36f7bd38a12912977a37a50ac9140d11 ] In the reset process, the hns3 NIC driver notifies the RoCE driver to perform reset related processing by calling the .reset_notify() interface registered by the RoCE driver in hip08 SoC. In the current version, if a reset occurs simultaneously during the execution of rmmod or insmod ko, there may be Oops error as below: Internal error: Oops: 86000007 [#1] PREEMPT SMP Modules linked in: hns_roce(O) hns3(O) hclge(O) hnae3(O) [last unloaded: hns_roce_hw_v2] CPU: 0 PID: 14 Comm: kworker/0:1 Tainted: G O 4.19.0-ge00d540 #1 Hardware name: Huawei Technologies Co., Ltd. Workqueue: events hclge_reset_service_task [hclge] pstate: 60c00009 (nZCv daif +PAN +UAO) pc : 0xffff00000100b0b8 lr : 0xffff00000100aea0 sp : ffff000009afbab0 x29: ffff000009afbab0 x28: 0000000000000800 x27: 0000000000007ff0 x26: ffff80002f90c004 x25: 00000000000007ff x24: ffff000008f97000 x23: ffff80003efee0a8 x22: 0000000000001000 x21: ffff80002f917ff0 x20: ffff8000286ea070 x19: 0000000000000800 x18: 0000000000000400 x17: 00000000c4d3225d x16: 00000000000021b8 x15: 0000000000000400 x14: 0000000000000400 x13: 0000000000000000 x12: ffff80003fac6e30 x11: 0000800036303000 x10: 0000000000000001 x9 : 0000000000000000 x8 : ffff80003016d000 x7 : 0000000000000000 x6 : 000000000000003f x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004 x2 : 00000000000007ff x1 : 0000000000000000 x0 : 0000000000000000 Process kworker/0:1 (pid: 14, stack limit = 0x00000000af8f0ad9) Call trace: 0xffff00000100b0b8 0xffff00000100b3a0 hns_roce_init+0x624/0xc88 [hns_roce] 0xffff000001002df8 0xffff000001006960 hclge_notify_roce_client+0x74/0xe0 [hclge] hclge_reset_service_task+0xa58/0xbc0 [hclge] process_one_work+0x1e4/0x458 worker_thread+0x40/0x450 kthread+0x12c/0x130 ret_from_fork+0x10/0x18 Code: bad PC value In the reset process, we will release the resources firstly, and after the hardware reset is completed, we will reapply resources and reconfigure the hardware. We can solve this problem by modifying both the NIC and the RoCE driver. We can modify the concurrent processing in the NIC driver to avoid calling the .reset_notify and .uninit_instance ops at the same time. And we need to modify the RoCE driver to record the reset stage and the driver's init/uninit state, and check the state in the .reset_notify, .init_instance. and uninit_instance functions to avoid NULL pointer operation. Fixes: cb7a94c9c808 ("RDMA/hns: Add reset process for RoCE in hip08") Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20i40iw: Avoid panic when handling the inetdev eventFeng Tang
[ Upstream commit ec4fe4bcc584b55e24e8d1768f5510a62c0fd619 ] There is a panic reported that on a system with x722 ethernet, when doing the operations like: # ip link add br0 type bridge # ip link set eno1 master br0 # systemctl restart systemd-networkd The system will panic "BUG: unable to handle kernel null pointer dereference at 0000000000000034", with call chain: i40iw_inetaddr_event notifier_call_chain blocking_notifier_call_chain notifier_call_chain __inet_del_ifa inet_rtm_deladdr rtnetlink_rcv_msg netlink_rcv_skb rtnetlink_rcv netlink_unicast netlink_sendmsg sock_sendmsg __sys_sendto It is caused by "local_ipaddr = ntohl(in->ifa_list->ifa_address)", while the in->ifa_list is NULL. So add a check for the "in->ifa_list == NULL" case, and skip the ARP operation accordingly. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20IB/mlx4: Fix race condition between catas error reset and aliasguid flowsJack Morgenstein
[ Upstream commit 587443e7773e150ae29e643ee8f41a1eed226565 ] Code review revealed a race condition which could allow the catas error flow to interrupt the alias guid query post mechanism at random points. Thiis is fixed by doing cancel_delayed_work_sync() instead of cancel_delayed_work() during the alias guid mechanism destroy flow. Fixes: a0c64a17aba8 ("mlx4: Add alias_guid mechanism") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-17IB/mlx5: Reset access mask when looping inside page fault handlerMoni Shoua
commit 1abe186ed8a6593069bc122da55fc684383fdc1c upstream. If page-fault handler spans multiple MRs then the access mask needs to be reset before each MR handling or otherwise write access will be granted to mapped pages instead of read-only. Cc: <stable@vger.kernel.org> # 3.19 Fixes: 7bdf65d411c1 ("IB/mlx5: Handle page faults") Reported-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-05iw_cxgb4: fix srqidx leak during connection abortRaju Rangoju
[ Upstream commit f368ff188ae4b3ef6f740a15999ea0373261b619 ] When an application aborts the connection by moving QP from RTS to ERROR, then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the *srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the connection by calling c4iw_ep_disconnect(). c4iw_ep_disconnect() does the following: 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4. 2. sends abort request CPL to hw. But, since the close_complete_upcall() is sent before sending the ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the connection holds one. Because, the srqidx is passed up to libcxgb4 only after corresponding ABORT_RPL is processed by kernel in abort_rpl(). This patch handle the corner-case by moving the call to close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl(). So that libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and libcxgb4 can relinquish the srqidx properly. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05IB/mlx4: Increase the timeout for CM cacheHåkon Bugge
[ Upstream commit 2612d723aadcf8281f9bf8305657129bd9f3cd57 ] Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-27RDMA/cma: Rollback source IP address if failing to acquire deviceMyungho Jung
commit 5fc01fb846bce8fa6d5f95e2625b8ce0f8e86810 upstream. If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the device state changes back to RDMA_CM_ADDR_BOUND but the resolved source IP address is still left. After that, if rdma_destroy_id() is called after rdma_listen(), the device is freed without removed from listen_any_list in cma_cancel_operation(). Revert to the previous IP address if acquiring device fails. Reported-by: syzbot+f3ce716af730c8f96637@syzkaller.appspotmail.com Signed-off-by: Myungho Jung <mhjungk@gmail.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23IB/rdmavt: Fix concurrency panics in QP post_send and modify to errorMichael J. Ruhl
commit d757c60eca9b22f4d108929a24401e0fdecda0b1 upstream. The RC/UC code path can go through a software loopback. In this code path the receive side QP is manipulated. If two threads are working on the QP receive side (i.e. post_send, and modify_qp to an error state), QP information can be corrupted. (post_send via loopback) set r_sge loop update r_sge (modify_qp) take r_lock update r_sge <---- r_sge is now incorrect (post_send) update r_sge <---- crash, etc. ... This can lead to one of the two following crashes: BUG: unable to handle kernel NULL pointer dereference at (null) IP: hfi1_copy_sge+0xf1/0x2e0 [hfi1] PGD 8000001fe6a57067 PUD 1fd9e0c067 PMD 0 Call Trace: ruc_loopback+0x49b/0xbc0 [hfi1] hfi1_do_send+0x38e/0x3e0 [hfi1] _hfi1_do_send+0x1e/0x20 [hfi1] process_one_work+0x17f/0x440 worker_thread+0x126/0x3c0 kthread+0xd1/0xe0 ret_from_fork_nospec_begin+0x21/0x21 or: BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 IP: rvt_clear_mr_refs+0x45/0x370 [rdmavt] PGD 80000006ae5eb067 PUD ef15d0067 PMD 0 Call Trace: rvt_error_qp+0xaa/0x240 [rdmavt] rvt_modify_qp+0x47f/0xaa0 [rdmavt] ib_security_modify_qp+0x8f/0x400 [ib_core] ib_modify_qp_with_udata+0x44/0x70 [ib_core] modify_qp.isra.23+0x1eb/0x2b0 [ib_uverbs] ib_uverbs_modify_qp+0xaa/0xf0 [ib_uverbs] ib_uverbs_write+0x272/0x430 [ib_uverbs] vfs_write+0xc0/0x1f0 SyS_write+0x7f/0xf0 system_call_fastpath+0x1c/0x21 Fix by using the appropriate locking on the receiving QP. Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt") Cc: <stable@vger.kernel.org> #v4.9+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23IB/rdmavt: Fix loopback send with invalidate orderingMike Marciniszyn
commit 38bbc9f0381550d1d227fc57afa08436e36b32fc upstream. The IBTA spec notes: o9-5.2.1: For any HCA which supports SEND with Invalidate, upon receiving an IETH, the Invalidate operation must not take place until after the normal transport header validation checks have been successfully completed. The rdmavt loopback code does the validation after the invalidate. Fix by relocating the operation specific logic for all SEND variants until after the validity checks. Cc: <stable@vger.kernel.org> #v4.20+ Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23IB/hfi1: Close race condition on user context disable and closeMichael J. Ruhl
commit bc5add09764c123f58942a37c8335247e683d234 upstream. When disabling and removing a receive context, it is possible for an asynchronous event (i.e IRQ) to occur. Because of this, there is a race between cleaning up the context, and the context being used by the asynchronous event. cpu 0 (context cleanup) rc->ref_count-- (ref_count == 0) hfi1_rcd_free() cpu 1 (IRQ (with rcd index)) rcd_get_by_index() lock ref_count+++ <-- reference count race (WARNING) return rcd unlock cpu 0 hfi1_free_ctxtdata() <-- incorrect free location lock remove rcd from array unlock free rcd This race will cause the following WARNING trace: WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 Call Trace: dump_stack+0x19/0x1b __warn+0xd8/0x100 warn_slowpath_null+0x1d/0x20 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] is_rcv_urgent_int+0x24/0x90 [hfi1] general_interrupt+0x1b6/0x210 [hfi1] __handle_irq_event_percpu+0x44/0x1c0 handle_irq_event_percpu+0x32/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x7f/0x150 handle_irq+0xe4/0x1a0 do_IRQ+0x4d/0xf0 common_interrupt+0x162/0x162 The race can also lead to a use after free which could be similar to: general protection fault: 0000 1 SMP CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 Call Trace: ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_aio_write+0xba/0x110 [hfi1] do_sync_readv_writev+0x7b/0xd0 do_readv_writev+0xce/0x260 ? handle_mm_fault+0x39d/0x9b0 ? pick_next_task_fair+0x5f/0x1b0 ? sched_clock_cpu+0x85/0xc0 ? __schedule+0x13a/0x890 vfs_writev+0x35/0x60 SyS_writev+0x7f/0x110 system_call_fastpath+0x22/0x27 Use the appropriate kref API to verify access. Reorder context cleanup to ensure context removal before cleanup occurs correctly. Cc: stable@vger.kernel.org # v4.14.0+ Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15iw_cxgb4: cq/qp mask depends on bar2 pages in a host pageRaju Rangoju
Adjust the cq/qp mask based on the number of bar2 pages in a host page. For user-mode rdma, the granularity of the BAR2 memory mapped to a user rdma process during queue allocation must be based on the host page size. The lld attributes udb_density and ucq_density are used to figure out how many sge contexts are in a bar2 page. So the rdev->qpmask and rdev->cqmask in iw_cxgb4 need to now be adjusted based on how many sge bar2 pages are in a host page. Otherwise the device fails to work on non 4k page size systems. Fixes: 2391b0030e24 ("cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size") Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04RDMA/srp: Rework SCSI device reset handlingBart Van Assche
Since .scsi_done() must only be called after scsi_queue_rq() has finished, make sure that the SRP initiator driver does not call .scsi_done() while scsi_queue_rq() is in progress. Although invoking sg_reset -d while I/O is in progress works fine with kernel v4.20 and before, that is not the case with kernel v5.0-rc1. This patch avoids that the following crash is triggered with kernel v5.0-rc1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000138 CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G B 5.0.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10 Call Trace: blk_mq_sched_dispatch_requests+0x2f7/0x300 __blk_mq_run_hw_queue+0xd6/0x180 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x4f1/0xa20 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: <stable@vger.kernel.org> Fixes: 94a9174c630c ("IB/srp: reduce lock coverage of command completion") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociateYishai Hadas
The vma->vm_mm can become impossible to get before rdma_umap_close() is called, in this case we must not try to get an mm that is already undergoing process exit. In this case there is no need to wait for anything as the VMA will be destroyed by another thread soon and is already effectively 'unreachable' by userspace. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__rb_erase_color+0xb9/0x280 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89 58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d 10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000 FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0 Call Trace: unlink_file_vma+0x3b/0x50 free_pgtables+0xa1/0x110 exit_mmap+0xca/0x1a0 ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib] mmput+0x54/0x140 uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs] uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: <stable@vger.kernel.org> # 4.19 Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25IB/ipoib: Fix for use-after-free in ipoib_cm_tx_startFeras Daoud
The following BUG was reported by kasan: BUG: KASAN: use-after-free in ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib] Read of size 80 at addr ffff88034c30bcd0 by task kworker/u16:1/24020 Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib] Call Trace: dump_stack+0x9a/0xeb print_address_description+0xe3/0x2e0 kasan_report+0x18a/0x2e0 ? ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib] memcpy+0x1f/0x50 ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib] ? kvm_clock_read+0x1f/0x30 ? ipoib_cm_skb_reap+0x610/0x610 [ib_ipoib] ? __lock_is_held+0xc2/0x170 ? process_one_work+0x880/0x1960 ? process_one_work+0x912/0x1960 process_one_work+0x912/0x1960 ? wq_pool_ids_show+0x310/0x310 ? lock_acquire+0x145/0x440 worker_thread+0x87/0xbb0 ? process_one_work+0x1960/0x1960 kthread+0x314/0x3d0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 Allocated by task 0: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc_trace+0x168/0x3e0 path_rec_create+0xa2/0x1f0 [ib_ipoib] ipoib_start_xmit+0xa98/0x19e0 [ib_ipoib] dev_hard_start_xmit+0x159/0x8d0 sch_direct_xmit+0x226/0xb40 __dev_queue_xmit+0x1d63/0x2950 neigh_update+0x889/0x1770 arp_process+0xc47/0x21f0 arp_rcv+0x462/0x760 __netif_receive_skb_core+0x1546/0x2da0 netif_receive_skb_internal+0xf2/0x590 napi_gro_receive+0x28e/0x390 ipoib_ib_handle_rx_wc_rss+0x873/0x1b60 [ib_ipoib] ipoib_rx_poll_rss+0x17d/0x320 [ib_ipoib] net_rx_action+0x427/0xe30 __do_softirq+0x28e/0xc42 Freed by task 26680: __kasan_slab_free+0x11d/0x160 kfree+0xf5/0x360 ipoib_flush_paths+0x532/0x9d0 [ib_ipoib] ipoib_set_mode_rss+0x1ad/0x560 [ib_ipoib] set_mode+0xc8/0x150 [ib_ipoib] kernfs_fop_write+0x279/0x440 __vfs_write+0xd8/0x5c0 vfs_write+0x15e/0x470 ksys_write+0xb8/0x180 do_syscall_64+0x9b/0x420 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88034c30bcc8 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 8 bytes inside of 512-byte region [ffff88034c30bcc8, ffff88034c30bec8) The buggy address belongs to the page: The following race between change mode and xmit flow is the reason for this use-after-free: Change mode Send packet 1 to GID XX Send packet 2 to GID XX | | | start | | | | | | | | | Create new path for GID XX | | and update neigh path | | | | | | | | | | flush_paths | | | | queue_work(cm.start_task) | | Path for GID XX not found | create new path | | start_task runs with old released path There is no locking to protect the lifetime of the path through the ipoib_cm_tx struct, so delete it entirely and always use the newly looked up path under the priv->lock. Fixes: 546481c2816e ("IB/ipoib: Fix memory corruption in ipoib cm mode connect flow") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25IB/uverbs: Fix ioctl query port to consider device disassociationYishai Hadas
Methods cannot peak into the ufile, the only way to get a ucontext and hence a device is via the ib_uverbs_get_ucontext() call or inspecing a locked uobject. Otherwise during/after disassociation the pointers may be null or free'd. BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 PGD 800000005ece6067 P4D 800000005ece6067 PUD 5ece7067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 10631 Comm: ibv_ud_pingpong Tainted: GW OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT+0x53/0x191 [ib_uverbs] Code: 80 00 00 00 31 c0 48 8b 47 40 48 8d 5c 24 38 48 8d 6c 24 08 48 89 df 48 8b 40 08 4c 8b a0 18 03 00 00 31 c0 f3 48 ab 48 89 ef <49> 83 7c 24 78 00 b1 06 f3 48 ab 0f 84 89 00 00 00 45 31 c9 31 d2 RSP: 0018:ffffb54802ccfb10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffb54802ccfb48 RCX:0000000000000000 RDX: fffffffffffffffa RSI: ffffb54802ccfcf8 RDI:ffffb54802ccfb18 RBP: ffffb54802ccfb18 R08: ffffb54802ccfd18 R09:0000000000000000 R10: 0000000000000000 R11: 00000000000000d0 R12:0000000000000000 R13: ffffb54802ccfcb0 R14: ffffb54802ccfc48 R15:ffff9f736e0059a0 FS: 00007f55a6bd7740(0000) GS:ffff9f737ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000078 CR3: 0000000064214000 CR4:00000000000006f0 Call Trace: ib_uverbs_cmd_verbs.isra.5+0x94d/0xa60 [ib_uverbs] ? copy_port_attr_to_resp+0x120/0x120 [ib_uverbs] ? arch_tlb_finish_mmu+0x16/0xc0 ? tlb_finish_mmu+0x1f/0x30 ? unmap_region+0xd9/0x120 ib_uverbs_ioctl+0xbc/0x120 [ib_uverbs] do_vfs_ioctl+0xa9/0x620 ? __do_munmap+0x29f/0x3a0 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f55a62cb567 Fixes: 641d1207d2ed ("IB/core: Move query port to ioctl") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25RDMA/mlx5: Fix flow creation on representorsMark Bloch
The intention of the flow_is_supported was to disable the entire tree and methods that allow raw flow creation, but the grammar syntax has this disable the entire UVERBS_FLOW object. Since the method requires a MLX5_IB_OBJECT_FLOW_MATCHER there is no need to do anything, as it is automatically disabled when matchers are disabled. This restores the ability to create flow steering rules on representors via regular verbs. Fixes: a1462351b590 ("RDMA/mlx5: Fail early if user tries to create flows on IB representors") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25IB/uverbs: Fix OOPs upon device disassociationYishai Hadas
The async_file might be freed before the disassociation has been ended, causing qp shutdown to use after free on it. Since uverbs_destroy_ufile_hw is not a fence, it returns if a disassociation is ongoing in another thread. It has to be written this way to avoid deadlock. However this means that the ufile FD close cannot destroy anything that may still be used by an active kref, such as the the async_file. To fix that move the kref_put() to be in ib_uverbs_release_file(). BUG: unable to handle kernel paging request at ffffffffba682787 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061 Oops: 0003 [#1] SMP PTI CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85 d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00 FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0 Call Trace: _raw_spin_lock_irq+0x27/0x30 ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs] uverbs_free_qp+0x7e/0x90 [ib_uverbs] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fac551aac60 Cc: <stable@vger.kernel.org> # 4.2 Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25RDMA/umem: Add missing initialization of owning_mmArtemy Kovalyov
When allocating a umem leaf for implicit ODP MR during page fault the field owning_mm was not set. Initialize and take a reference on this field to avoid kernel panic when trying to access this field. BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib] RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core] Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202 RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80 RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77 R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00 R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c FS: 0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0 Call Trace: pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib] mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib] ? __switch_to+0xe1/0x470 process_one_work+0x174/0x390 worker_thread+0x4f/0x3e0 kthread+0x102/0x140 ? drain_workqueue+0x130/0x130 ? kthread_stop+0x110/0x110 ret_from_fork+0x1f/0x30 Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25RDMA/hns: Update the kernel header file of hnsLijun Ou
The hns_roce_ib_create_srq_resp is used to interact with the user for data, this was open coded to use a u32 directly, instead use a properly sized structure. Fixes: c7bcb13442e1 ("RDMA/hns: Add SRQ support for hip08 kernel mode") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/mlx5: Fix how advise_mr() launches async workJason Gunthorpe
Work must hold a kref on the ib_device otherwise the dev pointer can become free before the work runs. This can happen because the work is being pushed onto the system work queue which is not flushed during driver unregister. Remove the bogus use of 'reg_state': - While in uverbs the reg_state is guaranteed to always be REGISTERED - Testing reg_state with no locking is bogus. Use ib_device_try_get() to get back into a region that prevents unregistration. For now continue with a flow that is similar to the existing code. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com>
2019-01-21RDMA/device: Expose ib_device_try_get(()Jason Gunthorpe
It turns out future patches need this capability quite widely now, not just for netlink, so provide two global functions to manage the registration lock refcount. This also moves the point the lock becomes 1 to within ib_register_device() so that the semantics of the public API are very sane and clear. Calling ib_device_try_get() will fail on devices that are only allocated but not yet registered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-01-21IB/hfi1: Add limit test for RC/UC send via loopbackMike Marciniszyn
Fix potential memory corruption and panic in loopback for IB_WR_SEND variants. The code blindly assumes the posted length will fit in the fetched rwqe, which is not a valid assumption. Fix by adding a limit test, and triggering the appropriate send completion and putting the QP in an error state. This mimics the handling for non-loopback QPs. Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt") Cc: <stable@vger.kernel.org> #v4.20+ Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/hfi1: Remove overly conservative VM_EXEC flag checkMichael J. Ruhl
Applications that use the stack for execution purposes cause userspace PSM jobs to fail during mmap(). Both Fortran (non-standard format parsing) and C (callback functions located in the stack) applications can be written such that stack execution is required. The linker notes this via the gnu_stack ELF flag. This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps to have PROT_EXEC for the process. Checking for VM_EXEC bit and failing the request with EPERM is overly conservative and will break any PSM application using executable stacks. Cc: <stable@vger.kernel.org> #v4.14+ Fixes: 12220267645c ("IB/hfi: Protect against writable mmap") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMMBrian Welty
The work completion length for a receiving a UD send with immediate is short by 4 bytes causing application using this opcode to fail. The UD receive logic incorrectly subtracts 4 bytes for immediate value. These bytes are already included in header length and are used to calculate header/payload split, so the result is these 4 bytes are subtracted twice, once when the header length subtracted from the overall length and once again in the UD opcode specific path. Remove the extra subtraction when handling the opcode. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOVJack Morgenstein
The commit cited below replaced rdma_create_ah with mlx4_ib_create_slave_ah when creating AHs for the paravirtualized special QPs. However, this change also required replacing rdma_destroy_ah with mlx4_ib_destroy_ah in the affected flows. The commit missed 3 places where rdma_destroy_ah should have been replaced with mlx4_ib_destroy_ah. As a result, the pd usecount was decremented when the ah was destroyed -- although the usecount was NOT incremented when the ah was created. This caused the pd usecount to become negative, and resulted in the WARN_ON stack trace below when the mlx4_ib.ko module was unloaded: WARNING: CPU: 3 PID: 25303 at drivers/infiniband/core/verbs.c:329 ib_dealloc_pd+0x6d/0x80 [ib_core] Modules linked in: rdma_ucm rdma_cm iw_cm ib_cm ib_umad mlx4_ib(-) ib_uverbs ib_core mlx4_en mlx4_core nfsv3 nfs fscache configfs xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc dm_mirror dm_region_hash dm_log dm_mod dax rndis_wlan rndis_host coretemp kvm_intel cdc_ether kvm usbnet iTCO_wdt iTCO_vendor_support cfg80211 irqbypass lpc_ich ipmi_si i2c_i801 mii pcspkr i2c_core mfd_core ipmi_devintf i7core_edac ipmi_msghandler ioatdma pcc_cpufreq dca acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas mptscsih crc32c_intel ata_piix bnx2 mptbase ipv6 crc_ccitt autofs4 [last unloaded: mlx4_core] CPU: 3 PID: 25303 Comm: modprobe Tainted: G W I 5.0.0-rc1-net-mlx4+ #1 Hardware name: IBM -[7148ZV6]-/Node 1, System Card, BIOS -[MLE170CUS-1.70]- 09/23/2011 RIP: 0010:ib_dealloc_pd+0x6d/0x80 [ib_core] Code: 00 00 85 c0 75 02 5b c3 80 3d aa 87 03 00 00 75 f5 48 c7 c7 88 d7 8f a0 31 c0 c6 05 98 87 03 00 01 e8 07 4c 79 e0 0f 0b 5b c3 <0f> 0b eb be 0f 0b eb ab 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 RSP: 0018:ffffc90005347e30 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: ffff8888589e9540 RCX: 0000000000000006 RDX: 0000000000000006 RSI: ffff88885d57ad40 RDI: 0000000000000000 RBP: ffff88885b029c00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000004 R12: ffff8887f06c0000 R13: ffff8887f06c13e8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fd6743c6740(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000ed1038 CR3: 00000007e3156000 CR4: 00000000000006e0 Call Trace: mlx4_ib_close_sriov+0x125/0x180 [mlx4_ib] mlx4_ib_remove+0x57/0x1f0 [mlx4_ib] mlx4_remove_device+0x92/0xa0 [mlx4_core] mlx4_unregister_interface+0x39/0x90 [mlx4_core] mlx4_ib_cleanup+0xc/0xd7 [mlx4_ib] __x64_sys_delete_module+0x17d/0x290 ? trace_hardirqs_off_thunk+0x1a/0x1c ? do_syscall_64+0x12/0x180 do_syscall_64+0x4a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 5e62d5ff1b9a ("IB/mlx4: Create slave AH's directly") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21RDMA/mlx5: Fix check for supported user flags when creating a QPMark Bloch
When the flags verification was added two flags were missed from the check: * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC * MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC This causes user applications that were using these flags to break. Fixes: 2e43bb31b8df ("IB/mlx5: Verify that driver supports user flags") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes frfom Jason Gunthorpe: "Not much so far. We have the usual batch of bugs and two fixes to code merged this cycle: - Restore valgrind support for the ioctl verbs interface merged this window, and fix a missed error code on an error path from that conversion - A user reported crash on obsolete mthca hardware - pvrdma was using the wrong command opcode toward the hypervisor - NULL pointer crash regression when dumping rdma-cm over netlink - Be conservative about exposing the global rkey" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT RDMA/mthca: Clear QP objects during their allocation RDMA/vmw_pvrdma: Return the correct opcode when creating WR RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id type RDMA/nldev: Don't expose unsafe global rkey to regular user RDMA/uverbs: Fix post send success return value in case of error
2019-01-14RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUTJason Gunthorpe
When the ioctl interface for the write commands was introduced it did not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes rdma-core in userspace to not mark the buffers as written for valgrind. Along the same lines it turns out we have always missed marking the driver data. Fixing both of these makes valgrind work properly with rdma-core and ioctl. Fixes: 4785860e04bc ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-10RDMA/mthca: Clear QP objects during their allocationLeon Romanovsky
As part of audit process to update drivers to use rdma_restrack_add() ensure that QP objects is cleared before access. Such change fixes the crash observed with uninitialized non zero sgid attr accessed by ib_destroy_qp(). CPU: 3 PID: 74 Comm: kworker/u16:1 Not tainted 4.19.10-300.fc29.x86_64 Workqueue: ipoib_wq ipoib_cm_tx_reap [ib_ipoib] RIP: 0010:rdma_put_gid_attr+0x9/0x30 [ib_core] RSP: 0018:ffffb7ad819dbde8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8d1bdf5a2e00 RCX: 0000000000002699 RDX: 206c656e72656af8 RSI: ffff8d1bf7ae6160 RDI: 206c656e72656b20 RBP: 0000000000000000 R08: 0000000000026160 R09: ffffffffc06b45bf R10: ffffe849887da000 R11: 0000000000000002 R12: ffff8d1be30cb400 R13: ffff8d1bdf681800 R14: ffff8d1be2272400 R15: ffff8d1be30ca000 FS: 0000000000000000(0000) GS:ffff8d1bf7ac0000(0000) knlGS:0000000000000000 Trace: ib_destroy_qp+0xc9/0x240 [ib_core] ipoib_cm_tx_reap+0x1f9/0x4e0 [ib_ipoib] process_one_work+0x1a1/0x3a0 worker_thread+0x30/0x380 ? pwq_unbound_release_workfn+0xd0/0xd0 kthread+0x112/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x22/0x40 Reported-by: Alexander Murashkin <AlexanderMurashkin@msn.com> Tested-by: Alexander Murashkin <AlexanderMurashkin@msn.com> Fixes: 1a1f460ff151 ("RDMA: Hold the sgid_attr inside the struct ib_ah/qp") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10RDMA/vmw_pvrdma: Return the correct opcode when creating WRAdit Ranadive
Since the IB_WR_REG_MR opcode value changed, let's set the PVRDMA device opcodes explicitly. Reported-by: Ruishuang Wang <ruishuangw@vmware.com> Fixes: 9a59739bd01f ("IB/rxe: Revise the ib_wr_opcode enum") Cc: stable@vger.kernel.org Reviewed-by: Bryan Tan <bryantan@vmware.com> Reviewed-by: Ruishuang Wang <ruishuangw@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>