summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2021-03-09IB/mlx5: Add missing error codeYueHaibing
[ Upstream commit 3a9b3d4536e0c25bd3906a28c1f584177e49dd0f ] Set err to -ENOMEM if kzalloc fails instead of 0. Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://lore.kernel.org/r/20210222122343.19720-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-09RDMA/rxe: Fix missing kconfig dependency on CRYPTOJulian Braha
[ Upstream commit 475f23b8c66d2892ad6acbf90ed757cafab13de7 ] When RDMA_RXE is enabled and CRYPTO is disabled, Kbuild gives the following warning: WARNING: unmet direct dependencies detected for CRYPTO_CRC32 Depends on [n]: CRYPTO [=n] Selected by [y]: - RDMA_RXE [=y] && (INFINIBAND_USER_ACCESS [=y] || !INFINIBAND_USER_ACCESS [=y]) && INET [=y] && PCI [=y] && INFINIBAND [=y] && INFINIBAND_VIRT_DMA [=y] This is because RDMA_RXE selects CRYPTO_CRC32, without depending on or selecting CRYPTO, despite that config option being subordinate to CRYPTO. Fixes: cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible") Signed-off-by: Julian Braha <julianbraha@gmail.com> Link: https://lore.kernel.org/r/21525878.NYvzQUHefP@ubuntu-mate-laptop Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/hns: Fixes missing error code of CMDQLang Cheng
[ Upstream commit 8f86e2eadac968200a6ab1d7074fc0f5cbc1e075 ] When posting a multi-descriptors command, the error code of previous failed descriptors may be rewrote to 0 by a later successful descriptor. Fixes: a04ff739f2a9 ("RDMA/hns: Add command queue support for hip08 RoCE driver") Link: https://lore.kernel.org/r/1612688143-28226-3-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/hns: Fix type of sq_signal_bitsWeihang Li
[ Upstream commit ea4092f3b56b236d08890ea589506ebd76248c53 ] This bit should be in type of enum ib_sig_type, or there will be a sparse warning. Fixes: bfe860351e31 ("RDMA/hns: Fix cast from or to restricted __le32 for driver") Link: https://lore.kernel.org/r/1612517974-31867-3-git-send-email-liweihang@huawei.com Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/siw: Fix calculation of tx_valid_cpus sizeKamal Heib
[ Upstream commit 429fa9698957d1a910535ce5e33aedf5adfdabc1 ] The size of tx_valid_cpus was calculated under the assumption that the numa nodes identifiers are continuous, which is not the case in all archs as this could lead to the following panic when trying to access an invalid tx_valid_cpus index, avoid the following panic by using nr_node_ids instead of num_online_nodes() to allocate the tx_valid_cpus size. Kernel attempted to read user page (8) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000008 Faulting instruction address: 0xc0080000081b4a90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: siw(+) rfkill rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm sunrpc ib_umad rdma_cm ib_cm iw_cm i40iw ib_uverbs ib_core i40e ses enclosure scsi_transport_sas ipmi_powernv ibmpowernv at24 ofpart ipmi_devintf regmap_i2c ipmi_msghandler powernv_flash uio_pdrv_genirq uio mtd opal_prd zram ip_tables xfs libcrc32c sd_mod t10_pi ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto aacraid drm_panel_orientation_quirks dm_mod CPU: 40 PID: 3279 Comm: modprobe Tainted: G W X --------- --- 5.11.0-0.rc4.129.eln108.ppc64le #2 NIP: c0080000081b4a90 LR: c0080000081b4a2c CTR: c0000000007ce1c0 REGS: c000000027fa77b0 TRAP: 0300 Tainted: G W X --------- --- (5.11.0-0.rc4.129.eln108.ppc64le) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 44224882 XER: 00000000 CFAR: c0000000007ce200 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0 GPR00: c0080000081b4a2c c000000027fa7a50 c0080000081c3900 0000000000000040 GPR04: c000000002023080 c000000012e1c300 000020072ad70000 0000000000000001 GPR08: c000000001726068 0000000000000008 0000000000000008 c0080000081b5758 GPR12: c0000000007ce1c0 c0000007fffc3000 00000001590b1e40 0000000000000000 GPR16: 0000000000000000 0000000000000001 000000011ad68fc8 00007fffcc09c5c8 GPR20: 0000000000000008 0000000000000000 00000001590b2850 00000001590b1d30 GPR24: 0000000000043d68 000000011ad67a80 000000011ad67a80 0000000000100000 GPR28: c000000012e1c300 c0000000020271c8 0000000000000001 c0080000081bf608 NIP [c0080000081b4a90] siw_init_cpulist+0x194/0x214 [siw] LR [c0080000081b4a2c] siw_init_cpulist+0x130/0x214 [siw] Call Trace: [c000000027fa7a50] [c0080000081b4a2c] siw_init_cpulist+0x130/0x214 [siw] (unreliable) [c000000027fa7a90] [c0080000081b4e68] siw_init_module+0x40/0x2a0 [siw] [c000000027fa7b30] [c0000000000124f4] do_one_initcall+0x84/0x2e0 [c000000027fa7c00] [c000000000267ffc] do_init_module+0x7c/0x350 [c000000027fa7c90] [c00000000026a180] __do_sys_init_module+0x210/0x250 [c000000027fa7db0] [c0000000000387e4] system_call_exception+0x134/0x230 [c000000027fa7e10] [c00000000000d660] system_call_common+0xf0/0x27c Instruction dump: 40810044 3d420000 e8bf0000 e88a82d0 3d420000 e90a82c8 792a1f24 7cc4302a 7d2642aa 79291f24 7d25482a 7d295214 <7d4048a8> 7d4a3b78 7d4049ad 40c2fff4 Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface") Link: https://lore.kernel.org/r/20210201112922.141085-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/hns: Fixed wrong judgments in the goto branchWenpeng Liang
[ Upstream commit bb74fe7e81c8b2b65c6a351a247fdb9a969cbaec ] When an error occurs, the qp_table must be cleared, regardless of whether the SRQ feature is enabled. Fixes: 5c1f167af112 ("RDMA/hns: Init SRQ table for hip08") Link: https://lore.kernel.org/r/1611997090-48820-5-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/rxe: Correct skb on loopback pathBob Pearson
[ Upstream commit 5120bf0a5fc15dec210a0fe0f39e4a256bb6e349 ] rxe_net.c sends packets at the IP layer with skb->data pointing at the IP header but receives packets from a UDP tunnel with skb->data pointing at the UDP header. On the loopback path this was not correctly accounted for. This patch corrects for this by using sbk_pull() to strip the IP header from the skb on received packets. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20210128182301.16859-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/rxe: Fix coding error in rxe_rcv_mcast_pktBob Pearson
[ Upstream commit 8fc1b7027fc162738d5a85c82410e501a371a404 ] rxe_rcv_mcast_pkt() in rxe_recv.c can leak SKBs in error path code. The loop over the QPs attached to a multicast group creates new cloned SKBs for all but the last QP in the list and passes the SKB and its clones to rxe_rcv_pkt() for further processing. Any QPs that do not pass some checks are skipped. If the last QP in the list fails the tests the SKB is leaked. This patch checks if the SKB for the last QP was used and if not frees it. Also removes a redundant loop invariant assignment. Fixes: 8700e3e7c485 ("Soft RoCE driver") Fixes: 71abf20b28ff ("RDMA/rxe: Handle skb_clone() failure in rxe_recv.c") Link: https://lore.kernel.org/r/20210128174752.16128-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/rxe: Fix coding error in rxe_recv.cBob Pearson
[ Upstream commit 7d9ae80e31df57dd3253e1ec514f0000aa588a81 ] check_type_state() in rxe_recv.c is written as if the type bits in the packet opcode were a bit mask which is not correct. This patch corrects this code to compare all 3 type bits to the required type. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20210127214500.3707-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/cm: Avoid a loop when device has 255 portsParav Pandit
[ Upstream commit 131be26750379592f0dd6244b2a90bbb504a10bb ] When RDMA device has 255 ports, loop iterator i overflows. Due to which cm_add_one() port iterator loops infinitely. Use core provided port iterator to avoid the infinite loop. Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/20210127150010.1876121-9-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/mlx5: Return appropriate error code instead of ENOMEMParav Pandit
[ Upstream commit d286ac1d05210695c312b9018b3aa7c2048e9aca ] When mlx5_ib_stage_init_init() fails, return the error code related to failure instead of -ENOMEM. Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") Link: https://lore.kernel.org/r/20210127150010.1876121-8-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/umad: Return EPOLLERR in case of when device disassociatedShay Drory
[ Upstream commit def4cd43f522253645b72c97181399c241b54536 ] Currently, polling a umad device will always works, even if the device was disassociated. A disassociated device should immediately return EPOLLERR from poll(). Otherwise userspace is endlessly hung on poll() with no idea that the device has been removed from the system. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/20210125121339.837518-3-leon@kernel.org Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04IB/umad: Return EIO in case of when device disassociatedShay Drory
[ Upstream commit 4fc5461823c9cad547a9bdfbf17d13f0da0d6bb5 ] MAD message received by the user has EINVAL error in all flows including when the device is disassociated. That makes it impossible for the applications to treat such flow differently. Change it to return EIO, so the applications will be able to perform disassociation recovery. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/20210125121339.837518-2-leon@kernel.org Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/mlx5: Use the correct obj_id upon DEVX TIR creationYishai Hadas
[ Upstream commit 8798e4ad0abe0ba1221928a46561981c510be0c6 ] Use the correct obj_id upon DEVX TIR creation by strictly taking the tirn 24 bits and not the general obj_id which is 32 bits. Fixes: 7efce3691d33 ("IB/mlx5: Add obj create and destroy functionality") Link: https://lore.kernel.org/r/20201230130121.180350-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04RDMA/siw: Fix handling of zero-sized Read and Receive Queues.Bernard Metzler
[ Upstream commit 661f385961f06f36da24cf408d461f988d0c39ad ] During connection setup, the application may choose to zero-size inbound and outbound READ queues, as well as the Receive queue. This patch fixes handling of zero-sized queues, but not prevents it. Kamal Heib says in an initial error report: When running the blktests over siw the following shift-out-of-bounds is reported, this is happening because the passed IRD or ORD from the ulp could be zero which will lead to unexpected behavior when calling roundup_pow_of_two(), fix that by blocking zero values of ORD or IRD. UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 20 PID: 3957 Comm: kworker/u64:13 Tainted: G S 5.10.0-rc6 #2 Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.1.5 04/11/2016 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x99/0xcb ubsan_epilogue+0x5/0x40 __ubsan_handle_shift_out_of_bounds.cold.11+0xb4/0xf3 ? down_write+0x183/0x3d0 siw_qp_modify.cold.8+0x2d/0x32 [siw] ? __local_bh_enable_ip+0xa5/0xf0 siw_accept+0x906/0x1b60 [siw] ? xa_load+0x147/0x1f0 ? siw_connect+0x17a0/0x17a0 [siw] ? lock_downgrade+0x700/0x700 ? siw_get_base_qp+0x1c2/0x340 [siw] ? _raw_spin_unlock_irqrestore+0x39/0x40 iw_cm_accept+0x1f4/0x430 [iw_cm] rdma_accept+0x3fa/0xb10 [rdma_cm] ? check_flush_dependency+0x410/0x410 ? cma_rep_recv+0x570/0x570 [rdma_cm] nvmet_rdma_queue_connect+0x1a62/0x2680 [nvmet_rdma] ? nvmet_rdma_alloc_cmds+0xce0/0xce0 [nvmet_rdma] ? lock_release+0x56e/0xcc0 ? lock_downgrade+0x700/0x700 ? lock_downgrade+0x700/0x700 ? __xa_alloc_cyclic+0xef/0x350 ? __xa_alloc+0x2d0/0x2d0 ? rdma_restrack_add+0xbe/0x2c0 [ib_core] ? __ww_mutex_die+0x190/0x190 cma_cm_event_handler+0xf2/0x500 [rdma_cm] iw_conn_req_handler+0x910/0xcb0 [rdma_cm] ? _raw_spin_unlock_irqrestore+0x39/0x40 ? trace_hardirqs_on+0x1c/0x150 ? cma_ib_handler+0x8a0/0x8a0 [rdma_cm] ? __kasan_kmalloc.constprop.7+0xc1/0xd0 cm_work_handler+0x121c/0x17a0 [iw_cm] ? iw_cm_reject+0x190/0x190 [iw_cm] ? trace_hardirqs_on+0x1c/0x150 process_one_work+0x8fb/0x16c0 ? pwq_dec_nr_in_flight+0x320/0x320 worker_thread+0x87/0xb40 ? __kthread_parkme+0xd1/0x1a0 ? process_one_work+0x16c0/0x16c0 kthread+0x35f/0x430 ? kthread_mod_delayed_work+0x180/0x180 ret_from_fork+0x22/0x30 Fixes: a531975279f3 ("rdma/siw: main include file") Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Link: https://lore.kernel.org/r/20210108125845.1803-1-bmt@zurich.ibm.com Reported-by: Kamal Heib <kamalheib1@gmail.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-03RDMA/cxgb4: Fix the reported max_recv_sge valueKamal Heib
[ Upstream commit a372173bf314d374da4dd1155549d8ca7fc44709 ] The max_recv_sge value is wrongly reported when calling query_qp, This is happening due to a typo when assigning the max_recv_sge value, the value of sq_max_sges was assigned instead of rq_max_sges. Fixes: 3e5c02c9ef9a ("iw_cxgb4: Support query_qp() verb") Link: https://lore.kernel.org/r/20210114191423.423529-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19IB/mlx5: Fix error unwinding when set_has_smi_cap failsParav Pandit
commit 2cb091f6293df898b47f4e0f2e54324e2bbaf816 upstream. When set_has_smi_cap() fails, multiport master cleanup is missed. Fix it by doing the correct error unwinding goto. Fixes: a989ea01cb10 ("RDMA/mlx5: Move SMI caps logic") Link: https://lore.kernel.org/r/20210113121703.559778-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19RDMA/mlx5: Fix wrong free of blue flame register on errorMark Bloch
commit 1c3aa6bd0b823105c2030af85d92d158e815d669 upstream. If the allocation of the fast path blue flame register fails, the driver should free the regular blue flame register allocated a statement above, not the one that it just failed to allocate. Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") Link: https://lore.kernel.org/r/20210113121703.559778-6-leon@kernel.org Reported-by: Hans Petter Selasky <hanss@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grpDinghao Liu
commit a306aba9c8d869b1fdfc8ad9237f1ed718ea55e6 upstream. If usnic_ib_qp_grp_create() fails at the first call, dev_list will not be freed on error, which leads to memleak. Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Link: https://lore.kernel.org/r/20201226074248.2893-1-dinghao.liu@zju.edu.cn Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19RDMA/restrack: Don't treat as an error allocation ID wrappingLeon Romanovsky
commit 3c638cdb8ecc0442552156e0fed8708dd2c7f35b upstream. xa_alloc_cyclic() call returns positive number if ID allocation succeeded but wrapped. It is not an error, so normalize the "ret" variable to zero as marker of not-an-error. drivers/infiniband/core/restrack.c:261 rdma_restrack_add() warn: 'ret' can be either negative or positive Fixes: fd47c2f99f04 ("RDMA/restrack: Convert internal DB from hash to XArray") Link: https://lore.kernel.org/r/20201216100753.1127638-1-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()Tom Rix
commit f2bc3af6353cb2a33dfa9d270d999d839eef54cb upstream. In ocrdma_dealloc_ucontext_pd() uctx->cntxt_pd is assigned to the variable pd and then after uctx->cntxt_pd is freed, the variable pd is passed to function _ocrdma_dealloc_pd() which dereferences pd directly or through its call to ocrdma_mbx_dealloc_pd(). Reorder the free using the variable pd. Cc: stable@vger.kernel.org Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20201230024653.1516495-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30RDMA/cma: Don't overwrite sgid_attr after device is releasedLeon Romanovsky
[ Upstream commit e246b7c035d74abfb3507fa10082d0c42cc016c3 ] As part of the cma_dev release, that pointer will be set to NULL. In case it happens in rdma_bind_addr() (part of an error flow), the next call to addr_handler() will have a call to cma_acquire_dev_by_src_ip() which will overwrite sgid_attr without releasing it. WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline] WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649 CPU: 2 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: ib_addr process_one_req RIP: 0010:cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline] RIP: 0010:cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649 Code: 66 d9 4a ff 4d 8b 6e 10 49 8d bd 1c 08 00 00 e8 b6 d6 4a ff 45 0f b6 bd 1c 08 00 00 41 83 e7 01 e9 49 fd ff ff e8 90 c5 29 ff <0f> 0b e9 80 fe ff ff e8 84 c5 29 ff 4c 89 f7 e8 2c d9 4a ff 4d 8b RSP: 0018:ffff8881047c7b40 EFLAGS: 00010293 RAX: ffff888104789c80 RBX: 0000000000000001 RCX: ffffffff820b8ef8 RDX: 0000000000000000 RSI: ffffffff820b9080 RDI: ffff88810cd4c998 RBP: ffff8881047c7c08 R08: ffff888104789c80 R09: ffffed10209f4036 R10: ffff888104fa01ab R11: ffffed10209f4035 R12: ffff88810cd4c800 R13: ffff888105750e28 R14: ffff888108f0a100 R15: ffff88810cd4c998 FS: 0000000000000000(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000104e60005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: addr_handler+0x266/0x350 drivers/infiniband/core/cma.c:3190 process_one_req+0xa3/0x300 drivers/infiniband/core/addr.c:645 process_one_work+0x54c/0x930 kernel/workqueue.c:2272 worker_thread+0x82/0x830 kernel/workqueue.c:2418 kthread+0x1ca/0x220 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Fixes: ff11c6cd521f ("RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()") Link: https://lore.kernel.org/r/20201213132940.345554-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30RDMA/core: Do not indicate device ready when device enablement failsJack Morgenstein
[ Upstream commit 779e0bf47632c609c59f527f9711ecd3214dccb0 ] In procedure ib_register_device, procedure kobject_uevent is called (advertising that the device is ready for userspace usage) even when device_enable_and_get() returned an error. As a result, various RDMA modules attempted to register for the device even while the device driver was preparing to unregister the device. Fix this by advertising the device availability only after enabling the device succeeds. Fixes: e7a5b4aafd82 ("RDMA/device: Don't fire uevent before device is fully initialized") Link: https://lore.kernel.org/r/20201208073545.9723-3-leon@kernel.org Suggested-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30RDMA/cxgb4: Validate the number of CQEsKamal Heib
[ Upstream commit 6d8285e604e0221b67bd5db736921b7ddce37d00 ] Before create CQ, make sure that the requested number of CQEs is in the supported range. Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Link: https://lore.kernel.org/r/20201108132007.67537-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30RDMa/mthca: Work around -Wenum-conversion warningArnd Bergmann
[ Upstream commit fbb7dc5db6dee553b5a07c27e86364a5223e244c ] gcc points out a suspicious mixing of enum types in a function that converts from MTHCA_OPCODE_* values to IB_WC_* values: drivers/infiniband/hw/mthca/mthca_cq.c: In function 'mthca_poll_one': drivers/infiniband/hw/mthca/mthca_cq.c:607:21: warning: implicit conversion from 'enum <anonymous>' to 'enum ib_wc_opcode' [-Wenum-conversion] 607 | entry->opcode = MTHCA_OPCODE_INVALID; Nothing seems to ever check for MTHCA_OPCODE_INVALID again, no idea if this is meaningful, but it seems harmless as it deals with an invalid input. Remove MTHCA_OPCODE_INVALID and set the ib_wc_opcode to 0xFF, which is still bogus, but at least doesn't make compiler warnings. Fixes: 2a4443a69934 ("[PATCH] IB/mthca: fill in opcode field for send completions") Link: https://lore.kernel.org/r/20201026211311.3887003-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30RDMA/rxe: Compute PSN windows correctlyBob Pearson
[ Upstream commit bb3ab2979fd69db23328691cb10067861df89037 ] The code which limited the number of unacknowledged PSNs was incorrect. The PSNs are limited to 24 bits and wrap back to zero from 0x00ffffff. The test was computing a 32 bit value which wraps at 32 bits so that qp->req.psn can appear smaller than the limit when it is actually larger. Replace '>' test with psn_compare which is used for other PSN comparisons and correctly handles the 24 bit size. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20201013170741.3590-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30RDMA/bnxt_re: Set queue pair state when being queriedKamal Heib
[ Upstream commit 53839b51a7671eeb3fb44d479d541cf3a0f2dd45 ] The API for ib_query_qp requires the driver to set cur_qp_state on return, add the missing set. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/20201021114952.38876-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewaitLeon Romanovsky
[ Upstream commit 340b940ea0ed12d9adbb8f72dea17d516b2019e8 ] If cm_create_timewait_info() fails, the timewait_info pointer will contain an error value and will be used in cm_remove_remote() later. general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127] CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978 Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00 RSP: 0018:ffff888013127918 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000 RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4 RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70 R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c FS: 00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155 cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline] rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107 rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140 ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069 ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724 vfs_write+0x220/0×830 fs/read_write.c:603 ksys_write+0x1df/0×240 fs/read_write.c:658 do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reported-by: Amit Matityahu <mitm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-08RDMA/i40iw: Address an mmap handler exploit in i40iwShiraz Saleem
commit 2ed381439e89fa6d1a0839ef45ccd45d99d8e915 upstream. i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range without any validation. This is vulnerable to an mmap exploit as described in: https://lore.kernel.org/r/20201119093523.7588-1-zhudi21@huawei.com The push feature is disabled in the driver currently and therefore no push mmaps are issued from user-space. The feature does not work as expected in the x722 product. Remove the push module parameter and all VMA attribute manipulations for this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound to a single page. Cc: <stable@kernel.org> Fixes: d37498417947 ("i40iw: add files for iwarp interface") Link: https://lore.kernel.org/r/20201125005616.1800-2-shiraz.saleem@intel.com Reported-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-02RDMA/hns: Bugfix for memory window mtpt configurationYixian Liu
[ Upstream commit 17475e104dcb74217c282781817f8f52b46130d3 ] When a memory window is bound to a memory region, the local write access should be set for its mtpt table. Fixes: c7c28191408b ("RDMA/hns: Add MW support for hip08") Link: https://lore.kernel.org/r/1606386372-21094-1-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-02RDMA/hns: Fix retry_cnt and rnr_cnt when querying QPWenpeng Liang
[ Upstream commit ab6f7248cc446b85fe9e31091670ad7c4293d7fd ] The maximum number of retransmission should be returned when querying QP, not the value of retransmission counter. Fixes: 99fcf82521d9 ("RDMA/hns: Fix the wrong value of rnr_retry when querying qp") Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1606382977-21431-1-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-02IB/mthca: fix return value of error branch in mthca_init_cq()Xiongfeng Wang
[ Upstream commit 6830ff853a5764c75e56750d59d0bbb6b26f1835 ] We return 'err' in the error branch, but this variable may be set as zero by the above code. Fix it by setting 'err' as a negative value before we goto the error label. Fixes: 74c2174e7be5 ("IB uverbs: add mthca user CQ support") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/1605837422-42724-1-git-send-email-wangxiongfeng2@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configsChristoph Hellwig
[ Upstream commit b1e678bf290db5a76f1b6a9f7c381310e03440d6 ] dma_virt_ops requires that all pages have a kernel virtual address. Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM and make all three drivers depend on the new symbol. Also remove the ARCH_DMA_ADDR_T_64BIT dependency, which has been obsolete since commit 4965a68780c5 ("arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig") Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops") Link: https://lore.kernel.org/r/20201106181941.1878556-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24RDMA/pvrdma: Fix missing kfree() in pvrdma_register_device()Qinglang Miao
[ Upstream commit d035c3f6cdb8e5d5a17adcbb79d7453417a6077d ] Fix missing kfree in pvrdma_register_device() when failure from ib_device_set_netdev(). Fixes: 4b38da75e089 ("RDMA/drivers: Convert easy drivers to use ib_device_set_netdev()") Link: https://lore.kernel.org/r/20201111032202.17925-1-miaoqinglang@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05RDMA/qedr: Fix memory leak in iWARP CMAlok Prasad
[ Upstream commit a2267f8a52eea9096861affd463f691be0f0e8c9 ] Fixes memory leak in iWARP CM Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Link: https://lore.kernel.org/r/20201021115008.28138-1-palok@marvell.com Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-01RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()Jason Gunthorpe
commit 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e upstream. This three thread race can result in the work being run once the callback becomes NULL: CPU1 CPU2 CPU3 netevent_callback() process_one_req() rdma_addr_cancel() [..] spin_lock_bh() set_timeout() spin_unlock_bh() spin_lock_bh() list_del_init(&req->list); spin_unlock_bh() req->callback = NULL spin_lock_bh() if (!list_empty(&req->list)) // Skipped! // cancel_delayed_work(&req->work); spin_unlock_bh() process_one_req() // again req->callback() // BOOM cancel_delayed_work_sync() The solution is to always cancel the work once it is completed so any in between set_timeout() does not result in it running again. Cc: stable@vger.kernel.org Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence") Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org Reported-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29RDMA/rxe: Handle skb_clone() failure in rxe_recv.cBob Pearson
[ Upstream commit 71abf20b28ff87fee6951ec2218d5ce7969c4e87 ] If skb_clone() is unable to allocate memory for a new sk_buff this is not detected by the current code. Check for a NULL return and continue. This is similar to other errors in this loop over QPs attached to the multicast address and consistent with the unreliable UD transport. Fixes: e7ec96fc7932f ("RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()") Addresses-Coverity-ID: 1497804: Null pointer dereferences (NULL_RETURNS) Link: https://lore.kernel.org/r/20201013184236.5231-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()Bob Pearson
[ Upstream commit e7ec96fc7932f48a6d6cdd05bf82004a1a04285b ] The changes referenced below replaced sbk_clone)_ by taking additional references, passing the skb along and then freeing the skb. This deleted the packets before they could be processed and additionally passed bad data in each packet. Since pkt is stored in skb->cb changing pkt->qp changed it for all the packets. Replace skb_get() by sbk_clone() in rxe_rcv_mcast_pkt() for cases where multiple QPs are receiving multicast packets on the same address. Delete kfree_skb() because the packets need to live until they have been processed by each QP. They are freed later. Fixes: 86af61764151 ("IB/rxe: remove unnecessary skb_clone") Fixes: fe896ceb5772 ("IB/rxe: replace refcount_inc with skb_get") Link: https://lore.kernel.org/r/20201008203651.256958-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29IB/rdmavt: Fix sizeof mismatchColin Ian King
[ Upstream commit 8e71f694e0c819db39af2336f16eb9689f1ae53f ] An incorrect sizeof is being used, struct rvt_ibport ** is not correct, it should be struct rvt_ibport *. Note that since ** is the same size as * this is not causing any issues. Improve this fix by using sizeof(*rdi->ports) as this allows us to not even reference the type of the pointer. Also remove line breaks as the entire statement can fit on one line. Link: https://lore.kernel.org/r/20201008095204.82683-1-colin.king@canonical.com Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)") Fixes: ff6acd69518e ("IB/rdmavt: Add device structure allocation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/ipoib: Set rtnl_link_ops for ipoib interfacesKamal Heib
[ Upstream commit 5ce2dced8e95e76ff7439863a118a053a7fc6f91 ] Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every IPoiB interface type, not just children created with 'ip link add'. After setting the rtnl_link_ops for the parent interface, implement the dellink() callback to block users from trying to remove it. Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Link: https://lore.kernel.org/r/20201004132948.26669-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/hns: Fix missing sq_sig_type when querying QPWeihang Li
[ Upstream commit 05df49279f8926178ecb3ce88e61b63104cd6293 ] The sq_sig_type field should be filled when querying QP, or the users may get a wrong value. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1600509802-44382-9-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/hns: Fix the wrong value of rnr_retry when querying qpWenpeng Liang
[ Upstream commit 99fcf82521d91468ee6115a3c253aa032dc63cbc ] The rnr_retry returned to the user is not correct, it should be got from another fields in QPC. Fixes: bfe860351e31 ("RDMA/hns: Fix cast from or to restricted __le32 for driver") Link: https://lore.kernel.org/r/1600509802-44382-7-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29i40iw: Add support to make destroy QP synchronousSindhu, Devale
[ Upstream commit f2334964e969762e266a616acf9377f6046470a2 ] Occasionally ib_write_bw crash is seen due to access of a pd object in i40iw_sc_qp_destroy after it is freed. Destroy qp is not synchronous in i40iw and thus the iwqp object could be referencing a pd object that is freed by ib core as a result of successful return from i40iw_destroy_qp. Wait in i40iw_destroy_qp till all QP references are released and destroy the QP and its associated resources before returning. Switch to use the refcount API vs atomic API for lifetime management of the qp. RIP: 0010:i40iw_sc_qp_destroy+0x4b/0x120 [i40iw] [...] RSP: 0018:ffffb4a7042e3ba8 EFLAGS: 00010002 RAX: 0000000000000000 RBX: 0000000000000001 RCX: dead000000000122 RDX: ffffb4a7042e3bac RSI: ffff8b7ef9b1e940 RDI: ffff8b7efbf09080 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 8080808080808080 R11: 0000000000000010 R12: ffff8b7efbf08050 R13: 0000000000000001 R14: ffff8b7f15042928 R15: ffff8b7ef9b1e940 FS: 0000000000000000(0000) GS:ffff8b7f2fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000400 CR3: 000000020d60a006 CR4: 00000000001606e0 Call Trace: i40iw_exec_cqp_cmd+0x4d3/0x5c0 [i40iw] ? try_to_wake_up+0x1ea/0x5d0 ? __switch_to_asm+0x40/0x70 i40iw_process_cqp_cmd+0x95/0xa0 [i40iw] i40iw_handle_cqp_op+0x42/0x1a0 [i40iw] ? cm_event_handler+0x13c/0x1f0 [iw_cm] i40iw_rem_ref+0xa0/0xf0 [i40iw] cm_work_handler+0x99c/0xd10 [iw_cm] process_one_work+0x1a1/0x360 worker_thread+0x30/0x380 ? process_one_work+0x360/0x360 kthread+0x10c/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 Fixes: d37498417947 ("i40iw: add files for iwarp interface") Link: https://lore.kernel.org/r/20200916131811.2077-1-shiraz.saleem@intel.com Reported-by: Kamal Heib <kheib@redhat.com> Signed-off-by: Sindhu, Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't workJason Gunthorpe
[ Upstream commit 0ec52f0194638e2d284ad55eba5a7aff753de1b9 ] set_reg_wr() always fails if !umr_modify_entity_size_disabled because mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR doesn't work and that means the device should not advertise IB_DEVICE_MEM_MGT_EXTENSIONS. Fixes: 841b07f99a47 ("IB/mlx5: Block MR WR if UMR is not possible") Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/hns: Set the unsupported wr opcodeLijun Ou
[ Upstream commit 22d3e1ed2cc837af87f76c3c8a4ccf4455e225c5 ] hip06 does not support IB_WR_LOCAL_INV, so the ps_opcode should be set to an invalid value instead of being left uninitialized. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Fixes: a2f3d4479fe9 ("RDMA/hns: Avoid unncessary initialization") Link: https://lore.kernel.org/r/1600350615-115217-1-git-send-email-oulijun@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/cma: Consolidate the destruction of a cma_multicast in one placeJason Gunthorpe
[ Upstream commit 3788d2997bc0150ea911a964d5b5a2e11808a936 ] Two places were open coding this sequence, and also pull in cma_leave_roce_mc_group() which was called only once. Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/cma: Remove dead code for kernel rdmacm multicastJason Gunthorpe
[ Upstream commit 1bb5091def706732c749df9aae45fbca003696f2 ] There is no kernel user of RDMA CM multicast so this code managing the multicast subscription of the kernel-only internal QP is dead. Remove it. This makes the bug fixes in the next patches much simpler. Link: https://lore.kernel.org/r/20200902081122.745412-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/qedr: Fix inline size returned for iWARPMichal Kalderon
[ Upstream commit fbf58026b2256e9cd5f241a4801d79d3b2b7b89d ] commit 59e8970b3798 ("RDMA/qedr: Return max inline data in QP query result") changed query_qp max_inline size to return the max roce inline size. When iwarp was introduced, this should have been modified to return the max inline size based on protocol. This size is cached in the device attributes Fixes: 69ad0e7fe845 ("RDMA/qedr: Add support for iWARP in user space") Link: https://lore.kernel.org/r/20200902165741.8355-8-michal.kalderon@marvell.com Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/qedr: Fix return code if accept is called on a destroyed qpMichal Kalderon
[ Upstream commit 8a5a10a1a74465065c75d9de1aa6685e1f1aa117 ] In iWARP, accept could be called after a QP is already destroyed. In this case an error should be returned and not success. Fixes: 82af6d19d8d9 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr") Link: https://lore.kernel.org/r/20200902165741.8355-5-michal.kalderon@marvell.com Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29RDMA/qedr: Fix use of uninitialized fieldMichal Kalderon
[ Upstream commit a379ad54e55a12618cae7f6333fd1b3071de9606 ] dev->attr.page_size_caps was used uninitialized when setting device attributes Fixes: ec72fce401c6 ("qedr: Add support for RoCE HW init") Link: https://lore.kernel.org/r/20200902165741.8355-4-michal.kalderon@marvell.com Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>