aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2019-08-17RDMA/hns: Fix bad endianess of port_pd variableLeon Romanovsky
commit 6734b2973565e36659e97e12ab0d0faf1d9f3fbe upstream. port_pd is treated as le32 in declaration and read, fix assignment to be in le32 too. This change fixes the following compilation warnings. drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: warning: incorrect type in assignment (different base types) drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: expected restricted __le32 [usertype] port_pd drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: got restricted __be32 [usertype] Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Lijun Ou <ouliun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-17RDMA/cxgb4: Fix null pointer dereference on alloc_skb failureColin Ian King
commit a6d2a5a92e67d151c98886babdc86d530d27111c upstream. Currently if alloc_skb fails to allocate the skb a null skb is passed to t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid the NULL pointer dereference by checking for a NULL skb and returning early. Addresses-Coverity: ("Dereference null return") Fixes: b38a0ad8ec11 ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02RDMA/cma: Consider scope_id while binding to ipv6 ll addressParav Pandit
commit 5d7ed2f27bbd482fd29e6b2e204b1a1ee8a0b268 upstream. When two netdev have same link local addresses (such as vlan and non vlan), two rdma cm listen id should be able to bind to following different addresses. listener-1: addr=lla, scope_id=A, port=X listener-2: addr=lla, scope_id=B, port=X However while comparing the addresses only addr and port are considered, due to which 2nd listener fails to listen. In below example of two listeners, 2nd listener is failing with address in use error. $ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545& $ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545 rdma_bind_addr: Address already in use To overcome this, consider the scope_ids as well which forms the accurate IPv6 link local address. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-08-02IB/hfi1: Fix WQ_MEM_RECLAIM warningMike Marciniszyn
commit 4c4b1996b5db688e2dcb8242b0a3bf7b1e845e42 upstream. The work_item cancels that occur when a QP is destroyed can elicit the following trace: workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100 Call Trace: __flush_work.isra.29+0x8c/0x1a0 ? __switch_to_asm+0x40/0x70 __cancel_work_timer+0x103/0x190 ? schedule+0x32/0x80 iowait_cancel_work+0x15/0x30 [hfi1] rvt_reset_qp+0x1f8/0x3e0 [rdmavt] rvt_destroy_qp+0x65/0x1f0 [rdmavt] ? _cond_resched+0x15/0x30 ib_destroy_qp+0xe9/0x230 [ib_core] ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib] process_one_work+0x171/0x370 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM. The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become entangled with memory reclaim, so this flag is appropriate. Fixes: 0a226edd203f ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-30RDMA/hns: Fix bug that caused srq creation to failLijun Ou
commit 4772e03d239484f3461e33c79d721c8ea03f7416 upstream. Due to the incorrect use of the seg and obj information, the position of the mtt is calculated incorrectly, and the free space of the page is not enough to store the entire mtt, resulting in access to the next page. This patch fixes this problem. Unable to handle kernel paging request at virtual address ffff00006e3cd000 ... Call trace: hns_roce_write_mtt+0x154/0x2f0 [hns_roce] hns_roce_buf_write_mtt+0xa8/0xd8 [hns_roce] hns_roce_create_srq+0x74c/0x808 [hns_roce] ib_create_srq+0x28/0xc8 Fixes: 0203b14c4f32 ("RDMA/hns: Unify the calculation for hem index in hip08") Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-30RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_removeKamal Heib
commit ea7a5c706fa49273cf6d1d9def053ecb50db2076 upstream. Make sure to free the DSR on pvrdma_pci_remove() to avoid the memory leak. Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-30IB/hfi1: Fix the allocation of RSM tableKaike Wan
commit d0294344470e6b52d097aa7369173f32d11f2f52 upstream. The receive side mapping (RSM) on hfi1 hardware is a special matching mechanism to direct an incoming packet to a given hardware receive context. It has 4 instances of matching capabilities (RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of 256 entries, each of which points to a receive context. Currently, three instances of RSM have been used: 1. RSM0 by QOS; 2. RSM1 by PSM FECN; 3. RSM2 by VNIC. Each RSM instance should reserve enough entries in RMT to function properly. Since both PSM and VNIC could allocate any receive context between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must reserve enough RMT entries to cover the entire receive context index range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only the user receive contexts allocated for PSM (dd->num_user_contexts). Consequently, the sizing of dd->num_user_contexts in set_up_context_variables is incorrect. Fixes: 2280740f01ae ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-30IB/hfi1: Eliminate opcode tests on mr derefKaike Wan
commit a8639a79e85c18c16c10089edd589c7948f19bbd upstream. When an old ack_queue entry is used to store an incoming request, it may need to clean up the old entry if it is still referencing the MR. Originally only RDMA READ request needed to reference MR on the responder side and therefore the opcode was tested when cleaning up the old entry. The introduction of tid rdma specific operations in the ack_queue makes the specific opcode tests wrong. Multiple opcodes (RDMA READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup. Remove the opcode specific tests associated with the ack_queue. Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-23IB/core: Destroy QP if XRC QP failsYuval Avnery
commit 535005ca8e5e71918d64074032f4b9d4fef8981e upstream. The open-coded variant missed destroy of SELinux created QP, reuse already existing ib_detroy_qp() call and use this opportunity to clean ib_create_qp() from double prints and unclear exit paths. Reported-by: Parav Pandit <parav@mellanox.com> Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-23IB/core: Fix potential memory leak while creating MAD agentsDaniel Jurgens
commit 6e88e672b69f0e627acdae74a527b730ea224b6b upstream. If the MAD agents isn't allowed to manage the subnet, or fails to register for the LSM notifier, the security context is leaked. Free the context in these cases. Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-23IB/core: Unregister notifier before freeing MAD securityDaniel Jurgens
commit d60667fc398ed34b3c7456b020481c55c760e503 upstream. If the notifier runs after the security context is freed an access of freed memory can occur. Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-23scsi: RDMA/srpt: Fix a credit leak for aborted commandsBart Van Assche
commit 40ca8757291ca7a8775498112d320205b2a2e571 upstream. Make sure that the next time a response is sent to the initiator that the credit it had allocated for the aborted request gets freed. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Fixes: 131e6abc674e ("target: Add TFO->abort_task for aborted task resources release") # v3.15 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-15RDMA/mlx5: Do not allow the user to write to the clock pageJason Gunthorpe
commit c660133c339f9ab684fdf568c0d51b9ae5e86002 upstream. The intent of this VMA was to be read-only from user space, but the VM_MAYWRITE masking was missed, so mprotect could make it writable. Cc: stable@vger.kernel.org Fixes: 5c99eaecb1fc ("IB/mlx5: Mmap the HCA's clock info to user-space") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-15IB/rdmavt: Fix frwr memory registrationJosh Collier
commit 7c39f7f671d2acc0a1f39ebbbee4303ad499bbfa upstream. Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: a41081aa5936 ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-05i40iw: Avoid panic when handling the inetdev eventFeng Tang
commit ec4fe4bcc584b55e24e8d1768f5510a62c0fd619 upstream. There is a panic reported that on a system with x722 ethernet, when doing the operations like: # ip link add br0 type bridge # ip link set eno1 master br0 # systemctl restart systemd-networkd The system will panic "BUG: unable to handle kernel null pointer dereference at 0000000000000034", with call chain: i40iw_inetaddr_event notifier_call_chain blocking_notifier_call_chain notifier_call_chain __inet_del_ifa inet_rtm_deladdr rtnetlink_rcv_msg netlink_rcv_skb rtnetlink_rcv netlink_unicast netlink_sendmsg sock_sendmsg __sys_sendto It is caused by "local_ipaddr = ntohl(in->ifa_list->ifa_address)", while the in->ifa_list is NULL. So add a check for the "in->ifa_list == NULL" case, and skip the ARP operation accordingly. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-05IB/mlx4: Fix race condition between catas error reset and aliasguid flowsJack Morgenstein
commit 587443e7773e150ae29e643ee8f41a1eed226565 upstream. Code review revealed a race condition which could allow the catas error flow to interrupt the alias guid query post mechanism at random points. Thiis is fixed by doing cancel_delayed_work_sync() instead of cancel_delayed_work() during the alias guid mechanism destroy flow. Fixes: a0c64a17aba8 ("mlx4: Add alias_guid mechanism") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-04-12iw_cxgb4: fix srqidx leak during connection abortRaju Rangoju
commit f368ff188ae4b3ef6f740a15999ea0373261b619 upstream. When an application aborts the connection by moving QP from RTS to ERROR, then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the *srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the connection by calling c4iw_ep_disconnect(). c4iw_ep_disconnect() does the following: 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4. 2. sends abort request CPL to hw. But, since the close_complete_upcall() is sent before sending the ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the connection holds one. Because, the srqidx is passed up to libcxgb4 only after corresponding ABORT_RPL is processed by kernel in abort_rpl(). This patch handle the corner-case by moving the call to close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl(). So that libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and libcxgb4 can relinquish the srqidx properly. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-04-12IB/mlx4: Increase the timeout for CM cacheHåkon Bugge
commit 2612d723aadcf8281f9bf8305657129bd9f3cd57 upstream. Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-04-07RDMA/cma: Rollback source IP address if failing to acquire deviceMyungho Jung
commit 5fc01fb846bce8fa6d5f95e2625b8ce0f8e86810 upstream. If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the device state changes back to RDMA_CM_ADDR_BOUND but the resolved source IP address is still left. After that, if rdma_destroy_id() is called after rdma_listen(), the device is freed without removed from listen_any_list in cma_cancel_operation(). Revert to the previous IP address if acquiring device fails. Reported-by: syzbot+f3ce716af730c8f96637@syzkaller.appspotmail.com Signed-off-by: Myungho Jung <mhjungk@gmail.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-03-26IB/hfi1: Close race condition on user context disable and closeMichael J. Ruhl
commit bc5add09764c123f58942a37c8335247e683d234 upstream. When disabling and removing a receive context, it is possible for an asynchronous event (i.e IRQ) to occur. Because of this, there is a race between cleaning up the context, and the context being used by the asynchronous event. cpu 0 (context cleanup) rc->ref_count-- (ref_count == 0) hfi1_rcd_free() cpu 1 (IRQ (with rcd index)) rcd_get_by_index() lock ref_count+++ <-- reference count race (WARNING) return rcd unlock cpu 0 hfi1_free_ctxtdata() <-- incorrect free location lock remove rcd from array unlock free rcd This race will cause the following WARNING trace: WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 Call Trace: dump_stack+0x19/0x1b __warn+0xd8/0x100 warn_slowpath_null+0x1d/0x20 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] is_rcv_urgent_int+0x24/0x90 [hfi1] general_interrupt+0x1b6/0x210 [hfi1] __handle_irq_event_percpu+0x44/0x1c0 handle_irq_event_percpu+0x32/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x7f/0x150 handle_irq+0xe4/0x1a0 do_IRQ+0x4d/0xf0 common_interrupt+0x162/0x162 The race can also lead to a use after free which could be similar to: general protection fault: 0000 1 SMP CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 Call Trace: ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_aio_write+0xba/0x110 [hfi1] do_sync_readv_writev+0x7b/0xd0 do_readv_writev+0xce/0x260 ? handle_mm_fault+0x39d/0x9b0 ? pick_next_task_fair+0x5f/0x1b0 ? sched_clock_cpu+0x85/0xc0 ? __schedule+0x13a/0x890 vfs_writev+0x35/0x60 SyS_writev+0x7f/0x110 system_call_fastpath+0x22/0x27 Use the appropriate kref API to verify access. Reorder context cleanup to ensure context removal before cleanup occurs correctly. Cc: stable@vger.kernel.org # v4.14.0+ Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-03-20IB/ipoib: Fix for use-after-free in ipoib_cm_tx_startFeras Daoud
commit 6ab4aba00f811a5265acc4d3eb1863bb3ca60562 upstream. The following BUG was reported by kasan: BUG: KASAN: use-after-free in ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib] Read of size 80 at addr ffff88034c30bcd0 by task kworker/u16:1/24020 Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib] Call Trace: dump_stack+0x9a/0xeb print_address_description+0xe3/0x2e0 kasan_report+0x18a/0x2e0 ? ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib] memcpy+0x1f/0x50 ipoib_cm_tx_start+0x430/0x1390 [ib_ipoib] ? kvm_clock_read+0x1f/0x30 ? ipoib_cm_skb_reap+0x610/0x610 [ib_ipoib] ? __lock_is_held+0xc2/0x170 ? process_one_work+0x880/0x1960 ? process_one_work+0x912/0x1960 process_one_work+0x912/0x1960 ? wq_pool_ids_show+0x310/0x310 ? lock_acquire+0x145/0x440 worker_thread+0x87/0xbb0 ? process_one_work+0x1960/0x1960 kthread+0x314/0x3d0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 Allocated by task 0: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc_trace+0x168/0x3e0 path_rec_create+0xa2/0x1f0 [ib_ipoib] ipoib_start_xmit+0xa98/0x19e0 [ib_ipoib] dev_hard_start_xmit+0x159/0x8d0 sch_direct_xmit+0x226/0xb40 __dev_queue_xmit+0x1d63/0x2950 neigh_update+0x889/0x1770 arp_process+0xc47/0x21f0 arp_rcv+0x462/0x760 __netif_receive_skb_core+0x1546/0x2da0 netif_receive_skb_internal+0xf2/0x590 napi_gro_receive+0x28e/0x390 ipoib_ib_handle_rx_wc_rss+0x873/0x1b60 [ib_ipoib] ipoib_rx_poll_rss+0x17d/0x320 [ib_ipoib] net_rx_action+0x427/0xe30 __do_softirq+0x28e/0xc42 Freed by task 26680: __kasan_slab_free+0x11d/0x160 kfree+0xf5/0x360 ipoib_flush_paths+0x532/0x9d0 [ib_ipoib] ipoib_set_mode_rss+0x1ad/0x560 [ib_ipoib] set_mode+0xc8/0x150 [ib_ipoib] kernfs_fop_write+0x279/0x440 __vfs_write+0xd8/0x5c0 vfs_write+0x15e/0x470 ksys_write+0xb8/0x180 do_syscall_64+0x9b/0x420 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88034c30bcc8 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 8 bytes inside of 512-byte region [ffff88034c30bcc8, ffff88034c30bec8) The buggy address belongs to the page: The following race between change mode and xmit flow is the reason for this use-after-free: Change mode Send packet 1 to GID XX Send packet 2 to GID XX | | | start | | | | | | | | | Create new path for GID XX | | and update neigh path | | | | | | | | | | flush_paths | | | | queue_work(cm.start_task) | | Path for GID XX not found | create new path | | start_task runs with old released path There is no locking to protect the lifetime of the path through the ipoib_cm_tx struct, so delete it entirely and always use the newly looked up path under the priv->lock. Fixes: 546481c2816e ("IB/ipoib: Fix memory corruption in ipoib cm mode connect flow") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-03-20RDMA/ipoib: Fix use of sizeof()Kamal Heib
commit b1b639708f7431c85df4f70ae0d82c336705d7d4 upstream. Make sure to use sizeof(...) instead of sizeof ... which is more preferred. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-03-20IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMMBrian Welty
commit 904bba211acc2112fdf866e5a2bc6cd9ecd0de1b upstream. The work completion length for a receiving a UD send with immediate is short by 4 bytes causing application using this opcode to fail. The UD receive logic incorrectly subtracts 4 bytes for immediate value. These bytes are already included in header length and are used to calculate header/payload split, so the result is these 4 bytes are subtracted twice, once when the header length subtracted from the overall length and once again in the UD opcode specific path. Remove the extra subtraction when handling the opcode. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-03-08RDMA/srp: Rework SCSI device reset handlingBart Van Assche
commit 48396e80fb6526ea5ed267bd84f028bae56d2f9e upstream. Since .scsi_done() must only be called after scsi_queue_rq() has finished, make sure that the SRP initiator driver does not call .scsi_done() while scsi_queue_rq() is in progress. Although invoking sg_reset -d while I/O is in progress works fine with kernel v4.20 and before, that is not the case with kernel v5.0-rc1. This patch avoids that the following crash is triggered with kernel v5.0-rc1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000138 CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G B 5.0.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10 Call Trace: blk_mq_sched_dispatch_requests+0x2f7/0x300 __blk_mq_run_hw_queue+0xd6/0x180 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x4f1/0xa20 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: <stable@vger.kernel.org> Fixes: 94a9174c630c ("IB/srp: reduce lock coverage of command completion") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-03-01IB/hfi1: Unreserve a reserved request when it is completedKaike Wan
commit ca95f802ef5139722acc8d30aeaab6fe5bbe939e upstream. Currently, When a reserved operation is completed, its entry in the send queue will not be unreserved, which leads to the miscalculation of qp->s_avail and thus the triggering of a WARN_ON call trace. This patch fixes the problem by unreserving the reserved operation when it is completed. Fixes: 856cc4c237ad ("IB/hfi1: Add the capability for reserved operations") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-02-19IB/hfi1: Remove overly conservative VM_EXEC flag checkMichael J. Ruhl
commit 7709b0dc265f28695487712c45f02bbd1f98415d upstream. Applications that use the stack for execution purposes cause userspace PSM jobs to fail during mmap(). Both Fortran (non-standard format parsing) and C (callback functions located in the stack) applications can be written such that stack execution is required. The linker notes this via the gnu_stack ELF flag. This causes READ_IMPLIES_EXEC to be set which forces all PROT_READ mmaps to have PROT_EXEC for the process. Checking for VM_EXEC bit and failing the request with EPERM is overly conservative and will break any PSM application using executable stacks. Cc: <stable@vger.kernel.org> #v4.14+ Fixes: 12220267645c ("IB/hfi: Protect against writable mmap") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-02-05IB/usnic: Fix potential deadlockParvi Kaustubhi
commit 8036e90f92aae2784b855a0007ae2d8154d28b3c upstream. Acquiring the rtnl lock while holding usdev_lock could result in a deadlock. For example: usnic_ib_query_port() | mutex_lock(&us_ibdev->usdev_lock) | ib_get_eth_speed() | rtnl_lock() rtnl_lock() | usnic_ib_netdevice_event() | mutex_lock(&us_ibdev->usdev_lock) This commit moves the usdev_lock acquisition after the rtnl lock has been released. This is safe to do because usdev_lock is not protecting anything being accessed in ib_get_eth_speed(). Hence, the correct order of holding locks (rtnl -> usdev_lock) is not violated. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-02-05rxe: IB_WR_REG_MR does not capture MR's iova fieldChuck Lever
commit b024dd0eba6e6d568f69d63c5e3153aba94c23e3 upstream. FRWR memory registration is done with a series of calls and WRs. 1. ULP invokes ib_dma_map_sg() 2. ULP invokes ib_map_mr_sg() 3. ULP posts an IB_WR_REG_MR on the Send queue Step 2 generates an iova. It is permissible for ULPs to change this iova (with certain restrictions) between steps 2 and 3. rxe_map_mr_sg captures the MR's iova but later when rxe processes the REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova after step 2 but before step 3, rxe never captures that change. When the remote sends an RDMA Read targeting that MR, rxe looks up the R_key, but the altered iova does not match the iova stored in the MR, causing the RDMA Read request to fail. Reported-by: Anna Schumaker <schumaker.anna@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-02-05RDMA/nldev: Don't expose unsafe global rkey to regular userLeon Romanovsky
commit a9666c1cae8dbcd1a9aacd08a778bf2a28eea300 upstream. Unsafe global rkey is considered dangerous because it exposes memory registered for all memory in the system. Only users with a QP on the same PD can use the rkey, and generally those QPs will already know the value. However, out of caution, do not expose the value to unprivleged users on the local system. Require CAP_NET_ADMIN instead. Cc: <stable@vger.kernel.org> # 4.16 Fixes: 29cf1351d450 ("RDMA/nldev: provide detailed PD information") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-01-30i40iw: Reorganize acquire/release of locks in i40iw_manage_apbvtShiraz Saleem
commit aaf5e003b1c454c5722bc4ea9dfd3506c57d36a9 upstream. Commit f43c00c04bbf ("i40iw: Extend port reuse support for listeners") introduces a sparse warning: include/linux/spinlock.h:365:9: sparse: context imbalance in 'i40iw_manage_apbvt' - unexpected unlock Fix this by reorganizing the acquire/release of locks in i40iw_manage_apbvt and add a new function i40iw_cqp_manage_abvpt_cmd to perform the CQP command. Also, use __clear_bit and __test_and_set_bit as we do not need atomic versions. Fixes: f43c00c04bbf ("i40iw: Extend port reuse support for listeners") Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-01-14RDMA/srpt: Fix a use-after-free in the channel release codeBart Van Assche
commit ed041919f0d23c109d52cde8da6ddc211c52d67e upstream. This patch avoids that KASAN sporadically reports the following: BUG: KASAN: use-after-free in rxe_run_task+0x1e/0x60 [rdma_rxe] Read of size 1 at addr ffff88801c50d8f4 by task check/24830 CPU: 4 PID: 24830 Comm: check Not tainted 4.20.0-rc6-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_address_description+0x71/0x239 kasan_report.cold.5+0x242/0x301 __asan_load1+0x47/0x50 rxe_run_task+0x1e/0x60 [rdma_rxe] rxe_post_send+0x4bd/0x8d0 [rdma_rxe] srpt_zerolength_write+0xe1/0x160 [ib_srpt] srpt_close_ch+0x8b/0xe0 [ib_srpt] srpt_set_enabled+0xe7/0x150 [ib_srpt] srpt_tpg_enable_store+0xc0/0x100 [ib_srpt] configfs_write_file+0x157/0x1d0 __vfs_write+0xd7/0x3d0 vfs_write+0x102/0x290 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 13856: save_stack+0x43/0xd0 kasan_kmalloc+0xc7/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0x105/0x320 rxe_alloc+0xff/0x1f0 [rdma_rxe] rxe_create_qp+0x9f/0x160 [rdma_rxe] ib_create_qp+0xf5/0x690 [ib_core] rdma_create_qp+0x6a/0x140 [rdma_cm] srpt_cm_req_recv.cold.59+0x1588/0x237b [ib_srpt] srpt_rdma_cm_req_recv.isra.35+0x1d5/0x220 [ib_srpt] srpt_rdma_cm_handler+0x6f/0x100 [ib_srpt] cma_listen_handler+0x59/0x60 [rdma_cm] cma_ib_req_handler+0xd5b/0x2570 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_work_handler+0x2aae/0x502b [ib_cm] process_one_work+0x481/0x9e0 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Freed by task 3440: save_stack+0x43/0xd0 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xbc/0x330 rxe_elem_release+0x66/0xe0 [rdma_rxe] rxe_destroy_qp+0x3f/0x50 [rdma_rxe] ib_destroy_qp+0x140/0x360 [ib_core] srpt_release_channel_work+0xdc/0x310 [ib_srpt] process_one_work+0x481/0x9e0 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: Sergey Gorenko <sergeygo@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-01-14rxe: fix error completion wr_id and qp_numSagi Grimberg
commit e48d8ed9c6193502d849b35767fd18e20bbd7ba2 upstream. Error completions must still contain a valid wr_id and qp_num such that the consumer can rely on. Correctly fill these fields in receive error completions. Reported-by: Walker Benjamin <benjamin.walker@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-01-12IB/hfi1: Incorrect sizing of sge for PIO will OOPsMichael J. Ruhl
commit dbc2970caef74e8ff41923d302aa6fb5a4812d0e upstream. An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to this: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1] PGD 0 Oops: 0000 1 SMP Call Trace: ? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1] hfi1_verbs_send+0xdf/0x250 [hfi1] ? make_rc_ack+0xa80/0xa80 [hfi1] hfi1_do_send+0x192/0x430 [hfi1] hfi1_do_send_from_rvt+0x10/0x20 [hfi1] rvt_post_send+0x369/0x820 [rdmavt] ib_uverbs_post_send+0x317/0x570 [ib_uverbs] ib_uverbs_write+0x26f/0x420 [ib_uverbs] ? security_file_permission+0x21/0xa0 vfs_write+0xbd/0x1e0 ? mntput+0x24/0x40 SyS_write+0x7f/0xe0 system_call_fastpath+0x16/0x1b Fix by adding the missing sizing check to correctly determine the sge length. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-01-12IB/hfi1: Remove race conditions in user_sdma send pathMichael J. Ruhl
commit 28a9a9e83ceae2cee25b9af9ad20d53aaa9ab951 upstream. Packet queue state is over used to determine SDMA descriptor availablitity and packet queue request state. cpu 0 ret = user_sdma_send_pkts(req, pcount); cpu 0 if (atomic_read(&pq->n_reqs)) cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE) cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE); At this point pq->n_reqs == 0 and pq->state is incorrectly SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state to return to _INACTIVE. This can also change the state from _DEFERRED to _ACTIVE. However, this is a mostly benign race. Remove the racy code path. Use n_reqs to determine if a packet queue is active or not. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-30IB/hfi1: Fix an out-of-bounds access in get_hw_statsPiotr Stankiewicz
commit 36d842194a57f1b21fbc6a6875f2fa2f9a7f8679 upstream. When running with KASAN, the following trace is produced: [ 62.535888] ================================================================== [ 62.544930] BUG: KASAN: slab-out-of-bounds in gut_hw_stats+0x122/0x230 [hfi1] [ 62.553856] Write of size 8 at addr ffff88080e8d6330 by task kworker/0:1/14 [ 62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.19.0-test-build-kasan+ #8 [ 62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 [ 62.587951] Workqueue: events work_for_cpu_fn [ 62.594050] Call Trace: [ 62.598023] dump_stack+0xc6/0x14c [ 62.603089] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 62.610041] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 62.616615] ? get_hw_stats+0x122/0x230 [hfi1] [ 62.622985] print_address_description+0x6c/0x23c [ 62.629744] ? get_hw_stats+0x122/0x230 [hfi1] [ 62.636108] kasan_report.cold.6+0x241/0x308 [ 62.642365] get_hw_stats+0x122/0x230 [hfi1] [ 62.648703] ? hfi1_alloc_rn+0x40/0x40 [hfi1] [ 62.655088] ? __kmalloc+0x110/0x240 [ 62.660695] ? hfi1_alloc_rn+0x40/0x40 [hfi1] [ 62.667142] setup_hw_stats+0xd8/0x430 [ib_core] [ 62.673972] ? show_hfi+0x50/0x50 [hfi1] [ 62.680026] ib_device_register_sysfs+0x165/0x180 [ib_core] [ 62.687995] ib_register_device+0x5a2/0xa10 [ib_core] [ 62.695340] ? show_hfi+0x50/0x50 [hfi1] [ 62.701421] ? ib_unregister_device+0x2e0/0x2e0 [ib_core] [ 62.709222] ? __vmalloc_node_range+0x2d0/0x380 [ 62.716131] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt] [ 62.723735] ? vmalloc_node+0x5c/0x70 [ 62.729697] ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt] [ 62.737347] ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt] [ 62.744998] ? __rvt_alloc_mr+0x110/0x110 [rdmavt] [ 62.752315] ? rvt_rc_error+0x140/0x140 [rdmavt] [ 62.759434] ? rvt_vma_open+0x30/0x30 [rdmavt] [ 62.766364] ? mutex_unlock+0x1d/0x40 [ 62.772445] ? kmem_cache_create_usercopy+0x15d/0x230 [ 62.780115] rvt_register_device+0x1f6/0x360 [rdmavt] [ 62.787823] ? rvt_get_port_immutable+0x180/0x180 [rdmavt] [ 62.796058] ? __get_txreq+0x400/0x400 [hfi1] [ 62.802969] ? memcpy+0x34/0x50 [ 62.808611] hfi1_register_ib_device+0xde6/0xeb0 [hfi1] [ 62.816601] ? hfi1_get_npkeys+0x10/0x10 [hfi1] [ 62.823760] ? hfi1_init+0x89f/0x9a0 [hfi1] [ 62.830469] ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1] [ 62.838204] ? pcie_capability_clear_and_set_word+0xcd/0xe0 [ 62.846429] ? pcie_capability_read_word+0xd0/0xd0 [ 62.853791] ? hfi1_pcie_init+0x187/0x4b0 [hfi1] [ 62.860958] init_one+0x67f/0xae0 [hfi1] [ 62.867301] ? hfi1_init+0x9a0/0x9a0 [hfi1] [ 62.873876] ? wait_woken+0x130/0x130 [ 62.879860] ? read_word_at_a_time+0xe/0x20 [ 62.886329] ? strscpy+0x14b/0x280 [ 62.891998] ? hfi1_init+0x9a0/0x9a0 [hfi1] [ 62.898405] local_pci_probe+0x70/0xd0 [ 62.904295] ? pci_device_shutdown+0x90/0x90 [ 62.910833] work_for_cpu_fn+0x29/0x40 [ 62.916750] process_one_work+0x584/0x960 [ 62.922974] ? rcu_work_rcufn+0x40/0x40 [ 62.928991] ? __schedule+0x396/0xdc0 [ 62.934806] ? __sched_text_start+0x8/0x8 [ 62.941020] ? pick_next_task_fair+0x68b/0xc60 [ 62.947674] ? run_rebalance_domains+0x260/0x260 [ 62.954471] ? __list_add_valid+0x29/0xa0 [ 62.960607] ? move_linked_works+0x1c7/0x230 [ 62.967077] ? trace_event_raw_event_workqueue_execute_start+0x140/0x140 [ 62.976248] ? mutex_lock+0xa6/0x100 [ 62.982029] ? __mutex_lock_slowpath+0x10/0x10 [ 62.988795] ? __switch_to+0x37a/0x710 [ 62.994731] worker_thread+0x62e/0x9d0 [ 63.000602] ? max_active_store+0xf0/0xf0 [ 63.006828] ? __switch_to_asm+0x40/0x70 [ 63.012932] ? __switch_to_asm+0x34/0x70 [ 63.019013] ? __switch_to_asm+0x40/0x70 [ 63.025042] ? __switch_to_asm+0x34/0x70 [ 63.031030] ? __switch_to_asm+0x40/0x70 [ 63.037006] ? __schedule+0x396/0xdc0 [ 63.042660] ? kmem_cache_alloc_trace+0xf3/0x1f0 [ 63.049323] ? kthread+0x59/0x1d0 [ 63.054594] ? ret_from_fork+0x35/0x40 [ 63.060257] ? __sched_text_start+0x8/0x8 [ 63.066212] ? schedule+0xcf/0x250 [ 63.071529] ? __wake_up_common+0x110/0x350 [ 63.077794] ? __schedule+0xdc0/0xdc0 [ 63.083348] ? wait_woken+0x130/0x130 [ 63.088963] ? finish_task_switch+0x1f1/0x520 [ 63.095258] ? kasan_unpoison_shadow+0x30/0x40 [ 63.101792] ? __init_waitqueue_head+0xa0/0xd0 [ 63.108183] ? replenish_dl_entity.cold.60+0x18/0x18 [ 63.115151] ? _raw_spin_lock_irqsave+0x25/0x50 [ 63.121754] ? max_active_store+0xf0/0xf0 [ 63.127753] kthread+0x1ae/0x1d0 [ 63.132894] ? kthread_bind+0x30/0x30 [ 63.138422] ret_from_fork+0x35/0x40 [ 63.146973] Allocated by task 14: [ 63.152077] kasan_kmalloc+0xbf/0xe0 [ 63.157471] __kmalloc+0x110/0x240 [ 63.162804] init_cntrs+0x34d/0xdf0 [hfi1] [ 63.168883] hfi1_init_dd+0x29a3/0x2f90 [hfi1] [ 63.175244] init_one+0x551/0xae0 [hfi1] [ 63.181065] local_pci_probe+0x70/0xd0 [ 63.186759] work_for_cpu_fn+0x29/0x40 [ 63.192310] process_one_work+0x584/0x960 [ 63.198163] worker_thread+0x62e/0x9d0 [ 63.203843] kthread+0x1ae/0x1d0 [ 63.208874] ret_from_fork+0x35/0x40 [ 63.217203] Freed by task 1: [ 63.221844] __kasan_slab_free+0x12e/0x180 [ 63.227844] kfree+0x92/0x1a0 [ 63.232570] single_release+0x3a/0x60 [ 63.238024] __fput+0x1d9/0x480 [ 63.242911] task_work_run+0x139/0x190 [ 63.248440] exit_to_usermode_loop+0x191/0x1a0 [ 63.254814] do_syscall_64+0x301/0x330 [ 63.260283] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 63.270199] The buggy address belongs to the object at ffff88080e8d5500 which belongs to the cache kmalloc-4096 of size 4096 [ 63.287247] The buggy address is located 3632 bytes inside of 4096-byte region [ffff88080e8d5500, ffff88080e8d6500) [ 63.303564] The buggy address belongs to the page: [ 63.310447] page:ffffea00203a3400 count:1 mapcount:0 mapping:ffff88081380e840 index:0x0 compound_mapcount: 0 [ 63.323102] flags: 0x2fffff80008100(slab|head) [ 63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001 ffff88081380e840 [ 63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 63.350564] page dumped because: kasan: bad access detected [ 63.361974] Memory state around the buggy address: [ 63.369137] ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 63.379082] ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc [ 63.398944] ^ [ 63.406141] ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 63.416109] ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 63.426099] ================================================================== The trace happens because get_hw_stats() assumes there is room in the memory allocated in init_cntrs() to accommodate the driver counters. Unfortunately, that routine only allocated space for the device counters. Fix by insuring the allocation has room for the additional driver counters. Cc: <Stable@vger.kernel.org> # v4.14+ Fixes: b7481944b06e9 ("IB/hfi1: Show statistics counters under IB stats interface") Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com> Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-30IB/mlx5: Fix page fault handling for MWArtemy Kovalyov
commit 75b7b86bdb0df37e08e44b6c1f99010967f81944 upstream. Memory windows are implemented with an indirect MKey, when a page fault event comes for a MW Mkey we need to find the MR at the end of the list of the indirect MKeys by iterating on all items from the first to the last. The offset calculated during this process has to be zeroed after the first iteration or the next iteration will start from a wrong address, resulting incorrect ODP faulting behavior. Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-30RDMA/hns: Bugfix pbl configuration for rereg mrYixian Liu
commit ca088320a02537f36c243ac21794525d8eabb3bd upstream. Current hns driver assigned the first two PBL page addresses from previous registered MR to the hardware when reregister MR changing the memory locations occurred. This will lead to PBL addressing error as the PBL has already been released. This patch fixes this wrong assignment by using the page address from new allocated PBL. Fixes: a2c80b7b4119 ("RDMA/hns: Add rereg mr support for hip08") Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-30RDMA/rdmavt: Fix rvt_create_ah function signatureKamal Heib
commit 4f32fb921b153ae9ea280e02a3e91509fffc03d3 upstream. rdmavt uses a crazy system that looses the type checking when assinging functions to struct ib_device function pointers. Because of this the signature to this function was not changed when the below commit revised things. Fix the signature so we are not calling a function pointer with a mismatched signature. Fixes: 477864c8fcd9 ("IB/core: Let create_ah return extended response to user") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-30RDMA/bnxt_re: Avoid accessing the device structure after it is freedSelvin Xavier
commit a6c66d6a08b88cc10aca9d3f65cfae31e7652a99 upstream. When bnxt_re_ib_reg returns failure, the device structure gets freed. Driver tries to access the device pointer after it is freed. [ 4871.034744] Failed to register with netedev: 0xffffffa1 [ 4871.034765] infiniband (null): Failed to register with IB: 0xffffffea [ 4871.046430] ================================================================== [ 4871.046437] BUG: KASAN: use-after-free in bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046439] Write of size 4 at addr ffff880fa8406f48 by task kworker/u48:2/17813 [ 4871.046443] CPU: 20 PID: 17813 Comm: kworker/u48:2 Kdump: loaded Tainted: G B OE 4.20.0-rc1+ #42 [ 4871.046444] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 4871.046447] Workqueue: bnxt_re bnxt_re_task [bnxt_re] [ 4871.046449] Call Trace: [ 4871.046454] dump_stack+0x91/0xeb [ 4871.046458] print_address_description+0x6a/0x2a0 [ 4871.046461] kasan_report+0x176/0x2d0 [ 4871.046463] ? bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046466] bnxt_re_task+0x63/0x180 [bnxt_re] [ 4871.046470] process_one_work+0x216/0x5b0 [ 4871.046471] ? process_one_work+0x189/0x5b0 [ 4871.046475] worker_thread+0x4e/0x3d0 [ 4871.046479] kthread+0x10e/0x140 [ 4871.046480] ? process_one_work+0x5b0/0x5b0 [ 4871.046482] ? kthread_stop+0x220/0x220 [ 4871.046486] ret_from_fork+0x3a/0x50 [ 4871.046492] The buggy address belongs to the page: [ 4871.046494] page:ffffea003ea10180 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 4871.046495] flags: 0x57ffffc0000000() [ 4871.046498] raw: 0057ffffc0000000 0000000000000000 ffffea003ea10188 0000000000000000 [ 4871.046500] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 4871.046501] page dumped because: kasan: bad access detected Avoid accessing the device structure once it is freed. Fixes: 497158aa5f52 ("RDMA/bnxt_re: Fix the ib_reg failure cleanup") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-30RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WRMajd Dibbiny
commit 074fca3a18e7e1e0d4d7dcc9d7badc43b90232f4 upstream. Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the current fence will be SMALL instead of Normal Fence. Without this patch krping doesn't work on CX-5 devices and throws following error: The error messages are from CX5 driver are: (from server side) [ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe [ 710.434016] 00000000 00000000 00000000 00000000 [ 710.434016] 00000000 00000000 00000000 00000000 [ 710.434017] 00000000 00000000 00000000 00000000 [ 710.434018] 00000000 93003204 100000b8 000524d2 [ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32 Fixed the logic to set the correct fence type. Fixes: 6e8484c5cf07 ("RDMA/mlx5: set UMR wqe fence according to HCA cap") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-15IB/mlx5: Avoid load failure due to unknown link widthMichael Guralnik
commit db7a691a1551a748cb92d9c89c6b190ea87e28d5 upstream. If the firmware reports a connection width that is not 1x, 4x, 8x or 12x it causes the driver to fail during initialization. To prevent this failure every time a new width is introduced to the RDMA stack, we will set a default 4x width for these widths which ar unknown to the driver. This is needed to allow to run old kernels with new firmware. Cc: <stable@vger.kernel.org> # 4.1 Fixes: 1b5daf11b015 ("IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode") Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-15iser: set sector for ambiguous mr status errorsSagi Grimberg
commit 24c3456c8d5ee6fc1933ca40f7b4406130682668 upstream. If for some reason we failed to query the mr status, we need to make sure to provide sufficient information for an ambiguous error (guard error on sector 0). Fixes: 0a7a08ad6f5f ("IB/iser: Implement check_protection") Cc: <stable@vger.kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-12-15IB/hfi1: Eliminate races in the SDMA send error pathMichael J. Ruhl
commit a0e0cb82804a6a21d9067022c2dfdf80d11da429 upstream. pq_update() can only be called in two places: from the completion function when the complete (npkts) sequence of packets has been submitted and processed, or from setup function if a subset of the packets were submitted (i.e. the error path). Currently both paths can call pq_update() if an error occurrs. This race will cause the n_req value to go negative, hanging file_close(), or cause a crash by freeing the txlist more than once. Several variables are used to determine SDMA send state. Most of these are unnecessary, and have code inspectible races between the setup function and the completion function, in both the send path and the error path. The request 'status' value can be set by the setup or by the completion function. This is code inspectibly racy. Since the status is not needed in the completion code or by the caller it has been removed. The request 'done' value races between usage by the setup and the completion function. The completion function does not need this. When the number of processed packets matches npkts, it is done. The 'has_error' value races between usage of the setup and the completion function. This can cause incorrect error handling and leave the n_req in an incorrect value (i.e. negative). Simplify the code by removing all of the unneeded state checks and variables. Clean up iovs node when it is freed. Eliminate race conditions in the error path: If all packets are submitted, the completion handler will set the completion status correctly (ok or aborted). If all packets are not submitted, the caller must wait until the submitted packets have completed, and then set the completion status. These two change eliminate the race condition in the error path. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2018-11-13IB/mlx5: Fix MR cache initializationArtemy Kovalyov
commit 013c2403bf32e48119aeb13126929f81352cc7ac upstream. Schedule MR cache work only after bucket was initialized. Cc: <stable@vger.kernel.org> # 4.10 Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13IB/rxe: fix for duplicate request processing and ack psnsVijay Immanuel
[ Upstream commit b97db58557f4aa6d9903f8e1deea6b3d1ed0ba43 ] Don't reset the resp opcode for a replayed read response. The resp opcode could be in the middle of a write or send sequence, when the duplicate read request was received. An example sequence is as follows: - Receive read request for 12KB PSN 20. Transmit read response first, middle and last with PSNs 20,21,22. - Receive write first PSN 23. At this point the resp psn is 24 and resp opcode is write first. - The sender notices that PSN 20 is dropped and retransmits. Receive read request for 12KB PSN 20. Transmit read response first, middle and last with PSNs 20,21,22. The resp opcode is set to -1, the resp psn remains 24. - Receive write first PSN 23. This is processed by duplicate_request(). The resp opcode remains -1 and resp psn remains 24. - Receive write middle PSN 24. check_op_seq() reports a missing first error since the resp opcode is -1. When sending an ack for a duplicate send or write request, use the psn of the previous ack sent. Do not use the psn of a read response for the ack. An example sequence is as follows: - Receive write PSN 30. Transmit ACK for PSN 30. - Receive read request 4KB PSN 31. Transmit read response with PSN 31. The resp psn is now 32. - The sender notices that PSN 30 is dropped and retransmits. Receive write PSN 30. duplicate_request() sends an ACK with PSN 31. That is incorrect since PSN 31 was a read request. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13IB/mlx5: Allow transition of DCI QP to resetMoni Shoua
[ Upstream commit 99ed748e878a99c6c7b87bbec063eefd9e47cb42 ] The transition is allowed from any state and the atrribute mask must be IB_QP_STATE. Fixes: c32a4f296e1d ("IB/mlx5: Add support for DC Initiator QP") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13RDMA/bnxt_re: Fix recursive lock warning in debug kernelSelvin Xavier
[ Upstream commit d455f29f6d76a5f94881ca1289aaa1e90617ff5d ] Fix possible recursive lock warning. Its a false warning as the locks are part of two differnt HW Queue data structure - cmdq and creq. Debug kernel is throwing the following warning and stack trace. [ 783.914967] ============================================ [ 783.914970] WARNING: possible recursive locking detected [ 783.914973] 4.19.0-rc2+ #33 Not tainted [ 783.914976] -------------------------------------------- [ 783.914979] swapper/2/0 is trying to acquire lock: [ 783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.914999] but task is already holding lock: [ 783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re] [ 783.915013] other info that might help us debug this: [ 783.915016] Possible unsafe locking scenario: [ 783.915019] CPU0 [ 783.915021] ---- [ 783.915034] lock(&(&hwq->lock)->rlock); [ 783.915035] lock(&(&hwq->lock)->rlock); [ 783.915037] *** DEADLOCK *** [ 783.915038] May be due to missing lock nesting notation [ 783.915039] 1 lock held by swapper/2/0: [ 783.915040] #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re] [ 783.915044] stack backtrace: [ 783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33 [ 783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 783.915048] Call Trace: [ 783.915049] <IRQ> [ 783.915054] dump_stack+0x90/0xe3 [ 783.915058] __lock_acquire+0x106c/0x1080 [ 783.915061] ? sched_clock+0x5/0x10 [ 783.915063] lock_acquire+0xbd/0x1a0 [ 783.915065] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915069] _raw_spin_lock_irqsave+0x4a/0x90 [ 783.915071] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915073] bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915078] tasklet_action_common.isra.17+0x197/0x1b0 [ 783.915081] __do_softirq+0xcb/0x3a6 [ 783.915084] irq_exit+0xe9/0x100 [ 783.915085] do_IRQ+0x6a/0x120 [ 783.915087] common_interrupt+0xf/0xf [ 783.915088] </IRQ> Use nested notation for the spin_lock to avoid this warning. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13RDMA/bnxt_re: Avoid accessing nq->bar_reg_iomem in failure caseSelvin Xavier
[ Upstream commit ed51efd2ce44091a858ad829f666727e7c95695e ] In the failure path, nq->bar_reg_iomem gets accessed without initializing. Avoid this by calling the bnxt_qplib_nq_stop_irq only if the initialization is complete. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13IB/ipoib: Clear IPCB before icmp_sendDenis Drozdov
[ Upstream commit 4d6e4d12da2c308f8f976d3955c45ee62539ac98 ] IPCB should be cleared before icmp_send, since it may contain data from previous layers and the data could be misinterpreted as ip header options, which later caused the ihl to be set to an invalid value and resulted in the following stack corruption: [ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping [ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping [ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping [ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping [ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping [ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping [ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping [ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping [ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping [ 1083.034387] ================================================================== [ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310 [ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7 [ 1083.034990] [ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G O 4.19.0-rc5+ #1 [ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib] [ 1083.035750] Call Trace: [ 1083.035888] dump_stack+0x9a/0xeb [ 1083.036031] print_address_description+0xe3/0x2e0 [ 1083.036213] kasan_report+0x18a/0x2e0 [ 1083.036356] ? __ip_options_echo+0xf08/0x1310 [ 1083.036522] __ip_options_echo+0xf08/0x1310 [ 1083.036688] icmp_send+0x7b9/0x1cd0 [ 1083.036843] ? icmp_route_lookup.constprop.9+0x1070/0x1070 [ 1083.037018] ? netif_schedule_queue+0x5/0x200 [ 1083.037180] ? debug_show_all_locks+0x310/0x310 [ 1083.037341] ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120 [ 1083.037519] ? debug_locks_off+0x11/0x80 [ 1083.037673] ? debug_check_no_obj_freed+0x207/0x4c6 [ 1083.037841] ? check_flags.part.27+0x450/0x450 [ 1083.037995] ? debug_check_no_obj_freed+0xc3/0x4c6 [ 1083.038169] ? debug_locks_off+0x11/0x80 [ 1083.038318] ? skb_dequeue+0x10e/0x1a0 [ 1083.038476] ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib] [ 1083.038642] ? netif_schedule_queue+0xa8/0x200 [ 1083.038820] ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib] [ 1083.038996] ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib] [ 1083.039174] process_one_work+0x912/0x1830 [ 1083.039336] ? wq_pool_ids_show+0x310/0x310 [ 1083.039491] ? lock_acquire+0x145/0x3a0 [ 1083.042312] worker_thread+0x87/0xbb0 [ 1083.045099] ? process_one_work+0x1830/0x1830 [ 1083.047865] kthread+0x322/0x3e0 [ 1083.050624] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 1083.053354] ret_from_fork+0x3a/0x50 For instance __ip_options_echo is failing to proceed with invalid srr and optlen passed from another layer via IPCB [ 762.139568] IPv4: __ip_options_echo rr=0 ts=0 srr=43 cipso=0 [ 762.139720] IPv4: ip_options_build: IPCB 00000000f3cd969e opt 000000002ccb3533 [ 762.139838] IPv4: __ip_options_echo in srr: optlen 197 soffset 84 [ 762.139852] IPv4: ip_options_build srr=0 is_frag=0 rr_needaddr=0 ts_needaddr=0 ts_needtime=0 rr=0 ts=0 [ 762.140269] ================================================================== [ 762.140713] IPv4: __ip_options_echo rr=0 ts=0 srr=0 cipso=0 [ 762.141078] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x12ec/0x1680 [ 762.141087] Write of size 4 at addr ffff880353457c7f by task kworker/u16:0/7 Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13RDMA/core: Do not expose unsupported countersParav Pandit
[ Upstream commit 0f6ef65d1c6ec8deb5d0f11f86631ec4cfe8f22e ] If the provider driver (such as rdma_rxe) doesn't support pma counters, avoid exposing its directory similar to optional hw_counters directory. If core fails to read the PMA counter, return an error so that user can retry later if needed. Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c") Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>