summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2019-10-28RDMA/hns: Prevent memory leaks of eq->buf_listLijun Ou
eq->buf_list->buf and eq->buf_list should also be freed when eqe_hop_num is set to 0, or there will be memory leaks. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/1572072995-11277-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure casePotnuri Bharat Teja
_put_ep_safe() and _put_pass_ep_safe() free the skb before it is freed by process_work(). fix double free by freeing the skb only in process_work(). Fixes: 1dad0ebeea1c ("iw_cxgb4: Avoid touch after free error in ARP failure handlers") Link: https://lore.kernel.org/r/1572006880-5800-1-git-send-email-bharat@chelsio.com Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Use irq xarray locking for mkey_tableJason Gunthorpe
The mkey_table xarray is touched by the reg_mr_callback() function which is called from a hard irq. Thus all other uses of xa_lock must use the _irq variants. WARNING: inconsistent lock state 5.4.0-rc1 #12 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30 {IN-HARDIRQ-W} state was registered at: lock_acquire+0xe1/0x200 _raw_spin_lock_irqsave+0x35/0x50 reg_mr_callback+0x2dd/0x450 [mlx5_ib] mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core] mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core] [..] Possible unsafe locking scenario: CPU0 ---- lock(&(&xa->xa_lock)->rlock#3); <Interrupt> lock(&(&xa->xa_lock)->rlock#3); *** DEADLOCK *** 2 locks held by python3/343: #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs] #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs] stack backtrace: CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x86/0xca print_usage_bug.cold.50+0x2e5/0x355 mark_lock+0x871/0xb50 ? match_held_lock+0x20/0x250 ? check_usage_forwards+0x240/0x240 __lock_acquire+0x7de/0x23a0 ? __kasan_check_read+0x11/0x20 ? mark_lock+0xae/0xb50 ? mark_held_locks+0xb0/0xb0 ? find_held_lock+0xca/0xf0 lock_acquire+0xe1/0x200 ? xa_erase+0x12/0x30 _raw_spin_lock+0x2a/0x40 ? xa_erase+0x12/0x30 xa_erase+0x12/0x30 mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib] uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs] uverbs_free_mw+0x1a/0x20 [ib_uverbs] destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs] [..] Fixes: 0417791536ae ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases") Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24IB/core: Avoid deadlock during netlink message handlingParav Pandit
When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence. [..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again Due to above call sequence, following call trace and deadlock is observed. kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented. 1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock. 2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class. Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24RDMA/nldev: Skip counter if port doesn't matchMark Zhang
The counter resource should return -EAGAIN if it was requested for a different port, this is similar to how QP works if the users provides a port filter. Otherwise port filtering in netlink will return broken counter nests. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/uverbs: Prevent potential underflowDan Carpenter
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that: if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) { But we don't check if "attr.comp_vector" is negative. It could potentially lead to an array underflow. My concern would be where cq->vector is used in the create_cq() function from the cxgb4 driver. And really "attr.comp_vector" is appears as a u32 to user space so that's the right type to use. Fixes: 9ee79fce3642 ("IB/core: Add completion queue (cq) object actions") Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-18IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fieldsParav Pandit
Current code tries to derive VLAN ID and compares it with GID attribute for matching entry. This raw search fails on macvlan netdevice as its not a VLAN device, but its an upper device of a VLAN netdevice. Due to this limitation, incoming QP1 packets fail to match in the GID table. Such packets are dropped. Hence, to support it, use the existing rdma_read_gid_l2_fields() that takes care of diffferent device types. Fixes: dbf727de7440 ("IB/core: Use GID table in AH creation and dmac resolution") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/qedr: Fix reported firmware versionKamal Heib
Remove spaces from the reported firmware version string. Actual value: $ cat /sys/class/infiniband/qedr0/fw_ver 8. 37. 7. 0 Expected value: $ cat /sys/class/infiniband/qedr0/fw_ver 8.37.7.0 Fixes: ec72fce401c6 ("qedr: Add support for RoCE HW init") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/siw: free siw_base_qp in kref release routineKrishnamraju Eraparaju
As siw_free_qp() is the last routine to access 'siw_base_qp' structure, freeing this structure early in siw_destroy_qp() could cause touch-after-free issue. Hence, moved kfree(siw_base_qp) from siw_destroy_qp() to siw_free_qp(). Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Link: https://lore.kernel.org/r/20191007104229.29412-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/iwcm: move iw_rem_ref() calls out of spinlockKrishnamraju Eraparaju
kref release routines usually perform memory release operations, hence, they should not be called with spinlocks held. one such case is: SIW kref release routine siw_free_qp(), which can sleep via vfree() while freeing queue memory. Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks. Fixes: 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18iw_cxgb4: fix ECN check on the passive acceptPotnuri Bharat Teja
pass_accept_req() is using the same skb for handling accept request and sending accept reply to HW. Here req and rpl structures are pointing to same skb->data which is over written by INIT_TP_WR() and leads to accessing corrupt req fields in accept_cr() while checking for ECN flags. Reordered code in accept_cr() to fetch correct req fields. Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections") Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Use a common pad buffer for 9B and 16B packetsMike Marciniszyn
There is no reason for a different pad buffer for the two packet types. Expand the current buffer allocation to allow for both packet types. Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Avoid excessive retry for TID RDMA READ requestKaike Wan
A TID RDMA READ request could be retried under one of the following conditions: - The RC retry timer expires; - A later TID RDMA READ RESP packet is received before the next expected one. For the latter, under normal conditions, the PSN in IB space is used for comparison. More specifically, the IB PSN in the incoming TID RDMA READ RESP packet is compared with the last IB PSN of a given TID RDMA READ request to determine if the request should be retried. This is similar to the retry logic for noraml RDMA READ request. However, if a TID RDMA READ RESP packet is lost due to congestion, header suppresion will be disabled and each incoming packet will raise an interrupt until the hardware flow is reloaded. Under this condition, each packet KDETH PSN will be checked by software against r_next_psn and a retry will be requested if the packet KDETH PSN is later than r_next_psn. Since each TID RDMA READ segment could have up to 64 packets and each TID RDMA READ request could have many segments, we could make far more retries under such conditions, and thus leading to RETRY_EXC_ERR status. This patch fixes the issue by removing the retry when the incoming packet KDETH PSN is later than r_next_psn. Instead, it resorts to RC timer and normal IB PSN comparison for any request retry. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17RDMA/mlx5: Clear old rate limit when closing QPRafi Wiener
Before QP is closed it changes to ERROR state, when this happens the QP was left with old rate limit that was already removed from the table. Fixes: 7d29f349a4b9 ("IB/mlx5: Properly adjust rate limit on QP state transitions") Signed-off-by: Rafi Wiener <rafiw@mellanox.com> Signed-off-by: Oleg Kuporosov <olegk@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-04RDMA/mlx5: Add missing synchronize_srcu() for MW casesJason Gunthorpe
While MR uses live as the SRCU 'update', the MW case uses the xarray directly, xa_erase() causes the MW to become inaccessible to the pagefault thread. Thus whenever a MW is removed from the xarray we must synchronize_srcu() before freeing it. This must be done before freeing the mkey as re-use of the mkey while the pagefault thread is using the stale mkey is undesirable. Add the missing synchronizes to MW and DEVX indirect mkey and delete the bogus protection against double destroy in mlx5_core_destroy_mkey() Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Put live in the correct place for ODP MRsJason Gunthorpe
live is used to signal to the pagefault thread that the MR is initialized and ready for use. It should be after the umem is assigned and all other setup is completed. This prevents races (at least) of the form: CPU0 CPU1 mlx5_ib_alloc_implicit_mr() implicit_mr_alloc() live = 1 imr->umem = umem num_pending_prefetch_inc() if (live) atomic_inc(num_pending_prefetch) atomic_set(num_pending_prefetch,0) // Overwrites other thread's store Further, live is being used with SRCU as the 'update' in an acquire/release fashion, so it can not be read and written raw. Move all live = 1's to after MR initialization is completed and use smp_store_release/smp_load_acquire() for manipulating it. Add a missing live = 0 when an implicit MR child is deleted, before queuing work to do synchronize_srcu(). The barriers in update_odp_mr() were some broken attempt to create a acquire/release, but were not even applied consistently and missed the point, delete it as well. Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcuJason Gunthorpe
During destroy setting live = 0 and then synchronize_srcu() prevents num_pending_prefetch from incrementing, and also, ensures that all work holding that count is queued on the WQ. Testing before causes races of the form: CPU0 CPU1 dereg_mr() mlx5_ib_advise_mr_prefetch() srcu_read_lock() num_pending_prefetch_inc() if (!live) live = 0 atomic_read() == 0 // skip flush_workqueue() atomic_inc() queue_work(); srcu_read_unlock() WARN_ON(atomic_read()) // Fails Swap the order so that the synchronize_srcu() prevents this. Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR") Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()Jason Gunthorpe
This fixes a race of the form: CPU0 CPU1 mlx5_ib_invalidate_range() mlx5_ib_invalidate_range() // This one actually makes npages == 0 ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) // This one does nothing ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) dying = 1; dying = 1; schedule_work(&umem_odp->work); // Double schedule of the same work schedule_work(&umem_odp->work); // BOOM npages and dying must be read and written under the umem_mutex lock. Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call mlx5_ib_update_xlt, and both need to be done in the same locking region, hoist the lock out of unmap. This avoids an expensive double critical section in mlx5_ib_invalidate_range(). Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MRJason Gunthorpe
mlx5_ib_update_xlt() must be protected against parallel free of the MR it is accessing, also it must be called single threaded while updating the HW. Otherwise we can have races of the form: CPU0 CPU1 mlx5_ib_update_xlt() mlx5_odp_populate_klm() odp_lookup() == NULL pklm = ZAP implicit_mr_get_data() implicit_mr_alloc() <update interval tree> mlx5_ib_update_xlt mlx5_odp_populate_klm() odp_lookup() != NULL pklm = VALID mlx5_ib_post_send_wait() mlx5_ib_post_send_wait() // Replaces VALID with ZAP This can be solved by putting both the SRCU and the umem_mutex lock around every call to mlx5_ib_update_xlt(). This ensures that the content of the interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr) will not change while it is running, and thus the posted WRs to update the KLM will always reflect the correct information. The race above will resolve by either having CPU1 wait till CPU0 completes the ZAP or CPU0 will run after the add and instead store VALID. The pagefault path adding children already holds the umem_mutex and SRCU, so the only missed lock is during MR destruction. Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Do not allow rereg of a ODP MRJason Gunthorpe
This code is completely broken, the umem of a ODP MR simply cannot be discarded without a lot more locking, nor can an ODP mkey be blithely destroyed via destroy_mkey(). Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/core: Fix wrong iterating on portsMohamad Heib
rdma_for_each_port is already incrementing the iterator's value it receives therefore, after the first iteration the iterator is increased by 2 which eventually causing wrong queries and possible traces. Fix the above by removing the old redundant incrementation that was used before rdma_for_each_port() macro. Cc: <stable@vger.kernel.org> Fixes: ea1075edcbab ("RDMA: Add and use rdma_for_each_port") Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Reviewed-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error pathLeon Romanovsky
Properly unwind QP counter rebinding in case of failure. Trying to rebind the counter after unbiding it is not going to work reliably, move the unbind to the end so it doesn't have to be unwound. Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cxgb4: Do not dma memory off of the stackGreg KH
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack, which is generally considered a very bad thing. On some architectures it could be a security problem, but odds are none of them actually run this driver, so it's just a "normal" bug. Resolve this by allocating the memory for a message off of the heap instead of the stack. kmalloc() always will give us a proper memory location that DMA will work correctly from. Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com Reported-by: Nicolas Waisman <nico@semmle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cm: Fix memory leak in cm_add/remove_oneJack Morgenstein
In the process of moving the debug counters sysfs entries, the commit mentioned below eliminated the cm_infiniband sysfs directory. This sysfs directory was tied to the cm_port object allocated in procedure cm_add_one(). Before the commit below, this cm_port object was freed via a call to kobject_put(port->kobj) in procedure cm_remove_port_fs(). Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated. This, however, meant that kfree was never called for the cm_port buffers. Fix this by adding explicit kfree(port) calls to functions cm_add_one() and cm_remove_one(). Note: the kfree call in the first chunk below (in the cm_add_one error flow) fixes an old, undetected memory leak. Fixes: c87e65cfb97c ("RDMA/cm: Move debug counters to be under relevant IB device") Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/core: Fix an error handling path in 'res_get_common_doit()'Christophe JAILLET
According to surrounding error paths, it is likely that 'goto err_get;' is expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be missing. Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/i40iw: Associate ibdev to netdev before IB device registrationShiraz, Saleem
i40iw IB device registration fails with ENODEV. ib_register_device setup_device/setup_port_data i40iw_port_immutable ib_query_port iw_query_port ib_device_get_netdev(ENODEV) ib_device_get_netdev() does not have a netdev associated with the ibdev and thus fails. Use ib_device_set_netdev() to associate netdev to ibdev in i40iw before IB device registration. Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/iwcm: Fix a lock inversion issueBart Van Assche
This patch fixes the lock inversion complaint: ============================================ WARNING: possible recursive locking detected 5.3.0-rc7-dbg+ #1 Not tainted -------------------------------------------- kworker/u16:6/171 is trying to acquire lock: 00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm] but task is already holding lock: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u16:6/171: #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] stack backtrace: CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x8a/0xd6 __lock_acquire.cold+0xe1/0x24d lock_acquire+0x106/0x240 __mutex_lock+0x12e/0xcb0 mutex_lock_nested+0x1f/0x30 rdma_destroy_id+0x78/0x4a0 [rdma_cm] iw_conn_req_handler+0x5c9/0x680 [rdma_cm] cm_work_handler+0xe62/0x1100 [iw_cm] process_one_work+0x56d/0xac0 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 This is not a bug as there are actually two lock classes here. Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/iw_cxgb4: fix SRQ access from dump_qp()Potnuri Bharat Teja
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used by the application. This patch matches the QPID before dumping them. Also removes unwanted SRQ id addition to QP id xarray. Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray") Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/hfi1: Prevent memory leak in sdma_initNavid Emamdoost
In sdma_init if rhashtable_init fails the allocated memory for tmp_sdma_rht should be released. Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable") Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/core: Fix use after free and refcnt leak on ndev in_device in ↵Michal Kalderon
iwarp_query_port If an iWARP driver is probed and removed while there are no ips set for the device, it will lead to a reference count leak on the inet device of the netdevice. In addition, the netdevice was accessed after already calling netdev_put, which could lead to using the netdev after already freed. Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925123332.10746-1-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/siw: Fix serialization issue in write_space()Krishnamraju Eraparaju
In siw_qp_llp_write_space(), 'sock' members should be accessed with sk_callback_lock held, otherwise, it could race with siw_sk_restore_upcalls(). And this could cause "NULL deref" panic. Below panic is due to the NULL cep returned from sk_to_cep(sk): Call Trace: <IRQ> siw_qp_llp_write_space+0x11/0x40 [siw] tcp_check_space+0x4c/0xf0 tcp_rcv_established+0x52b/0x630 tcp_v4_do_rcv+0xf4/0x1e0 tcp_v4_rcv+0x9b8/0xab0 ip_protocol_deliver_rcu+0x2c/0x1c0 ip_local_deliver_finish+0x44/0x50 ip_local_deliver+0x6b/0xf0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ip_rcv+0x52/0xd0 ? ip_rcv_finish_core.isra.14+0x390/0x390 __netif_receive_skb_one_core+0x83/0xa0 netif_receive_skb_internal+0x73/0xb0 napi_gro_frags+0x1ff/0x2b0 t4_ethrx_handler+0x4a7/0x740 [cxgb4] process_responses+0x2c9/0x590 [cxgb4] ? t4_sge_intr_msix+0x1d/0x30 [cxgb4] ? handle_irq_event_percpu+0x51/0x70 ? handle_irq_event+0x41/0x60 ? handle_edge_irq+0x97/0x1a0 napi_rx_handler+0x14/0xe0 [cxgb4] net_rx_action+0x2af/0x410 __do_softirq+0xda/0x2a8 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq+0x50/0x60 __local_bh_enable_ip+0x50/0x60 ip_finish_output2+0x18f/0x520 ip_output+0x6e/0xf0 ? __ip_finish_output+0x1f0/0x1f0 __ip_queue_xmit+0x14f/0x3d0 ? __slab_alloc+0x4b/0x58 __tcp_transmit_skb+0x57d/0xa60 tcp_write_xmit+0x23b/0xfd0 __tcp_push_pending_frames+0x2e/0xf0 tcp_sendmsg_locked+0x939/0xd50 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x57/0x80 siw_tx_hdt+0x894/0xb20 [siw] ? find_busiest_group+0x3e/0x5b0 ? common_interrupt+0xa/0xf ? common_interrupt+0xa/0xf ? common_interrupt+0xa/0xf siw_qp_sq_process+0xf1/0xe60 [siw] ? __wake_up_common_lock+0x87/0xc0 siw_sq_resume+0x33/0xe0 [siw] siw_run_sq+0xac/0x190 [siw] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? siw_sq_resume+0xe0/0xe0 [siw] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/vmw_pvrdma: Free SRQ only onceAdit Ranadive
An extra kfree cleanup was missed since these are now deallocated by core. Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com Cc: <stable@vger.kernel.org> Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Sanity check URB networking device parameters to avoid divide by zero, from Oliver Neukum. 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6 don't work properly. Longer term this needs a better fix tho. From Vijay Khemka. 3) Small fixes to selftests (use ping when ping6 is not present, etc.) from David Ahern. 4) Bring back rt_uses_gateway member of struct rtable, it's semantics were not well understood and trying to remove it broke things. From David Ahern. 5) Move usbnet snaity checking, ignore endpoints with invalid wMaxPacketSize. From Bjørn Mork. 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan. 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel, Alex Vesker, and Yevgeny Kliteynik 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron. 9) Fix crash when removing sch_cbs entry while offloading is enabled, from Vinicius Costa Gomes. 10) Signedness bug fixes, generally in looking at the result given by of_get_phy_mode() and friends. From Dan Crapenter. 11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet. 12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern. 13) Fix quantization code in tcp_bbr, from Kevin Yang. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits) net: tap: clean up an indentation issue nfp: abm: fix memory leak in nfp_abm_u32_knode_replace tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state sk_buff: drop all skb extensions on free and skb scrubbing tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions Documentation: Clarify trap's description mlxsw: spectrum: Clear VLAN filters during port initialization net: ena: clean up indentation issue NFC: st95hf: clean up indentation issue net: phy: micrel: add Asym Pause workaround for KSZ9021 net: socionext: ave: Avoid using netdev_err() before calling register_netdev() ptp: correctly disable flags on old ioctls lib: dimlib: fix help text typos net: dsa: microchip: Always set regmap stride to 1 nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled net: sched: sch_sfb: don't call qdisc_put() while holding tree lock ...
2019-09-26IB/hfi1: remove unlikely() from IS_ERR*() conditionDenis Efremov
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Joe Perches <joe@perches.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few hot fixes - ocfs2 updates - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan, cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug, sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy, oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap, zsmalloc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) mm/zsmalloc.c: fix a -Wunused-function warning zswap: do not map same object twice zswap: use movable memory if zpool support allocate movable memory zpool: add malloc_support_movable to zpool_driver shmem: fix obsolete comment in shmem_getpage_gfp() mm/madvise: reduce code duplication in error handling paths mm: mmap: increase sockets maximum memory size pgoff for 32bits mm/mmap.c: refine find_vma_prev() with rb_last() riscv: make mmap allocation top-down by default mips: use generic mmap top-down layout and brk randomization mips: replace arch specific way to determine 32bit task with generic version mips: adjust brk randomization offset to fit generic version mips: use STACK_TOP when computing mmap base address mips: properly account for stack randomization and stack guard gap arm: use generic mmap top-down layout and brk randomization arm: use STACK_TOP when computing mmap base address arm: properly account for stack randomization and stack guard gap arm64, mm: make randomization selected by generic topdown mmap layout arm64, mm: move generic mmap layout functions to mm arm64: consider stack randomization for mmap base only when necessary ...
2019-09-24mm/gup: add make_dirty arg to put_user_pages_dirty_lock()akpm@linux-foundation.org
[11~From: John Hubbard <jhubbard@nvidia.com> Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock() Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()", v3. There are about 50+ patches in my tree [2], and I'll be sending out the remaining ones in a few more groups: * The block/bio related changes (Jerome mostly wrote those, but I've had to move stuff around extensively, and add a little code) * mm/ changes * other subsystem patches * an RFC that shows the current state of the tracking patch set. That can only be applied after all call sites are converted, but it's good to get an early look at it. This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). This patch (of 3): Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. [jhubbard@nvidia.com: remove unused variable in siw_free_plist()] Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-23Merge tag 'pci-v5.4-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it (Krzysztof Wilczynski) - Fix incorrect PCIe device types and remove dev->has_secondary_link to simplify code that deals with upstream/downstream ports (Mika Westerberg) - After suspend, restore Resizable BAR size bits correctly for 1MB BARs (Sumit Saxena) - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra) Virtualization: - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna Labs (Ali Saidi) - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg) - Remove group write permissions from sysfs sriov_numvfs, sriov_drivers_autoprobe (Kelsey Skunberg) Hotplug: - Simplify pciehp indicator control (Denis Efremov) Peer-to-peer DMA: - Allow P2P DMA between root ports for whitelisted bridges (Logan Gunthorpe) - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe) - DMA map P2P DMA requests that traverse host bridge (Logan Gunthorpe) Amazon Annapurna Labs host bridge driver: - Add DT binding and controller driver (Jonathan Chocron) Hyper-V host bridge driver: - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui) - Fix PCI domain number collisions (Haiyang Zhang) - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang) - Fix build errors on non-SYSFS config (Randy Dunlap) i.MX6 host bridge driver: - Limit DBI register length (Stefan Agner) Intel VMD host bridge driver: - Fix config addressing issues (Jon Derrick) Layerscape host bridge driver: - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao) - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately (Xiaowei Bao) Mediatek host bridge driver: - Add MT7629 controller support (Jianjun Wang) Mobiveil host bridge driver: - Fix CPU base address setup (Hou Zhiqiang) - Make "num-lanes" property optional (Hou Zhiqiang) Tegra host bridge driver: - Fix OF node reference leak (Nishka Dasgupta) - Disable MSI for root ports to work around design problem (Vidya Sagar) - Add Tegra194 DT binding and controller support (Vidya Sagar) - Add support for sideband pins and slot regulators (Vidya Sagar) - Add PIPE2UPHY support (Vidya Sagar) Misc: - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg) - Unexport pci_bus_get(), etc (Kelsey Skunberg) - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in the PCI core (Kelsey Skunberg) - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg) - Mark expected switch fall-through (Gustavo A. R. Silva) - Propagate errors for optional regulators and PHYs (Thierry Reding) - Fix kernel command line resource_alignment parameter issues (Logan Gunthorpe)" * tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits) PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI arm64: tegra: Add PCIe slot supply information in p2972-0000 platform arm64: tegra: Add configuration for PCIe C5 sideband signals PCI: tegra: Add support to enable slot regulators PCI: tegra: Add support to configure sideband pins PCI: vmd: Fix shadow offsets to reflect spec changes PCI: vmd: Fix config addressing when using bus offsets PCI: dwc: Add validation that PCIe core is set to correct mode PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port PCI: Add ACS quirk for Amazon Annapurna Labs root ports PCI: Add Amazon's Annapurna Labs vendor ID MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries dt-bindings: PCI: tegra: Add sideband pins configuration entries PCI: tegra: Add Tegra194 PCIe support PCI: Get rid of dev->has_secondary_link flag ...
2019-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull RDMA subsystem updates from Jason Gunthorpe: "This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA: Fix double-free in srq creation error flow RDMA/efa: Fix incorrect error print IB/mlx5: Free mpi in mp_slave mode IB/mlx5: Use the original address for the page during free_pages RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" RDMA/hns: Package operations of rq inline buffer into separate functions RDMA/hns: Optimize cmd init and mode selection for hip08 IB/hfi1: Define variables as unsigned long to fix KASAN warning IB/{rdmavt, hfi1, qib}: Add a counter for credit waits IB/hfi1: Add traces for TID RDMA READ RDMA/siw: Relax from kmap_atomic() use in TX path IB/iser: Support up to 16MB data transfer in a single command RDMA/siw: Fix page address mapping in TX path RDMA: Fix goto target to release the allocated memory RDMA/usnic: Avoid overly large buffers on stack RDMA/odp: Add missing cast for 32 bit RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Documentation/infiniband: update name of some functions RDMA/cma: Fix false error message RDMA/hns: Fix wrong assignment of qp_access_flags ...
2019-09-21Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...
2019-09-20ipv4: Revert removal of rt_uses_gatewayDavid Ahern
Julian noted that rt_uses_gateway has a more subtle use than 'is gateway set': https://lore.kernel.org/netdev/alpine.LFD.2.21.1909151104060.2546@ja.home.ssi.bg/ Revert that part of the commit referenced in the Fixes tag. Currently, there are no u8 holes in 'struct rtable'. There is a 4-byte hole in the second cacheline which contains the gateway declaration. So move rt_gw_family down to the gateway declarations since they are always used together, and then re-use that u8 for rt_uses_gateway. End result is that rtable size is unchanged. Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway") Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-09-18Merge branch 'work.namei' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs namei updates from Al Viro: "Pathwalk-related stuff" [ Audit-related cleanups, misc simplifications, and easier to follow nd->root refcounts - Linus ] * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: devpts_pty_kill(): don't bother with d_delete() infiniband: don't bother with d_delete() hypfs: don't bother with d_delete() fs/namei.c: keep track of nd->root refcount status fs/namei.c: new helper - legitimize_root() kill the last users of user_{path,lpath,path_dir}() namei.h: get the comments on LOOKUP_... in sync with reality kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h audit_inode(): switch to passing AUDIT_INODE_... filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there filename_lookup(): audit_inode() argument is always 0
2019-09-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski. 2) Use bio_vec in the networking instead of custom skb_frag_t, from Matthew Wilcox. 3) Make use of xmit_more in r8169 driver, from Heiner Kallweit. 4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen. 5) Support all variants of 5750X bnxt_en chips, from Michael Chan. 6) More RTNL avoidance work in the core and mlx5 driver, from Vlad Buslov. 7) Add TCP syn cookies bpf helper, from Petar Penkov. 8) Add 'nettest' to selftests and use it, from David Ahern. 9) Add extack support to drop_monitor, add packet alert mode and support for HW drops, from Ido Schimmel. 10) Add VLAN offload to stmmac, from Jose Abreu. 11) Lots of devm_platform_ioremap_resource() conversions, from YueHaibing. 12) Add IONIC driver, from Shannon Nelson. 13) Several kTLS cleanups, from Jakub Kicinski. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits) mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer mlxsw: spectrum: Register CPU port with devlink mlxsw: spectrum_buffers: Prevent changing CPU port's configuration net: ena: fix incorrect update of intr_delay_resolution net: ena: fix retrieval of nonadaptive interrupt moderation intervals net: ena: fix update of interrupt moderation register net: ena: remove all old adaptive rx interrupt moderation code from ena_com net: ena: remove ena_restore_ethtool_params() and relevant fields net: ena: remove old adaptive interrupt moderation code from ena_netdev net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*() net: ena: enable the interrupt_moderation in driver_supported_features net: ena: reimplement set/get_coalesce() net: ena: switch to dim algorithm for rx adaptive interrupt moderation net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable ethtool: implement Energy Detect Powerdown support via phy-tunable xen-netfront: do not assume sk_buff_head list is empty in error handling s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb” net: ena: don't wake up tx queue when down drop_monitor: Better sanitize notified packets ...
2019-09-17Merge tag 'leds-for-5.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: "In this cycle we've finally managed to contribute the patch set sorting out LED naming issues. Besides that there are many changes scattered among various LED class drivers and triggers. LED naming related improvements: - add new 'function' and 'color' fwnode properties and deprecate 'label' property which has been frequently abused for conveying vendor specific names that have been available in sysfs anyway - introduce a set of standard LED_FUNCTION* definitions - introduce a set of standard LED_COLOR_ID* definitions - add a new {devm_}led_classdev_register_ext() API with the capability of automatic LED name composition basing on the properties available in the passed fwnode; the function is backwards compatible in a sense that it uses 'label' data, if present in the fwnode, for creating LED name - add tools/leds/get_led_device_info.sh script for retrieving LED vendor, product and bus names, if applicable; it also performs basic validation of an LED name - update following drivers and their DT bindings to use the new LED registration API: - leds-an30259a, leds-gpio, leds-as3645a, leds-aat1290, leds-cr0014114, leds-lm3601x, leds-lm3692x, leds-lp8860, leds-lt3593, leds-sc27xx-blt Other LED class improvements: - replace {devm_}led_classdev_register() macros with inlines - allow to call led_classdev_unregister() unconditionally - switch to use fwnode instead of be stuck with OF one LED triggers improvements: - led-triggers: - fix dereferencing of null pointer - fix a memory leak bug - ledtrig-gpio: - GPIO 0 is valid Drop superseeded apu2/3 support from leds-apu since for apu2+ a newer, more complete driver exists, based on a generic driver for the AMD SOCs gpio-controller, supporting LEDs as well other devices: - drop profile field from priv data - drop iosize field from priv data - drop enum_apu_led_platform_types - drop superseeded apu2/3 led support - add pr_fmt prefix for better log output - fix error message on probing failure Other misc fixes and improvements to existing LED class drivers: - leds-ns2, leds-max77650: - add of_node_put() before return - leds-pwm, leds-is31fl32xx: - use struct_size() helper - leds-lm3697, leds-lm36274, leds-lm3532: - switch to use fwnode_property_count_uXX() - leds-lm3532: - fix brightness control for i2c mode - change the define for the fs current register - fixes for the driver for stability - add full scale current configuration - dt: Add property for full scale current. - avoid potentially unpaired regulator calls - move static keyword to the front of declarations - fix optional led-max-microamp prop error handling - leds-max77650: - add of_node_put() before return - add MODULE_ALIAS() - Switch to fwnode property API - leds-as3645a: - fix misuse of strlcpy - leds-netxbig: - add of_node_put() in netxbig_leds_get_of_pdata() - remove legacy board-file support - leds-is31fl319x: - simplify getting the adapter of a client - leds-ti-lmu-common: - fix coccinelle issue - move static keyword to the front of declaration - leds-syscon: - use resource managed variant of device register - leds-ktd2692: - fix a typo in the name of a constant - leds-lp5562: - allow firmware files up to the maximum length - leds-an30259a: - fix typo - leds-pca953x: - include the right header" * tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (72 commits) leds: lm3532: Fix optional led-max-microamp prop error handling led: triggers: Fix dereferencing of null pointer leds: ti-lmu-common: Move static keyword to the front of declaration leds: lm3532: Move static keyword to the front of declarations leds: trigger: gpio: GPIO 0 is valid leds: pwm: Use struct_size() helper leds: is31fl32xx: Use struct_size() helper leds: ti-lmu-common: Fix coccinelle issue in TI LMU leds: lm3532: Avoid potentially unpaired regulator calls leds: syscon: Use resource managed variant of device register leds: Replace {devm_}led_classdev_register() macros with inlines leds: Allow to call led_classdev_unregister() unconditionally leds: lm3532: Add full scale current configuration dt: lm3532: Add property for full scale current. leds: lm3532: Fixes for the driver for stability leds: lm3532: Change the define for the fs current register leds: lm3532: Fix brightness control for i2c mode leds: Switch to use fwnode instead of be stuck with OF one leds: max77650: Switch to fwnode property API led: triggers: Fix a memory leak bug ...
2019-09-16RDMA: Fix double-free in srq creation error flowJack Morgenstein
The cited commit introduced a double-free of the srq buffer in the error flow of procedure __uverbs_create_xsrq(). The problem is that ib_destroy_srq_user() called in the error flow also frees the srq buffer. Thus, if uverbs_response() fails in __uverbs_create_srq(), the srq buffer will be freed twice. Cc: <stable@vger.kernel.org> Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20190916071154.20383-5-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16RDMA/efa: Fix incorrect error printGal Pressman
The error print should indicate that it failed to get the queue attributes, not network attributes. Link: https://lore.kernel.org/r/20190910134301.4194-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16IB/mlx5: Free mpi in mp_slave modeDanit Goldberg
ib_add_slave_port() allocates a multiport struct but never frees it. Don't leak memory, free the allocated mpi struct during driver unload. Cc: <stable@vger.kernel.org> Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE") Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16IB/mlx5: Use the original address for the page during free_pagesDanit Goldberg
The removal of 'buffer' in the patch below caused free_page() to use a value that had been offset since the wqe pointer is adjusted while the routine runs. The current implementation of free_pages() rounds down to a pfn, discarding the adjustment, but this is not the right way to use the API. Preserve the initial value and use it for free_page(). Fixes: 0f51427bd097 ("RDMA/mlx5: Cleanup WQE page fault handler") Link: https://lore.kernel.org/r/20190916064818.19823-2-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"Colin Ian King
There is a spelling mistake in a literal string, fix it. Fixes: 89f81008baac ("RDMA/bnxt_re: expose detailed stats retrieved from HW") Link: https://lore.kernel.org/r/20190911092856.11146-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16RDMA/hns: Package operations of rq inline buffer into separate functionsLijun Ou
Here packages the codes of allocating and freeing rq inline buffer in hns_roce_create_qp_common function in order to reduce the complexity. Link: https://lore.kernel.org/r/1567068102-56919-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-16RDMA/hns: Optimize cmd init and mode selection for hip08Yixian Liu
There are two modes for mailbox command (cmd) queue, i.e., event mode and poll mode. For each mode, we use corresponding semaphores to protect the cmd queue resource competition, so called event_sem and poll_sem. During cmd init, both semaphores are initialized and poll mode is selected. Thus, there is no need to up poll_sema again in cmd_use_polling. Furthermore, there is no need to down the sema of the other side while switching mode. This patch aims to decouple the switch between event mode and poll mode of cmd. Link: https://lore.kernel.org/r/1567068102-56919-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>