aboutsummaryrefslogtreecommitdiffstats
path: root/net/sunrpc/xprtrdma
AgeCommit message (Collapse)Author
2022-04-13SUNRPC/xprt: async tasks mustn't block waiting for memoryNeilBrown
[ Upstream commit a721035477fb5fb8abc738fbe410b07c12af3dc5 ] When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13SUNRPC/call_alloc: async tasks mustn't block waiting for memoryNeilBrown
[ Upstream commit c487216bec83b0c5a8803e5c61433d33ad7b104d ] When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_createDan Aloni
[ Upstream commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33 ] If there are failures then we must not leave the non-NULL pointers with the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries free them, resulting in an Oops. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18SUNRPC/xprtrdma: Fix reconnection lockingTrond Myklebust
[ Upstream commit f99fa50880f5300fbbb3c0754ddc7f8738d24fe7 ] The xprtrdma client code currently relies on the task that initiated the connect to hold the XPRT_LOCK for the duration of the connection attempt. If the task is woken early, due to some other event, then that lock could get released early. Avoid races by using the same mechanism that the socket code uses of transferring lock ownership to the RDMA connect worker itself. That frees us to call rpcrdma_xprt_disconnect() directly since we're now guaranteed exclusion w.r.t. other callers. Fixes: 4cf44be6f1e8 ("xprtrdma: Fix recursion into rpcrdma_xprt_disconnect()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03SUNRPC: More fixes for backlog congestionTrond Myklebust
commit e86be3a04bc4aeaf12f93af35f08f8d4385bcd98 upstream. Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well as socket based transports. Ensure we always initialise the request after waking up from the backlog list. Fixes: e877a88d1f06 ("SUNRPC in case of backlog, hand free slots directly to waiting task") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19xprtrdma: rpcrdma_mr_pop() already does list_del_init()Chuck Lever
[ Upstream commit 1363e6388c363d0433f9aa4e2f33efe047572687 ] The rpcrdma_mr_pop() earlier in the function has already cleared out mr_list, so it must not be done again in the error path. Fixes: 847568942f93 ("xprtrdma: Remove fr_state") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xprtrdma: Fix cwnd update orderingChuck Lever
[ Upstream commit 35d8b10a25884050bb3b0149b62c3818ec59f77c ] After a reconnect, the reply handler is opening the cwnd (and thus enabling more RPC Calls to be sent) /before/ rpcrdma_post_recvs() can post enough Receive WRs to receive their replies. This causes an RNR and the new connection is lost immediately. The race is most clearly exposed when KASAN and disconnect injection are enabled. This slows down rpcrdma_rep_create() enough to allow the send side to post a bunch of RPC Calls before the Receive completion handler can invoke ib_post_recv(). Fixes: 2ae50ad68cd7 ("xprtrdma: Close window between waking RPC senders and posting Receives") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19xprtrdma: Avoid Receive Queue wrappingChuck Lever
[ Upstream commit 32e6b68167f1d446111c973d57e6f52aee11897a ] Commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") increased the number of Receive WRs that are posted by the client, but did not increase the size of the Receive Queue allocated during transport set-up. This is usually not an issue because RPCRDMA_BACKWARD_WRS is defined as (32) when SUNRPC_BACKCHANNEL is defined. In cases where it isn't, there is a real risk of Receive Queue wrapping. Fixes: e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19SUNRPC: Move fault injection call sitesChuck Lever
[ Upstream commit 7638e0bfaed1b653d3ca663e560e9ffb44bb1030 ] I've hit some crashes that occur in the xprt_rdma_inject_disconnect path. It appears that, for some provides, rdma_disconnect() can take so long that the transport can disconnect and release its hardware resources while rdma_disconnect() is still running, resulting in a UAF in the provider. The transport's fault injection method may depend on the stability of transport data structures. That means it needs to be invoked only from contexts that hold the transport write lock. Fixes: 4a0682583988 ("SUNRPC: Transport fault injection") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-25svcrdma: disable timeouts on rdma backchannelTimo Rothenpieler
commit 6820bf77864d5894ff67b5c00d7dba8f92011e3d upstream. This brings it in line with the regular tcp backchannel, which also has all those timeouts disabled. Prevents the backchannel from timing out, getting some async operations like server side copying getting stuck indefinitely on the client side. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org> Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04svcrdma: Hold private mutex while invoking rdma_accept()Chuck Lever
[ Upstream commit 0ac24c320c4d89a9de6ec802591398b8675c7b3c ] RDMA core mutex locking was restructured by commit d114c6feedfe ("RDMA/cma: Add missing locking to rdma_accept()") [Aug 2020]. When lock debugging is enabled, the RPC/RDMA server trips over the new lockdep assertion in rdma_accept() because it doesn't call rdma_accept() from its CM event handler. As a temporary fix, have svc_rdma_accept() take the handler_mutex explicitly. In the meantime, let's consider how to restructure the RPC/RDMA transport to invoke rdma_accept() from the proper context. Calls to svc_rdma_accept() are serialized with calls to svc_rdma_free() by the generic RPC server layer. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-rdma/20210209154014.GO4247@nvidia.com/ Fixes: d114c6feedfe ("RDMA/cma: Add missing locking to rdma_accept()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30xprtrdma: Fix XDRBUF_SPARSE_PAGES supportChuck Lever
commit 15261b9126cd5bb2ad8521da49d8f5c042d904c7 upstream. Olga K. observed that rpcrdma_marsh_req() allocates sparse pages only when it has determined that a Reply chunk is necessary. There are plenty of cases where no Reply chunk is needed, but the XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in rpcrdma_inline_fixup() when it tries to copy parts of the received Reply into a missing page. To avoid crashing, handle sparse page allocation up front. Until XATTR support was added, this issue did not appear often because the only SPARSE_PAGES consumer always expected a reply large enough to always require a Reply chunk. Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30SUNRPC: xprt_load_transport() needs to support the netid "rdma6"Trond Myklebust
[ Upstream commit d5aa6b22e2258f05317313ecc02efbb988ed6d38 ] According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-22Merge tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "The one new feature this time, from Anna Schumaker, is READ_PLUS, which has the same arguments as READ but allows the server to return an array of data and hole extents. Otherwise it's a lot of cleanup and bugfixes" * tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits) NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy SUNRPC: fix copying of multiple pages in gss_read_proxy_verf() sunrpc: raise kernel RPC channel buffer size svcrdma: fix bounce buffers for unaligned offsets and multiple pages nfsd: remove unneeded break net/sunrpc: Fix return value for sysctl sunrpc.transports NFSD: Encode a full READ_PLUS reply NFSD: Return both a hole and a data segment NFSD: Add READ_PLUS hole segment encoding NFSD: Add READ_PLUS data support NFSD: Hoist status code encoding into XDR encoder functions NFSD: Map nfserr_wrongsec outside of nfsd_dispatch NFSD: Remove the RETURN_STATUS() macro NFSD: Call NFSv2 encoders on error returns NFSD: Fix .pc_release method for NFSv2 NFSD: Remove vestigial typedefs NFSD: Refactor nfsd_dispatch() error paths NFSD: Clean up nfsd_dispatch() variables NFSD: Clean up stale comments in nfsd_dispatch() NFSD: Clean up switch statement in nfsd_dispatch() ...
2020-10-16svcrdma: fix bounce buffers for unaligned offsets and multiple pagesDan Aloni
This was discovered using O_DIRECT at the client side, with small unaligned file offsets or IOs that span multiple file pages. Fixes: e248aa7be86 ("svcrdma: Remove max_sge check at connect time") Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-09-25net: sunrpc: delete repeated wordsRandy Dunlap
Drop duplicate words in net/sunrpc/. Also fix "Anyone" to be "Any one". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-09-21xprtrdma: drop double zeroingJulia Lawall
sg_init_table zeroes its first argument, so the allocation of that argument doesn't have to. the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,n,flags; @@ x = - kcalloc + kmalloc_array (n,sizeof(*x),flags) ... sg_init_table(x,n) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21SUNRPC: Hoist trace_xprtrdma_op_setport into generic codeChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21SUNRPC: Remove debugging instrumentation from xprt_releaseChuck Lever
These instruments don't appear to add any substantial value. We already have this at the termination of each RPC: iozone-2617 [002] 975.713126: rpc_stats_latency: task:418@5 xid=0x260eab5d nfsv3 LOOKUP backlog=15 rtt=32 execute=58 iozone-2617 [002] 975.713127: xprt_release_cong: task:418@5 snd_task:4294967295 cong=256 cwnd=16384 iozone-2617 [002] 975.713127: xprt_put_cong: task:418@5 snd_task:4294967295 cong=0 cwnd=16384 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21SUNRPC: Hoist trace_xprtrdma_op_allocate into generic codeChuck Lever
Introduce a tracepoint in call_allocate that reports the exact sizes in the RPC buffer allocation request and the status of the result. This helps catch problems with XDR buffer provisioning, and replaces transport-specific debugging instrumentation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-09Merge tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix an NFS/RDMA resource leak - Fix the error handling during delegation recall - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR - Stop printk reading past end of string * tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: stop printk reading past end of string NFS: Zero-stateid SETATTR should first return delegation NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall xprtrdma: Release in-flight MRs on disconnect
2020-08-26xprtrdma: Release in-flight MRs on disconnectChuck Lever
Dan Aloni reports that when a server disconnects abruptly, a few memory regions are left DMA mapped. Over time this leak could pin enough I/O resources to slow or even deadlock an NFS/RDMA client. I found that if a transport disconnects before pending Send and FastReg WRs can be posted, the to-be-registered MRs are stranded on the req's rl_registered list and never released -- since they weren't posted, there's no Send completion to DMA unmap them. Reported-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-09Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds
Pull NFS server updates from Chuck Lever: "Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints" * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits) svcrdma: CM event handler clean up svcrdma: Remove transport reference counting svcrdma: Fix another Receive buffer leak SUNRPC: Refresh the show_rqstp_flags() macro nfsd: netns.h: delete a duplicated word SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") nfsd: avoid a NULL dereference in __cld_pipe_upcall() nfsd4: a client's own opens needn't prevent delegations nfsd: Use seq_putc() in two functions svcrdma: Display chunk completion ID when posting a rw_ctxt svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send() svcrdma: Introduce Send completion IDs svcrdma: Record Receive completion ID in svc_rdma_decode_rqst svcrdma: Introduce Receive completion IDs svcrdma: Introduce infrastructure to support completion IDs svcrdma: Add common XDR encoders for RDMA and Read segments svcrdma: Add common XDR decoders for RDMA and Read segments SUNRPC: Add helpers for decoding list discriminators symbolically svcrdma: Remove declarations for functions long removed svcrdma: Clean up trace_svcrdma_send_failed() tracepoint ...
2020-07-28svcrdma: CM event handler clean upChuck Lever
Now that there's a core tracepoint that reports these events, there's no need to maintain dprintk() call sites in each arm of the switch statements. We also refresh the documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28svcrdma: Remove transport reference countingChuck Lever
Jason tells me that a ULP cannot rely on getting an ESTABLISHED and DISCONNECTED event pair for each connection, so transport reference counting in the CM event handler will never be reliable. Now that we have ib_drain_qp(), svcrdma should no longer need to hold transport references while Sends and Receives are posted. So remove the get/put call sites in the CM event handlers. This eliminates a significant source of locked memory bus traffic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-28svcrdma: Fix another Receive buffer leakChuck Lever
During a connection tear down, the Receive queue is flushed before the device resources are freed. Typically, all the Receives flush with IB_WR_FLUSH_ERR. However, any pending successful Receives flush with IB_WR_SUCCESS, and the server automatically posts a fresh Receive to replace the completing one. This happens even after the connection has closed and the RQ is drained. Receives that are posted after the RQ is drained appear never to complete, causing a Receive resource leak. The leaked Receive buffer is left DMA-mapped. To prevent these late-posted recv_ctxt's from leaking, block new Receive posting after XPT_CLOSE is set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-15xprtrdma: fix incorrect header size calculationsColin Ian King
Currently the header size calculations are using an assignment operator instead of a += operator when accumulating the header size leading to incorrect sizes. Fix this by using the correct operator. Addresses-Coverity: ("Unused value") Fixes: 302d3deb2068 ("xprtrdma: Prevent inline overflow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-13svcrdma: Display chunk completion ID when posting a rw_ctxtChuck Lever
Re-use the post_rw tracepoint (safely) to trace cc_info lifetime events, including completion IDs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()Chuck Lever
First, refactor: Dereference the svc_rdma_send_ctxt inside svc_rdma_send() instead of at every call site. Then, it can be passed into trace_svcrdma_post_send() to get the proper completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Introduce Send completion IDsChuck Lever
Set up a completion ID in each svc_rdma_send_ctxt. The ID is used to match an incoming Send completion to a transport and to a previous ib_post_send(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Record Receive completion ID in svc_rdma_decode_rqstChuck Lever
When recording a trace event in the Receive path, tie decoding results and errors to an incoming Receive completion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Introduce Receive completion IDsChuck Lever
Set up a completion ID in each svc_rdma_recv_ctxt. The ID is used to match an incoming Receive completion to a transport and to a previous ib_post_recv(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add common XDR encoders for RDMA and Read segmentsChuck Lever
Clean up: De-duplicate some code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add common XDR decoders for RDMA and Read segmentsChuck Lever
Clean up: De-duplicate some code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13SUNRPC: Add helpers for decoding list discriminators symbolicallyChuck Lever
Use these helpers in a few spots to demonstrate their use. The remaining open-coded discriminator checks in rpcrdma will be addressed in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Clean up trace_svcrdma_send_failed() tracepointChuck Lever
- Use the _err naming convention instead - Remove display of kernel memory address of the controlling xprt Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Consolidate send_error helper functionsChuck Lever
Final refactor: Replace internals of svc_rdma_send_error() with a simple call to svc_rdma_send_error_msg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Make svc_rdma_send_error_msg() a global functionChuck Lever
Prepare for svc_rdma_send_error_msg() to be invoked from another source file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Eliminate return value for svc_rdma_send_error_msg()Chuck Lever
Like svc_rdma_send_error(), have svc_rdma_send_error_msg() handle any error conditions internally, rather than duplicating that recovery logic at every call site. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add a @status parameter to svc_rdma_send_error_msg()Chuck Lever
The common "send RDMA_ERR" function should be in svc_rdma_sendto.c, since that is where the other Send-related functions are located. So from here, I will beef up svc_rdma_send_error_msg() and deprecate svc_rdma_send_error(). A generic svc_rdma_send_error_msg() will need to handle both ERR_CHUNK and ERR_VERS. Copy that logic from svc_rdma_send_error() to svc_rdma_send_error_msg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Add @rctxt parameter to svc_rdma_send_error() functionsChuck Lever
Another step towards making svc_rdma_send_error_msg() and svc_rdma_send_error() similar enough to eliminate one of them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Remove save_io_pages() call from send_error_msg()Chuck Lever
Commit 4757d90b15d8 ("svcrdma: Report Write/Reply chunk overruns") made an effort to preserve I/O pages until RDMA Write completion. In a subsequent patch, I intend to de-duplicate the two functions that send ERR_CHUNK responses. Pull the save_io_pages() call out of svc_rdma_send_error_msg() to make it more like svc_rdma_send_error(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13svcrdma: Fix page leak in svc_rdma_recv_read_chunk()Chuck Lever
Commit 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") moved the page saver logic so that it gets executed event when an error occurs. In that case, the I/O is never posted, and those pages are then leaked. Errors in this path, however, are quite rare. Fixes: 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13xprtrdma: Fix handling of connect errorsChuck Lever
Ensure that the connect worker is awoken if an attempt to establish a connection is unsuccessful. Otherwise the worker waits forever and the transport workload hangs. Connect errors should not attempt to destroy the ep, since the connect worker continues to use it after the handler runs, so these errors are now handled independently of DISCONNECTED events. Reported-by: Dan Aloni <dan@kernelim.com> Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-13xprtrdma: Fix return code from rpcrdma_xprt_connect()Chuck Lever
I noticed that when rpcrdma_xprt_connect() returns -ENOMEM, instead of retrying the connect, the RPC client kills the RPC task that requested the connection. We want a retry here. Fixes: cb586decbb88 ("xprtrdma: Make sendctx queue lifetime the same as connection lifetime") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-13xprtrdma: Fix recursion into rpcrdma_xprt_disconnect()Chuck Lever
Both Dan and I have observed two processes invoking rpcrdma_xprt_disconnect() concurrently. In my case: 1. The connect worker invokes rpcrdma_xprt_disconnect(), which drains the QP and waits for the final completion 2. This causes the newly posted Receive to flush and invoke xprt_force_disconnect() 3. xprt_force_disconnect() sets CLOSE_WAIT and wakes up the RPC task that is holding the transport lock 4. The RPC task invokes xprt_connect(), which calls ->ops->close 5. xprt_rdma_close() invokes rpcrdma_xprt_disconnect(), which tries to destroy the QP. Deadlock. To prevent xprt_force_disconnect() from waking anything, handle the clean up after a failed connection attempt in the xprt's sndtask. The retry loop is removed from rpcrdma_xprt_connect() to ensure that the newly allocated ep and id are properly released before a REJECTED connection attempt can be retried. Reported-by: Dan Aloni <dan@kernelim.com> Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-13xprtrdma: Fix double-free in rpcrdma_ep_create()Chuck Lever
In the error paths, there's no need to call kfree(ep) after calling rpcrdma_ep_put(ep). Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-22xprtrdma: Fix handling of RDMA_ERROR repliesChuck Lever
The RPC client currently doesn't handle ERR_CHUNK replies correctly. rpcrdma_complete_rqst() incorrectly passes a negative number to xprt_complete_rqst() as the number of bytes copied. Instead, set task->tk_status to the error value, and return zero bytes copied. In these cases, return -EIO rather than -EREMOTEIO. The RPC client's finite state machine doesn't know what to do with -EREMOTEIO. Additional clean ups: - Don't double-count RDMA_ERROR replies - Remove a stale comment Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.vger.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-22xprtrdma: Clean up disconnectChuck Lever
1. Ensure that only rpcrdma_cm_event_handler() modifies ep->re_connect_status to avoid racy changes to that field. 2. Ensure that xprt_force_disconnect() is invoked only once as a transport is closed or destroyed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>