aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfsd
AgeCommit message (Collapse)Author
2019-07-31nfsd: Fix overflow causing non-working mounts on 1 TB machinesPaul Menzel
[ Upstream commit 3b2d4dcf71c4a91b420f835e52ddea8192300a3b ] Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31nfsd: fix performance-limiting session calculationJ. Bruce Fields
[ Upstream commit c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 ] We're unintentionally limiting the number of slots per nfsv4.1 session to 10. Often more than 10 simultaneous RPCs are needed for the best performance. This calculation was meant to prevent any one client from using up more than a third of the limit we set for total memory use across all clients and sessions. Instead, it's limiting the client to a third of the maximum for a single session. Fix this. Reported-by: Chris Tracy <ctracy@engr.scu.edu> Cc: stable@vger.kernel.org Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31nfsd: give out fewer session slots as limit approachesJ. Bruce Fields
[ Upstream commit de766e570413bd0484af0b580299b495ada625c3 ] Instead of granting client's full requests until we hit our DRC size limit and then failing CREATE_SESSIONs (and hence mounts) completely, start granting clients smaller slot tables as we approach the limit. The factor chosen here is pretty much arbitrary. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31nfsd: increase DRC cache limitJ. Bruce Fields
[ Upstream commit 44d8660d3bb0a1c8363ebcb906af2343ea8e15f6 ] An NFSv4.1+ client negotiates the size of its duplicate reply cache size in the initial CREATE_SESSION request. The server preallocates the memory for the duplicate reply cache to ensure that we'll never fail to record the response to a nonidempotent operation. To prevent a few CREATE_SESSIONs from consuming all of memory we set an upper limit based on nr_free_buffer_pages(). 1/2^10 has been too limiting in practice; 1/2^7 is still less than one percent. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15nfsd: allow fh_want_write to be called twiceJ. Bruce Fields
[ Upstream commit 0b8f62625dc309651d0efcb6a6247c933acd8b45 ] A fuzzer recently triggered lockdep warnings about potential sb_writers deadlocks caused by fh_want_write(). Looks like we aren't careful to pair each fh_want_write() with an fh_drop_write(). It's not normally a problem since fh_put() will call fh_drop_write() for us. And was OK for NFSv3 where we'd do one operation that might call fh_want_write(), and then put the filehandle. But an NFSv4 protocol fuzzer can do weird things like call unlink twice in a compound, and then we get into trouble. I'm a little worried about this approach of just leaving everything to fh_put(). But I think there are probably a lot of fh_want_write()/fh_drop_write() imbalances so for now I think we need it to be more forgiving. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-02nfsd: Don't release the callback slot unless it was actually heldTrond Myklebust
commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream. If there are multiple callbacks queued, waiting for the callback slot when the callback gets shut down, then they all currently end up acting as if they hold the slot, and call nfsd4_cb_sequence_done() resulting in interesting side-effects. In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done() causes a loop back to nfsd4_cb_prepare() without first freeing the slot, which causes a deadlock when nfsd41_cb_get_slot() gets called a second time. This patch therefore adds a boolean to track whether or not the callback did pick up the slot, so that it can do the right thing in these 2 cases. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23nfsd: fix wrong check in write_v4_end_grace()Yihao Wu
commit dd838821f0a29781b185cd8fb8e48d5c177bd838 upstream. Commit 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before nfsd startup" is trying to fix a NULL dereference issue, but it mistakenly checks if the nfsd server is started. So fix it. Fixes: 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before nfsd startup" Cc: stable@vger.kernel.org Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23nfsd: fix memory corruption caused by readdirNeilBrown
commit b602345da6cbb135ba68cf042df8ec9a73da7981 upstream. If the result of an NFSv3 readdir{,plus} request results in the "offset" on one entry having to be split across 2 pages, and is sized so that the next directory entry doesn't fit in the requested size, then memory corruption can happen. When encode_entry() is called after encoding the last entry that fits, it notices that ->offset and ->offset1 are set, and so stores the offset value in the two pages as required. It clears ->offset1 but *does not* clear ->offset. Normally this omission doesn't matter as encode_entry_baggage() will be called, and will set ->offset to a suitable value (not on a page boundary). But in the case where cd->buflen < elen and nfserr_toosmall is returned, ->offset is not reset. This means that nfsd3proc_readdirplus will see ->offset with a value 4 bytes before the end of a page, and ->offset1 set to NULL. It will try to write 8bytes to ->offset. If we are lucky, the next page will be read-only, and the system will BUG: unable to handle kernel paging request at... If we are unlucky, some innocent page will have the first 4 bytes corrupted. nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly writes 8 bytes to the offset wherever it is. Fix this by clearing ->offset after it is used, and copying the ->offset handling code from nfsd3_proc_readdirplus into nfsd3_proc_readdir. (Note that the commit hash in the Fixes tag is from the 'history' tree - this bug predates git). Fixes: 0b1d57cf7654 ("[PATCH] kNFSd: Fix nfs3 dentry encoding") Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654 Cc: stable@vger.kernel.org (v2.6.12+) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12nfsd4: catch some false session retriesJ. Bruce Fields
commit 53da6a53e1d414e05759fa59b7032ee08f4e22d7 upstream. The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that the client is making a call that matches a previous (slot, seqid) pair but that *isn't* actually a replay, because some detail of the call doesn't actually match the previous one. Catching every such case is difficult, but we may as well catch a few easy ones. This also handles the case described in the previous patch, in a different way. The spec does however require us to catch the case where the difference is in the rpc credentials. This prevents somebody from snooping another user's replies by fabricating retries. (But the practical value of the attack is limited by the fact that the replies with the most sensitive data are READ replies, which are not normally cached.) Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12nfsd4: fix cached replies to solo SEQUENCE compoundsJ. Bruce Fields
commit 085def3ade52f2ffe3e31f42e98c27dcc222dd37 upstream. Currently our handling of 4.1+ requests without "cachethis" set is confusing and not quite correct. Suppose a client sends a compound consisting of only a single SEQUENCE op, and it matches the seqid in a session slot (so it's a retry), but the previous request with that seqid did not have "cachethis" set. The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP, but the protocol only allows that to be returned on the op following the SEQUENCE, and there is no such op in this case. The protocol permits us to cache replies even if the client didn't ask us to. And it's easy to do so in the case of solo SEQUENCE compounds. So, when we get a solo SEQUENCE, we can either return the previously cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some way from the original call. Currently, we're returning a corrupt reply in the case a solo SEQUENCE matches a previous compound with more ops. This actually matters because the Linux client recently started doing this as a way to recover from lost replies to idempotent operations in the case the process doing the original reply was killed: in that case it's difficult to keep the original arguments around to do a real retry, and the client no longer cares what the result is anyway, but it would like to make sure that the slot's sequence id has been incremented, and the solo SEQUENCE assures that: if the server never got the original reply, it will increment the sequence id. If it did get the original reply, it won't increment, and nothing else that about the reply really matters much. But we can at least attempt to return valid xdr! Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12nfsd4: fix crash on writing v4_end_grace before nfsd startupJ. Bruce Fields
[ Upstream commit 62a063b8e7d1db684db3f207261a466fa3194e72 ] Anatoly Trosinenko reports that this: 1) Checkout fresh master Linux branch (tested with commit e195ca6cb) 2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build 3) From `kvm-xfstests shell`: results in NULL dereference in locks_end_grace. Check that nfsd has been started before trying to end the grace period. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-21nfsd: COPY and CLONE operations require the saved filehandle to be setScott Mayhew
commit 01310bb7c9c98752cc763b36532fab028e0f8f81 upstream. Make sure we have a saved filehandle, otherwise we'll oops with a null pointer dereference in nfs4_preprocess_stateid_op(). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-10vfs: swap names of {do,vfs}_clone_file_range()Amir Goldstein
commit a725356b6659469d182d662f22d770d83d3bc7b5 upstream. Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze protection") created a wrapper do_clone_file_range() around vfs_clone_file_range() moving the freeze protection to former, so overlayfs could call the latter. The more common vfs practice is to call do_xxx helpers from vfs_xxx helpers, where freeze protecction is taken in the vfs_xxx helper, so this anomality could be a source of confusion. It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup support") may have fallen a victim to this confusion - ovl_clone_file_range() calls the vfs_clone_file_range() helper in the hope of getting freeze protection on upper fs, but in fact results in overlayfs allowing to bypass upper fs freeze protection. Swap the names of the two helpers to conform to common vfs practice and call the correct helpers from overlayfs and nfsd. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-03nfsd: fix corrupted reply to badly ordered compoundJ. Bruce Fields
[ Upstream commit 5b7b15aee641904ae269be9846610a3950cbd64c ] We're encoding a single op in the reply but leaving the number of ops zero, so the reply makes no sense. Somewhat academic as this isn't a case any real client will hit, though in theory perhaps that could change in a future protocol extension. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfoScott Mayhew
[ Upstream commit 3171822fdcdd6e6d536047c425af6dc7a92dc585 ] When running a fuzz tester against a KASAN-enabled kernel, the following splat periodically occurs. The problem occurs when the test sends a GETDEVICEINFO request with a malformed xdr array (size but no data) for gdia_notify_types and the array size is > 0x3fffffff, which results in an overflow in the value of nbytes which is passed to read_buf(). If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after the overflow occurs, the value of nbytes 0, and when that happens the pointer returned by read_buf() points to the end of the xdr data (i.e. argp->end) when really it should be returning NULL. Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this value is arbitrary, but it's the same threshold used by nfsd4_decode_bitmap()... in could really be any value >= 1 since it's expected to get at most a single bitmap in gdia_notify_types). [ 119.256854] ================================================================== [ 119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd] [ 119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538 [ 119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1 [ 119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 [ 119.261202] Call Trace: [ 119.262265] dump_stack+0x71/0xab [ 119.263371] print_address_description+0x6a/0x270 [ 119.264609] kasan_report+0x258/0x380 [ 119.265854] ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd] [ 119.267291] nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd] [ 119.268549] ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd] [ 119.269873] ? nfsd4_decode_sequence+0x490/0x490 [nfsd] [ 119.271095] nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd] [ 119.272393] ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd] [ 119.273658] nfsd_dispatch+0x183/0x850 [nfsd] [ 119.274918] svc_process+0x161c/0x31a0 [sunrpc] [ 119.276172] ? svc_printk+0x190/0x190 [sunrpc] [ 119.277386] ? svc_xprt_release+0x451/0x680 [sunrpc] [ 119.278622] nfsd+0x2b9/0x430 [nfsd] [ 119.279771] ? nfsd_destroy+0x1c0/0x1c0 [nfsd] [ 119.281157] kthread+0x2db/0x390 [ 119.282347] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 119.283756] ret_from_fork+0x35/0x40 [ 119.286041] Allocated by task 436: [ 119.287525] kasan_kmalloc+0xa0/0xd0 [ 119.288685] kmem_cache_alloc+0xe9/0x1f0 [ 119.289900] get_empty_filp+0x7b/0x410 [ 119.291037] path_openat+0xca/0x4220 [ 119.292242] do_filp_open+0x182/0x280 [ 119.293411] do_sys_open+0x216/0x360 [ 119.294555] do_syscall_64+0xa0/0x2f0 [ 119.295721] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 119.298068] Freed by task 436: [ 119.299271] __kasan_slab_free+0x130/0x180 [ 119.300557] kmem_cache_free+0x78/0x210 [ 119.301823] rcu_process_callbacks+0x35b/0xbd0 [ 119.303162] __do_softirq+0x192/0x5ea [ 119.305443] The buggy address belongs to the object at ffff880113ada000 which belongs to the cache filp of size 256 [ 119.308556] The buggy address is located 0 bytes inside of 256-byte region [ffff880113ada000, ffff880113ada100) [ 119.311376] The buggy address belongs to the page: [ 119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780 [ 119.314428] flags: 0x17ffe000000100(slab) [ 119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001 [ 119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000 [ 119.319050] page dumped because: kasan: bad access detected [ 119.321652] Memory state around the buggy address: [ 119.322993] ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 119.324515] ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 119.327547] ^ [ 119.328730] ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 119.330218] ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 119.331740] ================================================================== Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdirScott Mayhew
commit 9c2ece6ef67e9d376f32823086169b489c422ed0 upstream. nfsd4_readdir_rsize restricts rd_maxcount to svc_max_payload when estimating the size of the readdir reply, but nfsd_encode_readdir restricts it to INT_MAX when encoding the reply. This can result in log messages like "kernel: RPC request reserved 32896 but used 1049444". Restrict rd_dircount similarly (no reason it should be larger than svc_max_payload). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-19nfsd: fix incorrect umasksJ. Bruce Fields
commit 880a3a5325489a143269a8e172e7563ebf9897bc upstream. We're neglecting to clear the umask after it's set, which can cause a later unrelated rpc to (incorrectly) use the same umask if it happens to be processed by the same thread. There's a more subtle problem here too: An NFSv4 compound request is decoded all in one pass before any operations are executed. Currently we're setting current->fs->umask at the time we decode the compound. In theory a single compound could contain multiple creates each setting a umask. In that case we'd end up using whichever umask was passed in the *last* operation as the umask for all the creates, whether that was correct or not. So, we should just be saving the umask at decode time and waiting to set it until we actually process the corresponding operation. In practice it's unlikely any client would do multiple creates in a single compound. And even if it did they'd likely be from the same process (hence carry the same umask). So this is a little academic, but we should get it right anyway. Fixes: 47057abde515 (nfsd: add support for the umask attribute) Cc: stable@vger.kernel.org Reported-by: Lucash Stach <l.stach@pengutronix.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28nfsd: remove blocked locks on client teardownJeff Layton
commit 68ef3bc3166468678d5e1fdd216628c35bd1186f upstream. We had some reports of panics in nfsd4_lm_notify, and that showed a nfs4_lockowner that had outlived its so_client. Ensure that we walk any leftover lockowners after tearing down all of the stateids, and remove any blocked locks that they hold. With this change, we also don't need to walk the nbl_lru on nfsd_net shutdown, as that will happen naturally when we tear down the clients. Fixes: 76d348fadff5 (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks) Reported-by: Frank Sorenson <fsorenso@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 4.9 Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24nfsd4: permit layoutget of executable-only filesBenjamin Coddington
[ Upstream commit 66282ec1cf004c09083c29cb5e49019037937bbd ] Clients must be able to read a file in order to execute it, and for pNFS that means the client needs to be able to perform a LAYOUTGET on the file. This behavior for executable-only files was added for OPEN in commit a043226bc140 "nfsd4: permit read opens of executable-only files". This fixes up xfstests generic/126 on block/scsi layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03lockd: fix "list_add double add" caused by legacy signal interfaceVasily Averin
[ Upstream commit 81833de1a46edce9ca20cfe079872ac1c20ef359 ] restart_grace() uses hardcoded init_net. It can cause to "list_add double add" in following scenario: 1) nfsd and lockd was started in several net namespaces 2) nfsd in init_net was stopped (lockd was not stopped because it have users from another net namespaces) 3) lockd got signal, called restart_grace() -> set_grace_period() and enabled lock_manager in hardcoded init_net. 4) nfsd in init_net is started again, its lockd_up() calls set_grace_period() and tries to add lock_manager into init_net 2nd time. Jeff Layton suggest: "Make it safe to call locks_start_grace multiple times on the same lock_manager. If it's already on the global grace_list, then don't try to add it again. (But we don't intentionally add twice, so for now we WARN about that case.) With this change, we also need to ensure that the nfsd4 lock manager initializes the list before we call locks_start_grace. While we're at it, move the rest of the nfsd_net initialization into nfs4_state_create_net. I see no reason to have it spread over two functions like it is today." Suggested patch was updated to generate warning in described situation. Suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03nfsd: check for use of the closed special stateidAndrew Elble
[ Upstream commit ae254dac721d44c0bfebe2795df87459e2e88219 ] Prevent the use of the closed (invalid) special stateid by clients. Signed-off-by: Andrew Elble <aweits@rit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03nfsd: Ensure we check stateid validity in the seqid operation checksTrond Myklebust
[ Upstream commit 9271d7e509c1bfc0b9a418caec29ec8d1ac38270 ] After taking the stateid st_mutex, we want to know that the stateid still represents valid state before performing any non-idempotent actions. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)Trond Myklebust
[ Upstream commit fb500a7cfee7f2f447d2bbf30cb59629feab6ac1 ] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-31nfsd: auth: Fix gid sorting when rootsquash enabledBen Hutchings
commit 1995266727fa8143897e89b55f5d3c79aa828420 upstream. Commit bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility group_info allocators") appears to break nfsd rootsquash in a pretty major way. It adds a call to groups_sort() inside the loop that copies/squashes gids, which means the valid gids are sorted along with the following garbage. The net result is that the highest numbered valid gids are replaced with any lower-valued garbage gids, possibly including 0. We should sort only once, after filling in all the gids. Fixes: bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Wolfgang Walter <linux@stwm.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20kernel: make groups_sort calling a responsibility group_info allocatorsThiago Rafael Becker
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream. In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NeilBrown <neilb@suse.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05nfsd: fix panic in posix_unblock_lock called from nfs4_laundromatNaofumi Honda
commit 64ebe12494fd5d193f014ce38e1fd83cc57883c8 upstream. From kernel 4.9, my two nfsv4 servers sometimes suffer from "panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat(). These panics diseappear if we revert the commit "nfsd: add a LRU list for blocked locks". The cause appears to be a typo in nfs4_laundromat(), which is also present in nfs4_state_shutdown_net(). Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks" Cc: jlayton@redhat.com Reveiwed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05nfsd: Fix another OPEN stateid raceTrond Myklebust
commit d8a1a000555ecd1b824ac1ed6df8fe364dfbbbb0 upstream. If nfsd4_process_open2() is initialising a new stateid, and yet the call to nfs4_get_vfs_file() fails for some reason, then we must declare the stateid closed, and unhash it before dropping the mutex. Right now, we unhash the stateid after dropping the mutex, and without changing the stateid type, meaning that another OPEN could theoretically look it up and attempt to use it. Reported-by: Andrew W Elble <aweits@rit.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05nfsd: Fix stateid races between OPEN and CLOSETrond Myklebust
commit 15ca08d3299682dc49bad73251677b2c5017ef08 upstream. Open file stateids can linger on the nfs4_file list of stateids even after they have been closed. In order to avoid reusing such a stateid, and confusing the client, we need to recheck the nfs4_stid's type after taking the mutex. Otherwise, we risk reusing an old stateid that was already closed, which will confuse clients that expect new stateids to conform to RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30nfsd: deal with revoked delegations appropriatelyAndrew Elble
commit 95da1b3a5aded124dd1bda1e3cdb876184813140 upstream. If a delegation has been revoked by the server, operations using that delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1 case, and NFS4ERR_BAD_STATEID otherwise. The server needs NFSv4.1 clients to explicitly free revoked delegations. If the server returns NFS4ERR_DELEG_REVOKED, the client will do that; otherwise it may just forget about the delegation and be unable to recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a SEQUENCE reply. That can cause the Linux 4.1 client to loop in its stage manager. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05nfsd4: define nfsd4_secinfo_no_name_release()Eryu Guan
Commit 34b1744c91cc ("nfsd4: define ->op_release for compound ops") defined a couple ->op_release functions and run them if necessary. But there's a problem with that is that it reused nfsd4_secinfo_release() as the op_release of OP_SECINFO_NO_NAME, and caused a leak on struct nfsd4_secinfo_no_name in nfsd4_encode_secinfo_no_name(), because there's no .si_exp field in struct nfsd4_secinfo_no_name. I found this because I was unable to umount an ext4 partition after exporting it via NFS & run fsstress on the nfs mount. A simplified reproducer would be: # mount a local-fs device at /mnt/test, and export it via NFS with # fsid=0 export option (this is required) mount /dev/sda5 /mnt/test echo "/mnt/test *(rw,no_root_squash,fsid=0)" >> /etc/exports service nfs restart # locally mount the nfs export with all default, note that I have # nfsv4.1 configured as the default nfs version, because of the # fsid export option, v4 mount would fail and fall back to v3 mount localhost:/mnt/test /mnt/nfs # try to umount the underlying device, but got EBUSY umount /mnt/nfs service nfs stop umount /mnt/test <=== EBUSY here Fixed it by defining a separate nfsd4_secinfo_no_name_release() function as the op_release method of OP_SECINFO_NO_NAME that releases the correct nfsd4_secinfo_no_name structure. Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops") Signed-off-by: Eryu Guan <eguan@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-09Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "More RDMA work and some op-structure constification from Chuck Lever, and a small cleanup to our xdr encoding" * tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux: svcrdma: Estimate Send Queue depth properly rdma core: Add rdma_rw_mr_payload() svcrdma: Limit RQ depth svcrdma: Populate tail iovec when receiving nfsd: Incoming xdr_bufs may have content in tail buffer svcrdma: Clean up svc_rdma_build_read_chunk() sunrpc: Const-ify struct sv_serv_ops nfsd: Const-ify NFSv4 encoding and decoding ops arrays sunrpc: Const-ify instances of struct svc_xprt_ops nfsd4: individual encoders no longer see error cases nfsd4: skip encoder in trivial error cases nfsd4: define ->op_release for compound ops nfsd4: opdesc will be useful outside nfs4proc.c nfsd4: move some nfsd4 op definitions to xdr4.h
2017-09-05nfsd: Incoming xdr_bufs may have content in tail bufferChuck Lever
Since the beginning, svcsock has built a received RPC Call message by populating the xdr_buf's head, then placing the remaining message bytes in the xdr_buf's page list. The xdr_buf's tail is never populated. This means that an NFSv4 COMPOUND containing an NFS WRITE operation plus trailing operations has a page list that contains the WRITE data payload followed by the trailing operations. NFSv4 XDR decoders will not look in the xdr_buf's tail, ever, because svcsock never put anything there. To support transports that can pass the write payload in the xdr_buf's pagelist and trailing content in the xdr_buf's tail, introduce logic in READ_BUF that switches to the xdr_buf's tail vec when the decoder runs out of content in rq_arg.pages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-05merge nfsd 4.13 bugfixes into nfsd for-4.14 branchJ. Bruce Fields
2017-08-31annotate RWF_... flagsChristoph Hellwig
[AV: added missing annotations in syscalls.h/compat.h] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-08-24sunrpc: Const-ify struct sv_serv_opsChuck Lever
Close an attack vector by moving the arrays of per-server methods to read-only memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd: Const-ify NFSv4 encoding and decoding ops arraysChuck Lever
Close an attack vector by moving the arrays of encoding and decoding methods to read-only memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd4: individual encoders no longer see error casesJ. Bruce Fields
With a few exceptions, most individual encoders don't handle error cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd4: skip encoder in trivial error casesJ. Bruce Fields
Most encoders do nothing in the error case. But they can still screw things up in that case: most errors happen very early in rpc processing, possibly before argument fields are filled in and bounds-tested, so encoders that do anything other than immediately bail on error can easily crash in odd error cases. So just handle errors centrally most of the time to remove the chance of error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd4: define ->op_release for compound opsJ. Bruce Fields
Run a separate ->op_release function if necessary instead of depending on the xdr encoder to do this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd4: opdesc will be useful outside nfs4proc.cJ. Bruce Fields
Trivial cleanup, no change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-24nfsd: Limit end of page list when decoding NFSv4 WRITEChuck Lever
When processing an NFSv4 WRITE operation, argp->end should never point past the end of the data in the final page of the page list. Otherwise, nfsd4_decode_compound can walk into uninitialized memory. More critical, nfsd4_decode_write is failing to increment argp->pagelen when it increments argp->pagelist. This can cause later xdr decoders to assume more data is available than really is, which can cause server crashes on malformed requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-08-01nfsd4: move some nfsd4 op definitions to xdr4.hJ. Bruce Fields
I want code in nfs4xdr.c to have access to this stuff. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-17nfsd: Fix a memory scribble in the callback channelTrond Myklebust
The offset of the entry in struct rpc_version has to match the version number. Reported-by: Dave Jones <davej@codemonkey.org.uk> Fixes: 1c5876ddbdb4 ("sunrpc: move p_count out of struct rpc_procinfo") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-13Merge tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Chuck's RDMA update overhauls the "call receive" side of the RPC-over-RDMA transport to use the new rdma_rw API. Christoph cleaned the way nfs operations are declared, removing a bunch of function-pointer casts and declaring the operation vectors as const. Christoph's changes touch both client and server, and both client and server pulls this time around should be based on the same commits from Christoph" * tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits) svcrdma: fix an incorrect check on -E2BIG and -EINVAL nfsd4: factor ctime into change attribute svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field svcrdma: use offset_in_page() macro svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API svcrdma: Clean-up svc_rdma_unmap_dma svcrdma: Remove frmr cache svcrdma: Remove unused Read completion handlers svcrdma: Properly compute .len and .buflen for received RPC Calls svcrdma: Use generic RDMA R/W API in RPC Call path svcrdma: Add recvfrom helpers to svc_rdma_rw.c sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst svcrdma: Don't account for Receive queue "starvation" svcrdma: Improve Reply chunk sanity checking svcrdma: Improve Write chunk sanity checking svcrdma: Improve Read chunk sanity checking svcrdma: Remove svc_rdma_marshal.c svcrdma: Avoid Send Queue overflow svcrdma: Squelch disconnection messages sunrpc: Disable splice for krb5i ...
2017-07-12nfsd4: factor ctime into change attributeJ. Bruce Fields
Factoring ctime into the nfsv4 change attribute gives us better properties than just i_version alone. Eventually we'll likely also expose this (as opposed to raw i_version) to userspace, at which point we'll want to move it to a common helper, called from either userspace or individual filesystems. For now, nfsd is the only user. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-05Merge branch 'work.read_write' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull read/write updates from Al Viro: "Christoph's fs/read_write.c series - consolidation and cleanups" * 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: remove nfsd_vfs_read nfsd: use vfs_iter_read/write fs: implement vfs_iter_write using do_iter_write fs: implement vfs_iter_read using do_iter_read fs: move more code into do_iter_read/do_iter_write fs: remove __do_readv_writev fs: remove do_compat_readv_writev fs: remove do_readv_writev
2017-07-05Merge branch 'work.misc-set_fs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc user access cleanups from Al Viro: "The first pile is assorted getting rid of cargo-culted access_ok(), cargo-culted set_fs() and field-by-field copyouts. The same description applies to a lot of stuff in other branches - this is just the stuff that didn't fit into a more specific topical branch" * 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Switch flock copyin/copyout primitives to copy_{from,to}_user() fs/fcntl: return -ESRCH in f_setown when pid/pgid can't be found fs/fcntl: f_setown, avoid undefined behaviour fs/fcntl: f_setown, allow returning error lpfc debugfs: get rid of pointless access_ok() adb: get rid of pointless access_ok() isdn: get rid of pointless access_ok() compat statfs: switch to copy_to_user() fs/locks: don't mess with the address limit in compat_fcntl64 nfsd_readlink(): switch to vfs_get_link() drbd: ->sendpage() never needed set_fs() fs/locks: pass kernel struct flock to fcntl_getlk/setlk fs: locks: Fix some troubles at kernel-doc comments
2017-06-29nfsd: remove nfsd_vfs_readChristoph Hellwig
Simpler done in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29nfsd: use vfs_iter_read/writeChristoph Hellwig
Instead of messing with the address limit to use vfs_read/vfs_writev. Note that this requires that exported file implement ->read_iter and ->write_iter. All currently exportable file systems do this. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>