summaryrefslogtreecommitdiffstats
path: root/fs/nfsd/nfs4state.c
AgeCommit message (Collapse)Author
2020-10-01nfsd: Don't add locks to closed or closing open stateidsTrond Myklebust
[ Upstream commit a451b12311aa8c96c6f6e01c783a86995dc3ec6b ] In NFSv4, the lock stateids are tied to the lockowner, and the open stateid, so that the action of closing the file also results in either an automatic loss of the locks, or an error of the form NFS4ERR_LOCKS_HELD. In practice this means we must not add new locks to the open stateid after the close process has been invoked. In fact doing so, can result in the following panic: kernel BUG at lib/list_debug.c:51! invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 PID: 1085 Comm: nfsd Not tainted 5.6.0-rc3+ #2 Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.14410784.B64.1908150010 08/15/2019 RIP: 0010:__list_del_entry_valid.cold+0x31/0x55 Code: 1a 3d 9b e8 74 10 c2 ff 0f 0b 48 c7 c7 f0 1a 3d 9b e8 66 10 c2 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 b0 1a 3d 9b e8 52 10 c2 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 78 1a 3d 9b e8 3e 10 c2 ff 0f 0b RSP: 0018:ffffb296c1d47d90 EFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff8ba032456ec8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8ba039e99cc8 RDI: ffff8ba039e99cc8 RBP: ffff8ba032456e60 R08: 0000000000000781 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ba009a4abe0 R13: ffff8ba032456e8c R14: 0000000000000000 R15: ffff8ba00adb01d8 FS: 0000000000000000(0000) GS:ffff8ba039e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb213f0b008 CR3: 00000001347de006 CR4: 00000000003606e0 Call Trace: release_lock_stateid+0x2b/0x80 [nfsd] nfsd4_free_stateid+0x1e9/0x210 [nfsd] nfsd4_proc_compound+0x414/0x700 [nfsd] ? nfs4svc_decode_compoundargs+0x407/0x4c0 [nfsd] nfsd_dispatch+0xc1/0x200 [nfsd] svc_process_common+0x476/0x6f0 [sunrpc] ? svc_sock_secure_port+0x12/0x30 [sunrpc] ? svc_recv+0x313/0x9c0 [sunrpc] ? nfsd_svc+0x2d0/0x2d0 [nfsd] svc_process+0xd4/0x110 [sunrpc] nfsd+0xe3/0x140 [nfsd] kthread+0xf9/0x130 ? nfsd_destroy+0x50/0x50 [nfsd] ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x40 The fix is to ensure that lock creation tests for whether or not the open stateid is unhashed, and to fail if that is the case. Fixes: 659aefb68eca ("nfsd: Ensure we don't recognise lock stateids after freeing them") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29nfsd4: fix NULL dereference in nfsd/clients display codeJ. Bruce Fields
[ Upstream commit 9affa435817711861d774f5626c393c80f16d044 ] We hold the cl_lock here, and that's enough to keep stateid's from going away, but it's not enough to prevent the files they point to from going away. Take fi_lock and a reference and check for NULL, as we do in other code. Reported-by: NeilBrown <neilb@suse.de> Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09nfsd4: fix nfsdfs reference count loopJ. Bruce Fields
[ Upstream commit 681370f4b00af0fcc65bbfb9f82de526ab7ceb0a ] We don't drop the reference on the nfsdfs filesystem with mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called until the nfsd module's unloaded, and we can't unload the module as long as there's a reference on nfsdfs. So this prevents module unloading. Fixes: 2c830dd7209b ("nfsd: persist nfsd filesystem across mounts") Reported-and-Tested-by: Luo Xiaogang <lxgrxd@163.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-02nfsd: memory corruption in nfsd4_lock()Vasily Averin
commit e1e8399eee72e9d5246d4d1bcacd793debe34dd3 upstream. New struct nfsd4_blocked_lock allocated in find_or_allocate_block() does not initialized nbl_list and nbl_lru. If conflock allocation fails rollback can call list_del_init() access uninitialized fields and corrupt memory. v2: just initialize nbl_list and nbl_lru right after nbl allocation. Fixes: 76d348fadff5 ("nfsd: have nfsd4_lock use blocking locks for v4.1+ lock") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11nfsd: fix jiffies/time_t mixup in LRU listArnd Bergmann
commit 9594497f2c78993cb66b696122f7c65528ace985 upstream. The nfsd4_blocked_lock->nbl_time timestamp is recorded in jiffies, but then compared to a CLOCK_REALTIME timestamp later on, which makes no sense. For consistency with the other timestamps, change this to use a time_t. This is a change in behavior, which may cause regressions, but the current code is not sensible. On a system with CONFIG_HZ=1000, the 'time_after((unsigned long)nbl->nbl_time, (unsigned long)cutoff))' check is false for roughly the first 18 days of uptime and then true for the next 49 days. Fixes: 7919d0a27f1e ("nfsd: add a LRU list for blocked locks") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09nfsd4: fix up replay_matches_cache()Scott Mayhew
commit 6e73e92b155c868ff7fce9d108839668caf1d9be upstream. When running an nfs stress test, I see quite a few cached replies that don't match up with the actual request. The first comment in replay_matches_cache() makes sense, but the code doesn't seem to match... fix it. This isn't exactly a bugfix, as the server isn't required to catch every case of a false retry. So, we may as well do this, but if this is fixing a problem then that suggests there's a client bug. Fixes: 53da6a53e1d4 ("nfsd4: catch some false session retries") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-20nfsd: degraded slot-count more gracefully as allocation nears exhaustion.NeilBrown
This original code in nfsd4_get_drc_mem() would hand out 30 slots (approximately NFSD_MAX_MEM_PER_SESSION bytes at slightly over 2K per slot) to each requesting client until it ran out of space, then it would possibly give one last client a reduced allocation, then fail the allocation. Since commit de766e570413 ("nfsd: give out fewer session slots as limit approaches") the last 90 slots to be given to about 12 clients with quickly reducing slot counts (better than just 3 clients). This still seems unnecessarily hasty. A subsequent patch allows over-allocation so every client gets at least one slot, but that might be a bit restrictive. The requested number of nfsd threads is the best guide we have to the expected number of clients, so use that - if it is at least 8. 256 threads on a 256Meg machine - which is a lot for a tiny machine - would result in nfsd_drc_max_mem being 2Meg, so 8K (3 slots) would be available for the first client, and over 200 clients would get more than 1 slot. So I don't think this change will be too debilitating on poorly configured machines, though it does mean that a sensible configuration is a little more important. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-09-20nfsd: handle drc over-allocation gracefully.NeilBrown
Currently, if there are more clients than allowed for by the space allocation in set_max_drc(), we fail a SESSION_CREATE request with NFS4ERR_DELAY. This means that the client retries indefinitely, which isn't a user-friendly response. The RFC requires NFS4ERR_NOSPC, but that would at best result in a clean failure on the client, which is not much more friendly. The current space allocation is a best-guess and doesn't provide any guarantees, we could still run out of space when trying to allocate drc space. So fail more gracefully - always give out at least one slot. If all clients used all the space in all slots, we might start getting memory pressure, but that is possible anyway. So ensure 'num' is always at least 1, and remove the test for it being zero. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-09-10nfsd: add support for upcall version 2Scott Mayhew
Version 2 upcalls will allow the nfsd to include a hash of the kerberos principal string in the Cld_Create upcall. If a principal is present in the svc_cred, then the hash will be included in the Cld_Create upcall. We attempt to use the svc_cred.cr_raw_principal (which is returned by gssproxy) first, and then fall back to using the svc_cred.cr_principal (which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent restart, the hash will be returned in the Cld_Gracestart downcall and stored in the reclaim_str_hashtbl so it can be used when handling reclaim opens. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19nfsd: have nfsd_test_lock use the nfsd_file cacheJeff Layton
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cacheJeff Layton
Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp. Since we now presume that the struct file will be persistent in most cases, we can stop fiddling with the raparms in the read code. This also means that we don't really care about the rd_tmp_file field anymore. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19nfsd: convert fi_deleg_file and ls_file fields to nfsd_fileJeff Layton
Have them keep an nfsd_file reference instead of a struct file. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19nfsd: convert nfs4_file->fi_fds array to use nfsd_filesJeff Layton
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-09nfsd: Make two functions staticYueHaibing
Fix sparse warnings: fs/nfsd/nfs4state.c:1908:6: warning: symbol 'drop_client' was not declared. Should it be static? fs/nfsd/nfs4state.c:2518:6: warning: symbol 'force_expire_client' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: decode implementation idJ. Bruce Fields
Decode the implementation ID and display in nfsd/clients/#/info. It may be help identify the client. It won't be used otherwise. (When this went into the protocol, I thought the implementation ID would be a slippery slope towards implementation-specific workarounds as with the http user-agent. But I guess I was wrong, the risk seems pretty low now.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: create xdr_netobj_dup helperJ. Bruce Fields
Move some repeated code to a common helper. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: allow forced expiration of NFSv4 clientsJ. Bruce Fields
NFSv4 clients are automatically expired and all their locks removed if they don't contact the server for a certain amount of time (the lease period, 90 seconds by default). There can still be situations where that's not enough, so allow userspace to force expiry by writing "expire\n" to the new nfsd/client/#/ctl file. (The generic "ctl" name is because I expect we may want to allow other operations on clients in the future.) The write will not return until the client is expired and all of its locks and other state removed. The fault injection code also provides a way of expiring clients, but it fails if there are any in-progress RPC's referencing the client. Also, its method of selecting a client to expire is a little more primitive--it uses an IP address, which can't always uniquely specify an NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: create get_nfsdfs_clp helperJ. Bruce Fields
Factor our some common code. No change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: show layout stateidsJ. Bruce Fields
These are also minimal for now, I'm not sure what information would be useful. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: show lock and deleg stateidsJ. Bruce Fields
These entries are pretty minimal for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: add file to display list of client's opensJ. Bruce Fields
Add a nfsd/clients/#/opens file to list some information about all the opens held by the given client, including open modes, device numbers, inode numbers, and open owners. Open owners are totally opaque but seem to sometimes have some useful ascii strings included, so passing through printable ascii characters and escaping the rest seems useful while still being machine-readable. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: add more information to client info fileJ. Bruce Fields
Add ip address, full client-provided identifier, and minor version. There's much more that could possibly be useful but this is a start. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: copy client's address including port number to cl_addrJ. Bruce Fields
rpc_copy_addr() copies only the IP address and misses any port numbers. It seems potentially useful to keep the port number around too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: add a client info fileJ. Bruce Fields
Add a new nfsd/clients/#/info file with some basic information about each NFSv4 client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: make client/ directory names small intsJ. Bruce Fields
We want clientid's on the wire to be randomized for reasons explained in ebd7c72c63ac "nfsd: randomize SETCLIENTID reply to help distinguish servers". But I'd rather have mostly small integers for the clients/ directory. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: add nfsd/clients directoryJ. Bruce Fields
I plan to expose some information about nfsv4 clients here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd4: use reference count to free clientJ. Bruce Fields
Keep a second reference count which is what is really used to decide when to free the client's memory. Next I'm going to add an nfsd/clients/ directory with a subdirectory for each NFSv4 client. File objects under nfsd/clients/ will hold these references. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: rename cl_refcountJ. Bruce Fields
Rename this to a more descriptive name: it counts the number of in-progress rpc's referencing this client. Next I'm going to add a second refcount with a slightly different use. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03nfsd: Fix overflow causing non-working mounts on 1 TB machinesPaul Menzel
Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-05-15Merge tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "This consists mostly of nfsd container work: Scott Mayhew revived an old api that communicates with a userspace daemon to manage some on-disk state that's used to track clients across server reboots. We've been using a usermode_helper upcall for that, but it's tough to run those with the right namespaces, so a daemon is much friendlier to container use cases. Trond fixed nfsd's handling of user credentials in user namespaces. He also contributed patches that allow containers to support different sets of NFS protocol versions. The only remaining container bug I'm aware of is that the NFS reply cache is shared between all containers. If anyone's aware of other gaps in our container support, let me know. The rest of this is miscellaneous bugfixes" * tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits) nfsd: update callback done processing locks: move checks from locks_free_lock() to locks_release_private() nfsd: fh_drop_write in nfsd_unlink nfsd: allow fh_want_write to be called twice nfsd: knfsd must use the container user namespace SUNRPC: rsi_parse() should use the current user namespace SUNRPC: Fix the server AUTH_UNIX userspace mappings lockd: Pass the user cred from knfsd when starting the lockd server SUNRPC: Temporary sockets should inherit the cred from their parent SUNRPC: Cache the process user cred in the RPC server listener nfsd: Allow containers to set supported nfs versions nfsd: Add custom rpcbind callbacks for knfsd SUNRPC: Allow further customisation of RPC program registration SUNRPC: Clean up generic dispatcher code SUNRPC: Add a callback to initialise server requests SUNRPC/nfs: Fix return value for nfs4_callback_compound() nfsd: handle legacy client tracking records sent by nfsdcld nfsd: re-order client tracking method selection nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld nfsd: un-deprecate nfsdcld ...
2019-05-07Merge tag 'Wimplicit-fallthrough-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva: "Mark switch cases where we are expecting to fall through. This is part of the ongoing efforts to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. We are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the several missing break/return bugs, some of them introduced more than 5 years ago. Once this work is finished, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again" * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) memstick: mark expected switch fall-throughs drm/nouveau/nvkm: mark expected switch fall-throughs NFC: st21nfca: Fix fall-through warnings NFC: pn533: mark expected switch fall-throughs block: Mark expected switch fall-throughs ASN.1: mark expected switch fall-through lib/cmdline.c: mark expected switch fall-throughs lib: zstd: Mark expected switch fall-throughs scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs scsi: ppa: mark expected switch fall-through scsi: osst: mark expected switch fall-throughs scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs scsi: lpfc: lpfc_nvme: Mark expected switch fall-through scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs scsi: lpfc: lpfc_els: Mark expected switch fall-throughs scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs scsi: imm: mark expected switch fall-throughs scsi: csiostor: csio_wr: mark expected switch fall-through ...
2019-05-03nfsd: update callback done processingScott Mayhew
Instead of having the convention where individual nfsd4_callback_ops->done operations return -1 to indicate the callback path is down, move the check to nfsd4_cb_done. Only mark the callback path down on transport-level errors, not NFS-level errors. The existing logic causes the server to set SEQ4_STATUS_CB_PATH_DOWN just because the client returned an error to a CB_RECALL for a delegation that the client had already done a FREE_STATEID for. But clearly that error doesn't mean that there's anything wrong with the backchannel. Additionally, handle NFS4ERR_DELAY in nfsd4_cb_recall_done. The client returns NFS4ERR_DELAY if it is already in the process of returning the delegation. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcldScott Mayhew
When using nfsdcld for NFSv4 client tracking, track the number of RECLAIM_COMPLETE operations we receive from "known" clients to help in deciding if we can lift the grace period early (or whether we need to start a v4 grace period at all). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-24nfsd: make nfs4_client_reclaim use an xdr_netobj instead of a fixed char arrayScott Mayhew
This will allow the reclaim_str_hashtbl to store either the recovery directory names used by the legacy client tracking code or the full client strings used by the nfsdcld client tracking code. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-22nfsd: wake blocked file lock waiters before sending callbackJeff Layton
When a blocked NFS lock is "awoken" we send a callback to the server and then wake any hosts waiting on it. If a client attempts to get a lock and then drops off the net, we could end up waiting for a long time until we end up waking locks blocked on that request. So, wake any other waiting lock requests before sending the callback. Do this by calling locks_delete_block in a new "prepare" phase for CB_NOTIFY_LOCK callbacks. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-22nfsd: wake waiters blocked on file_lock before deleting itJeff Layton
After a blocked nfsd file_lock request is deleted, knfsd will send a callback to the client and then free the request. Commit 16306a61d3b7 ("fs/locks: always delete_block after waiting.") changed it such that locks_delete_block is always called on a request after it is awoken, but that patch missed fixing up blocked nfsd request handling. Call locks_delete_block on the block to wake up any locks still blocked on the nfsd lock request before freeing it. Some of its callers already do this however, so just remove those calls. URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363 Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Reported-by: Slawomir Pryczek <slawek1211@gmail.com> Cc: Neil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-04-08fs: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-02-21nfsd: fix performance-limiting session calculationJ. Bruce Fields
We're unintentionally limiting the number of slots per nfsv4.1 session to 10. Often more than 10 simultaneous RPCs are needed for the best performance. This calculation was meant to prevent any one client from using up more than a third of the limit we set for total memory use across all clients and sessions. Instead, it's limiting the client to a third of the maximum for a single session. Fix this. Reported-by: Chris Tracy <ctracy@engr.scu.edu> Cc: stable@vger.kernel.org Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-01-02Merge tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Thanks to Vasily Averin for fixing a use-after-free in the containerized NFSv4.2 client, and cleaning up some convoluted backchannel server code in the process. Otherwise, miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-4.21' of git://linux-nfs.org/~bfields/linux: (25 commits) nfs: fixed broken compilation in nfs_callback_up_net() nfs: minor typo in nfs4_callback_up_net() sunrpc: fix debug message in svc_create_xprt() sunrpc: make visible processing error in bc_svc_process() sunrpc: remove unused xpo_prep_reply_hdr callback sunrpc: remove svc_rdma_bc_class sunrpc: remove svc_tcp_bc_class sunrpc: remove unused bc_up operation from rpc_xprt_ops sunrpc: replace svc_serv->sv_bc_xprt by boolean flag sunrpc: use-after-free in svc_process_common() sunrpc: use SVC_NET() in svcauth_gss_* functions nfsd: drop useless LIST_HEAD lockd: Show pid of lockd for remote locks NFSD remove OP_CACHEME from 4.2 op_flags nfsd: Return EPERM, not EACCES, in some SETATTR cases sunrpc: fix cache_head leak due to queued request nfsd: clean up indentation, increase indentation in switch statement svcrdma: Optimize the logic that selects the R_key to invalidate nfsd: fix a warning in __cld_pipe_upcall() nfsd4: fix crash on writing v4_end_grace before nfsd startup ...
2018-12-07fs/locks: merge posix_unblock_lock() and locks_delete_block()NeilBrown
posix_unblock_lock() is not specific to posix locks, and behaves nearly identically to locks_delete_block() - the former returning a status while the later doesn't. So discard posix_unblock_lock() and use locks_delete_block() instead, after giving that function an appropriate return value. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-11-28nfsd: clean up indentation, increase indentation in switch statementColin Ian King
Trivial fix to clean up indentation, add in missing tabs. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-27nfsd4: remove unused nfs4_check_olstateid parameterJ. Bruce Fields
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29nfsd: correctly decrement odstate refcount in error pathAndrew Elble
alloc_init_deleg() both allocates an nfs4_delegation, and bumps the refcount on odstate. So after this point, we need to put_clnt_odstate() and nfs4_put_stid() to not leave the odstate refcount inappropriately bumped. Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-09-25NFSD introduce async copy featureOlga Kornievskaia
Upon receiving a request for async copy, create a new kthread. If we get asynchronous request, make sure to copy the needed arguments/state from the stack before starting the copy. Then start the thread and reply back to the client indicating copy is asynchronous. nfsd_copy_file_range() will copy in a loop over the total number of bytes is needed to copy. In case a failure happens in the middle, we ignore the error and return how much we copied so far. Once done creating a workitem for the callback workqueue and send CB_OFFLOAD with the results. The lifetime of the copy stateid is bound to the vfs copy. This way we don't need to keep the nfsd_net structure for the callback. We could keep it around longer so that an OFFLOAD_STATUS that came late would still get results, but clients should be able to deal without that. We handle OFFLOAD_CANCEL by sending a signal to the copy thread and calling kthread_stop. A client should cancel any ongoing copies before calling DESTROY_CLIENT; if not, we return a CLIENT_BUSY error. If the client is destroyed for some other reason (lease expiration, or server shutdown), we must clean up any ongoing copies ourselves. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> [colin.king@canonical.com: fix leak in error case] [bfields@fieldses.org: remove signalling, merge patches] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22nfsd: Remove callback_credChuck Lever
Clean up: The global callback_cred is no longer used, so it can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22sunrpc: Extract target name into svc_credChuck Lever
NFSv4.0 callback needs to know the GSS target name the client used when it established its lease. That information is available from the GSS context created by gssproxy. Make it available in each svc_cred. Note this will also give us access to the real target service principal name (which is typically "nfs", but spec does not require that). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09nfsd: fix leaked file lock with nfs exported overlayfsAmir Goldstein
nfsd and lockd call vfs_lock_file() to lock/unlock the inode returned by locks_inode(file). Many places in nfsd/lockd code use the inode returned by file_inode(file) for lock manipulation. With Overlayfs, file_inode() (the underlying inode) is not the same object as locks_inode() (the overlay inode). This can result in "Leaked POSIX lock" messages and eventually to a kernel crash as reported by Eddie Horng: https://marc.info/?l=linux-unionfs&m=153086643202072&w=2 Fix all the call sites in nfsd/lockd that should use locks_inode(). This is a correctness bug that manifested when overlayfs gained NFS export support in v4.16. Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd: update obselete comment referencing the BKLJ. Bruce Fields
It's inode->i_lock that's now taken in setlease and break_lease, instead of the big kernel lock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: cleanup sessionid in nfsd4_destroy_sessionJ. Bruce Fields
The name of this variable doesn't fit the type. And we only ever use one field of it. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: less confusing nfsd4_compound_in_sessionJ. Bruce Fields
Make the function prototype match the name a little better. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>