summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2016-10-22cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]David Howells
commit a818101d7b92e76db2f9a597e4830734767473b9 upstream. An NULL-pointer dereference happens in cachefiles_mark_object_inactive() when it tries to read i_blocks so that it can tell the cachefilesd daemon how much space it's making available. The problem is that cachefiles_drop_object() calls cachefiles_mark_object_inactive() after calling cachefiles_delete_object() because the object being marked active staves off attempts to (re-)use the file at that filename until after it has been deleted. This means that d_inode is NULL by the time we come to try to access it. To fix the problem, have the caller of cachefiles_mark_object_inactive() supply the number of blocks freed up. Without this, the following oops may occur: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles] ... CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G I ------------ 3.10.0-470.el7.x86_64 #1 Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011 Workqueue: fscache_object fscache_object_work_func [fscache] task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000 RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles] RSP: 0018:ffff8800b77c3d70 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034 RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8 RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000 R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600 R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498 FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600 Call Trace: [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles] [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache] [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache] [<ffffffff810a605b>] process_one_work+0x17b/0x470 [<ffffffff810a6e96>] worker_thread+0x126/0x410 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460 [<ffffffff810ae64f>] kthread+0xcf/0xe0 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81695418>] ret_from_fork+0x58/0x90 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140 The oopsing code shows: callq 0xffffffff810af6a0 <wake_up_bit> mov 0xf8(%r12),%rax mov 0x30(%rax),%rax mov 0x98(%rax),%rax <---- oops here lock add %rax,0x130(%rbx) where this is: d_backing_inode(object->dentry)->i_blocks Fixes: a5b3a80b899bda0f456f1246c4c5a1191ea01519 (CacheFiles: Provide read-and-reset release counters for cachefilesd) Reported-by: Jianhong Yin <jiyin@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22vfs: move permission checking into notify_change() for utimes(NULL)Miklos Szeredi
commit f2b20f6ee842313a0d681dbbf7f87b70291a6a3b upstream. This fixes a bug where the permission was not properly checked in overlayfs. The testcase is ltp/utimensat01. It is also cleaner and safer to do the permission checking in the vfs helper instead of the caller. This patch introduces an additional ia_valid flag ATTR_TOUCH (since touch(1) is the most obvious user of utimes(NULL)) that is passed into notify_change whenever the conditions for this special permission checking mode are met. Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22dlm: free workqueues after the connectionsMarcelo Ricardo Leitner
commit 3a8db79889ce16930aff19b818f5b09651bb7644 upstream. After backporting commit ee44b4bc054a ("dlm: use sctp 1-to-1 API") series to a kernel with an older workqueue which didn't use RCU yet, it was noticed that we are freeing the workqueues in dlm_lowcomms_stop() too early as free_conn() will try to access that memory for canceling the queued works if any. This issue was introduced by commit 0d737a8cfd83 as before it such attempt to cancel the queued works wasn't performed, so the issue was not present. This patch fixes it by simply inverting the free order. Fixes: 0d737a8cfd83 ("dlm: fix race while closing connections") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: unmap metadata when zeroing blocksJan Kara
commit 9b623df614576680cadeaa4d7e0b5884de8f7c17 upstream. When zeroing blocks for DAX allocations, we also have to unmap aliases in the block device mappings. Otherwise writeback can overwrite zeros with stale data from block device page cache. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: release bh in make_indexed_dirgmail
commit e81d44778d1d57bbaef9e24c4eac7c8a7a401d40 upstream. The commit 6050d47adcad: "ext4: bail out from make_indexed_dir() on first error" could end up leaking bh2 in the error path. [ Also avoid renaming bh2 to bh, which just confuses things --tytso ] Signed-off-by: yangsheng <yngsion@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: allow DAX writeback for hole punchRoss Zwisler
commit cca32b7eeb4ea24fa6596650e06279ad9130af98 upstream. Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: fix memory leak in ext4_insert_range()Fabian Frederick
commit edf15aa180d7b98fe16bd3eda42f9dd0e60dee20 upstream. Running xfstests generic/013 with kmemleak gives the following: unreferenced object 0xffff8801d3d27de0 (size 96): comm "fsstress", pid 4941, jiffies 4294860168 (age 53.485s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff818eaaf3>] kmemleak_alloc+0x23/0x40 [<ffffffff81179805>] __kmalloc+0xf5/0x1d0 [<ffffffff8122ef5c>] ext4_find_extent+0x1ec/0x2f0 [<ffffffff8123530c>] ext4_insert_range+0x34c/0x4a0 [<ffffffff81235942>] ext4_fallocate+0x4e2/0x8b0 [<ffffffff81181334>] vfs_fallocate+0x134/0x210 [<ffffffff8118203f>] SyS_fallocate+0x3f/0x60 [<ffffffff818efa9b>] entry_SYSCALL_64_fastpath+0x13/0x8f [<ffffffffffffffff>] 0xffffffffffffffff Problem seems mitigated by dropping refs and freeing path when there's no path[depth].p_ext Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: bugfix for mmaped pages in mpage_release_unused_pages()wangguang
commit 4e800c0359d9a53e6bf0ab216954971b2515247f upstream. Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty, which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);. This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped. Steps to reproduce the bug: (1) mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096); (2) return EIO at ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent which causes this log message to be print: ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err); (3)Unmap the addr cause warning at __set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page)); (4) wait for a minute,then bugon happen. Signed-off-by: wangguang <wangguang03@zte.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: reinforce check of i_dtime when clearing high fields of uid and gidDaeho Jeong
commit 93e3b4e6631d2a74a8cf7429138096862ff9f452 upstream. Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid of deleted and evicted inode to fix up interoperability with old kernels. However, it checks only i_dtime of an inode to determine whether the inode was deleted and evicted, and this is very risky, because i_dtime can be used for the pointer maintaining orphan inode list, too. We need to further check whether the i_dtime is being used for the orphan inode list even if the i_dtime is not NULL. We found that high 16-bit fields of uid/gid of inode are unintentionally and permanently cleared when the inode truncation is just triggered, but not finished, and the inode metadata, whose high uid/gid bits are cleared, is written on disk, and the sudden power-off follows that in order. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22ext4: enforce online defrag restriction for encrypted filesEric Whitney
commit 14fbd4aa613bd5110556c281799ce36dc6f3ba97 upstream. Online defragging of encrypted files is not currently implemented. However, the move extent ioctl can still return successfully when called. For example, this occurs when xfstest ext4/020 is run on an encrypted file system, resulting in a corrupted test file and a corresponding test failure. Until the proper functionality is implemented, fail the move extent ioctl if either the original or donor file is encrypted. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()Mike Galbraith
commit 420902c9d086848a7548c83e0a49021514bd71b7 upstream. If we hold the superblock lock while calling reiserfs_quota_on_mount(), we can deadlock our own worker - mount blocks kworker/3:2, sleeps forever more. crash> ps|grep UN 715 2 3 ffff880220734d30 UN 0.0 0 0 [kworker/3:2] 9369 9341 2 ffff88021ffb7560 UN 1.3 493404 123184 Xorg 9665 9664 3 ffff880225b92ab0 UN 0.0 47368 812 udisks-daemon 10635 10403 3 ffff880222f22c70 UN 0.0 14904 936 mount crash> bt ffff880220734d30 PID: 715 TASK: ffff880220734d30 CPU: 3 COMMAND: "kworker/3:2" #0 [ffff8802244c3c20] schedule at ffffffff8144584b #1 [ffff8802244c3cc8] __rt_mutex_slowlock at ffffffff814472b3 #2 [ffff8802244c3d28] rt_mutex_slowlock at ffffffff814473f5 #3 [ffff8802244c3dc8] reiserfs_write_lock at ffffffffa05f28fd [reiserfs] #4 [ffff8802244c3de8] flush_async_commits at ffffffffa05ec91d [reiserfs] #5 [ffff8802244c3e08] process_one_work at ffffffff81073726 #6 [ffff8802244c3e68] worker_thread at ffffffff81073eba #7 [ffff8802244c3ec8] kthread at ffffffff810782e0 #8 [ffff8802244c3f48] kernel_thread_helper at ffffffff81450064 crash> rd ffff8802244c3cc8 10 ffff8802244c3cc8: ffffffff814472b3 ffff880222f23250 .rD.....P2.".... ffff8802244c3cd8: 0000000000000000 0000000000000286 ................ ffff8802244c3ce8: ffff8802244c3d30 ffff880220734d80 0=L$.....Ms .... ffff8802244c3cf8: ffff880222e8f628 0000000000000000 (.."............ ffff8802244c3d08: 0000000000000000 0000000000000002 ................ crash> struct rt_mutex ffff880222e8f628 struct rt_mutex { wait_lock = { raw_lock = { slock = 65537 } }, wait_list = { node_list = { next = 0xffff8802244c3d48, prev = 0xffff8802244c3d48 } }, owner = 0xffff880222f22c71, save_state = 0 } crash> bt 0xffff880222f22c70 PID: 10635 TASK: ffff880222f22c70 CPU: 3 COMMAND: "mount" #0 [ffff8802216a9868] schedule at ffffffff8144584b #1 [ffff8802216a9910] schedule_timeout at ffffffff81446865 #2 [ffff8802216a99a0] wait_for_common at ffffffff81445f74 #3 [ffff8802216a9a30] flush_work at ffffffff810712d3 #4 [ffff8802216a9ab0] schedule_on_each_cpu at ffffffff81074463 #5 [ffff8802216a9ae0] invalidate_bdev at ffffffff81178aba #6 [ffff8802216a9af0] vfs_load_quota_inode at ffffffff811a3632 #7 [ffff8802216a9b50] dquot_quota_on_mount at ffffffff811a375c #8 [ffff8802216a9b80] finish_unfinished at ffffffffa05dd8b0 [reiserfs] #9 [ffff8802216a9cc0] reiserfs_fill_super at ffffffffa05de825 [reiserfs] RIP: 00007f7b9303997a RSP: 00007ffff443c7a8 RFLAGS: 00010202 RAX: 00000000000000a5 RBX: ffffffff8144ef12 RCX: 00007f7b932e9ee0 RDX: 00007f7b93d9a400 RSI: 00007f7b93d9a3e0 RDI: 00007f7b93d9a3c0 RBP: 00007f7b93d9a2c0 R8: 00007f7b93d9a550 R9: 0000000000000001 R10: ffffffffc0ed040e R11: 0000000000000202 R12: 000000000000040e R13: 0000000000000000 R14: 00000000c0ed040e R15: 00007ffff443ca20 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22fuse: fix killing s[ug]id in setattrMiklos Szeredi
commit a09f99eddef44035ec764075a37bace8181bec38 upstream. Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on chown and truncate, and (since writeback_cache) write. The problem with this is that it'll potentially restore a stale mode. The poper fix would be to let the filesystems do the suid/sgid clearing on the relevant operations. Possibly some are already doing it but there's no way we can detect this. So fix this by refreshing and recalculating the mode. Do this only if ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is still racy but the size of the window is reduced. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22fuse: invalidate dir dentry after chmodMiklos Szeredi
commit 5e2b8828ff3d79aca8c3a1730652758753205b61 upstream. Without "default_permissions" the userspace filesystem's lookup operation needs to perform the check for search permission on the directory. If directory does not allow search for everyone (this is quite rare) then userspace filesystem has to set entry timeout to zero to make sure permissions are always performed. Changing the mode bits of the directory should also invalidate the (previously cached) dentry to make sure the next lookup will have a chance of updating the timeout, if needed. Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22fuse: listxattr: verify xattr listMiklos Szeredi
commit cb3ae6d25a5471be62bfe6ac1fccc0e91edeaba0 upstream. Make sure userspace filesystem is returning a well formed list of xattr names (zero or more nonzero length, null terminated strings). [Michael Theall: only verify in the nonzero size case] Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22btrfs: assign error values to the correct bio structsJunjie Mao
commit 14155cafeadda946376260e2ad5d39a0528a332f upstream. Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio") Signed-off-by: Junjie Mao <junjie.mao@enight.me> Acked-by: David Sterba <dsterba@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22Btrfs: catch invalid free space treesOmar Sandoval
commit 6675df311db87aa2107a04ef97e19420953cbace upstream. There are two separate issues that can lead to corrupted free space trees. 1. The free space tree bitmaps had an endianness issue on big-endian systems which is fixed by an earlier patch in this series. 2. btrfs-progs before v4.7.3 modified filesystems without updating the free space tree. To catch both of these issues at once, we need to force the free space tree to be rebuilt. To do so, add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit isn't set, we know that it was either produced by a broken big-endian kernel or may have been corrupted by btrfs-progs. This also provides us with a way to add rudimentary read-write support for the free space tree to btrfs-progs: it can just clear this bit and have the kernel rebuild the free space tree. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22Btrfs: fix free space tree bitmaps on big-endian systemsOmar Sandoval
commit 2fe1d55134fce05c17ea118a2e37a4af771887bc upstream. In convert_free_space_to_{bitmaps,extents}(), we buffer the free space bitmaps in memory and copy them directly to/from the extent buffers with {read,write}_extent_buffer(). The extent buffer bitmap helpers use byte granularity, which is equivalent to a little-endian bitmap. This means that on big-endian systems, the in-memory bitmaps will be written to disk byte-swapped. To fix this, use byte-granularity for the bitmaps in memory. Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree") Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-22debugfs: introduce a public file_operations accessorChristian Lamparter
commit 86f0e06767dda7863d6d2a8f0b3b857e6ea876a0 upstream. This patch introduces an accessor which can be used by the users of debugfs (drivers, fs, ...) to get the original file_operations struct. It also removes the REAL_FOPS_DEREF macro in file.c and converts the code to use the public version. Previously, REAL_FOPS_DEREF was only available within the file.c of debugfs. But having a public getter available for debugfs users is important as some drivers (carl9170 and b43) use the pointer of the original file_operations in conjunction with container_of() within their debugfs implementations. Reviewed-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-20Make __xfs_xattr_put_listen preperly report errors.Artem Savkov
commit 791cc43b36eb1f88166c8505900cad1b43c7fe1a upstream. Commit 2a6fba6 "xfs: only return -errno or success from attr ->put_listent" changes the returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient space in the buffer assuming that setting context->count to -1 would be enough, but all of the ->put_listent callers only check seen_enough. This results in a failed assertion: XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175 in insufficient buffer size case. This is only reproducible with at least 2 xattrs and only when the buffer gets depleted before the last one. Furthermore if buffersize is such that it is enough to hold the last xattr's name, but not enough to hold the sum of preceeding xattr names listxattr won't fail with ERANGE, but will suceed returning last xattr's name without the first character. The first character end's up overwriting data stored at (context->alist - 1). Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07aio: mark AIO pseudo-fs noexecJann Horn
commit 22f6b4d34fcf039c63a94e7670e0da24f8575a5a upstream. This ensures that do_mmap() won't implicitly make AIO memory mappings executable if the READ_IMPLIES_EXEC personality flag is set. Such behavior is problematic because the security_mmap_file LSM hook doesn't catch this case, potentially permitting an attacker to bypass a W^X policy enforced by SELinux. I have tested the patch on my machine. To test the behavior, compile and run this: #define _GNU_SOURCE #include <unistd.h> #include <sys/personality.h> #include <linux/aio_abi.h> #include <err.h> #include <stdlib.h> #include <stdio.h> #include <sys/syscall.h> int main(void) { personality(READ_IMPLIES_EXEC); aio_context_t ctx = 0; if (syscall(__NR_io_setup, 1, &ctx)) err(1, "io_setup"); char cmd[1000]; sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'", (int)getpid()); system(cmd); return 0; } In the output, "rw-s" is good, "rwxs" is bad. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07ceph: do not modify fi->frag in need_reset_readdir()Nicolas Iooss
commit 0f5aa88a7bb28b73253fb42b3df8202142769f39 upstream. Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |= fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a comparison operator with an assignment one. This looks like a typo which is reported by clang when building the kernel with some warning flags: fs/ceph/dir.c:600:22: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] } else if (fi->frag |= fpos_frag(new_pos)) { ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ fs/ceph/dir.c:600:22: note: place parentheses around the assignment to silence this warning } else if (fi->frag |= fpos_frag(new_pos)) { ^ ( ) fs/ceph/dir.c:600:22: note: use '!=' to turn this compound assignment into an inequality comparison } else if (fi->frag |= fpos_frag(new_pos)) { ^~ != Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07pNFS/flexfiles: Fix layoutstat periodic reportingTrond Myklebust
commit 1c8d477a77e2d1d3504419e7f2e02e6422becf9a upstream. Putting the periodicity timer in the mirror instances is causing non-scalable reporting behaviour and missed reporting intervals. When you recall layouts and/or implement client side mirroring, it leads to consecutive reports with only a few ms between RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: d0379a5d066a9 ("pNFS/flexfiles: Support server-supplied...") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07pNFS/flexfiles: Fix layoutcommit after a commit to DSTrond Myklebust
commit c001c87a63aa2f35358e33eb05e45e4cbcb34f54 upstream. We should always do a layoutcommit after commit to DS, except if the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07pNFS/files: Fix layoutcommit after a commit to DSTrond Myklebust
commit 73e6c5d854d3f7f75e8b46d3e54aeb5d83fe6b1f upstream. According to the errata https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751 we should always send layout commit after a commit to DS. Fixes: bc7d4b8fd091 ("nfs/filelayout: set layoutcommit...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07NFS: Don't drop CB requests with invalid principalsChuck Lever
commit a4e187d83d88eeaba6252aac0a2ffe5eaa73a818 upstream. Before commit 778be232a207 ("NFS do not find client in NFSv4 pg_authenticate"), the Linux callback server replied with RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB request. Let's restore that behavior so the server has a chance to do something useful about it, and provide a warning that helps admins correct the problem. Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07NFS/pnfs: Do not clobber existing pgio_done_cb in nfs4_proc_read_setupTrond Myklebust
commit ca857cc1d4cf17aba4bbb3b95d35454ad96924b3 upstream. If a pNFS client sets hdr->pgio_done_cb, then we should not overwrite that in nfs4_proc_read_setup() Fixes: 75bf47ebf6b5 ("pNFS/flexfile: Fix erroneous fall back to...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07pstore: drop file opened reference countGeliang Tang
commit 52d210d961a62a9662e27f14d6505d6741b2fb02 upstream. In ee1d267423a1 ("pstore: add pstore unregister") I added: .owner = THIS_MODULE, in both pstore_fs_type and pstore_file_operations to increase a reference count when pstore filesystem is mounted and pstore file is opened. But, it's repetitive. There is no need to increase the opened reference count. We only need to increase the mounted reference count. When a file is opened, the filesystem can't be unmounted. Hence the pstore module can't be unloaded either. So I drop the opened reference count in this patch. Fixes: ee1d267423a1 ("pstore: add pstore unregister") Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()Dan Carpenter
commit 8a545f185145e3c09348cd74326268ecfc6715a3 upstream. We can't pass error pointers to kfree() or it causes an oops. Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30ceph: Correctly return NXIO errors from ceph_llseekPhil Turnbull
commit 955818cd5b6c4b58ea574ace4573e7afa4c19c1e upstream. ceph_llseek does not correctly return NXIO errors because the 'out' path always returns 'offset'. Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30libceph: add an ONSTACK initializer for oidsIlya Dryomov
commit 281dbe5db81c6137def9757e07a7aea14b1ed86e upstream. An on-stack oid in ceph_ioctl_get_dataloc() is not initialized, resulting in a WARN and a NULL pointer dereference later on. We will have more of these on-stack in the future, so fix it with a convenience macro. Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30btrfs: ensure that file descriptor used with subvol ioctls is a dirJeff Mahoney
commit 325c50e3cebb9208009083e841550f98a863bfa0 upstream. If the subvol/snapshot create/destroy ioctls are passed a regular file with execute permissions set, we'll eventually Oops while trying to do inode->i_op->lookup via lookup_one_len. This patch ensures that the file descriptor refers to a directory. Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules) Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl) Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30configfs: Return -EFBIG from configfs_write_bin_file.Phil Turnbull
commit 42857cf512cb34c2c8cb50f1e766689d979d64e0 upstream. The check for writing more than cb_max_size bytes does not 'goto out' so it is a no-op which allows users to vmalloc an arbitrary amount. Fixes: 03607ace807b ("configfs: implement binary attributes") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30fanotify: fix list corruption in fanotify_get_response()Jan Kara
commit 96d41019e3ac55f6f0115b0ce97e4f24a3d636d2 upstream. fanotify_get_response() calls fsnotify_remove_event() when it finds that group is being released from fanotify_release() (bypass_perm is set). However the event it removes need not be only in the group's notification queue but it can have already moved to access_list (userspace read the event before closing the fanotify instance fd) which is protected by a different lock. Thus when fsnotify_remove_event() races with fanotify_release() operating on access_list, the list can get corrupted. Fix the problem by moving all the logic removing permission events from the lists to one place - fanotify_release(). Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events") Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30fsnotify: add a way to stop queueing events on group shutdownJan Kara
commit 12703dbfeb15402260e7554d32a34ac40c233990 upstream. Implement a function that can be called when a group is being shutdown to stop queueing new events to the group. Fanotify will use this. Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events") Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30xfs: prevent dropping ioend completions during buftarg waitBrian Foster
commit 800b2694f890cc35a1bda63501fc71c94389d517 upstream. xfs_wait_buftarg() waits for all pending I/O, drains the ioend completion workqueue and walks the LRU until all buffers in the cache have been released. This is traditionally an unmount operation` but the mechanism is also reused during filesystem freeze. xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce, which is intended more for a shutdown sequence in that it indicates to the queue that new operations are not expected once the drain has begun. New work jobs after this point result in a WARN_ON_ONCE() and are otherwise dropped. With filesystem freeze, however, read operations are allowed and can proceed during or after the workqueue drain. If such a read occurs during the drain sequence, the workqueue infrastructure complains about the queued ioend completion work item and drops it on the floor. As a result, the buffer remains on the LRU and the freeze never completes. Despite the fact that the overall buffer cache cleanup is not necessary during freeze, fix up this operation such that it is safe to invoke during non-unmount quiesce operations. Replace the drain_workqueue() call with flush_workqueue(), which runs a similar serialization on pending workqueue jobs without causing new jobs to be dropped. This is safe for unmount as unmount independently locks out new operations by the time xfs_wait_buftarg() is invoked. cc: <stable@vger.kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30autofs: use dentry flags to block walks during expireIan Kent
commit 7cbdb4a286a60c5d519cb9223fe2134d26870d39 upstream. Somewhere along the way the autofs expire operation has changed to hold a spin lock over expired dentry selection. The autofs indirect mount expired dentry selection is complicated and quite lengthy so it isn't appropriate to hold a spin lock over the operation. Commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") added a might_sleep() to dput() causing a WARN_ONCE() about this usage to be issued. But the spin lock doesn't need to be held over this check, the autofs dentry info. flags are enough to block walks into dentrys during the expire. I've left the direct mount expire as it is (for now) because it is much simpler and quicker than the indirect mount expire and adding spin lock release and re-aquires would do nothing more than add overhead. Fixes: 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") Link: http://lkml.kernel.org/r/20160912014017.1773.73060.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: Takashi Iwai <tiwai@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30Revert "ocfs2: bump up o2cb network protocol version"Junxiao Bi
commit 63b52c4936a2e679639c38ef51a50aa8ca1c5c07 upstream. This reverts commit 38b52efd218b ("ocfs2: bump up o2cb network protocol version"). This commit made rolling upgrade fail. When one node is upgraded to new version with this commit, the remaining nodes will fail to establish connections to it, then the application like VMs on the remaining nodes can't be live migrated to the upgraded one. This will cause an outage. Since negotiate hb timeout behavior didn't change without this commit, so revert it. Fixes: 38b52efd218bf ("ocfs2: bump up o2cb network protocol version") Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30ocfs2: fix start offset to ocfs2_zero_range_for_truncate()Ashish Samant
commit d21c353d5e99c56cdd5b5c1183ffbcaf23b8b960 upstream. If we punch a hole on a reflink such that following conditions are met: 1. start offset is on a cluster boundary 2. end offset is not on a cluster boundary 3. (end offset is somewhere in another extent) or (hole range > MAX_CONTIG_BYTES(1MB)), we dont COW the first cluster starting at the start offset. But in this case, we were wrongly passing this cluster to ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster in place and zero it in the source too. Fix this by skipping this cluster in such a scenario. To reproduce: 1. Create a random file of say 10 MB xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile 2. Reflink it reflink -f 10MBfile reflnktest 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can also use a range that will put the end offset in another extent. fallocate -p -o 0 -l 1048615 reflnktest 4. sync 5. Check the first cluster in the source file. (It will be zeroed out). dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reported-by: Saar Maoz <saar.maoz@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30ocfs2/dlm: fix race between convert and migrationJoseph Qi
commit e6f0c6e6170fec175fe676495f29029aecdf486c upstream. Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") checks if lockres master has changed to identify whether new master has finished recovery or not. This will introduce a race that right after old master does umount ( means master will change), a new convert request comes. In this case, it will reset lockres state to DLM_RECOVERING and then retry convert, and then fail with lockres->l_action being set to OCFS2_AST_INVALID, which will cause inconsistent lock level between ocfs2 and dlm, and then finally BUG. Since dlm recovery will clear lock->convert_pending in dlm_move_lockres_to_recovery_list, we can use it to correctly identify the race case between convert and recovery. So fix it. Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30reiserfs: fix "new_insert_key may be used uninitialized ..."Jeff Mahoney
commit 0a11b9aae49adf1f952427ef1a1d9e793dd6ffb6 upstream. new_insert_key only makes any sense when it's associated with a new_insert_ptr, which is initialized to NULL and changed to a buffer_head when we also initialize new_insert_key. We can key off of that to avoid the uninitialized warning. Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24fuse: direct-io: don't dirty ITER_BVEC pagesMiklos Szeredi
commit 8fba54aebbdf1f999738121922e74bf796ad60ee upstream. When reading from a loop device backed by a fuse file it deadlocks on lock_page(). This is because the page is already locked by the read() operation done on the loop device. In this case we don't want to either lock the page or dirty it. So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors. Reported-by: Sheng Yang <sheng@yasker.org> Fixes: aa4d86163e4e ("block: loop: switch to VFS ITER_BVEC") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Sheng Yang <sheng@yasker.org> Reviewed-by: Ashish Samant <ashish.samant@oracle.com> Tested-by: Sheng Yang <sheng@yasker.org> Tested-by: Ashish Samant <ashish.samant@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returnsChris Mason
commit cbd60aa7cd17d81a434234268c55192862147439 upstream. We use a btrfs_log_ctx structure to pass information into the tree log commit, and get error values out. It gets added to a per log-transaction list which we walk when things go bad. Commit d1433debe added an optimization to skip waiting for the log commit, but didn't take root_log_ctx out of the list. This patch makes sure we remove things before exiting. Signed-off-by: Chris Mason <clm@fb.com> Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24devpts: return NULL pts 'priv' entry for non-devpts nodesLinus Torvalds
commit 3e423945ea94412283eaba8bfbe9d6e0a80b434f upstream. In commit 8ead9dd54716 ("devpts: more pty driver interface cleanups") I made devpts_get_priv() just return the dentry->fs_data directly. And because I thought it wouldn't happen, I added a warning if you ever saw a pts node that wasn't on devpts. And no, that warning never triggered under any actual real use, but you can trigger it by creating nonsensical pts nodes by hand. So just revert the warning, and make devpts_get_priv() return NULL for that case like it used to. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: "Eric W Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24mm: introduce get_task_exe_fileMateusz Guzik
commit cd81a9170e69e018bbaba547c1fd85a585f5697a upstream. For more convenient access if one has a pointer to the task. As a minor nit take advantage of the fact that only task lock + rcu are needed to safely grab ->exe_file. This saves mm refcount dance. Use the helper in proc_exe_link. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24NFSv4.1: Fix the CREATE_SESSION slot number accountingTrond Myklebust
commit b519d408ea32040b1c7e10b155a3ee9a36660947 upstream. Ensure that we conform to the algorithm described in RFC5661, section 18.36.4 for when to bump the sequence id. In essence we do it for all cases except when the RPC call timed out, or in case of the server returning NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialisedTrond Myklebust
commit bf0291dd2267a2b9a4cd74d65249553d11bb45d6 upstream. According to RFC5661, the client is responsible for serialising LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case where we send both in parallel. Client Server ====== ====== LAYOUTGET(seqid=X) LAYOUTRETURN(seqid=X) LAYOUTGET return seqid=X+1 LAYOUTRETURN return seqid=X+2 Process LAYOUTRETURN Forget layout stateid Process LAYOUTGET Set seqid=X+1 The client processes the layoutget/layoutreturn in the wrong order, and since the result of the layoutreturn was to clear the only existing layout segment, the client forgets the layout stateid. When the LAYOUTGET comes in, it is treated as having a completely new stateid, and so the client sets the wrong sequence id... Fix is to check if there are outstanding LAYOUTGET requests before we send the LAYOUTRETURN (note that LAYOUGET will already wait if it sees an outstanding LAYOUTRETURN). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24nfsd: Close race between nfsd4_release_lockowner and nfsd4_lockChuck Lever
commit 885848186fbc2d1d8fb6d2fdc2156638ae289a46 upstream. nfsd4_release_lockowner finds a lock owner that has no lock state, and drops cl_lock. Then release_lockowner picks up cl_lock and unhashes the lock owner. During the window where cl_lock is dropped, I don't see anything preventing a concurrent nfsd4_lock from finding that same lock owner and adding lock state to it. Move release_lockowner() into nfsd4_release_lockowner and hang onto the cl_lock until after the lock owner's state cannot be found again. Found by inspection, we don't currently have a reproducer. Fixes: 2c41beb0e5cf ("nfsd: reduce cl_lock thrashing in ... ") Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24NFSv4.x: Fix a refcount leak in nfs_callback_up_netTrond Myklebust
commit 98b0f80c2396224bbbed81792b526e6c72ba9efa upstream. On error, the callers expect us to return without bumping nn->cb_users[]. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24NFSv4.1: Fix Oopsable condition in server callback racesTrond Myklebust
commit e09c978aae5bedfdb379be80363b024b7d82638b upstream. The slot table hasn't been an array since v3.7. Ensure that we use nfs4_lookup_slot() to access the slot correctly. Fixes: 87dda67e7386 ("NFSv4.1: Allow SEQUENCE to resize the slot table...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24pNFS: The client must not do I/O to the DS if it's lease has expiredTrond Myklebust
commit b88fa69eaa8649f11828158c7b65c4bcd886ebd5 upstream. Ensure that the client conforms to the normative behaviour described in RFC5661 Section 12.7.2: "If a client believes its lease has expired, it MUST NOT send I/O to the storage device until it has validated its lease." So ensure that we wait for the lease to be validated before using the layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>