summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2018-10-20Revert "vfs: fix freeze protection in mnt_want_write_file() for overlayfs"Greg Kroah-Hartman
This reverts commit 5e1002ab5c9bde81a0c1eed12f243987e98f7bd0 which was commit a6795a585929d94ca3e931bc8518f8deb8bbe627 upstream. Turns out this causes problems and was to fix a patch only in the 4.19 and newer tree. Reported-by: Amir Goldstein <amir73il@gmail.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18filesystem-dax: Fix dax_layout_busy_page() livelockDan Williams
commit d7782145e1ad537df4ce74e58c50f1f732a1462d upstream. In the presence of multi-order entries the typical pagevec_lookup_entries() pattern may loop forever: while (index < end && pagevec_lookup_entries(&pvec, mapping, index, min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) { ... for (i = 0; i < pagevec_count(&pvec); i++) { index = indices[i]; ... } index++; /* BUG */ } The loop updates 'index' for each index found and then increments to the next possible page to continue the lookup. However, if the last entry in the pagevec is multi-order then the next possible page index is more than 1 page away. Fix this locally for the filesystem-dax case by checking for dax-multi-order entries. Going forward new users of multi-order entries need to be similarly careful, or we need a generic way to report the page increment in the radix iterator. Fixes: 5fac7408d828 ("mm, fs, dax: handle layout changes to pinned dax...") Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18afs: Fix clearance of replyDavid Howells
commit f0a7d1883d9f78ae7bf15fc258bf9a2b20f35b76 upstream. The recent patch to fix the afs_server struct leak didn't actually fix the bug, but rather fixed some of the symptoms. The problem is that an asynchronous call that holds a resource pointed to by call->reply[0] will find the pointer cleared in the call destructor, thereby preventing the resource from being cleaned up. In the case of the server record leak, the afs_fs_get_capabilities() function in devel code sets up a call with reply[0] pointing at the server record that should be altered when the result is obtained, but this was being cleared before the destructor was called, so the put in the destructor does nothing and the record is leaked. Commit f014ffb025c1 removed the additional ref obtained by afs_install_server(), but the removal of this ref is actually used by the garbage collector to mark a server record as being defunct after the record has expired through lack of use. The offending clearance of call->reply[0] upon completion in afs_process_async_call() has been there from the origin of the code, but none of the asynchronous calls actually use that pointer currently, so it should be safe to remove (note that synchronous calls don't involve this function). Fix this by the following means: (1) Revert commit f014ffb025c1. (2) Remove the clearance of reply[0] from afs_process_async_call(). Without this, afs_manage_servers() will suffer an assertion failure if it sees a server record that didn't get used because the usage count is not 1. Fixes: f014ffb025c1 ("afs: Fix afs_server struct leak") Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.") Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18afs: Fix afs_server struct leakDavid Howells
commit f014ffb025c159fd51d19af8af0022a991aaa4f8 upstream. Fix a leak of afs_server structs. The routine that installs them in the various lookup lists and trees gets a ref on leaving the function, whether it added the server or a server already exists. It shouldn't increment the refcount if it added the server. The effect of this that "rmmod kafs" will hang waiting for the leaked server to become unused. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13ubifs: Check for name being NULL while mountingRichard Weinberger
commit 37f31b6ca4311b94d985fb398a72e5399ad57925 upstream. The requested device name can be NULL or an empty string. Check for that and refuse to continue. UBIFS has to do this manually since we cannot use mount_bdev(), which checks for this condition. Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13f2fs: fix invalid memory accessChao Yu
commit d3f07c049dab1a3f1740f476afd3d5e5b738c21c upstream. syzbot found the following crash on: HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000 kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.com loop7: rw=12288, want=8200, limit=20 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. openvswitch: netlink: Message has 8 unknown bytes. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860 f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883 mount_bdev+0x314/0x3e0 fs/super.c:1344 f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133 legacy_get_tree+0x131/0x460 fs/fs_context.c:729 vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743 do_new_mount fs/namespace.c:2603 [inline] do_mount+0x6f2/0x1e20 fs/namespace.c:2927 ksys_mount+0x12d/0x140 fs/namespace.c:3143 __do_sys_mount fs/namespace.c:3157 [inline] __se_sys_mount fs/namespace.c:3154 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45943a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0 RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013 R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace bd8550c129352286 ]--- RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 openvswitch: netlink: Message has 8 unknown bytes. R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 In validate_checkpoint(), if we failed to call get_checkpoint_version(), we will pass returned invalid page pointer into f2fs_put_page, cause accessing invalid memory, this patch tries to handle error path correctly to fix this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-13pstore/ram: Fix failure-path memory leak in ramoops_initKees Cook
commit bac6f6cda206ad7cbe0c73c35e494377ce9c4749 upstream. As reported by nixiaoming, with some minor clarifications: 1) memory leak in ramoops_register_dummy(): dummy_data = kzalloc(sizeof(*dummy_data), GFP_KERNEL); but no kfree() if platform_device_register_data() fails. 2) memory leak in ramoops_init(): Missing platform_device_unregister(dummy) and kfree(dummy_data) if platform_driver_register(&ramoops_driver) fails. I've clarified the purpose of ramoops_register_dummy(), and added a common cleanup routine for all three failure paths to call. Reported-by: nixiaoming <nixiaoming@huawei.com> Cc: stable@vger.kernel.org Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ocfs2: fix locking for res->tracking and dlm->tracking_listAshish Samant
commit cbe355f57c8074bc4f452e5b6e35509044c6fa23 upstream. In dlm_init_lockres() we access and modify res->tracking and dlm->tracking_list without holding dlm->track_lock. This can cause list corruptions and can end up in kernel panic. Fix this by locking res->tracking and dlm->tracking_list with dlm->track_lock instead of dlm->spinlock. Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10proc: restrict kernel stack dumps to rootJann Horn
commit f8a00cef17206ecd1b30d3d9f99e10d9fa707aa7 upstream. Currently, you can use /proc/self/task/*/stack to cause a stack walk on a task you control while it is running on another CPU. That means that the stack can change under the stack walker. The stack walker does have guards against going completely off the rails and into random kernel memory, but it can interpret random data from your kernel stack as instruction pointers and stack pointers. This can cause exposure of kernel stack contents to userspace. Restrict the ability to inspect kernel stacks of arbitrary tasks to root in order to prevent a local attacker from exploiting racy stack unwinding to leak kernel task stack contents. See the added comment for a longer rationale. There don't seem to be any users of this userspace API that can't gracefully bail out if reading from the file fails. Therefore, I believe that this change is unlikely to break things. In the case that this patch does end up needing a revert, the next-best solution might be to fake a single-entry stack based on wchan. Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10smb2: fix missing files in root share directory listingAurelien Aptel
commit 0595751f267994c3c7027377058e4185b3a28e75 upstream. When mounting a Windows share that is the root of a drive (eg. C$) the server does not return . and .. directory entries. This results in the smb2 code path erroneously skipping the 2 first entries. Pseudo-code of the readdir() code path: cifs_readdir(struct file, struct dir_context) initiate_cifs_search <-- if no reponse cached yet server->ops->query_dir_first dir_emit_dots dir_emit <-- adds "." and ".." if we're at pos=0 find_cifs_entry initiate_cifs_search <-- if pos < start of current response (restart search) server->ops->query_dir_next <-- if pos > end of current response (fetch next search res) for(...) <-- loops over cur response entries starting at pos cifs_filldir <-- skip . and .., emit entry cifs_fill_dirent dir_emit pos++ A) dir_emit_dots() always adds . & .. and sets the current dir pos to 2 (0 and 1 are done). Therefore we always want the index_to_find to be 2 regardless of if the response has . and .. B) smb1 code initializes index_of_last_entry with a +2 offset in cifssmb.c CIFSFindFirst(): psrch_inf->index_of_last_entry = 2 /* skip . and .. */ + psrch_inf->entries_in_buffer; Later in find_cifs_entry() we want to find the next dir entry at pos=2 as a result of (A) first_entry_in_buffer = cfile->srch_inf.index_of_last_entry - cfile->srch_inf.entries_in_buffer; This var is the dir pos that the first entry in the buffer will have therefore it must be 2 in the first call. If we don't offset index_of_last_entry by 2 (like in (B)), first_entry_in_buffer=0 but we were instructed to get pos=2 so this code in find_cifs_entry() skips the 2 first which is ok for non-root shares, as it skips . and .. from the response but is not ok for root shares where the 2 first are actual files pos_in_buf = index_to_find - first_entry_in_buffer; // pos_in_buf=2 // we skip 2 first response entries :( for (i = 0; (i < (pos_in_buf)) && (cur_ent != NULL); i++) { /* go entry by entry figuring out which is first */ cur_ent = nxt_dir_entry(cur_ent, end_of_smb, cfile->srch_inf.info_level); } C) cifs_filldir() skips . and .. so we can safely ignore them for now. Sample program: int main(int argc, char **argv) { const char *path = argc >= 2 ? argv[1] : "."; DIR *dh; struct dirent *de; printf("listing path <%s>\n", path); dh = opendir(path); if (!dh) { printf("opendir error %d\n", errno); return 1; } while (1) { de = readdir(dh); if (!de) { if (errno) { printf("readdir error %d\n", errno); return 1; } printf("end of listing\n"); break; } printf("off=%lu <%s>\n", de->d_off, de->d_name); } return 0; } Before the fix with SMB1 on root shares: <.> off=1 <..> off=2 <$Recycle.Bin> off=3 <bootmgr> off=4 and on non-root shares: <.> off=1 <..> off=4 <-- after adding .., the offsets jumps to +2 because <2536> off=5 we skipped . and .. from response buffer (C) <411> off=6 but still incremented pos <file> off=7 <fsx> off=8 Therefore the fix for smb2 is to mimic smb1 behaviour and offset the index_of_last_entry by 2. Test results comparing smb1 and smb2 before/after the fix on root share, non-root shares and on large directories (ie. multi-response dir listing): PRE FIX ======= pre-1-root VS pre-2-root: ERR pre-2-root is missing [bootmgr, $Recycle.Bin] pre-1-nonroot VS pre-2-nonroot: OK~ same files, same order, different offsets pre-1-nonroot-large VS pre-2-nonroot-large: OK~ same files, same order, different offsets POST FIX ======== post-1-root VS post-2-root: OK same files, same order, same offsets post-1-nonroot VS post-2-nonroot: OK same files, same order, same offsets post-1-nonroot-large VS post-2-nonroot-large: OK same files, same order, same offsets REGRESSION? =========== pre-1-root VS post-1-root: OK same files, same order, same offsets pre-1-nonroot VS post-1-nonroot: OK same files, same order, same offsets BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Paulo Alcantara <palcantara@suse.deR> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10sysfs: Do not return POSIX ACL xattrs via listxattrAndreas Gruenbacher
commit ffc4c92227db5699493e43eb140b4cb5904c30ff upstream. Commit 786534b92f3c introduced a regression that caused listxattr to return the POSIX ACL attribute names even though sysfs doesn't support POSIX ACLs. This happens because simple_xattr_list checks for NULL i_acl / i_default_acl, but inode_init_always initializes those fields to ACL_NOT_CACHED ((void *)-1). For example: $ getfattr -m- -d /sys /sys: system.posix_acl_access: Operation not supported /sys: system.posix_acl_default: Operation not supported Fix this in simple_xattr_list by checking if the filesystem supports POSIX ACLs. Fixes: 786534b92f3c ("tmpfs: listxattr should include POSIX ACL xattrs") Reported-by: Marc Aurèle La France <tsi@tuyoix.net> Tested-by: Marc Aurèle La France <tsi@tuyoix.net> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ovl: fix format of setxattr debugMiklos Szeredi
commit 1a8f8d2a443ef9ad9a3065ba8c8119df714240fa upstream. Format has a typo: it was meant to be "%.*s", not "%*s". But at some point callers grew nonprintable values as well, so use "%*pE" instead with a maximized length. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ovl: fix memory leak on unlink of indexed fileAmir Goldstein
commit 63e132528032ce937126aba591a7b37ec593a6bb upstream. The memory leak was detected by kmemleak when running xfstests overlay/051,053 Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ovl: fix access beyond unterminated stringsAmir Goldstein
commit 601350ff58d5415a001769532f6b8333820e5786 upstream. KASAN detected slab-out-of-bounds access in printk from overlayfs, because string format used %*s instead of %.*s. > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604 > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811 > > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36 ... > printk+0xa7/0xcf kernel/printk/printk.c:1996 > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689 Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ovl: set I_CREATING on inode being createdMiklos Szeredi
commit 6faf05c2b2b4fe70d9068067437649401531de0a upstream. ...otherwise there will be list corruption due to inode_sb_list_add() being called for inode already on the sb list. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: e950564b97fd ("vfs: don't evict uninitialized inode") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> To: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10vfs: don't evict uninitialized inodeMiklos Szeredi
commit e950564b97fd0f541b02eb207685d0746f5ecf29 upstream. iput() ends up calling ->evict() on new inode, which is not yet initialized by owning fs. So use destroy_inode() instead. Add to sb->s_inodes list only if inode is not in I_CREATING state (meaning that it wasn't allocated with new_inode(), which already does the insertion). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 80ea09a002bf ("vfs: factor out inode_insert5()") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10new primitive: discard_new_inode()Al Viro
commit c2b6d621c4ffe9936adf7a55c8b1c769672c306f upstream. We don't want open-by-handle picking half-set-up in-core struct inode from e.g. mkdir() having failed halfway through. In other words, we don't want such inodes returned by iget_locked() on their way to extinction. However, we can't just have them unhashed - otherwise open-by-handle immediately *after* that would've ended up creating a new in-core inode over the on-disk one that is in process of being freed right under us. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10cifs: read overflow in is_valid_oplock_break()Dan Carpenter
[ Upstream commit 097f5863b1a0c9901f180bbd56ae7d630655faaa ] We need to verify that the "data_offset" is within bounds. Reported-by: Dr Silvio Cesare of InfoSect <silvio.cesare@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10fs/cifs: suppress a string overflow warningStephen Rothwell
[ Upstream commit bcfb84a996f6fa90b5e6e2954b2accb7a4711097 ] A powerpc build of cifs with gcc v8.2.0 produces this warning: fs/cifs/cifssmb.c: In function ‘CIFSSMBNegotiate’: fs/cifs/cifssmb.c:605:3: warning: ‘strncpy’ writing 16 bytes into a region of size 1 overflows the destination [-Wstringop-overflow=] strncpy(pSMB->DialectsArray+count, protocols[i].name, 16); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since we are already doing a strlen() on the source, change the strncpy to a memcpy(). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10afs: Fix cell specification to permit an empty address listDavid Howells
[ Upstream commit ecfe951f0c1b169ea4b7dd6f3a404dfedd795bc2 ] Fix the cell specification mechanism to allow cells to be pre-created without having to specify at least one address (the addresses will be upcalled for). This allows the cell information preload service to avoid the need to issue loads of DNS lookups during boot to get the addresses for each cell (500+ lookups for the 'standard' cell list[*]). The lookups can be done later as each cell is accessed through the filesystem. Also remove the print statement that prints a line every time a new cell is added. [*] There are 144 cells in the list. Each cell is first looked up for an SRV record, and if that fails, for an AFSDB record. These get a list of server names, each of which then has to be looked up to get the addresses for that server. E.g.: dig srv _afs3-vlserver._udp.grand.central.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ceph: avoid a use-after-free in ceph_destroy_options()Ilya Dryomov
[ Upstream commit 8aaff15168cfbc7c8980fdb0e8a585f1afe56ec0 ] syzbot reported a use-after-free in ceph_destroy_options(), called from ceph_mount(). The problem was that create_fs_client() consumed the opt pointer on some errors, but not on all of them. Make sure it always consumes both libceph and ceph options. Reported-by: syzbot+8ab6f1042021b4eed062@syzkaller.appspotmail.com Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10fsnotify: fix ignore mask logic in fsnotify()Amir Goldstein
[ Upstream commit 9bdda4e9cf2dcecb60a0683b10ffb8cd7e5f2f45 ] Commit 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group()") acknoledges the use case of ignoring an event on an inode mark, because of an ignore mask on a mount mark of the same group (i.e. I want to get all events on this file, except for the events that came from that mount). This change depends on correctly merging the inode marks and mount marks group lists, so that the mount mark ignore mask would be tested in send_to_group(). Alas, the merging of the lists did not take into account the case where event in question is not in the mask of any of the mount marks. To fix this, completely remove the tests for inode and mount event masks from the lists merging code. Fixes: 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10fs/cifs: don't translate SFM_SLASH (U+F026) to backslashJon Kuhn
[ Upstream commit c15e3f19a6d5c89b1209dc94b40e568177cb0921 ] When a Mac client saves an item containing a backslash to a file server the backslash is represented in the CIFS/SMB protocol as as U+F026. Before this change, listing a directory containing an item with a backslash in its name will return that item with the backslash represented with a true backslash character (U+005C) because convert_sfm_character mapped U+F026 to U+005C when interpretting the CIFS/SMB protocol response. However, attempting to open or stat the path using a true backslash will result in an error because convert_to_sfm_char does not map U+005C back to U+F026 causing the CIFS/SMB request to be made with the backslash represented as U+005C. This change simply prevents the U+F026 to U+005C conversion from happenning. This is analogous to how the code does not do any translation of UNI_SLASH (U+F000). Signed-off-by: Jon Kuhn <jkuhn@barracuda.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10btrfs: btrfs_shrink_device should call commit transaction at the endAnand Jain
[ Upstream commit 801660b040d132f67fac6a95910ad307c5929b49 ] Test case btrfs/164 reports use-after-free: [ 6712.084324] general protection fault: 0000 [#1] PREEMPT SMP .. [ 6712.195423] btrfs_update_commit_device_size+0x75/0xf0 [btrfs] [ 6712.201424] btrfs_commit_transaction+0x57d/0xa90 [btrfs] [ 6712.206999] btrfs_rm_device+0x627/0x850 [btrfs] [ 6712.211800] btrfs_ioctl+0x2b03/0x3120 [btrfs] Reason for this is that btrfs_shrink_device adds the resized device to the fs_devices::resized_devices after it has called the last commit transaction. So the list fs_devices::resized_devices is not empty when btrfs_shrink_device returns. Now the parent function btrfs_rm_device calls: btrfs_close_bdev(device); call_rcu(&device->rcu, free_device_rcu); and then does the transactio ncommit. It goes through the fs_devices::resized_devices in btrfs_update_commit_device_size and leads to use-after-free. Fix this by making sure btrfs_shrink_device calls the last needed btrfs_commit_transaction before the return. This is consistent with what the grow counterpart does and this makes sure the on-disk state is persistent when the function returns. Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Tested-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10Btrfs: fix unexpected failure of nocow buffered writes after snapshotting ↵Robbie Ko
when low on space [ Upstream commit 8ecebf4d767e2307a946c8905278d6358eda35c3 ] Commit e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") forced nocow writes to fallback to COW, during writeback, when a snapshot is created. This resulted in writes made before creating the snapshot to unexpectedly fail with ENOSPC during writeback when success (0) was returned to user space through the write system call. The steps leading to this problem are: 1. When it's not possible to allocate data space for a write, the buffered write path checks if a NOCOW write is possible. If it is, it will not reserve space and success (0) is returned to user space. 2. Then when a snapshot is created, the root's will_be_snapshotted atomic is incremented and writeback is triggered for all inode's that belong to the root being snapshotted. Incrementing that atomic forces all previous writes to fallback to COW during writeback (running delalloc). 3. This results in the writeback for the inodes to fail and therefore setting the ENOSPC error in their mappings, so that a subsequent fsync on them will report the error to user space. So it's not a completely silent data loss (since fsync will report ENOSPC) but it's a very unexpected and undesirable behaviour, because if a clean shutdown/unmount of the filesystem happens without previous calls to fsync, it is expected to have the data present in the files after mounting the filesystem again. So fix this by adding a new atomic named snapshot_force_cow to the root structure which prevents this behaviour and works the following way: 1. It is incremented when we start to create a snapshot after triggering writeback and before waiting for writeback to finish. 2. This new atomic is now what is used by writeback (running delalloc) to decide whether we need to fallback to COW or not. Because we incremented this new atomic after triggering writeback in the snapshot creation ioctl, we ensure that all buffered writes that happened before snapshot creation will succeed and not fallback to COW (which would make them fail with ENOSPC). 3. The existing atomic, will_be_snapshotted, is kept because it is used to force new buffered writes, that start after we started snapshotting, to reserve data space even when NOCOW is possible. This makes these writes fail early with ENOSPC when there's no available space to allocate, preventing the unexpected behaviour of writeback later failing with ENOSPC due to a fallback to COW mode. Fixes: e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03isofs: reject hardware sector size > 2048 bytesEric Sandeen
[ Upstream commit 09a4e0be5826aa66c4ce9954841f110ffe63ef4f ] The largest block size supported by isofs is ISOFS_BLOCK_SIZE (2048), but isofs_fill_super calls sb_min_blocksize and sets the blocksize to the device's logical block size if it's larger than what we ended up with after option parsing. If for some reason we try to mount a hard 4k device as an isofs filesystem, we'll set opt.blocksize to 4096, and when we try to read the superblock we found via: block = iso_blknum << (ISOFS_BLOCK_BITS - s->s_blocksize_bits) with s_blocksize_bits greater than ISOFS_BLOCK_BITS, we'll have a negative shift and the bread will fail somewhat cryptically: isofs_fill_super: bread failed, dev=sda, iso_blknum=17, block=-2147483648 It seems best to just catch and clearly reject mounts of such a device. Reported-by: Bryan Gurney <bgurney@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03filesystem-dax: Fix use of zero pageMatthew Wilcox
commit b90ca5cc32f59bb214847c6855959702f00c6801 upstream. Use my_zero_pfn instead of ZERO_PAGE(), and pass the vaddr to it instead of zero so it works on MIPS and s390 who reference the vaddr to select a zero page. Cc: <stable@vger.kernel.org> Fixes: 91d25ba8a6b0 ("dax: use common 4k zero page for dax mmap reads") Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03ext2, dax: set ext2_dax_aops for dax filesToshi Kani
commit 9e796c9db93b4840d1b00e550eea26db7cb741e2 upstream. Sync syscall to DAX file needs to flush processor cache, but it currently does not flush to existing DAX files. This is because 'ext2_da_aops' is set to address_space_operations of existing DAX files, instead of 'ext2_dax_aops', since S_DAX flag is set after ext2_set_aops() in the open path. Similar to ext4, change ext2_iget() to initialize i_flags before ext2_set_aops(). Fixes: fb094c90748f ("ext2, dax: introduce ext2_dax_aops") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Jan Kara <jack@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03fs/lock: skip lock owner pid translation in case we are in init_pid_nsKonstantin Khorenko
[ Upstream commit 826d7bc9f013d01e92997883d2fd0c25f4af1f1c ] If the flock owner process is dead and its pid has been already freed, pid translation won't work, but we still want to show flock owner pid number when expecting /proc/$PID/fdinfo/$FD in init pidns. Reproducer: process A process A1 process A2 fork()---------> exit() open() flock() fork()---------> exit() sleep() Before the patch: ================ (root@vz7)/: cat /proc/${PID_A2}/fdinfo/3 pos: 4 flags: 02100002 mnt_id: 257 lock: (root@vz7)/: After the patch: =============== (root@vz7)/:cat /proc/${PID_A2}/fdinfo/3 pos: 4 flags: 02100002 mnt_id: 295 lock: 1: FLOCK ADVISORY WRITE ${PID_A1} b6:f8a61:529946 0 EOF Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks") Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Acked-by: Andrey Vagin <avagin@openvz.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03nfsd: fix corrupted reply to badly ordered compoundJ. Bruce Fields
[ Upstream commit 5b7b15aee641904ae269be9846610a3950cbd64c ] We're encoding a single op in the reply but leaving the number of ops zero, so the reply makes no sense. Somewhat academic as this isn't a case any real client will hit, though in theory perhaps that could change in a future protocol extension. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03iomap: complete partial direct I/O writes synchronouslyAndreas Gruenbacher
[ Upstream commit ebf00be37de35788cad72f4f20b4a39e30c0be4a ] According to xfstest generic/240, applications seem to expect direct I/O writes to either complete as a whole or to fail; short direct I/O writes are apparently not appreciated. This means that when only part of an asynchronous direct I/O write succeeds, we can either fail the entire write, or we can wait for the partial write to complete and retry the remaining write as buffered I/O. The old __blockdev_direct_IO helper has code for waiting for partial writes to complete; the new iomap_dio_rw iomap helper does not. The above mentioned fallback mode is needed for gfs2, which doesn't allow block allocations under direct I/O to avoid taking cluster-wide exclusive locks. As a consequence, an asynchronous direct I/O write to a file range that contains a hole will result in a short write. In that case, wait for the short write to complete to allow gfs2 to recover. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4, dax: set ext4_dax_aops for dax filesToshi Kani
commit cce6c9f7e6029caee45c459db5b3e78fec6973cb upstream. Sync syscall to DAX file needs to flush processor cache, but it currently does not flush to existing DAX files. This is because 'ext4_da_aops' is set to address_space_operations of existing DAX files, instead of 'ext4_dax_aops', since S_DAX flag is set after ext4_set_aops() in the open path. New file -------- lookup_open ext4_create __ext4_new_inode ext4_set_inode_flags // Set S_DAX flag ext4_set_aops // Set aops to ext4_dax_aops Existing file ------------- lookup_open ext4_lookup ext4_iget ext4_set_aops // Set aops to ext4_da_aops ext4_set_inode_flags // Set S_DAX flag Change ext4_iget() to initialize i_flags before ext4_set_aops(). Fixes: 5f0663bb4a64 ("ext4, dax: introduce ext4_dax_aops") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Suggested-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4, dax: add ext4_bmap to ext4_dax_aopsToshi Kani
commit 94dbb63117e82253c9592816aa4465f0a9c94850 upstream. Ext4 mount path calls .bmap to the journal inode. This currently works for the DAX mount case because ext4_iget() always set 'ext4_da_aops' to any regular files. In preparation to fix ext4_iget() to set 'ext4_dax_aops' for ext4 DAX files, add ext4_bmap() to 'ext4_dax_aops', since bmap works for DAX inodes. Fixes: 5f0663bb4a64 ("ext4, dax: introduce ext4_dax_aops") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Suggested-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: show test_dummy_encryption mount option in /proc/mountsEric Biggers
commit 338affb548c243d2af25b1ca628e67819350de6b upstream. When in effect, add "test_dummy_encryption" to _ext4_show_options() so that it is shown in /proc/mounts and other relevant procfs files. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: don't mark mmp buffer head dirtyLi Dongyang
commit fe18d649891d813964d3aaeebad873f281627fbc upstream. Marking mmp bh dirty before writing it will make writeback pick up mmp block later and submit a write, we don't want the duplicate write as kmmpd thread should have full control of reading and writing the mmp block. Another reason is we will also have random I/O error on the writeback request when blk integrity is enabled, because kmmpd could modify the content of the mmp block(e.g. setting new seq and time) while the mmp block is under I/O requested by writeback. Signed-off-by: Li Dongyang <dongyangli@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: fix online resizing for bigalloc file systems with a 1k block sizeTheodore Ts'o
commit 5f8c10936fab2b69a487400f2872902e597dd320 upstream. An online resize of a file system with the bigalloc feature enabled and a 1k block size would be refused since ext4_resize_begin() did not understand s_first_data_block is 0 for all bigalloc file systems, even when the block size is 1k. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: fix online resize's handling of a too-small final block groupTheodore Ts'o
commit f0a459dec5495a3580f8d784555e6f8f3bf7f263 upstream. Avoid growing the file system to an extent so that the last block group is too small to hold all of the metadata that must be stored in the block group. This problem can be triggered with the following reproducer: umount /mnt mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \ -E resize=1073741824 /tmp/foo.img 128M mount /tmp/foo.img /mnt truncate --size 1708M /tmp/foo.img resize2fs /dev/loop0 295400 umount /mnt e2fsck -fy /tmp/foo.img Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: recalucate superblock checksum after updating free blocks/inodesTheodore Ts'o
commit 4274f516d4bc50648a4d97e4f67ecbd7b65cde4a upstream. When mounting the superblock, ext4_fill_super() calculates the free blocks and free inodes and stores them in the superblock. It's not strictly necessary, since we don't use them any more, but it's nice to keep them roughly aligned to reality. Since it's not critical for file system correctness, the code doesn't call ext4_commit_super(). The problem is that it's in ext4_commit_super() that we recalculate the superblock checksum. So if we're not going to call ext4_commit_super(), we need to call ext4_superblock_csum_set() to make sure the superblock checksum is consistent. Most of the time, this doesn't matter, since we end up calling ext4_commit_super() very soon thereafter, and definitely by the time the file system is unmounted. However, it doesn't work in this sequence: mke2fs -Fq -t ext4 /dev/vdc 128M mount /dev/vdc /vdc cp xfstests/git-versions /vdc godown /vdc umount /vdc mount /dev/vdc tune2fs -l /dev/vdc With this commit, the "tune2fs -l" no longer fails. Reported-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: avoid arithemetic overflow that can trigger a BUGTheodore Ts'o
commit bcd8e91f98c156f4b1ebcfacae675f9cfd962441 upstream. A maliciously crafted file system can cause an overflow when the results of a 64-bit calculation is stored into a 32-bit length parameter. https://bugzilla.kernel.org/show_bug.cgi?id=200623 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: avoid divide by zero fault when deleting corrupted inline directoriesTheodore Ts'o
commit 4d982e25d0bdc83d8c64e66fdeca0b89240b3b85 upstream. A specially crafted file system can trick empty_inline_dir() into reading past the last valid entry in a inline directory, and then run into the end of xattr marker. This will trigger a divide by zero fault. Fix this by using the size of the inline directory instead of dir->i_size. Also clean up error reporting in __ext4_check_dir_entry so that the message is clearer and more understandable --- and avoids the division by zero trap if the size passed in is zero. (I'm not sure why we coded it that way in the first place; printing offset % size is actually more confusing and less useful.) https://bugzilla.kernel.org/show_bug.cgi?id=200933 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ext4: check to make sure the rename(2)'s destination is not freedTheodore Ts'o
commit b50282f3241acee880514212d88b6049fb5039c8 upstream. If the destination of the rename(2) system call exists, the inode's link count (i_nlinks) must be non-zero. If it is, the inode can end up on the orphan list prematurely, leading to all sorts of hilarity, including a use-after-free. https://bugzilla.kernel.org/show_bug.cgi?id=200931 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29ocfs2: fix ocfs2 read block panicJunxiao Bi
commit 234b69e3e089d850a98e7b3145bd00e9b52b1111 upstream. While reading block, it is possible that io error return due to underlying storage issue, in this case, BH_NeedsValidate was left in the buffer head. Then when reading the very block next time, if it was already linked into journal, that will trigger the following panic. [203748.702517] kernel BUG at fs/ocfs2/buffer_head_io.c:342! [203748.702533] invalid opcode: 0000 [#1] SMP [203748.702561] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc dm_switch dm_queue_length dm_multipath bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif i2c_core ipmi_si ipmi_msghandler acpi_pad pcspkr sb_edac edac_core lpc_ich mfd_core shpchp sg tg3 ptp pps_core ext4 jbd2 mbcache2 sr_mod cdrom sd_mod ahci libahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [203748.703024] CPU: 7 PID: 38369 Comm: touch Not tainted 4.1.12-124.18.6.el6uek.x86_64 #2 [203748.703045] Hardware name: Dell Inc. PowerEdge R620/0PXXHP, BIOS 2.5.2 01/28/2015 [203748.703067] task: ffff880768139c00 ti: ffff88006ff48000 task.ti: ffff88006ff48000 [203748.703088] RIP: 0010:[<ffffffffa05e9f09>] [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2] [203748.703130] RSP: 0018:ffff88006ff4b818 EFLAGS: 00010206 [203748.703389] RAX: 0000000008620029 RBX: ffff88006ff4b910 RCX: 0000000000000000 [203748.703885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000023079fe [203748.704382] RBP: ffff88006ff4b8d8 R08: 0000000000000000 R09: ffff8807578c25b0 [203748.704877] R10: 000000000f637376 R11: 000000003030322e R12: 0000000000000000 [203748.705373] R13: ffff88006ff4b910 R14: ffff880732fe38f0 R15: 0000000000000000 [203748.705871] FS: 00007f401992c700(0000) GS:ffff880bfebc0000(0000) knlGS:0000000000000000 [203748.706370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [203748.706627] CR2: 00007f4019252440 CR3: 00000000a621e000 CR4: 0000000000060670 [203748.707124] Stack: [203748.707371] ffff88006ff4b828 ffffffffa0609f52 ffff88006ff4b838 0000000000000001 [203748.707885] 0000000000000000 0000000000000000 ffff880bf67c3800 ffffffffa05eca00 [203748.708399] 00000000023079ff ffffffff81c58b80 0000000000000000 0000000000000000 [203748.708915] Call Trace: [203748.709175] [<ffffffffa0609f52>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2] [203748.709680] [<ffffffffa05eca00>] ? ocfs2_empty_dir_filldir+0x80/0x80 [ocfs2] [203748.710185] [<ffffffffa05ec0cb>] ocfs2_read_dir_block_direct+0x3b/0x200 [ocfs2] [203748.710691] [<ffffffffa05f0fbf>] ocfs2_prepare_dx_dir_for_insert.isra.57+0x19f/0xf60 [ocfs2] [203748.711204] [<ffffffffa065660f>] ? ocfs2_metadata_cache_io_unlock+0x1f/0x30 [ocfs2] [203748.711716] [<ffffffffa05f4f3a>] ocfs2_prepare_dir_for_insert+0x13a/0x890 [ocfs2] [203748.712227] [<ffffffffa05f442e>] ? ocfs2_check_dir_for_entry+0x8e/0x140 [ocfs2] [203748.712737] [<ffffffffa061b2f2>] ocfs2_mknod+0x4b2/0x1370 [ocfs2] [203748.713003] [<ffffffffa061c385>] ocfs2_create+0x65/0x170 [ocfs2] [203748.713263] [<ffffffff8121714b>] vfs_create+0xdb/0x150 [203748.713518] [<ffffffff8121b225>] do_last+0x815/0x1210 [203748.713772] [<ffffffff812192e9>] ? path_init+0xb9/0x450 [203748.714123] [<ffffffff8121bca0>] path_openat+0x80/0x600 [203748.714378] [<ffffffff811bcd45>] ? handle_pte_fault+0xd15/0x1620 [203748.714634] [<ffffffff8121d7ba>] do_filp_open+0x3a/0xb0 [203748.714888] [<ffffffff8122a767>] ? __alloc_fd+0xa7/0x130 [203748.715143] [<ffffffff81209ffc>] do_sys_open+0x12c/0x220 [203748.715403] [<ffffffff81026ddb>] ? syscall_trace_enter_phase1+0x11b/0x180 [203748.715668] [<ffffffff816f0c9f>] ? system_call_after_swapgs+0xe9/0x190 [203748.715928] [<ffffffff8120a10e>] SyS_open+0x1e/0x20 [203748.716184] [<ffffffff816f0d5e>] system_call_fastpath+0x18/0xd7 [203748.716440] Code: 00 00 48 8b 7b 08 48 83 c3 10 45 89 f8 44 89 e1 44 89 f2 4c 89 ee e8 07 06 11 e1 48 8b 03 48 85 c0 75 df 8b 5d c8 e9 4d fa ff ff <0f> 0b 48 8b 7d a0 e8 dc c6 06 00 48 b8 00 00 00 00 00 00 00 10 [203748.717505] RIP [<ffffffffa05e9f09>] ocfs2_read_blocks+0x669/0x7f0 [ocfs2] [203748.717775] RSP <ffff88006ff4b818> Joesph ever reported a similar panic. Link: https://oss.oracle.com/pipermail/ocfs2-devel/2013-May/008931.html Link: http://lkml.kernel.org/r/20180912063207.29484-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29Revert "ubifs: xattr: Don't operate on deleted inodes"Richard Weinberger
commit f061c1cc404a618858a77aea233fde0aeaad2f2d upstream. This reverts commit 11a6fc3dc743e22fb50f2196ec55bee5140d3c52. UBIFS wants to assert that xattr operations are only issued on files with positive link count. The said patch made this operations return -ENOENT for unlinked files such that the asserts will no longer trigger. This was wrong since xattr operations are perfectly fine on unlinked files. Instead the assertions need to be fixed/removed. Cc: <stable@vger.kernel.org> Fixes: 11a6fc3dc743 ("ubifs: xattr: Don't operate on deleted inodes") Reported-by: Koen Vandeputte <koen.vandeputte@ncentric.com> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26f2fs: do checkpoint in kill_sbJaegeuk Kim
[ Upstream commit 1cb50f87e10696e8cc61fb62d0d948e11b0e6dc1 ] When unmounting f2fs in force mode, we can get it stuck by io_schedule() by some pending IOs in meta_inode. io_schedule+0xd/0x30 wait_on_page_bit_common+0xc6/0x130 __filemap_fdatawait_range+0xbd/0x100 filemap_fdatawait_keep_errors+0x15/0x40 sync_inodes_sb+0x1cf/0x240 sync_filesystem+0x52/0x90 generic_shutdown_super+0x1d/0x110 kill_f2fs_super+0x28/0x80 [f2fs] deactivate_locked_super+0x35/0x60 cleanup_mnt+0x36/0x70 task_work_run+0x79/0xa0 exit_to_usermode_loop+0x62/0x70 do_syscall_64+0xdb/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 0xffffffffffffffff Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26configfs: fix registered group removalMike Christie
[ Upstream commit cc57c07343bd071cdf1915a91a24ab7d40c9b590 ] This patch fixes a bug where configfs_register_group had added a group in a tree, and userspace has done a rmdir on a dir somewhere above that group and we hit a kernel crash. The problem is configfs_rmdir will detach everything under it and unlink groups on the default_groups list. It will not unlink groups added with configfs_register_group so when configfs_unregister_group is called to drop its references to the group/items we crash when we try to access the freed dentrys. The patch just adds a check for if a rmdir has been done above us and if so just does the unlink part of unregistration. Sorry if you are getting this multiple times. I thouhgt I sent this to some of you and lkml, but I do not see it. Signed-off-by: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26vfs: fix freeze protection in mnt_want_write_file() for overlayfsMiklos Szeredi
[ Upstream commit a6795a585929d94ca3e931bc8518f8deb8bbe627 ] The underlying real file used by overlayfs still contains the overlay path. This results in mnt_want_write_file() calls by the filesystem getting freeze protection on the wrong inode (the overlayfs one instead of the real one). Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26binfmt_elf: Respect error return from `regset->active'Maciej W. Rozycki
[ Upstream commit 2f819db565e82e5f73cd42b39925098986693378 ] The regset API documented in <linux/regset.h> defines -ENODEV as the result of the `->active' handler to be used where the feature requested is not available on the hardware found. However code handling core file note generation in `fill_thread_core_info' interpretes any non-zero result from the `->active' handler as the regset requested being active. Consequently processing continues (and hopefully gracefully fails later on) rather than being abandoned right away for the regset requested. Fix the problem then by making the code proceed only if a positive result is returned from the `->active' handler. Signed-off-by: Maciej W. Rozycki <macro@mips.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 4206d3aa1978 ("elf core dump: notes user_regset") Patchwork: https://patchwork.linux-mips.org/patch/19332/ Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26NFSv4.1 fix infinite loop on I/O.Trond Myklebust
commit 994b15b983a72e1148a173b61e5b279219bb45ae upstream. The previous fix broke recovery of delegated stateids because it assumes that if we did not mark the delegation as suspect, then the delegation has effectively been revoked, and so it removes that delegation irrespectively of whether or not it is valid and still in use. While this is "mostly harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning in an infinite loop while complaining that we're using an invalid stateid (in this case the all-zero stateid). What we rather want to do here is ensure that the delegation is always correctly marked as needing testing when that is the case. So we want to close the loophole offered by nfs4_schedule_stateid_recovery(), which marks the state as needing to be reclaimed, but not the delegation that may be backing it. Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26NFSv4: Fix a tracepoint Oops in initiate_file_draining()Trond Myklebust
commit 2a534a7473bf4e7f1c12805113f80c795fc8e89a upstream. Now that the value of 'ino' can be NULL or an ERR_PTR(), we need to change the test in the tracepoint. Fixes: ce5624f7e6675 ("NFSv4: Return NFS4ERR_DELAY when a layout fails...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26cifs: integer overflow in in SMB2_ioctl()Dan Carpenter
commit 2d204ee9d671327915260071c19350d84344e096 upstream. The "le32_to_cpu(rsp->OutputOffset) + *plen" addition can overflow and wrap around to a smaller value which looks like it would lead to an information leak. Fixes: 4a72dafa19ba ("SMB2 FSCTL and IOCTL worker function") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>