summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2018-07-11f2fs: truncate preallocated blocks in error caseJaegeuk Kim
commit dc7a10ddee0c56c6d891dd18de5c4ee9869545e0 upstream. If write is failed, we must deallocate the blocks that we couldn't write. Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: check superblock mapped prior to committingJon Derrick
commit a17712c8e4be4fa5404d20e9cd3b2b21eae7bc56 upstream. This patch attempts to close a hole leading to a BUG seen with hot removals during writes [1]. A block device (NVME namespace in this test case) is formatted to EXT4 without partitions. It's mounted and write I/O is run to a file, then the device is hot removed from the slot. The superblock attempts to be written to the drive which is no longer present. The typical chain of events leading to the BUG: ext4_commit_super() __sync_dirty_buffer() submit_bh() submit_bh_wbc() BUG_ON(!buffer_mapped(bh)); This fix checks for the superblock's buffer head being mapped prior to syncing. [1] https://www.spinics.net/lists/linux-ext4/msg56527.html Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: add more mount time checks of the superblockTheodore Ts'o
commit bfe0a5f47ada40d7984de67e59a7d3390b9b9ecc upstream. The kernel's ext4 mount-time checks were more permissive than e2fsprogs's libext2fs checks when opening a file system. The superblock is considered too insane for debugfs or e2fsck to operate on it, the kernel has no business trying to mount it. This will make file system fuzzing tools work harder, but the failure cases that they find will be more useful and be easier to evaluate. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: add more inode number paranoia checksTheodore Ts'o
commit c37e9e013469521d9adb932d17a1795c139b36db upstream. If there is a directory entry pointing to a system inode (such as a journal inode), complain and declare the file system to be corrupted. Also, if the superblock's first inode number field is too small, refuse to mount the file system. This addresses CVE-2018-10882. https://bugzilla.kernel.org/show_bug.cgi?id=200069 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: avoid running out of journal credits when appending to an inline fileTheodore Ts'o
commit 8bc1379b82b8e809eef77a9fedbb75c6c297be19 upstream. Use a separate journal transaction if it turns out that we need to convert an inline file to use an data block. Otherwise we could end up failing due to not having journal credits. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: never move the system.data xattr out of the inode bodyTheodore Ts'o
commit 8cdb5240ec5928b20490a2bb34cb87e9a5f40226 upstream. When expanding the extra isize space, we must never move the system.data xattr out of the inode body. For performance reasons, it doesn't make any sense, and the inline data implementation assumes that system.data xattr is never in the external xattr block. This addresses CVE-2018-10880 https://bugzilla.kernel.org/show_bug.cgi?id=200005 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: clear i_data in ext4_inode_info when removing inline dataTheodore Ts'o
commit 6e8ab72a812396996035a37e5ca4b3b99b5d214b upstream. When converting from an inode from storing the data in-line to a data block, ext4_destroy_inline_data_nolock() was only clearing the on-disk copy of the i_blocks[] array. It was not clearing copy of the i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually used by ext4_map_blocks(). This didn't matter much if we are using extents, since the extents header would be invalid and thus the extents could would re-initialize the extents tree. But if we are using indirect blocks, the previous contents of the i_blocks array will be treated as block numbers, with potentially catastrophic results to the file system integrity and/or user data. This gets worse if the file system is using a 1k block size and s_first_data is zero, but even without this, the file system can get quite badly corrupted. This addresses CVE-2018-10881. https://bugzilla.kernel.org/show_bug.cgi?id=200015 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: include the illegal physical block in the bad map ext4_error msgTheodore Ts'o
commit bdbd6ce01a70f02e9373a584d0ae9538dcf0a121 upstream. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: verify the depth of extent tree in ext4_find_extent()Theodore Ts'o
commit bc890a60247171294acc0bd67d211fa4b88d40ba upstream. If there is a corupted file system where the claimed depth of the extent tree is -1, this can cause a massive buffer overrun leading to sadness. This addresses CVE-2018-10877. https://bugzilla.kernel.org/show_bug.cgi?id=199417 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: only look at the bg_flags field if it is validTheodore Ts'o
commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream. The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this. Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected. This addresses CVE-2018-10876. https://bugzilla.kernel.org/show_bug.cgi?id=199403 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: always check block group bounds in ext4_init_block_bitmap()Theodore Ts'o
commit 819b23f1c501b17b9694325471789e6b5cc2d0d2 upstream. Regardless of whether the flex_bg feature is set, we should always check to make sure the bits we are setting in the block bitmap are within the block group bounds. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: make sure bitmaps and the inode table don't overlap with bg descriptorsTheodore Ts'o
commit 77260807d1170a8cf35dbb06e07461a655f67eee upstream. It's really bad when the allocation bitmaps and the inode table overlap with the block group descriptors, since it causes random corruption of the bg descriptors. So we really want to head those off at the pass. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: always verify the magic number in xattr blocksTheodore Ts'o
commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream. If there an inode points to a block which is also some other type of metadata block (such as a block allocation bitmap), the buffer_verified flag can be set when it was validated as that other metadata block type; however, it would make a really terrible external attribute block. The reason why we use the verified flag is to avoid constantly reverifying the block. However, it doesn't take much overhead to make sure the magic number of the xattr block is correct, and this will avoid potential crashes. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=200001 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11ext4: add corruption check in ext4_xattr_set_entry()Theodore Ts'o
commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream. In theory this should have been caught earlier when the xattr list was verified, but in case it got missed, it's simple enough to add check to make sure we don't overrun the xattr buffer. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=200001 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11jbd2: don't mark block as modified if the handle is out of creditsTheodore Ts'o
commit e09463f220ca9a1a1ecfda84fcda658f99a1f12a upstream. Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE settingStefano Brivio
commit f46ecbd97f508e68a7806291a139499794874f3d upstream. A "small" CIFS buffer is not big enough in general to hold a setacl request for SMB2, and we end up overflowing the buffer in send_set_info(). For instance: # mount.cifs //127.0.0.1/test /mnt/test -o username=test,password=test,nounix,cifsacl # touch /mnt/test/acltest # getcifsacl /mnt/test/acltest REVISION:0x1 CONTROL:0x9004 OWNER:S-1-5-21-2926364953-924364008-418108241-1000 GROUP:S-1-22-2-1001 ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff ACL:S-1-22-2-1001:ALLOWED/0x0/R ACL:S-1-22-2-1001:ALLOWED/0x0/R ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff ACL:S-1-1-0:ALLOWED/0x0/R # setcifsacl -a "ACL:S-1-22-2-1004:ALLOWED/0x0/R" /mnt/test/acltest this setacl will cause the following KASAN splat: [ 330.777927] BUG: KASAN: slab-out-of-bounds in send_set_info+0x4dd/0xc20 [cifs] [ 330.779696] Write of size 696 at addr ffff88010d5e2860 by task setcifsacl/1012 [ 330.781882] CPU: 1 PID: 1012 Comm: setcifsacl Not tainted 4.18.0-rc2+ #2 [ 330.783140] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 330.784395] Call Trace: [ 330.784789] dump_stack+0xc2/0x16b [ 330.786777] print_address_description+0x6a/0x270 [ 330.787520] kasan_report+0x258/0x380 [ 330.788845] memcpy+0x34/0x50 [ 330.789369] send_set_info+0x4dd/0xc20 [cifs] [ 330.799511] SMB2_set_acl+0x76/0xa0 [cifs] [ 330.801395] set_smb2_acl+0x7ac/0xf30 [cifs] [ 330.830888] cifs_xattr_set+0x963/0xe40 [cifs] [ 330.840367] __vfs_setxattr+0x84/0xb0 [ 330.842060] __vfs_setxattr_noperm+0xe6/0x370 [ 330.843848] vfs_setxattr+0xc2/0xd0 [ 330.845519] setxattr+0x258/0x320 [ 330.859211] path_setxattr+0x15b/0x1b0 [ 330.864392] __x64_sys_setxattr+0xc0/0x160 [ 330.866133] do_syscall_64+0x14e/0x4b0 [ 330.876631] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 330.878503] RIP: 0033:0x7ff2e507db0a [ 330.880151] Code: 48 8b 0d 89 93 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 bc 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 93 2c 00 f7 d8 64 89 01 48 [ 330.885358] RSP: 002b:00007ffdc4903c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc [ 330.887733] RAX: ffffffffffffffda RBX: 000055d1170de140 RCX: 00007ff2e507db0a [ 330.890067] RDX: 000055d1170de7d0 RSI: 000055d115b39184 RDI: 00007ffdc4904818 [ 330.892410] RBP: 0000000000000001 R08: 0000000000000000 R09: 000055d1170de7e4 [ 330.894785] R10: 00000000000002b8 R11: 0000000000000246 R12: 0000000000000007 [ 330.897148] R13: 000055d1170de0c0 R14: 0000000000000008 R15: 000055d1170de550 [ 330.901057] Allocated by task 1012: [ 330.902888] kasan_kmalloc+0xa0/0xd0 [ 330.904714] kmem_cache_alloc+0xc8/0x1d0 [ 330.906615] mempool_alloc+0x11e/0x380 [ 330.908496] cifs_small_buf_get+0x35/0x60 [cifs] [ 330.910510] smb2_plain_req_init+0x4a/0xd60 [cifs] [ 330.912551] send_set_info+0x198/0xc20 [cifs] [ 330.914535] SMB2_set_acl+0x76/0xa0 [cifs] [ 330.916465] set_smb2_acl+0x7ac/0xf30 [cifs] [ 330.918453] cifs_xattr_set+0x963/0xe40 [cifs] [ 330.920426] __vfs_setxattr+0x84/0xb0 [ 330.922284] __vfs_setxattr_noperm+0xe6/0x370 [ 330.924213] vfs_setxattr+0xc2/0xd0 [ 330.926008] setxattr+0x258/0x320 [ 330.927762] path_setxattr+0x15b/0x1b0 [ 330.929592] __x64_sys_setxattr+0xc0/0x160 [ 330.931459] do_syscall_64+0x14e/0x4b0 [ 330.933314] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 330.936843] Freed by task 0: [ 330.938588] (stack is not available) [ 330.941886] The buggy address belongs to the object at ffff88010d5e2800 which belongs to the cache cifs_small_rq of size 448 [ 330.946362] The buggy address is located 96 bytes inside of 448-byte region [ffff88010d5e2800, ffff88010d5e29c0) [ 330.950722] The buggy address belongs to the page: [ 330.952789] page:ffffea0004357880 count:1 mapcount:0 mapping:ffff880108fdca80 index:0x0 compound_mapcount: 0 [ 330.955665] flags: 0x17ffffc0008100(slab|head) [ 330.957760] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880108fdca80 [ 330.960356] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 330.963005] page dumped because: kasan: bad access detected [ 330.967039] Memory state around the buggy address: [ 330.969255] ffff88010d5e2880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 330.971833] ffff88010d5e2900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 330.974397] >ffff88010d5e2980: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 330.976956] ^ [ 330.979226] ffff88010d5e2a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 330.981755] ffff88010d5e2a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 330.984225] ================================================================== Fix this by allocating a regular CIFS buffer in smb2_plain_req_init() if the request command is SMB2_SET_INFO. Reported-by: Jianhong Yin <jiyin@redhat.com> Fixes: 366ed846df60 ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function") CC: Stable <stable@vger.kernel.org> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-and-tested-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11cifs: Fix infinite loop when using hard mount optionPaulo Alcantara
commit 7ffbe65578b44fafdef577a360eb0583929f7c6e upstream. For every request we send, whether it is SMB1 or SMB2+, we attempt to reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying out the request. So, while server->tcpStatus != CifsNeedReconnect, we wait for the reconnection to succeed on wait_event_interruptible_timeout(). If it returns, that means that either the condition was evaluated to true, or timeout elapsed, or it was interrupted by a signal. Since we're not handling the case where the process woke up due to a received signal (-ERESTARTSYS), the next call to wait_event_interruptible_timeout() will _always_ fail and we end up looping forever inside either cifs_reconnect_tcon() or smb2_reconnect(). Here's an example of how to trigger that: $ mount.cifs //foo/share /mnt/test -o username=foo,password=foo,vers=1.0,hard (break connection to server before executing bellow cmd) $ stat -f /mnt/test & sleep 140 [1] 2511 $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 0.0 0.0 12892 1008 pts/0 S 12:24 0:00 stat -f /mnt/test $ kill -9 2511 (wait for a while; process is stuck in the kernel) $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 83.2 0.0 12892 1008 pts/0 R 12:24 30:01 stat -f /mnt/test By using 'hard' mount point means that cifs.ko will keep retrying indefinitely, however we must allow the process to be killed otherwise it would hang the system. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Cc: stable@vger.kernel.org Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11cifs: Fix memory leak in smb2_set_ea()Paulo Alcantara
commit 6aa0c114eceec8cc61715f74a4ce91b048d7561c upstream. This patch fixes a memory leak when doing a setxattr(2) in SMB2+. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11cifs: Fix use after free of a mid_q_entryLars Persson
commit 696e420bb2a6624478105651d5368d45b502b324 upstream. With protocol version 2.0 mounts we have seen crashes with corrupt mid entries. Either the server->pending_mid_q list becomes corrupt with a cyclic reference in one element or a mid object fetched by the demultiplexer thread becomes overwritten during use. Code review identified a race between the demultiplexer thread and the request issuing thread. The demultiplexer thread seems to be written with the assumption that it is the sole user of the mid object until it calls the mid callback which either wakes the issuer task or deletes the mid. This assumption is not true because the issuer task can be woken up earlier by a signal. If the demultiplexer thread has proceeded as far as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer thread will happily end up calling cifs_delete_mid while the demultiplexer thread still is using the mid object. Inserting a delay in the cifs demultiplexer thread widens the race window and makes reproduction of the race very easy: if (server->large_buf) buf = server->bigbuf; + usleep_range(500, 4000); server->lstrp = jiffies; To resolve this I think the proper solution involves putting a reference count on the mid object. This patch makes sure that the demultiplexer thread holds a reference until it has finished processing the transaction. Cc: stable@vger.kernel.org Signed-off-by: Lars Persson <larper@axis.com> Acked-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-11userfaultfd: hugetlbfs: fix userfaultfd_huge_must_wait() pte accessJanosch Frank
commit 1e2c043628c7736dd56536d16c0ce009bc834ae7 upstream. Use huge_ptep_get() to translate huge ptes to normal ptes so we can check them with the huge_pte_* functions. Otherwise some architectures will check the wrong values and will not wait for userspace to bring in the memory. Link: http://lkml.kernel.org/r/20180626132421.78084-1-frankja@linux.ibm.com Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges") Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-08fs: clear writeback errors in inode_init_alwaysDarrick J. Wong
[ Upstream commit 829bc787c1a0403e4d886296dd4d90c5f9c1744a ] In inode_init_always(), we clear the inode mapping flags, which clears any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not also clear wb_err, which means that old mapping errors can leak through to new inodes. This is crucial for the XFS inode allocation path because we recycle old in-core inodes and we do not want error state from an old file to leak into the new file. This bug was discovered by running generic/036 and generic/047 in a loop and noticing that the EIOs generated by the collision of direct and buffered writes in generic/036 would survive the remount between 036 and 047, and get reported to the fsyncs (on different files!) in generic/047. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-08afs: Fix directory permissions checkDavid Howells
[ Upstream commit 378831e4daec75fbba6d3612bcf3b4dd00ddbf08 ] Doing faccessat("/afs/some/directory", 0) triggers a BUG in the permissions check code. Fix this by just removing the BUG section. If no permissions are asked for, just return okay if the file exists. Also: (1) Split up the directory check so that it has separate if-statements rather than if-else-if (e.g. checking for MAY_EXEC shouldn't skip the check for MAY_READ and MAY_WRITE). (2) Check for MAY_CHDIR as MAY_EXEC. Without the main fix, the following BUG may occur: kernel BUG at fs/afs/security.c:386! invalid opcode: 0000 [#1] SMP PTI ... RIP: 0010:afs_permission+0x19d/0x1a0 [kafs] ... Call Trace: ? inode_permission+0xbe/0x180 ? do_faccessat+0xdc/0x270 ? do_syscall_64+0x60/0x1f0 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 00d3b7a4533e ("[AFS]: Add security support.") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03udf: Detect incorrect directory sizeJan Kara
commit fa65653e575fbd958bdf5fb9c4a71a324e39510d upstream. Detect when a directory entry is (possibly partially) beyond directory size and return EIO in that case since it means the filesystem is corrupted. Otherwise directory operations can further corrupt the directory and possibly also oops the kernel. CC: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> CC: stable@vger.kernel.org Reported-and-tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03NFSv4: Fix a typo in nfs41_sequence_processTrond Myklebust
commit 995891006ccbb73c0c9c3923cf9d25c4d07ec16b upstream. We want to compare the slot_id to the highest slot number advertised by the server. Fixes: 3be0f80b5fe9c ("NFSv4.1: Fix up replays of interrupted requests") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03NFSv4: Revert commit 5f83d86cf531d ("NFSv4.x: Fix wraparound issues..")Trond Myklebust
commit fc40724fc6731d90cc7fb6d62d66135f85a33dd2 upstream. The correct behaviour for NFSv4 sequence IDs is to wrap around to the value 0 after 0xffffffff. See https://tools.ietf.org/html/rfc5661#section-2.10.6.1 Fixes: 5f83d86cf531d ("NFSv4.x: Fix wraparound issues when validing...") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_messageDave Wysochanski
commit d68894800ec5712d7ddf042356f11e36f87d7f78 upstream. In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d' that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which is a stack char array variable of length NFS_UINT_MAXLEN == 11. If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion overflows into a negative value, for example: crash> p (unsigned) (0x80000000) $1 = 2147483648 crash> p (signed) (0x80000000) $2 = -2147483648 The '-' sign is written to the buffer and this causes a 1 byte overflow when the NULL byte is written, which corrupts kernel stack memory. If CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic: [11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c [11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1 [11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014 [11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac [11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8 [11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c [11558053.650107] Call Trace: [11558053.651347] [<ffffffff81685eac>] dump_stack+0x19/0x1b [11558053.653013] [<ffffffff8167f2b3>] panic+0xe3/0x1f2 [11558053.666240] [<ffffffff811dcb03>] ? kfree+0x103/0x140 [11558053.682589] [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.689710] [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30 [11558053.691619] [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.693867] [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc] [11558053.695763] [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0 [11558053.702236] [<ffffffff810acccc>] ? task_work_run+0xac/0xe0 [11558053.704215] [<ffffffff811fec4f>] SyS_write+0x7f/0xe0 [11558053.709674] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b Fix this by calling the internally defined nfs_map_numeric_to_string() function which properly uses '%u' to convert this __u32. For consistency, also replace the one other place where snprintf is called. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Stephen Johnston <sjohnsto@redhat.com> Fixes: cf4ab538f1516 ("NFSv4: Fix the string length returned by the idmapper") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdirScott Mayhew
commit 9c2ece6ef67e9d376f32823086169b489c422ed0 upstream. nfsd4_readdir_rsize restricts rd_maxcount to svc_max_payload when estimating the size of the readdir reply, but nfsd_encode_readdir restricts it to INT_MAX when encoding the reply. This can result in log messages like "kernel: RPC request reserved 32896 but used 1049444". Restrict rd_dircount similarly (no reason it should be larger than svc_max_payload). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03UBIFS: Fix potential integer overflow in allocationSilvio Cesare
commit 353748a359f1821ee934afc579cf04572406b420 upstream. There is potential for the size and len fields in ubifs_data_node to be too large causing either a negative value for the length fields or an integer overflow leading to an incorrect memory allocation. Likewise, when the len field is small, an integer underflow may occur. Signed-off-by: Silvio Cesare <silvio.cesare@gmail.com> Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03Btrfs: fix return value on rename exchange failureFilipe Manana
commit c5b4a50b74018b3677098151ec5f4fce07d5e6a0 upstream. If we failed during a rename exchange operation after starting/joining a transaction, we would end up replacing the return value, stored in the local 'ret' variable, with the return value from btrfs_end_transaction(). So this could end up returning 0 (success) to user space despite the operation having failed and aborted the transaction, because if there are multiple tasks having a reference on the transaction at the time btrfs_end_transaction() is called by the rename exchange, that function returns 0 (otherwise it returns -EIO and not the original error value). So fix this by not overwriting the return value on error after getting a transaction handle. Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03fuse: fix control dir setup and teardownMiklos Szeredi
commit 6becdb601bae2a043d7fb9762c4d48699528ea6e upstream. syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1]. Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode() failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to clear d_inode(dentry)->i_private field. Fix by only adding the dentry to the array after being fully set up. When tearing down the control directory, do d_invalidate() on it to get rid of any mounts that might have been added. [1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6 Reported-by: syzbot <syzbot+32c236387d66c4516827@syzkaller.appspotmail.com> Fixes: bafa96541b25 ("[PATCH] fuse: add control filesystem") Cc: <stable@vger.kernel.org> # v2.6.18 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03fuse: don't keep dead fuse_conn at fuse_fill_super().Tetsuo Handa
commit 543b8f8662fe6d21f19958b666ab0051af9db21a upstream. syzbot is reporting use-after-free at fuse_kill_sb_blk() [1]. Since sb->s_fs_info field is not cleared after fc was released by fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds already released fc and tries to hold the lock. Fix this by clearing sb->s_fs_info field after calling fuse_conn_put(). [1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+ec3986119086fe4eec97@syzkaller.appspotmail.com> Fixes: 3b463ae0c626 ("fuse: invalidation reverse calls") Cc: John Muir <john@jmuir.com> Cc: Csaba Henk <csaba@gluster.com> Cc: Anand Avati <avati@redhat.com> Cc: <stable@vger.kernel.org> # v2.6.31 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03fuse: atomic_o_trunc should truncate pagecacheMiklos Szeredi
commit df0e91d488276086bc07da2e389986cae0048c37 upstream. Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the O_TRUNC flag in the OPEN request to truncate the file atomically with the open. In this mode there's no need to send a SETATTR request to userspace after the open, so fuse_do_setattr() checks this mode and returns. But this misses the important step of truncating the pagecache. Add the missing parts of truncation to the ATTR_OPEN branch. Reported-by: Chad Austin <chadaustin@fb.com> Fixes: 6ff958edbf39 ("fuse: add atomic open+truncate support") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03fuse: fix congested state leak on aborted connectionsTejun Heo
commit 8a301eb16d99983a4961f884690ec97b92e7dcfe upstream. If a connection gets aborted while congested, FUSE can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously which can lead to severe performance degradation. The leak is caused by gating congestion state clearing with fc->connected test in request_end(). This was added way back in 2009 by 26c3679101db ("fuse: destroy bdi on umount"). While the commit description doesn't explain why the test was added, it most likely was to avoid dereferencing bdi after it got destroyed. Since then, bdi lifetime rules have changed many times and now we're always guaranteed to have access to the bdi while the superblock is alive (fc->sb). Drop fc->connected conditional to avoid leaking congestion states. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Joshua Miller <joshmiller@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org # v2.6.29+ Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26fs/binfmt_misc.c: do not allow offset overflowThadeu Lima de Souza Cascardo
commit 5cc41e099504b77014358b58567c5ea6293dd220 upstream. WHen registering a new binfmt_misc handler, it is possible to overflow the offset to get a negative value, which might crash the system, or possibly leak kernel data. Here is a crash log when 2500000000 was used as an offset: BUG: unable to handle kernel paging request at ffff989cfd6edca0 IP: load_misc_binary+0x22b/0x470 [binfmt_misc] PGD 1ef3e067 P4D 1ef3e067 PUD 0 Oops: 0000 [#1] SMP NOPTI Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc] Call Trace: search_binary_handler+0x97/0x1d0 do_execveat_common.isra.34+0x667/0x810 SyS_execve+0x31/0x40 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Use kstrtoint instead of simple_strtoul. It will work as the code already set the delimiter byte to '\0' and we only do it when the field is not empty. Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX. Also tested with examples documented at Documentation/admin-guide/binfmt-misc.rst and other registrations from packages on Ubuntu. Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26orangefs: report attributes_mask and attributes for statxMartin Brandenburg
commit 7f54910fa8dfe504f2e1563f4f6ddc3294dfbf3a upstream. OrangeFS formerly failed to set attributes_mask with the result that software could not see immutable and append flags present in the filesystem. Reported-by: Becky Ligon <ligon@clemson.edu> Signed-off-by: Martin Brandenburg <martin@omnibond.com> Fixes: 68a24a6cc4a6 ("orangefs: implement statx") Cc: stable@vger.kernel.org Cc: hubcap@omnibond.com Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26orangefs: set i_size on new symlinkMartin Brandenburg
commit f6a4b4c9d07dda90c7c29dae96d6119ac6425dca upstream. As long as a symlink inode remains in-core, the destination (and therefore size) will not be re-fetched from the server, as it cannot change. The original implementation of the attribute cache assumed that setting the expiry time in the past was sufficient to cause a re-fetch of all attributes on the next getattr. That does not work in this case. The bug manifested itself as follows. When the command sequence touch foo; ln -s foo bar; ls -l bar is run, the output was lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo However, after a re-mount, ls -l bar produces lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo After this commit, even before a re-mount, the output is lrwxrwxrwx. 1 fedora fedora 3 Apr 24 19:10 bar -> foo Reported-by: Becky Ligon <ligon@clemson.edu> Signed-off-by: Martin Brandenburg <martin@omnibond.com> Fixes: 71680c18c8f2 ("orangefs: Cache getattr results.") Cc: stable@vger.kernel.org Cc: hubcap@omnibond.com Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26cifs: For SMB2 security informaion query, check for minimum sized security ↵Shirish Pargaonkar
descriptor instead of sizeof FileAllInformation class commit ee25c6dd7b05113783ce1f4fab6b30fc00d29b8d upstream. Validate_buf () function checks for an expected minimum sized response passed to query_info() function. For security information, the size of a security descriptor can be smaller (one subauthority, no ACEs) than the size of the structure that defines FileInfoClass of FileAllInformation. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199725 Cc: <stable@vger.kernel.org> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Noah Morrison <noah.morrison@rubrik.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiryMark Syms
commit d81243c697ffc71f983736e7da2db31a8be0001f upstream. Handle this additional status in the same way as SESSION_EXPIRED. Signed-off-by: Mark Syms <mark.syms@citrix.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26smb3: on reconnect set PreviousSessionId fieldSteve French
commit b2adf22fdfba85a6701c481faccdbbb3a418ccfc upstream. The server detects reconnect by the (non-zero) value in PreviousSessionId of SMB2/SMB3 SessionSetup request, but this behavior regressed due to commit 166cea4dc3a4f66f020cfb9286225ecd228ab61d ("SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup") CC: Stable <stable@vger.kernel.org> CC: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26smb3: fix various xid leaksSteve French
commit cfe89091644c441a1ade6dae6d2e47b715648615 upstream. Fix a few cases where we were not freeing the xid which led to active requests being non-zero at unmount time. Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26btrfs: scrub: Don't use inode pages for device replaceQu Wenruo
commit ac0b4145d662a3b9e34085dea460fb06ede9b69b upstream. [BUG] Btrfs can create compressed extent without checksum (even though it shouldn't), and if we then try to replace device containing such extent, the result device will contain all the uncompressed data instead of the compressed one. Test case already submitted to fstests: https://patchwork.kernel.org/patch/10442353/ [CAUSE] When handling compressed extent without checksum, device replace will goe into copy_nocow_pages() function. In that function, btrfs will get all inodes referring to this data extents and then use find_or_create_page() to get pages direct from that inode. The problem here is, pages directly from inode are always uncompressed. And for compressed data extent, they mismatch with on-disk data. Thus this leads to corrupted compressed data extent written to replace device. [FIX] In this attempt, we could just remove the "optimization" branch, and let unified scrub_pages() to handle it. Although scrub_pages() won't bother reusing page cache, it will be a little slower, but it does the correct csum checking and won't cause such data corruption caused by "optimization". Note about the fix: this is the minimal fix that can be backported to older stable trees without conflicts. The whole callchain from copy_nocow_pages() can be deleted, and will be in followup patches. Fixes: ff023aac3119 ("Btrfs: add code to scrub to copy read data to another disk") CC: stable@vger.kernel.org # 4.4+ Reported-by: James Harvey <jamespharvey20@gmail.com> Reviewed-by: James Harvey <jamespharvey20@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> [ remove code removal, add note why ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26btrfs: return error value if create_io_em failed in cow_file_rangeSu Yue
commit 090a127afa8f73e9618d4058d6755f7ec7453dd6 upstream. In cow_file_range(), create_io_em() may fail, but its return value is not recorded. Then return value may be 0 even it failed which is a wrong behavior. Let cow_file_range() return PTR_ERR(em) if create_io_em() failed. Fixes: 6f9994dbabe5 ("Btrfs: create a helper to create em for IO") CC: stable@vger.kernel.org # 4.11+ Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2()Omar Sandoval
commit fd4e994bd1f9dc9628e168a7f619bf69f6984635 upstream. If we have invalid flags set, when we error out we must drop our writer counter and free the buffer we allocated for the arguments. This bug is trivially reproduced with the following program on 4.7+: #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/btrfs.h> #include <linux/btrfs_tree.h> int main(int argc, char **argv) { struct btrfs_ioctl_vol_args_v2 vol_args = { .flags = UINT64_MAX, }; int ret; int fd; if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_WRONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args); if (ret == -1) perror("ioctl"); close(fd); return EXIT_SUCCESS; } When unmounting the filesystem, we'll hit the WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the filesystem to be remounted read-only as the writer count will stay lifted. Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26Btrfs: fix clone vs chattr NODATASUM raceOmar Sandoval
commit b5c40d598f5408bd0ca22dfffa82f03cd9433f23 upstream. In btrfs_clone_files(), we must check the NODATASUM flag while the inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags() will change the flags after we check and we can end up with a party checksummed file. The race window is only a few instructions in size, between the if and the locks which is: 3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode)) 3835 return -EISDIR; where the setflags must be run and toggle the NODATASUM flag (provided the file size is 0). The clone will block on the inode lock, segflags takes the inode lock, changes flags, releases log and clone continues. Not impossible but still needs a lot of bad luck to hit unintentionally. Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: fix fencepost error in check for inode count overflow during resizeJan Kara
commit 4f2f76f751433908364ccff82f437a57d0e6e9b7 upstream. ext4_resize_fs() has an off-by-one bug when checking whether growing of a filesystem will not overflow inode count. As a result it allows a filesystem with 8192 inodes per group to grow to 64TB which overflows inode count to 0 and makes filesystem unusable. Fix it. Cc: stable@vger.kernel.org Fixes: 3f8a6411fbada1fa482276591e037f3b1adcf55b Reported-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: correctly handle a zero-length xattr with a non-zero e_value_offsTheodore Ts'o
commit 8a2b307c21d4b290e3cbe33f768f194286d07c23 upstream. Ext4 will always create ext4 extended attributes which do not have a value (where e_value_size is zero) with e_value_offs set to zero. In most places e_value_offs will not be used in a substantive way if e_value_size is zero. There was one exception to this, which is in ext4_xattr_set_entry(), where if there is a maliciously crafted file system where there is an extended attribute with e_value_offs is non-zero and e_value_size is 0, the attempt to remove this xattr will result in a negative value getting passed to memmove, leading to the following sadness: [ 41.225365] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) [ 44.538641] BUG: unable to handle kernel paging request at ffff9ec9a3000000 [ 44.538733] IP: __memmove+0x81/0x1a0 [ 44.538755] PGD 1249bd067 P4D 1249bd067 PUD 1249c1067 PMD 80000001230000e1 [ 44.538793] Oops: 0003 [#1] SMP PTI [ 44.539074] CPU: 0 PID: 1470 Comm: poc Not tainted 4.16.0-rc1+ #1 ... [ 44.539475] Call Trace: [ 44.539832] ext4_xattr_set_entry+0x9e7/0xf80 ... [ 44.539972] ext4_xattr_block_set+0x212/0xea0 ... [ 44.540041] ext4_xattr_set_handle+0x514/0x610 [ 44.540065] ext4_xattr_set+0x7f/0x120 [ 44.540090] __vfs_removexattr+0x4d/0x60 [ 44.540112] vfs_removexattr+0x75/0xe0 [ 44.540132] removexattr+0x4d/0x80 ... [ 44.540279] path_removexattr+0x91/0xb0 [ 44.540300] SyS_removexattr+0xf/0x20 [ 44.540322] do_syscall_64+0x71/0x120 [ 44.540344] entry_SYSCALL_64_after_hwframe+0x21/0x86 https://bugzilla.kernel.org/show_bug.cgi?id=199347 This addresses CVE-2018-10840. Reported-by: "Xu, Wen" <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Fixes: dec214d00e0d7 ("ext4: xattr inode deduplication") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget()Theodore Ts'o
commit eb9b5f01c33adebc31cbc236c02695f605b0e417 upstream. If ext4_find_inline_data_nolock() returns an error it needs to get reflected up to ext4_iget(). In order to fix this, ext4_iget_extra_inode() needs to return an error (and not return void). This is related to "ext4: do not allow external inodes for inline data" (which fixes CVE-2018-11412) in that in the errors=continue case, it would be useful to for userspace to receive an error indicating that file system is corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: do not allow external inodes for inline dataTheodore Ts'o
commit 117166efb1ee8f13c38f9e96b258f16d4923f888 upstream. The inline data feature was implemented before we added support for external inodes for xattrs. It makes no sense to support that combination, but the problem is that there are a number of extended attribute checks that are skipped if e_value_inum is non-zero. Unfortunately, the inline data code is completely e_value_inum unaware, and attempts to interpret the xattr fields as if it were an inline xattr --- at which point, Hilarty Ensues. This addresses CVE-2018-11412. https://bugzilla.kernel.org/show_bug.cgi?id=199803 Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: update mtime in ext4_punch_hole even if no blocks are releasedLukas Czerner
commit eee597ac931305eff3d3fd1d61d6aae553bc0984 upstream. Currently in ext4_punch_hole we're going to skip the mtime update if there are no actual blocks to release. However we've actually modified the file by zeroing the partial block so the mtime should be updated. Moreover the sync and datasync handling is skipped as well, which is also wrong. Fix it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Joe Habermann <joe.habermann@quantum.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-26ext4: fix hole length detection in ext4_ind_map_blocks()Jan Kara
commit 2ee3ee06a8fd792765fa3267ddf928997797eec5 upstream. When ext4_ind_map_blocks() computes a length of a hole, it doesn't count with the fact that mapped offset may be somewhere in the middle of the completely empty subtree. In such case it will return too large length of the hole which then results in lseek(SEEK_DATA) to end up returning an incorrect offset beyond the end of the hole. Fix the problem by correctly taking offset within a subtree into account when computing a length of a hole. Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8 CC: stable@vger.kernel.org Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>