summaryrefslogtreecommitdiffstats
path: root/fs/ext4
AgeCommit message (Collapse)Author
2020-02-28ext4: fix a data race in EXT4_I(inode)->i_disksizeQian Cai
commit 35df4299a6487f323b0aca120ea3f485dfee2ae3 upstream. EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4] write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127: ext4_write_end+0x4e3/0x750 [ext4] ext4_update_i_disksize at fs/ext4/ext4.h:3032 (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046 (inlined by) ext4_write_end at fs/ext4/inode.c:1287 generic_perform_write+0x208/0x2a0 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37: ext4_writepages+0x10ac/0x1d00 [ext4] mpage_map_and_submit_extent at fs/ext4/inode.c:2468 (inlined by) ext4_writepages at fs/ext4/inode.c:2772 do_writepages+0x5e/0x130 __writeback_single_inode+0xeb/0xb20 writeback_sb_inodes+0x429/0x900 __writeback_inodes_wb+0xc4/0x150 wb_writeback+0x4bd/0x870 wb_workfn+0x6b4/0x960 process_one_work+0x54c/0xbe0 worker_thread+0x80/0x650 kthread+0x1e0/0x200 ret_from_fork+0x27/0x50 Reported by Kernel Concurrency Sanitizer on: CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G W O L 5.5.0-next-20200204+ #5 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: writeback wb_workfn (flush-7:0) Since only the read is operating as lockless (outside of the "i_data_sem"), load tearing could introduce a logic bug. Fix it by adding READ_ONCE() for the read and WRITE_ONCE() for the write. Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAITRitesh Harjani
[ Upstream commit f629afe3369e9885fd6e9cc7a4f514b6a65cf9e9 ] Apparently our current rwsem code doesn't like doing the trylock, then lock for real scheme. So change our dax read/write methods to just do the trylock for the RWF_NOWAIT case. This seems to fix AIM7 regression in some scalable filesystems upto ~25% in some cases. Claimed in commit 942491c9e6d6 ("xfs: fix AIM7 regression") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-2-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-19ext4: improve explanation of a mount failure caused by a misconfigured kernelTheodore Ts'o
commit d65d87a07476aa17df2dcb3ad18c22c154315bec upstream. If CONFIG_QFMT_V2 is not enabled, but CONFIG_QUOTA is enabled, when a user tries to mount a file system with the quota or project quota enabled, the kernel will emit a very confusing messsage: EXT4-fs warning (device vdc): ext4_enable_quotas:5914: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix. EXT4-fs (vdc): mount failed We will now report an explanatory message indicating which kernel configuration options have to be enabled, to avoid customer/sysadmin confusion. Link: https://lore.kernel.org/r/20200215012738.565735-1-tytso@mit.edu Google-Bug-Id: 149093531 Fixes: 7c319d328505b778 ("ext4: make quota as first class supported feature") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19ext4: add cond_resched() to ext4_protect_reserved_inodeShijie Luo
commit af133ade9a40794a37104ecbcc2827c0ea373a3c upstream. When journal size is set too big by "mkfs.ext4 -J size=", or when we mount a crafted image to make journal inode->i_size too big, the loop, "while (i < num)", holds cpu too long. This could cause soft lockup. [ 529.357541] Call trace: [ 529.357551] dump_backtrace+0x0/0x198 [ 529.357555] show_stack+0x24/0x30 [ 529.357562] dump_stack+0xa4/0xcc [ 529.357568] watchdog_timer_fn+0x300/0x3e8 [ 529.357574] __hrtimer_run_queues+0x114/0x358 [ 529.357576] hrtimer_interrupt+0x104/0x2d8 [ 529.357580] arch_timer_handler_virt+0x38/0x58 [ 529.357584] handle_percpu_devid_irq+0x90/0x248 [ 529.357588] generic_handle_irq+0x34/0x50 [ 529.357590] __handle_domain_irq+0x68/0xc0 [ 529.357593] gic_handle_irq+0x6c/0x150 [ 529.357595] el1_irq+0xb8/0x140 [ 529.357599] __ll_sc_atomic_add_return_acquire+0x14/0x20 [ 529.357668] ext4_map_blocks+0x64/0x5c0 [ext4] [ 529.357693] ext4_setup_system_zone+0x330/0x458 [ext4] [ 529.357717] ext4_fill_super+0x2170/0x2ba8 [ext4] [ 529.357722] mount_bdev+0x1a8/0x1e8 [ 529.357746] ext4_mount+0x44/0x58 [ext4] [ 529.357748] mount_fs+0x50/0x170 [ 529.357752] vfs_kern_mount.part.9+0x54/0x188 [ 529.357755] do_mount+0x5ac/0xd78 [ 529.357758] ksys_mount+0x9c/0x118 [ 529.357760] __arm64_sys_mount+0x28/0x38 [ 529.357764] el0_svc_common+0x78/0x130 [ 529.357766] el0_svc_handler+0x38/0x78 [ 529.357769] el0_svc+0x8/0xc [ 541.356516] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mount:18674] Link: https://lore.kernel.org/r/20200211011752.29242-1-luoshijie1@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19ext4: fix checksum errors with indexed dirsJan Kara
commit 48a34311953d921235f4d7bbd2111690d2e469cf upstream. DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled. 1) We just disallow loading any directory inodes with EXT4_INDEX_FL when DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?" 2) When we find out htree structure is corrupted (but the filesystem and the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more. Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19ext4: fix support for inode sizes > 1024 bytesTheodore Ts'o
commit 4f97a68192bd33b9963b400759cef0ca5963af00 upstream. A recent commit, 9803387c55f7 ("ext4: validate the debug_want_extra_isize mount option at parse time"), moved mount-time checks around. One of those changes moved the inode size check before the blocksize variable was set to the blocksize of the file system. After 9803387c55f7 was set to the minimum allowable blocksize, which in practice on most systems would be 1024 bytes. This cuased file systems with inode sizes larger than 1024 bytes to be rejected with a message: EXT4-fs (sdXX): unsupported inode size: 4096 Fixes: 9803387c55f7 ("ext4: validate the debug_want_extra_isize mount option at parse time") Link: https://lore.kernel.org/r/20200206225252.GA3673@mit.edu Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-19ext4: don't assume that mmp_nodename/bdevname have NULAndreas Dilger
commit 14c9ca0583eee8df285d68a0e6ec71053efd2228 upstream. Don't assume that the mmp_nodename and mmp_bdevname strings are NUL terminated, since they are filled in by snprintf(), which is not guaranteed to do so. Link: https://lore.kernel.org/r/1580076215-1048-1-git-send-email-adilger@dilger.ca Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11ext4: fix deadlock allocating crypto bounce page from mempoolEric Biggers
[ Upstream commit 547c556f4db7c09447ecf5f833ab6aaae0c5ab58 ] ext4_writepages() on an encrypted file has to encrypt the data, but it can't modify the pagecache pages in-place, so it encrypts the data into bounce pages and writes those instead. All bounce pages are allocated from a mempool using GFP_NOFS. This is not correct use of a mempool, and it can deadlock. This is because GFP_NOFS includes __GFP_DIRECT_RECLAIM, which enables the "never fail" mode for mempool_alloc() where a failed allocation will fall back to waiting for one of the preallocated elements in the pool. But since this mode is used for all a bio's pages and not just the first, it can deadlock waiting for pages already in the bio to be freed. This deadlock can be reproduced by patching mempool_alloc() to pretend that pool->alloc() always fails (so that it always falls back to the preallocations), and then creating an encrypted file of size > 128 KiB. Fix it by only using GFP_NOFS for the first page in the bio. For subsequent pages just use GFP_NOWAIT, and if any of those fail, just submit the bio and start a new one. This will need to be fixed in f2fs too, but that's less straightforward. Fixes: c9af28fdd449 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191231181149.47619-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-05ext4: validate the debug_want_extra_isize mount option at parse timeTheodore Ts'o
commit 9803387c55f7d2ce69aa64340c5fdc6b3027dbc8 upstream. Instead of setting s_want_extra_size and then making sure that it is a valid value afterwards, validate the field before we set it. This avoids races and other problems when remounting the file system. Link: https://lore.kernel.org/r/20191215063020.GA11512@mit.edu Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-and-tested-by: syzbot+4a39a025912b265cacef@syzkaller.appspotmail.com Signed-off-by: Zubin Mithra <zsm@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-27ext4: set error return correctly when ext4_htree_store_dirent failsColin Ian King
[ Upstream commit 7a14826ede1d714f0bb56de8167c0e519041eeda ] Currently when the call to ext4_htree_store_dirent fails the error return variable 'ret' is is not being set to the error code and variable count is instead, hence the error code is not being returned. Fix this by assigning ret to the error return code. Addresses-Coverity: ("Unused value") Fixes: 8af0f0822797 ("ext4: fix readdir error in the case of inline_data+dir_index") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04ext4: iomap that extends beyond EOF should be marked dirtyMatthew Bobrowski
[ Upstream commit 2e9b51d78229d5145725a481bb5464ebc0a3f9b2 ] This patch addresses what Dave Chinner had discovered and fixed within commit: 7684e2c4384d. This changes does not have any user visible impact for ext4 as none of the current users of ext4_iomap_begin() that extend files depend on IOMAP_F_DIRTY. When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/8b43ee9ee94bee5328da56ba0909b7d2229ef150.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04ext4: update direct I/O read lock pattern for IOCB_NOWAITMatthew Bobrowski
[ Upstream commit 548feebec7e93e58b647dba70b3303dcb569c914 ] This patch updates the lock pattern in ext4_direct_IO_read() to not block on inode lock in cases of IOCB_NOWAIT direct I/O reads. The locking condition implemented here is similar to that of 942491c9e6d6 ("xfs: fix AIM7 regression"). Fixes: 16c54688592c ("ext4: Allow parallel DIO reads") Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c5d5e759f91747359fbd2c6f9a36240cf75ad79f.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31ext4: unlock on error in ext4_expand_extra_isize()Dan Carpenter
commit 7f420d64a08c1dcd65b27be82a27cf2bdb2e7847 upstream. We need to unlock the xattr before returning on this error path. Cc: stable@kernel.org # 4.13 Fixes: c03b45b853f5 ("ext4, project: expand inode extra size if possible") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20191213185010.6k7yl2tck3wlsdkt@kili.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31ext4: check for directory entries too close to block endJan Kara
commit 109ba779d6cca2d519c5dd624a3276d03e21948e upstream. ext4_check_dir_entry() currently does not catch a case when a directory entry ends so close to the block end that the header of the next directory entry would not fit in the remaining space. This can lead to directory iteration code trying to access address beyond end of current buffer head leading to oops. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191202170213.4761-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31ext4: fix ext4_empty_dir() for directories with holesJan Kara
commit 64d4ce892383b2ad6d782e080d25502f91bf2a38 upstream. Function ext4_empty_dir() doesn't correctly handle directories with holes and crashes on bh->b_data dereference when bh is NULL. Reorganize the loop to use 'offset' variable all the times instead of comparing pointers to current direntry with bh->b_data pointer. Also add more strict checking of '.' and '..' directory entries to avoid entering loop in possibly invalid state on corrupted filesystems. References: CVE-2019-19037 CC: stable@vger.kernel.org Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191202170213.4761-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17ext4: fix a bug in ext4_wait_for_tail_page_commityangerkun
commit 565333a1554d704789e74205989305c811fd9c7a upstream. No need to wait for any commit once the page is fully truncated. Besides, it may confuse e.g. concurrent ext4_writepage() with the page still be dirty (will be cleared by truncate_pagecache() in ext4_setattr()) but buffers has been freed; and then trigger a bug show as below: [ 26.057508] ------------[ cut here ]------------ [ 26.058531] kernel BUG at fs/ext4/inode.c:2134! ... [ 26.088130] Call trace: [ 26.088695] ext4_writepage+0x914/0xb28 [ 26.089541] writeout.isra.4+0x1b4/0x2b8 [ 26.090409] move_to_new_page+0x3b0/0x568 [ 26.091338] __unmap_and_move+0x648/0x988 [ 26.092241] unmap_and_move+0x48c/0xbb8 [ 26.093096] migrate_pages+0x220/0xb28 [ 26.093945] kernel_mbind+0x828/0xa18 [ 26.094791] __arm64_sys_mbind+0xc8/0x138 [ 26.095716] el0_svc_common+0x190/0x490 [ 26.096571] el0_svc_handler+0x60/0xd0 [ 26.097423] el0_svc+0x8/0xc Run the procedure (generate by syzkaller) parallel with ext3. void main() { int fd, fd1, ret; void *addr; size_t length = 4096; int flags; off_t offset = 0; char *str = "12345"; fd = open("a", O_RDWR | O_CREAT); assert(fd >= 0); /* Truncate to 4k */ ret = ftruncate(fd, length); assert(ret == 0); /* Journal data mode */ flags = 0xc00f; ret = ioctl(fd, _IOW('f', 2, long), &flags); assert(ret == 0); /* Truncate to 0 */ fd1 = open("a", O_TRUNC | O_NOATIME); assert(fd1 >= 0); addr = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_SHARED, fd, offset); assert(addr != (void *)-1); memcpy(addr, str, 5); mbind(addr, length, 0, 0, 0, MPOL_MF_MOVE); } And the bug will be triggered once we seen the below order. reproduce1 reproduce2 ... | ... truncate to 4k | change to journal data mode | | memcpy(set page dirty) truncate to 0: | ext4_setattr: | ... | ext4_wait_for_tail_page_commit | | mbind(trigger bug) truncate_pagecache(clean dirty)| ... ... | mbind will call ext4_writepage() since the page still be dirty, and then report the bug since the buffers has been free. Fix it by return directly once offset equals to 0 which means the page has been fully truncated. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17ext4: work around deleting a file with i_nlink == 0 safelyTheodore Ts'o
commit c7df4a1ecb8579838ec8c56b2bb6a6716e974f37 upstream. If the file system is corrupted such that a file's i_links_count is too small, then it's possible that when unlinking that file, i_nlink will already be zero. Previously we were working around this kind of corruption by forcing i_nlink to one; but we were doing this before trying to delete the directory entry --- and if the file system is corrupted enough that ext4_delete_entry() fails, then we exit with i_nlink elevated, and this causes the orphan inode list handling to be FUBAR'ed, such that when we unmount the file system, the orphan inode list can get corrupted. A better way to fix this is to simply skip trying to call drop_nlink() if i_nlink is already zero, thus moving the check to the place where it makes the most sense. https://bugzilla.kernel.org/show_bug.cgi?id=205433 Link: https://lore.kernel.org/r/20191112032903.8828-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17ext4: Fix credit estimate for final inode freeingJan Kara
commit 65db869c754e7c271691dd5feabf884347e694f5 upstream. Estimate for the number of credits needed for final freeing of inode in ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for orphan deletion, bitmap & group descriptor for inode freeing) and not just 3. [ Fixed minor whitespace nit. -- TYT ] Fixes: e50e5129f384 ("ext4: xattr-in-inode support") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05ext4: add more paranoia checking in ext4_expand_extra_isize handlingTheodore Ts'o
commit 4ea99936a1630f51fc3a2d61a58ec4a1c4b7d55a upstream. It's possible to specify a non-zero s_want_extra_isize via debugging option, and this can cause bad things(tm) to happen when using a file system with an inode size of 128 bytes. Add better checking when the file system is mounted, as well as when we are actually doing the trying to do the inode expansion. Link: https://lore.kernel.org/r/20191110121510.GH23325@mit.edu Reported-by: syzbot+f8d6f8386ceacdbfff57@syzkaller.appspotmail.com Reported-by: syzbot+33d7ea72e47de3bdf4e1@syzkaller.appspotmail.com Reported-by: syzbot+44b6763edfc17144296f@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-24ext4: fix build error when DX_DEBUG is definedGabriel Krisman Bertazi
[ Upstream commit 799578ab16e86b074c184ec5abbda0bc698c7b0b ] Enabling DX_DEBUG triggers the build error below. info is an attribute of the dxroot structure. linux/fs/ext4/namei.c:2264:12: error: ‘info’ undeclared (first use in this function); did you mean ‘insl’? info->indirect_levels)); Fixes: e08ac99fa2a2 ("ext4: add largedir feature") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06ext4: disallow files with EXT4_JOURNAL_DATA_FL from EXT4_IOC_SWAP_BOOTTheodore Ts'o
[ Upstream commit 6e589291f4b1b700ca12baec5930592a0d51e63c ] A malicious/clueless root user can use EXT4_IOC_SWAP_BOOT to force a corner casew which can lead to the file system getting corrupted. There's no usefulness to allowing this, so just prohibit this case. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07ext4: fix potential use after free after remounting with noblock_validityzhangyi (F)
[ Upstream commit 7727ae52975d4f4ef7ff69ed8e6e25f6a4168158 ] Remount process will release system zone which was allocated before if "noblock_validity" is specified. If we mount an ext4 file system to two mountpoints with default mount options, and then remount one of them with "noblock_validity", it may trigger a use after free problem when someone accessing the other one. # mount /dev/sda foo # mount /dev/sda bar User access mountpoint "foo" | Remount mountpoint "bar" | ext4_map_blocks() | ext4_remount() check_block_validity() | ext4_setup_system_zone() ext4_data_block_valid() | ext4_release_system_zone() | free system_blks rb nodes access system_blks rb nodes | trigger use after free | This problem can also be reproduced by one mountpint, At the same time, add_system_zone() can get called during remount as well so there can be racing ext4_data_block_valid() reading the rbtree at the same time. This patch add RCU to protect system zone from releasing or building when doing a remount which inverse current "noblock_validity" mount option. It assign the rbtree after the whole tree was complete and do actual freeing after rcu grace period, avoid any intermediate state. Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05ext4: fix punch hole for inline_data file systemsTheodore Ts'o
commit c1e8220bd316d8ae8e524df39534b8a412a45d5e upstream. If a program attempts to punch a hole on an inline data file, we need to convert it to a normal file first. This was detected using ext4/032 using the adv configuration. Simple reproducer: mke2fs -Fq -t ext4 -O inline_data /dev/vdc mount /vdc echo "" > /vdc/testfile xfs_io -c 'truncate 33554432' /vdc/testfile xfs_io -c 'fpunch 0 1048576' /vdc/testfile umount /vdc e2fsck -fy /dev/vdc Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05ext4: fix warning inside ext4_convert_unwritten_extents_endioRakesh Pandit
commit e3d550c2c4f2f3dba469bc3c4b83d9332b4e99e1 upstream. Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing first argument. This was introduced in commit ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio") and splitting extents inside endio would trigger it. Fixes: ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-16ext4: unsigned int compared against zeroColin Ian King
[ Upstream commit fbbbbd2f28aec991f3fbc248df211550fbdfd58c ] There are two cases where u32 variables n and err are being checked for less than zero error values, the checks is always false because the variables are not signed. Fix this by making the variables ints. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16ext4: fix block validity checks for journal inodes using indirect blocksTheodore Ts'o
[ Upstream commit 170417c8c7bb2cbbdd949bf5c443c0c8f24a203b ] Commit 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") failed to add an exception for the journal inode in ext4_check_blockref(), which is the function used by ext4_get_branch() for indirect blocks. This caused attempts to read from the ext3-style journals to fail with: [ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block Fix this by adding the missing exception check. Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16ext4: don't perform block validity checks on the journal inodeTheodore Ts'o
[ Upstream commit 0a944e8a6c66ca04c7afbaa17e22bf208a8b37f0 ] Since the journal inode is already checked when we added it to the block validity's system zone, if we check it again, we'll just trigger a failure. This was causing failures like this: [ 53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0) [ 53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8 [ 53.938480] Aborting journal on device sda-8. ... but only if the system was under enough memory pressure that logical->physical mapping for the journal inode gets pushed out of the extent cache. (This is why it wasn't noticed earlier.) Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity") Reported-by: Dan Rue <dan.rue@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16ext4: protect journal inode's blocks using block_validityTheodore Ts'o
[ Upstream commit 345c0dbf3a30872d9b204db96b5857cd00808cae ] Add the blocks which belong to the journal inode to block_validity's system zone so attempts to deallocate or overwrite the journal due a corrupted file system where the journal blocks are also claimed by another inode. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-28ext4: allow directory holesTheodore Ts'o
commit 4e19d6b65fb4fc42e352ce9883649e049da14743 upstream. The largedir feature was intended to allow ext4 directories to have unmapped directory blocks (e.g., directory holes). And so the released e2fsprogs no longer enforces this for largedir file systems; however, the corresponding change to the kernel-side code was not made. This commit fixes this oversight. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28ext4: use jbd2_inode dirty range scopingRoss Zwisler
commit 73131fbb003b3691cfcf9656f234b00da497fcd6 upstream. Use the newly introduced jbd2_inode dirty range scoping to prevent us from waiting forever when trying to complete a journal transaction. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28ext4: enforce the immutable flag on open filesTheodore Ts'o
commit 02b016ca7f99229ae6227e7b2fc950c4e140d74a upstream. According to the chattr man page, "a file with the 'i' attribute cannot be modified..." Historically, this was only enforced when the file was opened, per the rest of the description, "... and the file can not be opened in write mode". There is general agreement that we should standardize all file systems to prevent modifications even for files that were opened at the time the immutable flag is set. Eventually, a change to enforce this at the VFS layer should be landing in mainline. Until then, enforce this at the ext4 level to prevent xfstests generic/553 from failing. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28ext4: don't allow any modifications to an immutable fileDarrick J. Wong
commit 2e53840362771c73eb0a5ff71611507e64e8eecd upstream. Don't allow any modifications to a file that's marked immutable, which means that we have to flush all the writable pages to make the readonly and we have to check the setattr/setflags parameters more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31ext4: wait for outstanding dio during truncate in nojournal modeJan Kara
commit 82a25b027ca48d7ef197295846b352345853dfa8 upstream. We didn't wait for outstanding direct IO during truncate in nojournal mode (as we skip orphan handling in that case). This can lead to fs corruption or stale data exposure if truncate ends up freeing blocks and these get reallocated before direct IO finishes. Fix the condition determining whether the wait is necessary. CC: stable@vger.kernel.org Fixes: 1c9114f9c0f1 ("ext4: serialize unlocked dio reads with truncate") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31ext4: do not delete unlinked inode from orphan list on failed truncateJan Kara
commit ee0ed02ca93ef1ecf8963ad96638795d55af2c14 upstream. It is possible that unlinked inode enters ext4_setattr() (e.g. if somebody calls ftruncate(2) on unlinked but still open file). In such case we should not delete the inode from the orphan list if truncate fails. Note that this is mostly a theoretical concern as filesystem is corrupted if we reach this path anyway but let's be consistent in our orphan handling. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: don't update s_rev_level if not requiredAndreas Dilger
commit c9e716eb9b3455a83ed7c5f5a81256a3da779a95 upstream. Don't update the superblock s_rev_level during mount if it isn't actually necessary, only if superblock features are being set by the kernel. This was originally added for ext3 since it always set the INCOMPAT_RECOVER and HAS_JOURNAL features during mount, but this is not needed since no journal mode was added to ext4. That will allow Geert to mount his 20-year-old ext2 rev 0.0 m68k filesystem, as a testament of the backward compatibility of ext4. Fixes: 0390131ba84f ("ext4: Allow ext4 to run without a journal") Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: fix compile error when using BUFFER_TRACEzhangyi (F)
commit ddccb6dbe780d68133191477571cb7c69e17bb8c upstream. Fix compile error below when using BUFFER_TRACE. fs/ext4/inode.c: In function ‘ext4_expand_extra_isize’: fs/ext4/inode.c:5979:19: error: request for member ‘bh’ in something not a structure or union BUFFER_TRACE(iloc.bh, "get_write_access"); Fixes: c03b45b853f58 ("ext4, project: expand inode extra size if possible") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: avoid panic during forced reboot due to aborted journalJan Kara
commit 2c1d0e3631e5732dba98ef49ac0bec1388776793 upstream. Handling of aborted journal is a special code path different from standard ext4_error() one and it can call panic() as well. Commit 1dc1097ff60e ("ext4: avoid panic during forced reboot") forgot to update this path so fix that omission. Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 5.1 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: fix use-after-free in dx_release()Sahitya Tummala
commit 08fc98a4d6424af66eb3ac4e2cedd2fc927ed436 upstream. The buffer_head (frames[0].bh) and it's corresping page can be potentially free'd once brelse() is done inside the for loop but before the for loop exits in dx_release(). It can be free'd in another context, when the page cache is flushed via drop_caches_sysctl_handler(). This results into below data abort when accessing info->indirect_levels in dx_release(). Unable to handle kernel paging request at virtual address ffffffc17ac3e01e Call trace: dx_release+0x70/0x90 ext4_htree_fill_tree+0x2d4/0x300 ext4_readdir+0x244/0x6f8 iterate_dir+0xbc/0x160 SyS_getdents64+0x94/0x174 Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: fix data corruption caused by overlapping unaligned and aligned IOLukas Czerner
commit 57a0da28ced8707cb9f79f071a016b9d005caf5a upstream. Unaligned AIO must be serialized because the zeroing of partial blocks of unaligned AIO can result in data corruption in case it's overlapping another in flight IO. Currently we wait for all unwritten extents before we submit unaligned AIO which protects data in case of unaligned AIO is following overlapping IO. However if a unaligned AIO is followed by overlapping aligned AIO we can still end up corrupting data. To fix this, we must make sure that the unaligned AIO is the only IO in flight by waiting for unwritten extents conversion not just before the IO submission, but right after it as well. This problem can be reproduced by xfstest generic/538 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: zero out the unused memory region in the extent tree blockSriram Rajagopalan
commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream. This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries. This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced. This fixes CVE-2019-11833. Signed-off-by: Sriram Rajagopalan <sriramr@arista.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: fix ext4_show_options for file systems w/o journalDebabrata Banerjee
commit 50b29d8f033a7c88c5bc011abc2068b1691ab755 upstream. Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as I assume was intended, all other options were blown away leading to _ext4_show_options() output being incorrect. Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal") Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: actually request zeroing of inode table after growKirill Tkhai
commit 310a997fd74de778b9a4848a64be9cda9f18764a upstream. It is never possible, that number of block groups decreases, since only online grow is supported. But after a growing occured, we have to zero inode tables for just created new block groups. Fixes: 19c5246d2516 ("ext4: add new online resize interface") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: fix use-after-free race with debug_want_extra_isizeBarret Rhoden
commit 7bc04c5c2cc467c5b40f2b03ba08da174a0d5fa7 upstream. When remounting with debug_want_extra_isize, we were not performing the same checks that we do during a normal mount. That allowed us to set a value for s_want_extra_isize that reached outside the s_inode_size. Fixes: e2b911c53584 ("ext4: clean up feature test macros with predicate functions") Reported-by: syzbot+f584efa0ac7213c226b7@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: avoid drop reference to iloc.bh twicePan Bian
commit 8c380ab4b7b59c0c602743810be1b712514eaebc upstream. The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty. However, the reference is dropped again if error occurs during ext4_handle_dirty_metadata, which may result in use-after-free bugs. Fixes: fb265c9cb49e("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: ignore e_value_offs for xattrs with value-in-ea-inodeTheodore Ts'o
commit e5d01196c0428a206f307e9ee5f6842964098ff0 upstream. In other places in fs/ext4/xattr.c, if e_value_inum is non-zero, the code ignores the value in e_value_offs. The e_value_offs *should* be zero, but we shouldn't depend upon it, since it might not be true in a corrupted/fuzzed file system. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202897 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202877 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22ext4: make sanity check in mballoc more strictJan Kara
commit 31562b954b60f02acb91b7349dc6432d3f8c3c5f upstream. The sanity check in mb_find_extent() only checked that returned extent does not extend past blocksize * 8, however it should not extend past EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group < blocksize * 8 and the tail of the bitmap is not properly filled by 1s which happened e.g. when ancient kernels have grown the filesystem. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ext4: fix some error pointer dereferencesDan Carpenter
[ Upstream commit 7159a986b4202343f6cca3bb8079ecace5816fd6 ] We can't pass error pointers to brelse(). Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: prohibit fstrim in norecovery modeDarrick J. Wong
[ Upstream commit 18915b5873f07e5030e6fb108a050fa7c71c59fb ] The ext4 fstrim implementation uses the block bitmaps to find free space that can be discarded. If we haven't replayed the journal, the bitmaps will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: report real fs size after failed resizeLukas Czerner
[ Upstream commit 6c7328400e0488f7d49e19e02290ba343b6811b2 ] Currently when the file system resize using ext4_resize_fs() fails it will report into log that "resized filesystem to <requested block count>". However this may not be true in the case of failure. Use the current block count as returned by ext4_blocks_count() to report the block count. Additionally, report a warning that "error occurred during file system resize" Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: add missing brelse() in add_new_gdb_meta_bg()Lukas Czerner
[ Upstream commit d64264d6218e6892edd832dc3a5a5857c2856c53 ] Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh in case ext4_journal_get_write_access() fails. Additionally kvfree() is missing in the same error path. Fix it by moving the ext4_journal_get_write_access() before the ext4 sb update as Ted suggested and release n_group_desc and gdb_bh in case it fails. Fixes: 61a9c11e5e7a ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>