aboutsummaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2021-05-13fs: dcache: avoid livelocks on d_alloc_parallelv5.2/standard/preempt-rt/baseWenlin Kang
With RT, if a high priority task that runs d_alloc_parallel() preempts a low priority task that runs __d_add(), it can make d_alloc_parallel() go into an infinite retry. low priority task: high priority task: __lookup_slow() d_add() __d_add start_dir_add(dir) __lookup_slow() d_alloc_parallel() When the low priority task finished start_dir_add(), i_dir_seq has been incremented to an odd value. Since __d_add() uses spin_lock(), migration is disabled, so if the low priority task is preempted during start_dir_add() /end_dir_add(), then high priority task can go into an infinite waiting for i_dir_seq. This patch avoid the issue by enable preempt_disable() during start_dir_add() /end_dir_add(). Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
2020-11-19Merge branch 'v5.2/standard/base' into v5.2/standard/preempt-rt/baseBruce Ashfield
2020-11-19ext4: fix -Wstringop-truncation warningsv5.2/standard/tiny/intel-x86v5.2/standard/tiny/common-pcv5.2/standard/tiny/basev5.2/standard/qemuppcv5.2/standard/qemuarm64v5.2/standard/intel-x86v5.2/standard/edgerouterv5.2/standard/beaglebonev5.2/standard/baseWenlin Kang
The strncpy() function may create a unterminated string, use strscpy_pad() instead. This fixes the following warning: fs/ext4/super.c: In function '__save_error_info': fs/ext4/super.c:349:2: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation] strncpy(es->s_last_error_func, func, sizeof(es->s_last_error_func)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/ext4/super.c:353:3: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation] strncpy(es->s_first_error_func, func, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sizeof(es->s_first_error_func)); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
2020-09-21Merge branch 'v5.2/standard/base' into v5.2/standard/preempt-rt/baseBruce Ashfield
2020-09-21Merge tag 'v5.2.60' into v5.2/standard/baseBruce Ashfield
This is the 5.2.60 stable release
2020-09-17rxrpc: Make rxrpc_kernel_get_srtt() indicate validityDavid Howells
commit 1d4adfaf65746203861c72d9d78de349eb97d528 upstream. Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't useful. Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17debugfs: Fix module state check conditionVladis Dronov
commit e3b9fc7eec55e6fdc8beeed18f2ed207086341e2 upstream. The '#ifdef MODULE' check in the original commit does not work as intended. The code under the check is not built at all if CONFIG_DEBUG_FS=y. Fix this by using a correct check. Fixes: 275678e7a9be ("debugfs: Check module state before warning in {full/open}_proxy_open()") Signed-off-by: Vladis Dronov <vdronov@redhat.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200811150129.53343-1-vdronov@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17fs/ufs: avoid potential u32 multiplication overflowColin Ian King
commit 88b2e9b06381551b707d980627ad0591191f7a2d upstream. The 64 bit ino is being compared to the product of two u32 values, however, the multiplication is being performed using a 32 bit multiply so there is a potential of an overflow. To be fully safe, cast uspi->s_ncg to a u64 to ensure a 64 bit multiplication occurs to avoid any chance of overflow. Fixes: f3e2a520f5fb ("ufs: NFS support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Link: http://lkml.kernel.org/r/20200715170355.1081713-1-colin.king@canonical.com Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17fs/minix: remove expected error message in block_to_path()Eric Biggers
commit f666f9fb9a36f1c833b9d18923572f0e4d304754 upstream. When truncating a file to a size within the last allowed logical block, block_to_path() is called with the *next* block. This exceeds the limit, causing the "block %ld too big" error message to be printed. This case isn't actually an error; there are just no more blocks past that point. So, remove this error message. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Link: http://lkml.kernel.org/r/20200628060846.682158-7-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17fs/minix: fix block limit check for V1 filesystemsEric Biggers
commit 0a12c4a8069607247cb8edc3b035a664e636fd9a upstream. The minix filesystem reads its maximum file size from its on-disk superblock. This value isn't necessarily a multiple of the block size. When it's not, the V1 block mapping code doesn't allow mapping the last possible block. Commit 6ed6a722f9ab ("minixfs: fix block limit check") fixed this in the V2 mapping code. Fix it in the V1 mapping code too. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Link: http://lkml.kernel.org/r/20200628060846.682158-6-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17fs/minix: set s_maxbytes correctlyEric Biggers
commit 32ac86efff91a3e4ef8c3d1cadd4559e23c8e73a upstream. The minix filesystem leaves super_block::s_maxbytes at MAX_NON_LFS rather than setting it to the actual filesystem-specific limit. This is broken because it means userspace doesn't see the standard behavior like getting EFBIG and SIGXFSZ when exceeding the maximum file size. Fix this by setting s_maxbytes correctly. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Link: http://lkml.kernel.org/r/20200628060846.682158-5-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17nfs: Fix getxattr kernel panic and memory overflowJeffrey Mitchell
commit b4487b93545214a9db8cbf32e86411677b0cca21 upstream. Move the buffer size check to decode_attr_security_label() before memcpy() Only call memcpy() if the buffer is large enough Fixes: aa9c2669626c ("NFS: Client implementation of Labeled-NFS") Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io> [Trond: clean up duplicate test of label->len != 0] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17nfs: nfs_file_write() should check for writeback errorsScott Mayhew
commit ce368536dd614452407dc31e2449eb84681a06af upstream. The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was removed by commit 6fbda89b257f. The absence of an error check allows writes to be continually queued up for a server that may no longer be able to handle them. Fix it by adding an error check using the generic error reporting functions. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17ubifs: Fix wrong orphan node deletion in ubifs_jnl_update|renameZhihao Cheng
commit 094b6d1295474f338201b846a1f15e72eb0b12cf upstream. There a wrong orphan node deleting in error handling path in ubifs_jnl_update() and ubifs_jnl_rename(), which may cause following error msg: UBIFS error (ubi0:0 pid 1522): ubifs_delete_orphan [ubifs]: missing orphan ino 65 Fix this by checking whether the node has been operated for adding to orphan list before being deleted, Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Fixes: 823838a486888cf484e ("ubifs: Add hashes to the tree node cache") Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17nfs: ensure correct writeback errors are returned on close()Scott Mayhew
commit 67dd23f9e6fbaf163431912ef5599c5e0693476c upstream. nfs_wb_all() calls filemap_write_and_wait(), which uses filemap_check_errors() to determine the error to return. filemap_check_errors() only looks at the mapping->flags and will therefore only return either -ENOSPC or -EIO. To ensure that the correct error is returned on close(), nfs{,4}_file_flush() should call filemap_check_wb_err() which looks at the errseq value in mapping->wb_err without consuming it. Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17orangefs: get rid of knob code...Mike Marshall
commit ec95f1dedc9c64ac5a8b0bdb7c276936c70fdedd upstream. Christoph Hellwig sent in a reversion of "orangefs: remember count when reading." because: ->read_iter calls can race with each other and one or more ->flush calls. Remove the the scheme to store the read count in the file private data as is is completely racy and can cause use after free or double free conditions Christoph's reversion caused Orangefs not to work or to compile. I added a patch that fixed that, but intel's kbuild test robot pointed out that sending Christoph's patch followed by my patch upstream, it would break bisection because of the failure to compile. So I have combined the reversion plus my patch... here's the commit message that was in my patch: Logically, optimal Orangefs "pages" are 4 megabytes. Reading large Orangefs files 4096 bytes at a time is like trying to kick a dead whale down the beach. Before Christoph's "Revert orangefs: remember count when reading." I tried to give users a knob whereby they could, for example, use "count" in read(2) or bs with dd(1) to get whatever they considered an appropriate amount of bytes at a time from Orangefs and fill as many page cache pages as they could at once. Without the racy code that Christoph reverted Orangefs won't even compile, much less work. So this replaces the logic that used the private file data that Christoph reverted with a static number of bytes to read from Orangefs. I ran tests like the following to determine what a reasonable static number of bytes might be: dd if=/pvfsmnt/asdf of=/dev/null count=128 bs=4194304 dd if=/pvfsmnt/asdf of=/dev/null count=256 bs=2097152 dd if=/pvfsmnt/asdf of=/dev/null count=512 bs=1048576 . . . dd if=/pvfsmnt/asdf of=/dev/null count=4194304 bs=128 Reads seem faster using the static number, so my "knob code" wasn't just racy, it wasn't even a good idea... Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17xfs: fix boundary test in xfs_attr_shortform_verifyEric Sandeen
commit f4020438fab05364018c91f7e02ebdd192085933 upstream. The boundary test for the fixed-offset parts of xfs_attr_sf_entry in xfs_attr_shortform_verify is off by one, because the variable array at the end is defined as nameval[1] not nameval[]. Hence we need to subtract 1 from the calculation. This can be shown by: # touch file # setfattr -n root.a file and verifications will fail when it's written to disk. This only matters for a last attribute which has a single-byte name and no value, otherwise the combination of namelen & valuelen will push endp further out and this test won't fail. Fixes: 1e1bbd8e7ee06 ("xfs: create structure verifier function for shortform xattrs") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17jbd2: add the missing unlock_buffer() in the error path of ↵zhangyi (F)
jbd2_write_superblock() commit ef3f5830b859604eda8723c26d90ab23edc027a4 upstream. jbd2_write_superblock() is under the buffer lock of journal superblock before ending that superblock write, so add a missing unlock_buffer() in in the error path before submitting buffer. Fixes: 742b06b5628f ("jbd2: check superblock mapped prior to committing") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17ext4: fix potential negative array index in do_split()Eric Sandeen
commit 5872331b3d91820e14716632ebb56b1399b34fe1 upstream. If for any reason a directory passed to do_split() does not have enough active entries to exceed half the size of the block, we can end up iterating over all "count" entries without finding a split point. In this case, count == move, and split will be zero, and we will attempt a negative index into map[]. Guard against this by detecting this case, and falling back to split-to-half-of-count instead; in this case we will still have plenty of space (> half blocksize) in each split block. Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/f53e246b-647c-64bb-16ec-135383c70ad7@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17ext2: fix missing percpu_counter_incMikulas Patocka
commit bc2fbaa4d3808aef82dd1064a8e61c16549fe956 upstream. sbi->s_freeinodes_counter is only decreased by the ext2 code, it is never increased. This patch fixes it. Note that sbi->s_freeinodes_counter is only used in the algorithm that tries to find the group for new allocations, so this bug is not easily visible (the only visibility is that the group finding algorithm selects inoptinal result). Link: https://lore.kernel.org/r/alpine.LRH.2.02.2004201538300.19436@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17cifs: Fix leak when handling lease break for cached root fidPaul Aurich
commit baf57b56d3604880ccb3956ec6c62ea894f5de99 upstream. Handling a lease break for the cached root didn't free the smb2_lease_break_work allocation, resulting in a leak: unreferenced object 0xffff98383a5af480 (size 128): comm "cifsd", pid 684, jiffies 4294936606 (age 534.868s) hex dump (first 32 bytes): c0 ff ff ff 1f 00 00 00 88 f4 5a 3a 38 98 ff ff ..........Z:8... 88 f4 5a 3a 38 98 ff ff 80 88 d6 8a ff ff ff ff ..Z:8........... backtrace: [<0000000068957336>] smb2_is_valid_oplock_break+0x1fa/0x8c0 [<0000000073b70b9e>] cifs_demultiplex_thread+0x73d/0xcc0 [<00000000905fa372>] kthread+0x11c/0x150 [<0000000079378e4e>] ret_from_fork+0x22/0x30 Avoid this leak by only allocating when necessary. Fixes: a93864d93977 ("cifs: add lease tracking to the cached root fid") Signed-off-by: Paul Aurich <paul@darkrain42.org> CC: Stable <stable@vger.kernel.org> # v4.18+ Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: fix memory leaks after failure to lookup checksums during inode loggingFilipe Manana
commit 4f26433e9b3eb7a55ed70d8f882ae9cd48ba448b upstream. While logging an inode, at copy_items(), if we fail to lookup the checksums for an extent we release the destination path, free the ins_data array and then return immediately. However a previous iteration of the for loop may have added checksums to the ordered_sums list, in which case we leak the memory used by them. So fix this by making sure we iterate the ordered_sums list and free all its checksums before returning. Fixes: 3650860b90cc2a ("Btrfs: remove almost all of the BUG()'s from tree-log.c") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: only search for left_info if there is no right_info in ↵Josef Bacik
try_merge_free_space commit bf53d4687b8f3f6b752f091eb85f62369a515dfd upstream. In try_to_merge_free_space we attempt to find entries to the left and right of the entry we are adding to see if they can be merged. We search for an entry past our current info (saved into right_info), and then if right_info exists and it has a rb_prev() we save the rb_prev() into left_info. However there's a slight problem in the case that we have a right_info, but no entry previous to that entry. At that point we will search for an entry just before the info we're attempting to insert. This will simply find right_info again, and assign it to left_info, making them both the same pointer. Now if right_info _can_ be merged with the range we're inserting, we'll add it to the info and free right_info. However further down we'll access left_info, which was right_info, and thus get a use-after-free. Fix this by only searching for the left entry if we don't find a right entry at all. The CVE referenced had a specially crafted file system that could trigger this use-after-free. However with the tree checker improvements we no longer trigger the conditions for the UAF. But the original conditions still apply, hence this fix. Reference: CVE-2019-19448 Fixes: 963030817060 ("Btrfs: use hybrid extents+bitmap rb tree for free space") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: fix messages after changing compression level by remountDavid Sterba
commit 27942c9971cc405c60432eca9395e514a2ae9f5e upstream. Reported by Forza on IRC that remounting with compression options does not reflect the change in level, or at least it does not appear to do so according to the messages: mount -o compress=zstd:1 /dev/sda /mnt mount -o remount,compress=zstd:15 /mnt does not print the change to the level to syslog: [ 41.366060] BTRFS info (device vda): use zstd compression, level 1 [ 41.368254] BTRFS info (device vda): disk space caching is enabled [ 41.390429] BTRFS info (device vda): disk space caching is enabled What really happens is that the message is lost but the level is actualy changed. There's another weird output, if compression is reset to 'no': [ 45.413776] BTRFS info (device vda): use no compression, level 4 To fix that, save the previous compression level and print the message in that case too and use separate message for 'no' compression. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: move the chunk_mutex in btrfs_read_chunk_treeJosef Bacik
commit 01d01caf19ff7c537527d352d169c4368375c0a1 upstream. We are currently getting this lockdep splat in btrfs/161: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G E ------------------------------------------------------ mount/678048 is trying to acquire lock: ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs] but task is already holding lock: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x8b/0x8f0 btrfs_init_new_device+0x2d2/0x1240 [btrfs] btrfs_ioctl+0x1de/0x2d20 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 __mutex_lock+0x8b/0x8f0 clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); *** DEADLOCK *** 3 locks held by mount/678048: #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs] #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] stack backtrace: CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 lock_acquire+0xab/0x360 ? clone_fs_devices+0x4d/0x170 [btrfs] __mutex_lock+0x8b/0x8f0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? rcu_read_lock_sched_held+0x52/0x60 ? cpumask_next+0x16/0x20 ? module_assert_mutex_or_preempt+0x14/0x40 ? __module_address+0x28/0xf0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? static_obj+0x4f/0x60 ? lockdep_init_map_waits+0x43/0x200 ? clone_fs_devices+0x4d/0x170 [btrfs] clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] ? super_setup_bdi_name+0x79/0xd0 btrfs_mount_root.cold+0x13/0xfa [btrfs] ? vfs_parse_fs_string+0x84/0xb0 ? rcu_read_lock_sched_held+0x52/0x60 ? kfree+0x2b5/0x310 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? rcu_read_lock_sched_held+0x52/0x60 ? legacy_get_tree+0x30/0x50 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 ? memdup_user+0x4e/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and then read the device, which takes the device_list_mutex. The device_list_mutex needs to be taken before the chunk_mutex, so this is a problem. We only really need the chunk mutex around adding the chunk, so move the mutex around read_one_chunk. An argument could be made that we don't even need the chunk_mutex here as it's during mount, and we are protected by various other locks. However we already have special rules for ->device_list_mutex, and I'd rather not have another special case for ->chunk_mutex. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: open device without device_list_mutexJosef Bacik
commit 18c850fdc5a801bad4977b0f1723761d42267e45 upstream. There's long existed a lockdep splat because we open our bdev's under the ->device_list_mutex at mount time, which acquires the bd_mutex. Usually this goes unnoticed, but if you do loopback devices at all suddenly the bd_mutex comes with a whole host of other dependencies, which results in the splat when you mount a btrfs file system. ====================================================== WARNING: possible circular locking dependency detected 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted ------------------------------------------------------ systemd-journal/509 is trying to acquire lock: ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs] but task is already holding lock: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (sb_pagefaults){.+.+}-{0:0}: __sb_start_write+0x13e/0x220 btrfs_page_mkwrite+0x59/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 -> #5 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x60/0x80 _copy_from_user+0x20/0xb0 get_sg_io_hdr+0x9a/0xb0 scsi_cmd_ioctl+0x1ea/0x2f0 cdrom_ioctl+0x3c/0x12b4 sr_block_ioctl+0xa4/0xd0 block_ioctl+0x3f/0x50 ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&cd->lock){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 sr_block_open+0xa2/0x180 __blkdev_get+0xdd/0x550 blkdev_get+0x38/0x150 do_dentry_open+0x16b/0x3e0 path_openat+0x3c9/0xa00 do_filp_open+0x75/0x100 do_sys_openat2+0x8a/0x140 __x64_sys_openat+0x46/0x70 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 __blkdev_get+0x6a/0x550 blkdev_get+0x85/0x150 blkdev_get_by_path+0x2c/0x70 btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] open_fs_devices+0x88/0x240 [btrfs] btrfs_open_devices+0x92/0xa0 [btrfs] btrfs_mount_root+0x250/0x490 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x119/0x380 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x8c6/0xca0 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_run_dev_stats+0x36/0x420 [btrfs] commit_cowonly_roots+0x91/0x2d0 [btrfs] btrfs_commit_transaction+0x4e6/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x7b/0x820 btrfs_commit_transaction+0x48e/0x9f0 [btrfs] btrfs_sync_file+0x38a/0x480 [btrfs] __x64_sys_fdatasync+0x47/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}: __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 __mutex_lock+0x7b/0x820 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_pagefaults); lock(&mm->mmap_lock#2); lock(sb_pagefaults); lock(&fs_info->reloc_mutex); *** DEADLOCK *** 3 locks held by systemd-journal/509: #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs] #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs] stack backtrace: CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x92/0xc8 check_noncircular+0x134/0x150 __lock_acquire+0x1241/0x20c0 lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? lock_acquire+0xb0/0x400 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] __mutex_lock+0x7b/0x820 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs] ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0xc/0xb0 btrfs_record_root_in_trans+0x44/0x70 [btrfs] start_transaction+0xd2/0x500 [btrfs] btrfs_dirty_inode+0x44/0xd0 [btrfs] file_update_time+0xc6/0x120 btrfs_page_mkwrite+0xda/0x560 [btrfs] ? sched_clock+0x5/0x10 do_page_mkwrite+0x4f/0x130 do_wp_page+0x3b0/0x4f0 handle_mm_fault+0xf47/0x1850 do_user_addr_fault+0x1fc/0x4b0 exc_page_fault+0x88/0x300 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x7fa3972fdbfe Code: Bad RIP value. Fix this by not holding the ->device_list_mutex at this point. The device_list_mutex exists to protect us from modifying the device list while the file system is running. However it can also be modified by doing a scan on a device. But this action is specifically protected by the uuid_mutex, which we are holding here. We cannot race with opening at this point because we have the ->s_mount lock held during the mount. Not having the ->device_list_mutex here is perfectly safe as we're not going to change the devices at this point. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add some comments ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: don't traverse into the seed devices in show_devnameAnand Jain
commit 4faf55b03823e96c44dc4e364520000ed3b12fdb upstream. ->show_devname currently shows the lowest devid in the list. As the seed devices have the lowest devid in the sprouted filesystem, the userland tool such as findmnt end up seeing seed device instead of the device from the read-writable sprouted filesystem. As shown below. mount /dev/sda /btrfs mount: /btrfs: WARNING: device write-protected, mounted read-only. findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 btrfs dev add -f /dev/sdb /btrfs umount /btrfs mount /dev/sdb /btrfs findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 All sprouts from a single seed will show the same seed device and the same fsid. That's confusing. This is causing problems in our prototype as there isn't any reference to the sprout file-system(s) which is being used for actual read and write. This was added in the patch which implemented the show_devname in btrfs commit 9c5085c14798 ("Btrfs: implement ->show_devname"). I tried to look for any particular reason that we need to show the seed device, there isn't any. So instead, do not traverse through the seed devices, just show the lowest devid in the sprouted fsid. After the patch: mount /dev/sda /btrfs mount: /btrfs: WARNING: device write-protected, mounted read-only. findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 btrfs dev add -f /dev/sdb /btrfs mount -o rw,remount /dev/sdb /btrfs findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sdb /btrfs 595ca0e6-b82e-46b5-b9e2-c72a6928be48 mount /dev/sda /btrfs1 mount: /btrfs1: WARNING: device write-protected, mounted read-only. btrfs dev add -f /dev/sdc /btrfs1 findmnt --output SOURCE,TARGET,UUID /btrfs1 SOURCE TARGET UUID /dev/sdc /btrfs1 ca1dbb7a-8446-4f95-853c-a20f3f82bdbb cat /proc/self/mounts | grep btrfs /dev/sdb /btrfs btrfs rw,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0 /dev/sdc /btrfs1 btrfs ro,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0 Reported-by: Martin K. Petersen <martin.petersen@oracle.com> CC: stable@vger.kernel.org # 4.19+ Tested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: ref-verify: fix memory leak in add_block_entryTom Rix
commit d60ba8de1164e1b42e296ff270c622a070ef8fe7 upstream. clang static analysis flags this error fs/btrfs/ref-verify.c:290:3: warning: Potential leak of memory pointed to by 're' [unix.Malloc] kfree(be); ^~~~~ The problem is in this block of code: if (root_objectid) { struct root_entry *exist_re; exist_re = insert_root_entry(&exist->roots, re); if (exist_re) kfree(re); } There is no 'else' block freeing when root_objectid is 0. Add the missing kfree to the else branch. Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: don't allocate anonymous block device for user invisible rootsQu Wenruo
commit 851fd730a743e072badaf67caf39883e32439431 upstream. [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted: BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The error is EMFILE (Too many files open) and comes from the anonymous block device allocation. The ids are in a shared pool of size 1<<20. The ids are assigned to live subvolumes, ie. the root structure exists in memory (eg. after creation or after the root appears in some path). The pool could be exhausted if the numbers are not reclaimed fast enough, after subvolume deletion or if other system component uses the anon block devices. [WORKAROUND] Since it's not possible to completely solve the problem, we can only minimize the time the id is allocated to a subvolume root. Firstly, we can reduce the use of anon_dev by trees that are not subvolume roots, like data reloc tree. This patch will do extra check on root objectid, to skip roots that don't need anon_dev. Currently it's only data reloc tree and orphan roots. Reported-by: Greed Rong <greedrong@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17btrfs: free anon block device right after subvolume deletionQu Wenruo
commit 082b6c970f02fefd278c7833880cda29691a5f34 upstream. [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted caused by slow anonymous block device reclaim: BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The anonymous device pool is shared and its size is 1M. It's possible to hit that limit if the subvolume deletion is not fast enough and the subvolumes to be cleaned keep the ids allocated. [WORKAROUND] We can't avoid the anon device pool exhaustion but we can shorten the time the id is attached to the subvolume root once the subvolume becomes invisible to the user. Reported-by: Greed Rong <greedrong@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-17smb3: warn on confusing error scenario with sec=krb5Steve French
commit 0a018944eee913962bce8ffebbb121960d5125d9 upstream. When mounting with Kerberos, users have been confused about the default error returned in scenarios in which either keyutils is not installed or the user did not properly acquire a krb5 ticket. Log a warning message in the case that "ENOKEY" is returned from the get_spnego_key upcall so that users can better understand why mount failed in those two cases. CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-07Merge branch 'v5.2/standard/base' into v5.2/standard/preempt-rt/baseBruce Ashfield
2020-09-07Merge tag 'v5.2.59' into v5.2/standard/baseBruce Ashfield
This is the 5.2.59 stable release
2020-09-04ext4: fix checking of directory entry validity for inline directoriesJan Kara
commit 7303cb5bfe845f7d43cd9b2dbd37dbb266efda9b upstream. ext4_search_dir() and ext4_generic_delete_entry() can be called both for standard director blocks and for inline directories stored inside inode or inline xattr space. For the second case we didn't call ext4_check_dir_entry() with proper constraints that could result in accepting corrupted directory entry as well as false positive filesystem errors like: EXT4-fs error (device dm-0): ext4_search_dir:1395: inode #28320400: block 113246792: comm dockerd: bad entry in directory: directory entry too close to block end - offset=0, inode=28320403, rec_len=32, name_len=8, size=4096 Fix the arguments passed to ext4_check_dir_entry(). Fixes: 109ba779d6cc ("ext4: check for directory entries too close to block end") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200731162135.8080-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04fs/minix: reject too-large maximum file sizeEric Biggers
commit 270ef41094e9fa95273f288d7d785313ceab2ff3 upstream. If the minix filesystem tries to map a very large logical block number to its on-disk location, block_to_path() can return offsets that are too large, causing out-of-bounds memory accesses when accessing indirect index blocks. This should be prevented by the check against the maximum file size, but this doesn't work because the maximum file size is read directly from the on-disk superblock and isn't validated itself. Fix this by validating the maximum file size at mount time. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+c7d9ec7a1a7272dd71b3@syzkaller.appspotmail.com Reported-by: syzbot+3b7b03a0c28948054fb5@syzkaller.appspotmail.com Reported-by: syzbot+6e056ee473568865f3e6@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628060846.682158-4-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04fs/minix: don't allow getting deleted inodesEric Biggers
commit facb03dddec04e4aac1bb2139accdceb04deb1f3 upstream. If an inode has no links, we need to mark it bad rather than allowing it to be accessed. This avoids WARNINGs in inc_nlink() and drop_nlink() when doing directory operations on a fuzzed filesystem. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a9ac3de1b5de5fb10efc@syzkaller.appspotmail.com Reported-by: syzbot+df958cf5688a96ad3287@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qiujun Huang <anenbupt@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628060846.682158-3-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04fs/minix: check return value of sb_getblk()Eric Biggers
commit da27e0a0e5f655f0d58d4e153c3182bb2b290f64 upstream. Patch series "fs/minix: fix syzbot bugs and set s_maxbytes". This series fixes all syzbot bugs in the minix filesystem: KASAN: null-ptr-deref Write in get_block KASAN: use-after-free Write in get_block KASAN: use-after-free Read in get_block WARNING in inc_nlink KMSAN: uninit-value in get_block WARNING in drop_nlink It also fixes the minix filesystem to set s_maxbytes correctly, so that userspace sees the correct behavior when exceeding the max file size. This patch (of 6): sb_getblk() can fail, so check its return value. This fixes a NULL pointer dereference. Originally from Qiujun Huang. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+4a88b2b9dc280f47baf4@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qiujun Huang <anenbupt@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200628060846.682158-1-ebiggers@kernel.org Link: http://lkml.kernel.org/r/20200628060846.682158-2-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04pstore: Fix linking when crypto API disabledMatteo Croce
commit fd49e03280e596e54edb93a91bc96170f8e97e4a upstream. When building a kernel with CONFIG_PSTORE=y and CONFIG_CRYPTO not set, a build error happens: ld: fs/pstore/platform.o: in function `pstore_dump': platform.c:(.text+0x3f9): undefined reference to `crypto_comp_compress' ld: fs/pstore/platform.o: in function `pstore_get_backend_records': platform.c:(.text+0x784): undefined reference to `crypto_comp_decompress' This because some pstore code uses crypto_comp_(de)compress regardless of the CONFIG_CRYPTO status. Fix it by wrapping the (de)compress usage by IS_ENABLED(CONFIG_PSTORE_COMPRESS) Signed-off-by: Matteo Croce <mcroce@linux.microsoft.com> Link: https://lore.kernel.org/lkml/20200706234045.9516-1-mcroce@linux.microsoft.com Fixes: cb3bee0369bc ("pstore: Use crypto compress API") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04NFS: Don't return layout segments that are in useTrond Myklebust
commit d474f96104bd4377573526ebae2ee212205a6839 upstream. If the NFS_LAYOUT_RETURN_REQUESTED flag is set, we want to return the layout as soon as possible, meaning that the affected layout segments should be marked as invalid, and should no longer be in use for I/O. Fixes: f0b429819b5f ("pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04NFS: Don't move layouts to plh_return_segs list while in useTrond Myklebust
commit ff041727e9e029845857cac41aae118ead5e261b upstream. If the layout segment is still in use for a read or a write, we should not move it to the layout plh_return_segs list. If we do, we can end up returning the layout while I/O is still in progress. Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-049p: Fix memory leak in v9fs_mountZheng Bin
commit cb0aae0e31c632c407a2cab4307be85a001d4d98 upstream. v9fs_mount v9fs_session_init v9fs_cache_session_get_cookie v9fs_random_cachetag -->alloc cachetag v9ses->fscache = fscache_acquire_cookie -->maybe NULL sb = sget -->fail, goto clunk clunk_fid: v9fs_session_close if (v9ses->fscache) -->NULL kfree(v9ses->cachetag) Thus memleak happens. Link: http://lkml.kernel.org/r/20200615012153.89538-1-zhengbin13@huawei.com Fixes: 60e78d2c993e ("9p: Add fscache support to 9p") Cc: <stable@vger.kernel.org> # v2.6.32+ Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04dlm: Fix kobject memleakWang Hai
commit 0ffddafc3a3970ef7013696e7f36b3d378bc4c16 upstream. Currently the error return path from kobject_init_and_add() is not followed by a call to kobject_put() - which means we are leaking the kobject. Set do_unreg = 1 before kobject_init_and_add() to ensure that kobject_put() can be called in its error patch. Fixes: 901195ed7f4b ("Kobject: change GFS2 to use kobject_init_and_add") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04xfs: fix inode allocation block res calculation precedenceBrian Foster
commit b2a8864728683443f34a9fd33a2b78b860934cc1 upstream. The block reservation calculation for inode allocation is supposed to consist of the blocks required for the inode chunk plus (maxlevels-1) of the inode btree multiplied by the number of inode btrees in the fs (2 when finobt is enabled, 1 otherwise). Instead, the macro returns (ialloc_blocks + 2) due to a precedence error in the calculation logic. This leads to block reservation overruns via generic/531 on small block filesystems with finobt enabled. Add braces to fix the calculation and reserve the appropriate number of blocks. Fixes: 9d43b180af67 ("xfs: update inode allocation/free transaction reservations for finobt") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04kernfs: do not call fsnotify() with name without a parentAmir Goldstein
commit 9991bb84b27a2594187898f261866cfc50255454 upstream. When creating an FS_MODIFY event on inode itself (not on parent) the file_name argument should be NULL. The change to send a non NULL name to inode itself was done on purpuse as part of another commit, as Tejun writes: "...While at it, supply the target file name to fsnotify() from kernfs_node->name.". But this is wrong practice and inconsistent with inotify behavior when watching a single file. When a child is being watched (as opposed to the parent directory) the inotify event should contain the watch descriptor, but not the file name. Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...") Link: https://lore.kernel.org/r/20200708111156.24659-5-amir73il@gmail.com Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04xfs: fix reflink quota reservation accounting errorDarrick J. Wong
commit 83895227aba1ade33e81f586aa7b6b1e143096a5 upstream. Quota reservations are supposed to account for the blocks that might be allocated due to a bmap btree split. Reflink doesn't do this, so fix this to make the quota accounting more accurate before we start rearranging things. Fixes: 862bb360ef56 ("xfs: reflink extents from one file to another") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data forkDarrick J. Wong
commit eb0efe5063bb10bcb653e4f8e92a74719c03a347 upstream. The data fork scrubber calls filemap_write_and_wait to flush dirty pages and delalloc reservations out to disk prior to checking the data fork's extent mappings. Unfortunately, this means that scrub can consume the EIO/ENOSPC errors that would otherwise have stayed around in the address space until (we hope) the writer application calls fsync to persist data and collect errors. The end result is that programs that wrote to a file might never see the error code and proceed as if nothing were wrong. xfs_scrub is not in a position to notify file writers about the writeback failure, and it's only here to check metadata, not file contents. Therefore, if writeback fails, we should stuff the error code back into the address space so that an fsync by the writer application can pick that up. Fixes: 99d9d8d05da2 ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-09-04fs/btrfs: Add cond_resched() for try_release_extent_mapping() stallsPaul E. McKenney
commit 9f47eb5461aaeb6cb8696f9d11503ae90e4d5cb0 upstream. Very large I/Os can cause the following RCU CPU stall warning: RIP: 0010:rb_prev+0x8/0x50 Code: 49 89 c0 49 89 d1 48 89 c2 48 89 f8 e9 e5 fd ff ff 4c 89 48 10 c3 4c = 89 06 c3 4c 89 40 10 c3 0f 1f 00 48 8b 0f 48 39 cf 74 38 <48> 8b 47 10 48 85 c0 74 22 48 8b 50 08 48 85 d2 74 0c 48 89 d0 48 RSP: 0018:ffffc9002212bab0 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13 RAX: ffff888821f93630 RBX: ffff888821f93630 RCX: ffff888821f937e0 RDX: 0000000000000000 RSI: 0000000000102000 RDI: ffff888821f93630 RBP: 0000000000103000 R08: 000000000006c000 R09: 0000000000000238 R10: 0000000000102fff R11: ffffc9002212bac8 R12: 0000000000000001 R13: ffffffffffffffff R14: 0000000000102000 R15: ffff888821f937e0 __lookup_extent_mapping+0xa0/0x110 try_release_extent_mapping+0xdc/0x220 btrfs_releasepage+0x45/0x70 shrink_page_list+0xa39/0xb30 shrink_inactive_list+0x18f/0x3b0 shrink_lruvec+0x38e/0x6b0 shrink_node+0x14d/0x690 do_try_to_free_pages+0xc6/0x3e0 try_to_free_mem_cgroup_pages+0xe6/0x1e0 reclaim_high.constprop.73+0x87/0xc0 mem_cgroup_handle_over_high+0x66/0x150 exit_to_usermode_loop+0x82/0xd0 do_syscall_64+0xd4/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 On a PREEMPT=n kernel, the try_release_extent_mapping() function's "while" loop might run for a very long time on a large I/O. This commit therefore adds a cond_resched() to this loop, providing RCU any needed quiescent states. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-29Merge branch 'v5.2/standard/base' into v5.2/standard/preempt-rt/baseBruce Ashfield
2020-08-29Merge tag 'v5.2.58' into v5.2/standard/baseBruce Ashfield
This is the 5.2.58 stable release
2020-08-29Merge tag 'v5.2.57' into v5.2/standard/baseBruce Ashfield
This is the 5.2.57 stable release