summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2020-01-27Btrfs: fix hang when loading existing inode cache off diskFilipe Manana
[ Upstream commit 7764d56baa844d7f6206394f21a0e8c1f303c476 ] If we are able to load an existing inode cache off disk, we set the state of the cache to BTRFS_CACHE_FINISHED, but we don't wake up any one waiting for the cache to be available. This means that anyone waiting for the cache to be available, waiting on the condition that either its state is BTRFS_CACHE_FINISHED or its available free space is greather than zero, can hang forever. This could be observed running fstests with MOUNT_OPTIONS="-o inode_cache", in particular test case generic/161 triggered it very frequently for me, producing a trace like the following: [63795.739712] BTRFS info (device sdc): enabling inode map caching [63795.739714] BTRFS info (device sdc): disk space caching is enabled [63795.739716] BTRFS info (device sdc): has skinny extents [64036.653886] INFO: task btrfs-transacti:3917 blocked for more than 120 seconds. [64036.654079] Not tainted 5.2.0-rc4-btrfs-next-50 #1 [64036.654143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [64036.654232] btrfs-transacti D 0 3917 2 0x80004000 [64036.654239] Call Trace: [64036.654258] ? __schedule+0x3ae/0x7b0 [64036.654271] schedule+0x3a/0xb0 [64036.654325] btrfs_commit_transaction+0x978/0xae0 [btrfs] [64036.654339] ? remove_wait_queue+0x60/0x60 [64036.654395] transaction_kthread+0x146/0x180 [btrfs] [64036.654450] ? btrfs_cleanup_transaction+0x620/0x620 [btrfs] [64036.654456] kthread+0x103/0x140 [64036.654464] ? kthread_create_worker_on_cpu+0x70/0x70 [64036.654476] ret_from_fork+0x3a/0x50 [64036.654504] INFO: task xfs_io:3919 blocked for more than 120 seconds. [64036.654568] Not tainted 5.2.0-rc4-btrfs-next-50 #1 [64036.654617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [64036.654685] xfs_io D 0 3919 3633 0x00000000 [64036.654691] Call Trace: [64036.654703] ? __schedule+0x3ae/0x7b0 [64036.654716] schedule+0x3a/0xb0 [64036.654756] btrfs_find_free_ino+0xa9/0x120 [btrfs] [64036.654764] ? remove_wait_queue+0x60/0x60 [64036.654809] btrfs_create+0x72/0x1f0 [btrfs] [64036.654822] lookup_open+0x6bc/0x790 [64036.654849] path_openat+0x3bc/0xc00 [64036.654854] ? __lock_acquire+0x331/0x1cb0 [64036.654869] do_filp_open+0x99/0x110 [64036.654884] ? __alloc_fd+0xee/0x200 [64036.654895] ? do_raw_spin_unlock+0x49/0xc0 [64036.654909] ? do_sys_open+0x132/0x220 [64036.654913] do_sys_open+0x132/0x220 [64036.654926] do_syscall_64+0x60/0x1d0 [64036.654933] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix this by adding a wake_up() call right after setting the cache state to BTRFS_CACHE_FINISHED, at start_caching(), when we are able to load the cache from disk. Fixes: 82d5902d9c681b ("Btrfs: Support reading/writing on disk free ino cache") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27f2fs: fix error path of f2fs_convert_inline_page()Chao Yu
[ Upstream commit e8c82c11c93d586d03d80305959527bcac383555 ] In error path of f2fs_convert_inline_page(), we missed to truncate newly reserved block in .i_addrs[0] once we failed in get_node_info(), fix it. Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27f2fs: fix wrong error injection path in inc_valid_block_count()Chao Yu
[ Upstream commit 9ea2f0be6ceaebae1518a5f897cff2645830dd95 ] If FAULT_BLOCK type error injection is on, in inc_valid_block_count() we may decrease sbi->alloc_valid_block_count percpu stat count incorrectly, fix it. Fixes: 36b877af7992 ("f2fs: Keep alloc_valid_block_count in sync") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman
[ Upstream commit 33da8e7c814f77310250bb54a9db36a44c5de784 ] My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27ext4: set error return correctly when ext4_htree_store_dirent failsColin Ian King
[ Upstream commit 7a14826ede1d714f0bb56de8167c0e519041eeda ] Currently when the call to ext4_htree_store_dirent fails the error return variable 'ret' is is not being set to the error code and variable count is instead, hence the error code is not being returned. Fix this by assigning ret to the error return code. Addresses-Coverity: ("Unused value") Fixes: 8af0f0822797 ("ext4: fix readdir error in the case of inline_data+dir_index") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27cifs: fix rmmod regression in cifs.ko caused by force_sig changesSteve French
[ Upstream commit 247bc9470b1eeefc7b58cdf2c39f2866ba651509 ] Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") The global change from force_sig caused module unloading of cifs.ko to fail (since the cifsd process could not be killed, "rmmod cifs" now would always fail) Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27ceph: fix "ceph.dir.rctime" vxattr valueDavid Disseldorp
[ Upstream commit 718807289d4130be1fe13f24f018733116958070 ] The vxattr value incorrectly places a "09" prefix to the nanoseconds field, instead of providing it as a zero-pad width specifier after '%'. Fixes: 3489b42a72a4 ("ceph: fix three bugs, two in ceph_vxattrcb_file_layout()") Link: https://tracker.ceph.com/issues/39943 Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sigEric W. Biederman
[ Upstream commit 72abe3bcf0911d69b46c1e8bdb5612675e0ac42c ] The locking in force_sig_info is not prepared to deal with a task that exits or execs (as sighand may change). The is not a locking problem in force_sig as force_sig is only built to handle synchronous exceptions. Further the function force_sig_info changes the signal state if the signal is ignored, or blocked or if SIGNAL_UNKILLABLE will prevent the delivery of the signal. The signal SIGKILL can not be ignored and can not be blocked and SIGNAL_UNKILLABLE won't prevent it from being delivered. So using force_sig rather than send_sig for SIGKILL is confusing and pointless. Because it won't impact the sending of the signal and and because using force_sig is wrong, replace force_sig with send_sig. Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Jeff Layton <jlayton@primarydata.com> Cc: Steve French <smfrench@gmail.com> Fixes: a5c3e1c725af ("Revert "cifs: No need to send SIGKILL to demux_thread during umount"") Fixes: e7ddee9037e7 ("cifs: disable sharing session and tcon and add new TCP sharing code") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Fix double inc of vnode->cb_breakDavid Howells
[ Upstream commit fd711586bb7d63f257da5eff234e68c446ac35ea ] When __afs_break_callback() clears the CB_PROMISED flag, it increments vnode->cb_break to trigger a future refetch of the status and callback - however it also calls afs_clear_permits(), which also increments vnode->cb_break. Fix this by removing the increment from afs_clear_permits(). Whilst we're at it, fix the conditional call to afs_put_permits() as the function checks to see if the argument is NULL, so the check is redundant. Fixes: be080a6f43c4 ("afs: Overhaul permit caching"); Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Fix lock-wait/callback-break double lockingDavid Howells
[ Upstream commit c7226e407b6065d3bda8bd9dc627663d2c505ea3 ] __afs_break_callback() holds vnode->lock around its call of afs_lock_may_be_available() - which also takes that lock. Fix this by not taking the lock in __afs_break_callback(). Also, there's no point checking the granted_locks and pending_locks queues; it's sufficient to check lock_state, so move that check out of afs_lock_may_be_available() into __afs_break_callback() to replace the queue checks. Fixes: e8d6c554126b ("AFS: implement file locking") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not setDavid Howells
[ Upstream commit d9052dda8a39069312218f913d22d99c48d90004 ] Don't invalidate the callback promise on a directory if the AFS_VNODE_DIR_VALID flag is not set (which indicates that the directory contents are invalid, due to edit failure, callback break, page reclaim). The directory will be reloaded next time the directory is accessed, so clearing the callback flag at this point may race with a reload of the directory and cancel it's recorded callback promise. Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Fix key leak in afs_release() and afs_evict_inode()David Howells
[ Upstream commit a1b879eefc2b34cd3f17187ef6fc1cf3960e9518 ] Fix afs_release() to go through the cleanup part of the function if FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the cleanup). The cleanup involves discarding the refs on the key used for file ops and the writeback key record. Also fix afs_evict_inode() to clean up any left over wb keys attached to the inode/vnode when it is removed. Fixes: 5a8132761609 ("afs: Do better accretion of small writes on newly created content") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Fix the afs.cell and afs.volume xattr handlersDavid Howells
[ Upstream commit c73aa4102f5b9f261a907c3b3df94cd2c478504d ] Fix the ->get handlers for the afs.cell and afs.volume xattrs to pass the source data size to memcpy() rather than target buffer size. Overcopying the source data occasionally causes the kernel to oops. Fixes: d3e3b7eac886 ("afs: Add metadata xattrs") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27NFS: Don't interrupt file writeout due to fatal errorsTrond Myklebust
[ Upstream commit 14bebe3c90b326d2a0df78aed5e9de090c71d878 ] When flushing out dirty pages, the fact that we may hit fatal errors is not a reason to stop writeback. Those errors are reported through fsync(), not through the flush mechanism. Fixes: a6598813a4c5b ("NFS: Don't write back further requests if there...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Further fix file lockingDavid Howells
[ Upstream commit 4be5975aea154e164696128d049dec9ed341585c ] Further fix the file locking in the afs filesystem client in a number of ways, including: (1) Don't submit the operation to obtain a lock from the server in a work queue context, but rather do it in the process context of whoever issued the requesting system call. (2) The owner of the file_lock struct at the front of the pending_locks queue now owns right to talk to the server. (3) Write locks can be instantly granted if they don't overlap with any other locks *and* we have a write lock on the server. (4) In the event of an authentication/permission error, all other matching pending locks requests are also immediately aborted. (5) Properly use VFS core locks_lock_file_wait() to distribute the server lock amongst local client locks, including waiting for the lock to become available. Test with: sqlite3 /afs/.../scratch/billings.sqlite <<EOF CREATE TABLE hosts ( hostname varchar(80), shorthost varchar(80), room varchar(30), building varchar(30), PRIMARY KEY(shorthost) ); EOF With the version of sqlite3 that I have, this should fail consistently with EAGAIN, whether or not the program is straced (which introduces some delays between lock syscalls). Fixes: 0fafdc9f888b ("afs: Fix file locking") Reported-by: Jonathan Billings <jsbillin@umich.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27afs: Fix AFS file locking to allow fine grained locksDavid Howells
[ Upstream commit 68ce801ffd82e72d5005ab5458e8b9e59f24d9cc ] Fix AFS file locking to allow fine grained locks as some applications, such as firefox, won't work if they can't take such locks on certain state files - thereby preventing the use of kAFS to distribute a home directory. Note that this cannot be made completely functional as the protocol only has provision for whole-file locks, so there exists the possibility of a process deadlocking itself by getting a partial read-lock on a file first and then trying to get a non-overlapping write-lock - but we got the server's read lock with the first lock, so we're now stuck. OpenAFS solves this by just granting any partial-range lock directly without consulting the server - and hoping there's no remote collision. I want to implement that in a separate patch and it requires a bit more thought. Fixes: 8d6c554126b8 ("AFS: implement file locking") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27jfs: fix bogus variable self-initializationArnd Bergmann
[ Upstream commit a5fdd713d256887b5f012608701149fa939e5645 ] A statement was originally added in 2006 to shut up a gcc warning, now but now clang warns about it: fs/jfs/jfs_txnmgr.c:1932:15: error: variable 'pxd' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] pxd_t pxd = pxd; /* truncated extent of xad */ ~~~ ^~~ Modern versions of gcc are fine without the silly assignment, so just drop it. Tested with gcc-4.6 (released 2011), 4.7, 4.8, and 4.9. Fixes: c9e3ad6021e5 ("JFS: Get rid of "may be used uninitialized" warnings") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()Trond Myklebust
[ Upstream commit 108bb4afd351d65826648a47f11fa3104e250d9b ] If the attempt to instantiate the mirror's layout DS pointer failed, then that pointer may hold a value of type ERR_PTR(), so we need to check that before we dereference it. Fixes: 65990d1afbd2d ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27NFS: Add missing encode / decode sequence_maxsz to v4.2 operationsAnna Schumaker
[ Upstream commit 1a3466aed3a17eed41cd9411f89eb637f58349b0 ] These really should have been there from the beginning, but we never noticed because there was enough slack in the RPC request for the extra bytes. Chuck's recent patch to use au_cslack and au_rslack to compute buffer size shrunk the buffer enough that this was now a problem for SEEK operations on my test client. Fixes: f4ac1674f5da4 ("nfs: Add ALLOCATE support") Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") Fixes: cb95deea0b4aa ("NFS OFFLOAD_CANCEL xdr") Fixes: 624bd5b7b683c ("nfs: Add DEALLOCATE support") Fixes: 1c6dcbe5ceff8 ("NFS: Implement SEEK") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umountTrond Myklebust
[ Upstream commit 5085607d209102b37b169bc94d0aa39566a9842a ] If a bulk layout recall or a metadata server reboot coincides with a umount, then holding a reference to an inode is unsafe unless we also hold a reference to the super block. Fixes: fd9a8d7160937 ("NFSv4.1: Fix bulk recall and destroy of layouts") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27NFS: Fix a soft lockup in the delegation recovery codeTrond Myklebust
[ Upstream commit 6f9449be53f3ce383caed797708b332ede8d952c ] Fix a soft lockup when NFS client delegation recovery is attempted but the inode is in the process of being freed. When the igrab(inode) call fails, and we have to restart the recovery process, we need to ensure that we won't attempt to recover the same delegation again. Fixes: 45870d6909d5a ("NFSv4.1: Test delegation stateids when server...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27fs/nfs: Fix nfs_parse_devname to not modify it's argumentEric W. Biederman
[ Upstream commit 40cc394be1aa18848b8757e03bd8ed23281f572e ] In the rare and unsupported case of a hostname list nfs_parse_devname will modify dev_name. There is no need to modify dev_name as the all that is being computed is the length of the hostname, so the computed length can just be shorted. Fixes: dc04589827f7 ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27exportfs: fix 'passing zero to ERR_PTR()' warningYueHaibing
[ Upstream commit 909e22e05353a783c526829427e9a8de122fba9c ] Fix a static code checker warning: fs/exportfs/expfs.c:171 reconnect_one() warn: passing zero to 'ERR_PTR' The error path for lookup_one_len_unlocked failure should set err to PTR_ERR. Fixes: bbf7a8a3562f ("exportfs: move most of reconnect_path to helper function") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27xfs: Sanity check flags of Q_XQUOTARM callJan Kara
commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 upstream. Flags passed to Q_XQUOTARM were not sanity checked for invalid values. Fix that. Fixes: 9da93f9b7cdf ("xfs: fix Q_XQUOTARM ioctl") Reported-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattrJeff Mahoney
commit 394440d469413fa9b74f88a11f144d76017221f2 upstream. Commit 60e4cf67a58 (reiserfs: fix extended attributes on the root directory) introduced a regression open_xa_root started returning -EOPNOTSUPP but it was not handled properly in reiserfs_for_each_xattr. When the reiserfs module is built without CONFIG_REISERFS_FS_XATTR, deleting an inode would result in a warning and chowning an inode would also result in a warning and then fail to complete. With CONFIG_REISERFS_FS_XATTR enabled, the xattr root would always be present for read-write operations. This commit handles -EOPNOSUPP in the same way -ENODATA is handled. Fixes: 60e4cf67a582 ("reiserfs: fix extended attributes on the root directory") CC: stable@vger.kernel.org # Commit 60e4cf67a58 was picked up by stable Link: https://lore.kernel.org/r/20200115180059.6935-1-jeffm@suse.com Reported-by: Michael Brunnbauer <brunni@netestate.de> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23btrfs: fix memory leak in qgroup accountingJohannes Thumshirn
commit 26ef8493e1ab771cb01d27defca2fa1315dc3980 upstream. When running xfstests on the current btrfs I get the following splat from kmemleak: unreferenced object 0xffff88821b2404e0 (size 32): comm "kworker/u4:7", pid 26663, jiffies 4295283698 (age 8.776s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 10 ff fd 26 82 88 ff ff ...........&.... 10 ff fd 26 82 88 ff ff 20 ff fd 26 82 88 ff ff ...&.... ..&.... backtrace: [<00000000f94fd43f>] ulist_alloc+0x25/0x60 [btrfs] [<00000000fd023d99>] btrfs_find_all_roots_safe+0x41/0x100 [btrfs] [<000000008f17bd32>] btrfs_find_all_roots+0x52/0x70 [btrfs] [<00000000b7660afb>] btrfs_qgroup_rescan_worker+0x343/0x680 [btrfs] [<0000000058e66778>] btrfs_work_helper+0xac/0x1e0 [btrfs] [<00000000f0188930>] process_one_work+0x1cf/0x350 [<00000000af5f2f8e>] worker_thread+0x28/0x3c0 [<00000000b55a1add>] kthread+0x109/0x120 [<00000000f88cbd17>] ret_from_fork+0x35/0x40 This corresponds to: (gdb) l *(btrfs_find_all_roots_safe+0x41) 0x8d7e1 is in btrfs_find_all_roots_safe (fs/btrfs/backref.c:1413). 1408 1409 tmp = ulist_alloc(GFP_NOFS); 1410 if (!tmp) 1411 return -ENOMEM; 1412 *roots = ulist_alloc(GFP_NOFS); 1413 if (!*roots) { 1414 ulist_free(tmp); 1415 return -ENOMEM; 1416 } 1417 Following the lifetime of the allocated 'roots' ulist, it gets freed again in btrfs_qgroup_account_extent(). But this does not happen if the function is called with the 'BTRFS_FS_QUOTA_ENABLED' flag cleared, then btrfs_qgroup_account_extent() does a short leave and directly returns. Instead of directly returning we should jump to the 'out_free' in order to free all resources as expected. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23btrfs: do not delete mismatched root refsJosef Bacik
commit 423a716cd7be16fb08690760691befe3be97d3fc upstream. btrfs_del_root_ref() will simply WARN_ON() if the ref doesn't match in any way, and then continue to delete the reference. This shouldn't happen, we have these values because there's more to the reference than the original root and the sub root. If any of these checks fail, return -ENOENT. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23btrfs: fix invalid removal of root refJosef Bacik
commit d49d3287e74ffe55ae7430d1e795e5f9bf7359ea upstream. If we have the following sequence of events btrfs sub create A btrfs sub create A/B btrfs sub snap A C mkdir C/foo mv A/B C/foo rm -rf * We will end up with a transaction abort. The reason for this is because we create a root ref for B pointing to A. When we create a snapshot of C we still have B in our tree, but because the root ref points to A and not C we will make it appear to be empty. The problem happens when we move B into C. This removes the root ref for B pointing to A and adds a ref of B pointing to C. When we rmdir C we'll see that we have a ref to our root and remove the root ref, despite not actually matching our reference name. Now btrfs_del_root_ref() allowing this to work is a bug as well, however we know that this inode does not actually point to a root ref in the first place, so we shouldn't be calling btrfs_del_root_ref() in the first place and instead simply look up our dir index for this item and do the rest of the removal. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23btrfs: rework arguments of btrfs_unlink_subvolJosef Bacik
[ Upstream commit 045d3967b6920b663fc010ad414ade1b24143bd1 ] btrfs_unlink_subvol takes the name of the dentry and the root objectid based on what kind of inode this is, either a real subvolume link or a empty one that we inherited as a snapshot. We need to fix how we unlink in the case for BTRFS_EMPTY_SUBVOL_DIR_OBJECTID in the future, so rework btrfs_unlink_subvol to just take the dentry and handle getting the right objectid given the type of inode this is. There is no functional change here, simply pushing the work into btrfs_unlink_subvol() proper. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-17ocfs2: call journal flush to mark journal as empty after journal recovery ↵Kai Li
when mount [ Upstream commit 397eac17f86f404f5ba31d8c3e39ec3124b39fd3 ] If journal is dirty when mount, it will be replayed but jbd2 sb log tail cannot be updated to mark a new start because journal->j_flag has already been set with JBD2_ABORT first in journal_init_common. When a new transaction is committed, it will be recored in block 1 first(journal->j_tail is set to 1 in journal_reset). If emergency restart happens again before journal super block is updated unfortunately, the new recorded trans will not be replayed in the next mount. The following steps describe this procedure in detail. 1. mount and touch some files 2. these transactions are committed to journal area but not checkpointed 3. emergency restart 4. mount again and its journals are replayed 5. journal super block's first s_start is 1, but its s_seq is not updated 6. touch a new file and its trans is committed but not checkpointed 7. emergency restart again 8. mount and journal is dirty, but trans committed in 6 will not be replayed. This exception happens easily when this lun is used by only one node. If it is used by multi-nodes, other node will replay its journal and its journal super block will be updated after recovery like what this patch does. ocfs2_recover_node->ocfs2_replay_journal. The following jbd2 journal can be generated by touching a new file after journal is replayed, and seq 15 is the first valid commit, but first seq is 13 in journal super block. logdump: Block 0: Journal Superblock Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2) Blocksize: 4096 Total Blocks: 32768 First Block: 1 First Commit ID: 13 Start Log Blknum: 1 Error: 0 Feature Compat: 0 Feature Incompat: 2 block64 Feature RO compat: 0 Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36 FS Share Cnt: 1 Dynamic Superblk Blknum: 0 Per Txn Block Limit Journal: 0 Data: 0 Block 1: Journal Commit Block Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK) Block 2: Journal Descriptor Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK) No. Blocknum Flags 0. 587 none UUID: 00000000000000000000000000000000 1. 8257792 JBD2_FLAG_SAME_UUID 2. 619 JBD2_FLAG_SAME_UUID 3. 24772864 JBD2_FLAG_SAME_UUID 4. 8257802 JBD2_FLAG_SAME_UUID 5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG ... Block 7: Inode Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809) FS Generation: 2839773110 (0xa9437fb6) CRC32: 00000000 ECC: 0000 Type: Regular Attr: 0x0 Flags: Valid Dynamic Features: (0x1) InlineData User: 0 (root) Group: 0 (root) Size: 7 Links: 1 Clusters: 0 ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019 atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019 mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019 dtime: 0x0 -- Thu Jan 1 08:00:00 1970 ... Block 9: Journal Commit Block Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK) The following is journal recovery log when recovering the upper jbd2 journal when mount again. syslog: ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it. fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0 fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1 fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2 fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13 Due to first commit seq 13 recorded in journal super is not consistent with the value recorded in block 1(seq is 14), journal recovery will be terminated before seq 15 even though it is an unbroken commit, inode 8257802 is a new file and it will be lost. Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com Signed-off-by: Kai Li <li.kai4@h3c.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Changwei Ge <gechangwei@live.cn> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-17f2fs: fix potential overflowChao Yu
commit 1f0d5c911b64165c9754139a26c8c2fad352c132 upstream. We expect 64-bit calculation result from below statement, however in 32-bit machine, looped left shift operation on pgoff_t type variable may cause overflow issue, fix it by forcing type cast. page->index << PAGE_SHIFT; Fixes: 26de9b117130 ("f2fs: avoid unnecessary updating inode during fsync") Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturnTrond Myklebust
commit 5326de9e94bedcf7366e7e7625d4deb8c1f1ca8a upstream. If nfs4_delegreturn_prepare needs to wait for a layoutreturn to complete then make sure we drop the sequence slot if we hold it. Fixes: 1c5bd76d17cc ("pNFS: Enable layoutreturn operation for return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17NFSv2: Fix a typo in encode_sattr()Trond Myklebust
commit ad97a995d8edff820d4238bd0dfc69f440031ae6 upstream. Encode the mtime correctly. Fixes: 95582b0083883 ("vfs: change inode times to use struct timespec64") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17btrfs: simplify inode locking for RWF_NOWAITGoldwyn Rodrigues
commit 9cf35f673583ccc9f3e2507498b3079d56614ad3 upstream. This is similar to 942491c9e6d6 ("xfs: fix AIM7 regression"). Apparently our current rwsem code doesn't like doing the trylock, then lock for real scheme. This causes extra contention on the lock and can be measured eg. by AIM7 benchmark. So change our read/write methods to just do the trylock for the RWF_NOWAIT case. Fixes: edf064e7c6fe ("btrfs: nowait aio support") Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17afs: Fix missing cell comparison in afs_test_super()David Howells
commit 106bc79843c3c6f4f00753d1f46e54e815f99377 upstream. Fix missing cell comparison in afs_test_super(). Without this, any pair volumes that have the same volume ID will share a superblock, no matter the cell, unless they're in different network namespaces. Normally, most users will only deal with a single cell and so they won't see this. Even if they do look into a second cell, they won't see a problem unless they happen to hit a volume with the same ID as one they've already got mounted. Before the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0 After the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ admin/ common/ install/ OldFiles/ service/ system/ bakrestores/ home/ misc/ pkg/ src/ wsadmin/ # cat /proc/mounts | grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0 Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Carsten Jacobi <jacobi@de.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org> cc: Todd DeSantis <atd@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17cifs: Adjust indentation in smb2_open_fileNathan Chancellor
commit 7935799e041ae10d380d04ea23868240f082bd11 upstream. Clang warns: ../fs/cifs/smb2file.c:70:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] if (oparms->tcon->use_resilient) { ^ ../fs/cifs/smb2file.c:66:2: note: previous statement is here if (rc) ^ 1 warning generated. This warning occurs because there is a space after the tab on this line. Remove it so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 592fafe644bf ("Add resilienthandles mount parm") Link: https://github.com/ClangBuiltLinux/linux/issues/826 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17f2fs: check if file namelen exceeds max valueSheng Yong
commit 720db068634c91553a8e1d9a0fcd8c7050e06d2b upstream. Dentry bitmap is not enough to detect incorrect dentries. So this patch also checks the namelen value of a dentry. Signed-off-by: Gong Chen <gongchen4@huawei.com> Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17f2fs: check memory boundary by insane namelenJaegeuk Kim
commit 4e240d1bab1ead280ddf5eb05058dba6bbd57d10 upstream. If namelen is corrupted to have very long value, fill_dentries can copy wrong memory area. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17f2fs: Move err variable to function scope in f2fs_fill_dentries()Ben Hutchings
This is preparation for the following backported fixes. It was done upstream as part of commit e1293bdfa01d "f2fs: plug readahead IO in readdir()", the rest of which does not seem suitable for stable. Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <yuchao0@huawei.com> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14chardev: Avoid potential use-after-free in 'chrdev_open()'Will Deacon
commit 68faa679b8be1a74e6663c21c3a9d25d32f1c079 upstream. 'chrdev_open()' calls 'cdev_get()' to obtain a reference to the 'struct cdev *' stashed in the 'i_cdev' field of the target inode structure. If the pointer is NULL, then it is initialised lazily by looking up the kobject in the 'cdev_map' and so the whole procedure is protected by the 'cdev_lock' spinlock to serialise initialisation of the shared pointer. Unfortunately, it is possible for the initialising thread to fail *after* installing the new pointer, for example if the subsequent '->open()' call on the file fails. In this case, 'cdev_put()' is called, the reference count on the kobject is dropped and, if nobody else has taken a reference, the release function is called which finally clears 'inode->i_cdev' from 'cdev_purge()' before potentially freeing the object. The problem here is that a racing thread can happily take the 'cdev_lock' and see the non-NULL pointer in the inode, which can result in a refcount increment from zero and a warning: | ------------[ cut here ]------------ | refcount_t: addition on 0; use-after-free. | WARNING: CPU: 2 PID: 6385 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0 | Modules linked in: | CPU: 2 PID: 6385 Comm: repro Not tainted 5.5.0-rc2+ #22 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 | RIP: 0010:refcount_warn_saturate+0x6d/0xf0 | Code: 05 55 9a 15 01 01 e8 9d aa c8 ff 0f 0b c3 80 3d 45 9a 15 01 00 75 ce 48 c7 c7 00 9c 62 b3 c6 08 | RSP: 0018:ffffb524c1b9bc70 EFLAGS: 00010282 | RAX: 0000000000000000 RBX: ffff9e9da1f71390 RCX: 0000000000000000 | RDX: ffff9e9dbbd27618 RSI: ffff9e9dbbd18798 RDI: ffff9e9dbbd18798 | RBP: 0000000000000000 R08: 000000000000095f R09: 0000000000000039 | R10: 0000000000000000 R11: ffffb524c1b9bb20 R12: ffff9e9da1e8c700 | R13: ffffffffb25ee8b0 R14: 0000000000000000 R15: ffff9e9da1e8c700 | FS: 00007f3b87d26700(0000) GS:ffff9e9dbbd00000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 00007fc16909c000 CR3: 000000012df9c000 CR4: 00000000000006e0 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 | Call Trace: | kobject_get+0x5c/0x60 | cdev_get+0x2b/0x60 | chrdev_open+0x55/0x220 | ? cdev_put.part.3+0x20/0x20 | do_dentry_open+0x13a/0x390 | path_openat+0x2c8/0x1470 | do_filp_open+0x93/0x100 | ? selinux_file_ioctl+0x17f/0x220 | do_sys_open+0x186/0x220 | do_syscall_64+0x48/0x150 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 | RIP: 0033:0x7f3b87efcd0e | Code: 89 54 24 08 e8 a3 f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f4 | RSP: 002b:00007f3b87d259f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 | RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b87efcd0e | RDX: 0000000000000000 RSI: 00007f3b87d25a80 RDI: 00000000ffffff9c | RBP: 00007f3b87d25e90 R08: 0000000000000000 R09: 0000000000000000 | R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe188f504e | R13: 00007ffe188f504f R14: 00007f3b87d26700 R15: 0000000000000000 | ---[ end trace 24f53ca58db8180a ]--- Since 'cdev_get()' can already fail to obtain a reference, simply move it over to use 'kobject_get_unless_zero()' instead of 'kobject_get()', which will cause the racing thread to return -ENXIO if the initialising thread fails unexpectedly. Cc: Hillf Danton <hdanton@sina.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+82defefbbd8527e1c2cb@syzkaller.appspotmail.com Signed-off-by: Will Deacon <will@kernel.org> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191219120203.32691-1-will@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12fs: avoid softlockups in s_inodes iteratorsEric Sandeen
[ Upstream commit 04646aebd30b99f2cfa0182435a2ec252fcb16d0 ] Anything that walks all inodes on sb->s_inodes list without rescheduling risks softlockups. Previous efforts were made in 2 functions, see: c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() ac05fbb inode: don't softlockup when evicting inodes but there hasn't been an audit of all walkers, so do that now. This also consistently moves the cond_resched() calls to the bottom of each loop in cases where it already exists. One loop remains: remove_dquot_ref(), because I'm not quite sure how to deal with that one w/o taking the i_lock. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12btrfs: Fix error messages in qgroup_rescan_initNikolay Borisov
[ Upstream commit 37d02592f11bb76e4ab1dcaa5b8a2a0715403207 ] The branch of qgroup_rescan_init which is executed from the mount path prints wrong errors messages. The textual print out in case BTRFS_QGROUP_STATUS_FLAG_RESCAN/BTRFS_QGROUP_STATUS_FLAG_ON are not set are transposed. Fix it by exchanging their place. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gapsZhihao Cheng
[ Upstream commit 6abf57262166b4f4294667fb5206ae7ba1ba96f5 ] Running stress-test test_2 in mtd-utils on ubi device, sometimes we can get following oops message: BUG: unable to handle page fault for address: ffffffff00000140 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 280a067 P4D 280a067 PUD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 60 Comm: kworker/u16:1 Kdump: loaded Not tainted 5.2.0 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0 -0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-ubifs_0_0) RIP: 0010:rb_next_postorder+0x2e/0xb0 Code: 80 db 03 01 48 85 ff 0f 84 97 00 00 00 48 8b 17 48 83 05 bc 80 db 03 01 48 83 e2 fc 0f 84 82 00 00 00 48 83 05 b2 80 db 03 01 <48> 3b 7a 10 48 89 d0 74 02 f3 c3 48 8b 52 08 48 83 05 a3 80 db 03 RSP: 0018:ffffc90000887758 EFLAGS: 00010202 RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: 0000000080800001 RDX: ffffffff00000130 RSI: 0000000080800024 RDI: ffff888138b08400 RBP: ffff888138b08400 R08: ffffea0004a6b920 R09: 0000000000000000 R10: ffffc90000887740 R11: 0000000000000001 R12: ffff888128d48000 R13: 0000000000000800 R14: 000000000000011e R15: 00000000000007c8 FS: 0000000000000000(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff00000140 CR3: 000000013789d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: destroy_old_idx+0x5d/0xa0 [ubifs] ubifs_tnc_start_commit+0x4fe/0x1380 [ubifs] do_commit+0x3eb/0x830 [ubifs] ubifs_run_commit+0xdc/0x1c0 [ubifs] Above Oops are due to the slab-out-of-bounds happened in do-while of function layout_in_gaps indirectly called by ubifs_tnc_start_commit. In function layout_in_gaps, there is a do-while loop placing index nodes into the gaps created by obsolete index nodes in non-empty index LEBs until rest index nodes can totally be placed into pre-allocated empty LEBs. @c->gap_lebs points to a memory area(integer array) which records LEB numbers used by 'in-the-gaps' method. Whenever a fitable index LEB is found, corresponding lnum will be incrementally written into the memory area pointed by @c->gap_lebs. The size ((@c->lst.idx_lebs + 1) * sizeof(int)) of memory area is allocated before do-while loop and can not be changed in the loop. But @c->lst.idx_lebs could be increased by function ubifs_change_lp (called by layout_leb_in_gaps->ubifs_find_dirty_idx_leb->get_idx_gc_leb) during the loop. So, sometimes oob happens when number of cycles in do-while loop exceeds the original value of @c->lst.idx_lebs. See detail in https://bugzilla.kernel.org/show_bug.cgi?id=204229. This patch fixes oob in layout_in_gaps. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09xfs: periodically yield scrub threads to the schedulerDarrick J. Wong
[ Upstream commit 5d1116d4c6af3e580f1ed0382ca5a94bd65a34cf ] Christoph Hellwig complained about the following soft lockup warning when running scrub after generic/175 when preemption is disabled and slub debugging is enabled: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [xfs_scrub:161] Modules linked in: irq event stamp: 41692326 hardirqs last enabled at (41692325): [<ffffffff8232c3b7>] _raw_0 hardirqs last disabled at (41692326): [<ffffffff81001c5a>] trace0 softirqs last enabled at (41684994): [<ffffffff8260031f>] __do_e softirqs last disabled at (41684987): [<ffffffff81127d8c>] irq_e0 CPU: 3 PID: 16189 Comm: xfs_scrub Not tainted 5.4.0-rc3+ #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.124 RIP: 0010:_raw_spin_unlock_irqrestore+0x39/0x40 Code: 89 f3 be 01 00 00 00 e8 d5 3a e5 fe 48 89 ef e8 ed 87 e5 f2 RSP: 0018:ffffc9000233f970 EFLAGS: 00000286 ORIG_RAX: ffffffffff3 RAX: ffff88813b398040 RBX: 0000000000000286 RCX: 0000000000000006 RDX: 0000000000000006 RSI: ffff88813b3988c0 RDI: ffff88813b398040 RBP: ffff888137958640 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00042b0c00 R13: 0000000000000001 R14: ffff88810ac32308 R15: ffff8881376fc040 FS: 00007f6113dea700(0000) GS:ffff88813bb80000(0000) knlGS:00000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6113de8ff8 CR3: 000000012f290000 CR4: 00000000000006e0 Call Trace: free_debug_processing+0x1dd/0x240 __slab_free+0x231/0x410 kmem_cache_free+0x30e/0x360 xchk_ag_btcur_free+0x76/0xb0 xchk_ag_free+0x10/0x80 xchk_bmap_iextent_xref.isra.14+0xd9/0x120 xchk_bmap_iextent+0x187/0x210 xchk_bmap+0x2e0/0x3b0 xfs_scrub_metadata+0x2e7/0x500 xfs_ioc_scrub_metadata+0x4a/0xa0 xfs_file_ioctl+0x58a/0xcd0 do_vfs_ioctl+0xa0/0x6f0 ksys_ioctl+0x5b/0x90 __x64_sys_ioctl+0x11/0x20 do_syscall_64+0x4b/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe If preemption is disabled, all metadata buffers needed to perform the scrub are already in memory, and there are a lot of records to check, it's possible that the scrub thread will run for an extended period of time without sleeping for IO or any other reason. Then the watchdog timer or the RCU stall timeout can trigger, producing the backtrace above. To fix this problem, call cond_resched() from the scrub thread so that we back out to the scheduler whenever necessary. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09bdev: Refresh bdev size for disks without partitioningJan Kara
commit cba22d86e0a10b7070d2e6a7379dbea51aa0883c upstream. Currently, block device size in not updated on second and further open for block devices where partition scan is disabled. This is particularly annoying for example for DVD drives as that means block device size does not get updated once the media is inserted into a drive if the device is already open when inserting the media. This is actually always the case for example when pktcdvd is in use. Fix the problem by revalidating block device size on every open even for devices with partition scan disabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09bdev: Factor out bdev revalidation into a common helperJan Kara
commit 731dc4868311ee097757b8746eaa1b4f8b2b4f1c upstream. Factor out code handling revalidation of bdev on disk change into a common helper. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAPAl Viro
commit 6b2daec19094a90435abe67d16fb43b1a5527254 upstream. Unlike FICLONE, all of those take a pointer argument; they do need compat_ptr() applied to arg. Fixes: d79bdd52d8be ("vfs: wire up compat ioctl for CLONE/CLONE_RANGE") Fixes: 54dbc1517237 ("vfs: hoist the btrfs deduplication ioctl to the vfs") Fixes: ceac204e1da9 ("fs: make fiemap work from compat_ioctl") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09xfs: don't check for AG deadlock for realtime files in bunmapiOmar Sandoval
commit 69ffe5960df16938bccfe1b65382af0b3de51265 upstream. Commit 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") added a check in __xfs_bunmapi() to stop early if we would touch multiple AGs in the wrong order. However, this check isn't applicable for realtime files. In most cases, it just makes us do unnecessary commits. However, without the fix from the previous commit ("xfs: fix realtime file data space leak"), if the last and second-to-last extents also happen to have different "AG numbers", then the break actually causes __xfs_bunmapi() to return without making any progress, which sends xfs_itruncate_extents_flags() into an infinite loop. Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09nfsd4: fix up replay_matches_cache()Scott Mayhew
commit 6e73e92b155c868ff7fce9d108839668caf1d9be upstream. When running an nfs stress test, I see quite a few cached replies that don't match up with the actual request. The first comment in replay_matches_cache() makes sense, but the code doesn't seem to match... fix it. This isn't exactly a bugfix, as the server isn't required to catch every case of a false retry. So, we may as well do this, but if this is fixing a problem then that suggests there's a client bug. Fixes: 53da6a53e1d4 ("nfsd4: catch some false session retries") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09locks: print unsigned ino in /proc/locksAmir Goldstein
commit 98ca480a8f22fdbd768e3dad07024c8d4856576c upstream. An ino is unsigned, so display it as such in /proc/locks. Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>