aboutsummaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2016-08-04nfsd: remove some dead code in nfsd_create_locked()Dan Carpenter
We changed this around in f135af1041f ('nfsd: reorganize nfsd_create') so "dchild" can't be an error pointer any more. Also, dchild can't be NULL here (and dput would already handle this even if it was). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: drop unnecessary MAY_EXEC check from createJ. Bruce Fields
We need an fh_verify to make sure we at least have a dentry, but actual permission checks happen later. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: clean up bad-type check in nfsd_create_lockedJ. Bruce Fields
Minor cleanup, no change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: remove unnecessary positive-dentry checkJ. Bruce Fields
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which returns EEXIST if the object already exists. This check is therefore unnecessary. (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to RFC 1094, our code seems to believe that a CREATE of an existing file should succeed. I'm leaving that behavior alone.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: reorganize nfsd_createJ. Bruce Fields
There's some odd logic in nfsd_create() that allows it to be called with the parent directory either locked or unlocked. The only already-locked caller is NFSv2's nfsd_proc_create(). It's less confusing to split out the unlocked case into a separate function which the NFSv2 code can call directly. Also fix some comments while we're here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: check d_can_lookup in fh_verify of directoriesJ. Bruce Fields
Create and other nfsd ops generally assume we can call lookup_one_len on inodes with S_IFDIR set. Al says that this assumption isn't true in general, though it should be for the filesystem objects nfsd sees. Add a check just to make sure our assumption isn't violated. Remove a couple checks for i_op->lookup in create code. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: remove redundant zero-length check from createJ. Bruce Fields
lookup_one_len already has this check. The only effect of this patch is to return access instead of perm in the 0-length-filename case. I actually prefer nfserr_perm (or _inval?), but I doubt anyone cares. The isdotent check seems redundant too, but I worry that some client might actually care about that strange nfserr_exist error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: Make creates return EEXIST instead of EACCESOleg Drokin
When doing a create (mkdir/mknod) on a name, it's worth checking the name exists first before returning EACCES in case the directory is not writeable by the user. This makes return values on the client more consistent regardless of whenever the entry there is cached in the local cache or not. Another positive side effect is certain programs only expect EEXIST in that case even despite POSIX allowing any valid error to be returned. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04mm/block: convert rw_page users to bio op useMike Christie
The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04Fixup direct bi_rw modifiersShaun Tancheff
bi_rw should be using bio_set_op_attrs to set bi_rw. Signed-off-by: Shaun Tancheff <shaun@tancheff.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04f2fs: drop bio->bi_rw manual assignmentJens Axboe
Merge 4fc29c1aa375 included this extra line, but it's not needed (or useful) since we'll bio_set_op_attrs() right after to properly set the op and flags for the bio. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04block: add missing group association in bio-cloning functionsPaolo Valente
When a bio is cloned, the newly created bio must be associated with the same blkcg as the original bio (if BLK_CGROUP is enabled). If this operation is not performed, then the new bio is not associated with any group, and the group of the current task is returned when the group of the bio is requested. Depending on the cloning frequency, this may cause a large percentage of the bios belonging to a given group to be treated as if belonging to other groups (in most cases as if belonging to the root group). The expected group isolation may thereby be broken. This commit adds the missing association in bio-cloning functions. Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04writeback: Write dirty times for WB_SYNC_ALL writebackJan Kara
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and queue_io() so that inodes which have only dirty timestamps are properly written on fsync(2) and sync(2). However there are other call sites - most notably going through write_inode_now() - which expect inode to be clean after WB_SYNC_ALL writeback. This is not currently true as we do not clear I_DIRTY_TIME in __writeback_single_inode() even for WB_SYNC_ALL writeback in all the cases. This then resulted in the following oops because bdev_write_inode() did not clean the inode and writeback code later stumbled over a dirty inode with detached wb. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-11:0) task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000 RIP: 0010:[<ffffffff818884d2>] [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750 RSP: 0018:ffff88006cdaf7d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050 RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286 RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000 R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0 R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40 FS: 0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0 DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948 ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948 ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e Call Trace: [< inline >] inode_to_wb_and_lock_list fs/fs-writeback.c:309 [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554 [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600 [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709 [< inline >] wb_do_writeback fs/fs-writeback.c:1844 [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884 [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094 [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228 [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303 [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392 Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e 00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3 RIP [< inline >] wb_get include/linux/backing-dev-defs.h:212 RIP [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750 fs/fs-writeback.c:281 RSP <ffff88006cdaf7d0> ---[ end trace 986a4d314dcb2694 ]--- Fix the problem by making sure __writeback_single_inode() writes inode only with dirty times in WB_SYNC_ALL mode. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04block: remove BLK_DEV_DAX config optionRoss Zwisler
The functionality for block device DAX was already removed with commit acc93d30d7d4 ("Revert "block: enable dax for raw block devices"") However, we still had a config option hanging around that was always disabled because it depended on CONFIG_BROKEN. This config option was introduced in commit 03cdadb04077 ("block: disable block device DAX by default") This change reverts that commit, removing the dead config option. Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()Dan Carpenter
We can't pass error pointers to kfree() or it causes an oops. Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-08-03Btrfs: fix __MAX_CSUM_ITEMSChris Mason
Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on machines where the pagesize >= metadata blocksize. This just reverts the relevant hunks to bring the old math back. Signed-off-by: Chris Mason <clm@fb.com>
2016-08-03cachefiles: Fix race between inactivating and culling a cache objectDavid Howells
There's a race between cachefiles_mark_object_inactive() and cachefiles_cull(): (1) cachefiles_cull() can't delete a backing file until the cache object is marked inactive, but as soon as that's the case it's fair game. (2) cachefiles_mark_object_inactive() marks the object as being inactive and *only then* reads the i_blocks on the backing inode - but cachefiles_cull() might've managed to delete it by this point. Fix this by making sure cachefiles_mark_object_inactive() gets any data it needs from the backing inode before deactivating the object. Without this, the following oops may occur: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles] ... CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G I ------------ 3.10.0-470.el7.x86_64 #1 Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011 Workqueue: fscache_object fscache_object_work_func [fscache] task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000 RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles] RSP: 0018:ffff8800b77c3d70 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034 RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8 RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000 R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600 R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498 FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600 Call Trace: [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles] [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache] [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache] [<ffffffff810a605b>] process_one_work+0x17b/0x470 [<ffffffff810a6e96>] worker_thread+0x126/0x410 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460 [<ffffffff810ae64f>] kthread+0xcf/0xe0 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81695418>] ret_from_fork+0x58/0x90 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140 The oopsing code shows: callq 0xffffffff810af6a0 <wake_up_bit> mov 0xf8(%r12),%rax mov 0x30(%rax),%rax mov 0x98(%rax),%rax <---- oops here lock add %rax,0x130(%rbx) where this is: d_backing_inode(object->dentry)->i_blocks Fixes: a5b3a80b899bda0f456f1246c4c5a1191ea01519 (CacheFiles: Provide read-and-reset release counters for cachefilesd) Reported-by: Jianhong Yin <jiyin@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-03Merge branch 'for-viro' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-linus
2016-08-03fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2Geert Uytterhoeven
With gcc < 4.2 (e.g. 4.1.2): CC fs/proc/task_mmu.o cc1: error: unrecognized command line option "-Wno-override-init" To fix this, only enable the compiler option when it is actually supported by the compiler. Fixes: ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-039p: use clone_fid()Al Viro
in a bunch of places it cleans the things up Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-039p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"Al Viro
In v9fs_vfs_rename() we need to clone the parents' fids, not just find them. Spotted-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-03vfs: make dentry_needs_remove_privs() internalMiklos Szeredi
Only used by the vfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-08-03vfs: fix deadlock in file_remove_privs() on overlayfsMiklos Szeredi
file_remove_privs() is called with inode lock on file_inode(), which proceeds to calling notify_change() on file->f_path.dentry. Which triggers the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later when ovl_setattr tries to lock the underlying inode again. Fix this mess by not mixing the layers, but doing everything on underlying dentry/inode. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode") Cc: <stable@vger.kernel.org>
2016-08-03Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve()Filipe Manana
No longer used as of commit 5846a3c26873 ("btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans"). Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-08-03xfs: move (and rename) the deferred bmap-free tracepointsDarrick J. Wong
Rename the deferred bmap-free to extent_free and make them only trigger when we're really running deferred ops. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: collapse single use static functionsDarrick J. Wong
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: remove unnecessary parentheses from log redo item recovery functionsDarrick J. Wong
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: remove the extents array from the rmap update done log itemDarrick J. Wong
Nothing ever uses the extent array in the rmap update done redo item, so remove it before it is fixed in the on-disk log format. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: in btree_lshift, only allocate temporary cursor when neededDarrick J. Wong
We only need the temporary cursor in _btree_lshift if we're shifting in an overlapped btree. Therefore, factor that into a single block of code so we avoid unnecessary cursor duplication. Also fix use of the wrong cursor when checking for corruption in xfs_btree_rshift(). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: remove unnecesary lshift/rshift key initializationDarrick J. Wong
In the lshift/rshift functions we don't use the key variable for anything now, so remove the variable and its initializer. The update_keys functions figure out the key for a block on their own. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: remove the get*keys and update_keys btree ops pointersDarrick J. Wong
These are internal btree functions; we don't need them to be dispatched via function pointers. Make them static again and just check the overlapped flag to figure out what we need to do. The strategy behind this patch was suggested by Christoph. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: enable the rmap btree functionalityDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> Add the feature flag to the supported matrix so that the kernel can mount and use rmap btree enabled filesystems Signed-off-by: Dave Chinner <dchinner@redhat.com> [darrick.wong@oracle.com: move the experimental tag] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: don't update rmapbt when fixing agflDarrick J. Wong
Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates when fixing the AG freelist. xfs_repair needs this during phase 5 to be able to adjust the freelist while it's reconstructing the rmap btree; the missing entries will be added back at the very end of phase 5 once the AGFL contents settle down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabledDarrick J. Wong
Swapping extents between two inodes requires the owner to be updated in the rmap tree for all the extents that are swapped. This code does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until support has been implemented. This will need to be done before the rmap btree code can have the experimental tag removed. This functionality will be provided in a (much) later patch, using some of the reflink deferred block remapping functionality to accomlish extent swapping with rmap updates. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add rmap btree block detection to log recoveryDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> So such blocks can be correctly identified and have their operations structures attached to validate recovery has not resulted in a correct block. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add rmap btree geometry feature flagDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> So xfs_info and other userspace utilities know the filesystem is using this feature. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: propagate bmap updates to rmapbtDarrick J. Wong
When we map, unmap, or convert an extent in a file's data or attr fork, schedule a respective update in the rmapbt. Previous versions of this patch required a 1:1 correspondence between bmap and rmap, but this is no longer true as we now have ability to make interval queries against the rmapbt. We use the deferred operations code to handle redo operations atomically and deadlock free. This plumbs in all five rmap actions (map, unmap, convert extent, alloc, free); we'll use the first three now for file data, and reflink will want the last two. We also add an error injection site to test log recovery. Finally, we need to fix the bmap shift extent code to adjust the rmaps correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: enable the xfs_defer mechanism to process rmaps to updateDarrick J. Wong
Connect the xfs_defer mechanism with the pieces that we'll need to handle deferred rmap updates. We'll wire up the existing code to our new deferred mechanism later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: log rmap intent itemsDarrick J. Wong
Provide a mechanism for higher levels to create RUI/RUD items, submit them to the log, and a stub function to deal with recovered RUI items. These parts will be connected to the rmapbt in a later patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: create rmap update intent log itemsDarrick J. Wong
Create rmap update intent/done log items to record redo information in the log. Because we need to roll transactions between updating the bmbt mapping and updating the reverse mapping, we also have to track the status of the metadata updates that will be recorded in the post-roll transactions, just in case we crash before committing the final transaction. This mechanism enables log recovery to finish what was already started. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add rmap btree insert and delete helpersDarrick J. Wong
Add a couple of helper functions to encapsulate rmap btree insert and delete operations. Add tracepoints to the update function. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: convert unwritten status of reverse mappingsDarrick J. Wong
Provide a function to convert an unwritten rmap extent to a real one and vice versa. [ dchinner: Note that this algorithm and code was derived from the existing bmapbt unwritten extent conversion code in xfs_bmap_add_extent_unwritten_real(). ] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: remove an extent from the rmap btreeDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> Now that we have records in the rmap btree, we need to remove them when extents are freed. This needs to find the relevant record in the btree and remove/trim/split it accordingly. [darrick.wong@oracle.com: make rmap routines handle the enlarged keyspace] [dchinner: remove remaining unused debug printks] [darrick: fix a bug when growfs in an AG with an rmap ending at EOFS] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add an extent to the rmap btreeDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> Now all the btree, free space and transaction infrastructure is in place, we can finally add the code to insert reverse mappings to the rmap btree. Freeing will be done in a separate patch, so just the addition operation can be focussed on here. [darrick: handle owner offsets when adding rmaps] [dchinner: remove remaining debug printk statements] [darrick: move unwritten bit to rm_offset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add tracepoints for the rmap functionsDarrick J. Wong
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: teach rmapbt to support interval queriesDarrick J. Wong
Now that the generic btree code supports querying all records within a range of keys, use that functionality to allow us to ask for all the extents mapped to a range of physical blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: support overlapping intervals in the rmap btreeDarrick J. Wong
Now that the generic btree code supports overlapping intervals, plug in the rmap btree to this functionality. We will need it to find potential left neighbors in xfs_rmap_{alloc,free} later in the patch set. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add rmap btree operationsDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> Implement the generic btree operations needed to manipulate rmap btree blocks. This is very similar to the per-ag freespace btree implementation, and uses the AGFL for allocation and freeing of blocks. Adapt the rmap btree to store owner offsets within each rmap record, and to handle the primary key being redefined as the tuple [agblk, owner, offset]. The expansion of the primary key is crucial to allowing multiple owners per extent. [darrick: adapt the btree ops to deal with offsets] [darrick: remove init_rec_from_key] [darrick: move unwritten bit to rm_offset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: rmap btree requires more reserved free spaceDarrick J. Wong
Originally-From: Dave Chinner <dchinner@redhat.com> The rmap btree is allocated from the AGFL, which means we have to ensure ENOSPC is reported to userspace before we run out of free space in each AG. The last allocation in an AG can cause a full height rmap btree split, and that means we have to reserve at least this many blocks *in each AG* to be placed on the AGFL at ENOSPC. Update the various space calculation functions to handle this. Also, because the macros are now executing conditional code and are called quite frequently, convert them to functions that initialise variables in the struct xfs_mount, use the new variables everywhere and document the calculations better. [darrick.wong@oracle.com: don't reserve blocks if !rmap] [dchinner@redhat.com: update m_ag_max_usable after growfs] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: rmap btree transaction reservationsDarrick J. Wong
The rmap btrees will use the AGFL as the block allocation source, so we need to ensure that the transaction reservations reflect the fact this tree is modified by allocation and freeing. Hence we need to extend all the extent allocation/free reservations used in transactions to handle this. Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES macro, as we now do buffer reservations based on the number of buffers logged via xfs_calc_buf_res(). Hence we only need the buffer count calculation now. [darrick: use rmap_maxlevels when calculating log block resv] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>