summaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_inode.h
AgeCommit message (Collapse)Author
2018-08-02xfs: fold dfops into the transactionBrian Foster
struct xfs_defer_ops has now been reduced to a single list_head. The external dfops mechanism is unused and thus everywhere a (permanent) transaction is accessible the associated dfops structure is as well. Remove the xfs_defer_ops structure and fold the list_head into the transaction. Also remove the last remnant of external dfops in xfs_trans_dup(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-30xfs: introduce a new xfs_inode_has_cow_data helperChristoph Hellwig
We have a few places that already check if an inode has actual data in the COW fork to avoid work on reflink inodes that do not actually have outstanding COW blocks. There are a few more places that can avoid working if doing the same check, so add a documented helper for this condition and use it in all places where it makes sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-30xfs: remove the xfs_ifork_t typedefChristoph Hellwig
We only have a few more callers left, so seize the opportunity and kill it off. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-26xfs: clean up IRELE/iput callsitesDarrick J. Wong
Replace the IRELE macro with a proper function so that we can do proper typechecking and so that we can stop open-coding iput in scrub, which means that we'll be able to ftrace inode lifetimes going through scrub correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-07-26xfs: kill IHOLDDarrick J. Wong
Nobody uses this macro, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2018-07-11xfs: remove dfops parameter from ifree call stackBrian Foster
The inode free callchain starting in xfs_inactive_ifree() already associates its dfops with the transaction. It still passes the dfops on the stack down through xfs_difree_inobt(), however. Clean up the call stack and reference dfops directly from the transaction. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-12Merge tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "Here's the second round of patches for XFS for 4.18. Most of the commits are small cleanups, bug fixes, and continued strengthening of metadata verifiers; the bulk of the diff is the conversion of the fs/xfs/ tree to use SPDX tags. This series has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Summary: - Strengthen metadata checking to avoid ASSERTing on bad disk contents - Validate btree records that are being retrieved for clients - Strengthen root inode verification - Convert license blurbs to SPDX tags - Enable changing DAX flag on directories - Fix some writeback deadlocks in reflink - Refactor out some old xfs helpers - Move type verifiers to a separate file - Fix some fuzzer crashes - Various other bug fixes" * tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits) xfs: update incore per-AG inode count xfs: replace do_mod with native operations xfs: don't call xfs_da_shrink_inode with NULL bp xfs: clean up MIN/MAX xfs: move various type verifiers to common file xfs: xfs_reflink_convert_cow() memory allocation deadlock xfs: setup VFS i_rwsem lockdep state correctly xfs: fix string handling in label get/set functions xfs: convert to SPDX license tags xfs: validate btree records on retrieval xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() xfs: verify root inode more thoroughly xfs: verify COW extent size hint is valid in inode verifier xfs: verify extent size hint is valid in inode verifier xfs: catch bad stripe alignment configurations iomap: fsync swap files before iterating mappings xfs: use xfs_trans_getsb in xfs_sync_sb_buf xfs: don't assert on corrupted unlinked inode list xfs: explicitly pass buffer size to xfs_corruption_error xfs: don't assert when on-disk btree pointers are garbage ...
2018-06-08Merge tag 'libnvdimm-for-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This adds a user for the new 'bytes-remaining' updates to memcpy_mcsafe() that you already received through Ingo via the x86-dax- for-linus pull. Not included here, but still targeting this cycle, is support for handling memory media errors (poison) consumed via userspace dax mappings. Summary: - DAX broke a fundamental assumption of truncate of file mapped pages. The truncate path assumed that it is safe to disconnect a pinned page from a file and let the filesystem reclaim the physical block. With DAX the page is equivalent to the filesystem block. Introduce dax_layout_busy_page() to enable filesystems to wait for pinned DAX pages to be released. Without this wait a filesystem could allocate blocks under active device-DMA to a new file. - DAX arranges for the block layer to be bypassed and uses dax_direct_access() + copy_to_iter() to satisfy read(2) calls. However, the memcpy_mcsafe() facility is available through the pmem block driver. In order to safely handle media errors, via the DAX block-layer bypass, introduce copy_to_iter_mcsafe(). - Fix cache management policy relative to the ACPI NFIT Platform Capabilities Structure to properly elide cache flushes when they are not necessary. The table indicates whether CPU caches are power-fail protected. Clarify that a deep flush is always performed on REQ_{FUA,PREFLUSH} requests" * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) dax: Use dax_write_cache* helpers libnvdimm, pmem: Do not flush power-fail protected CPU caches libnvdimm, pmem: Unconditionally deep flush on *sync libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH acpi, nfit: Remove ecc_unit_size dax: dax_insert_mapping_entry always succeeds libnvdimm, e820: Register all pmem resources libnvdimm: Debug probe times linvdimm, pmem: Preserve read-only setting for pmem devices x86, nfit_test: Add unit test for memcpy_mcsafe() pmem: Switch to copy_to_iter_mcsafe() dax: Report bytes remaining in dax_iomap_actor() dax: Introduce a ->copy_to_iter dax operation uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation xfs, dax: introduce xfs_break_dax_layouts() xfs: prepare xfs_break_layouts() for another layout type xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL mm, fs, dax: handle layout changes to pinned dax mappings mm: fix __gup_device_huge vs unmap mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS ...
2018-06-06xfs: convert to SPDX license tagsDave Chinner
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-22xfs: prepare xfs_break_layouts() for another layout typeDan Williams
When xfs is operating as the back-end of a pNFS block server, it prevents collisions between local and remote operations by requiring a lease to be held for remotely accessed blocks. Local filesystem operations break those leases before writing or mutating the extent map of the file. A similar mechanism is needed to prevent operations on pinned dax mappings, like device-DMA, from colliding with extent unmap operations. BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of layout breaking. Layouts are broken in the BREAK_WRITE case to ensure that layout-holders do not collide with local writes. Additionally, layouts are broken in the BREAK_UNMAP case to make sure the layout-holder has a consistent view of the file's extent map. While BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL. After this refactoring xfs_break_layouts() becomes the entry point for coordinating both types of breaks. Finally, xfs_break_leased_layouts() becomes just the BREAK_WRITE handler. Note that the unlock tracking is needed in a follow on change. That will coordinate retrying either break handler until both successfully test for a lease break while maintaining the lock state. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Reported-by: Dave Chinner <david@fromorbit.com> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-15xfs: factor out nodiscard helpersBrian Foster
The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: skip online discard during eofblocks trimsBrian Foster
We've had reports of online discard operations being sent from XFS on write-only workloads. These discards occur as a result of eofblocks trims that can occur after a large file copy completes. These discards are slightly confusing for users who might be paying close attention to online discards (i.e., vdo) due to performance sensitivity. They also happen to be spurious because freed post-eof blocks by definition have not been written to during the current allocation cycle. Update xfs_free_eofblocks() to skip discards that are purely attributed to eofblocks trims. This cuts down the number of spurious discards that may occur on write-only workloads due to normal preallocation activity. Note that discards of post-eof extents can still occur from other codepaths that do not isolate handling of post-eof blocks from those within eof. For example, file unlinks and truncates may still cause discards for any file blocks affected by the operation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-09xfs: non-scrub - remove unused function parametersEric Sandeen
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-02xfs: Remove "committed" argument of xfs_dir_iallocChandan Rajendra
xfs_dir_ialloc() rolls the current transaction when allocation of a new inode required the space manager to perform an allocation and replinish the Inode btree. None of the callers of xfs_dir_ialloc() need to know if the transaction was committed. Hence this commit removes the "committed" argument of xfs_dir_ialloc. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: remove xfs_zero_rangeChristoph Hellwig
This helper doesn't add any real value over just calling iomap_zero_range directly, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modesDarrick J. Wong
Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-08xfs: provide a centralized method for verifying inline fork dataDarrick J. Wong
Replace the current haphazard dir2 shortform verifier callsites with a centralized verifier function that can be called either with the default verifier functions or with a custom set. This helps us strengthen integrity checking while providing us with flexibility for repair tools. xfs_repair wants this to be able to supply its own verifier functions when trying to fix possibly corrupt metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-12-20xfs: track cowblocks separately in i_flagsDarrick J. Wong
The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them with separate i_flags. Right now we're abusing IEOFBLOCKS for both, which is totally bogus because we won't tag the inode with COWBLOCKS if IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS. Found by wiring up clonerange to fsstress in xfs/017. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-12-08xfs: remove "no-allocation" reservations for file creationsChristoph Hellwig
If we create a new file we will need an inode, and usually some metadata in the parent direction. Aiming for everything to go well despite the lack of a reservation leads to dirty transactions cancelled under a heavy create/delete load. This patch removes those nospace transactions, which will lead to slightly earlier ENOSPC on some workloads, but instead prevent file system shutdowns due to cancelling dirty transactions for others. A customer could observe assertations failures and shutdowns due to cancelation of dirty transactions during heavy NFS workloads as shown below: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]--- 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-26xfs: remove if_rdevChristoph Hellwig
We can simply use the i_rdev field in the Linux inode and just convert to and from the XFS dev_t when reading or logging/writing the inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-07-02xfs: Switch to iomap for SEEK_HOLE / SEEK_DATAChristoph Hellwig
Switch to the iomap_seek_hole and iomap_seek_data helpers for implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that isn't needed any more. Based on patches from Andreas Gruenbacher <agruenba@redhat.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-06-19xfs: remove double-underscore integer typesDarrick J. Wong
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-28xfs: support ability to wait on new inodesBrian Foster
Inodes that are inserted into the perag tree but still under construction are flagged with the XFS_INEW bit. Most contexts either skip such inodes when they are encountered or have the ability to handle them. The runtime quotaoff sequence introduces a context that must wait for construction of such inodes to correctly ensure that all dquots in the fs are released. In anticipation of this, support the ability to wait on new inodes. Wake the appropriate bit when XFS_INEW is cleared. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-11-30xfs: remove i_iolock and use i_rwsem in the VFS inode insteadChristoph Hellwig
This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which recently replaced i_mutex instead. This means we only have to take one lock instead of two in many fast path operations, and we can also shrink the xfs_inode structure. Thanks to the xfs_ilock family there is very little churn, the only thing of note is that we need to switch to use the lock_two_directory helper for taking the i_rwsem on two inodes in a few places to make sure our lock order matches the one used in the VFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-10xfs: fix unbalanced inode reclaim flush lockingBrian Foster
Filesystem shutdown testing on an older distro kernel has uncovered an imbalanced locking pattern for the inode flush lock in xfs_reclaim_inode(). Specifically, there is a double unlock sequence between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the "reclaim:" label. This actually does not cause obvious problems on current kernels due to the current flush lock implementation. Older kernels use a counting based flush lock mechanism, however, which effectively breaks the lock indefinitely when an already unlocked flush lock is repeatedly unlocked. Though this only currently occurs on filesystem shutdown, it has reproduced the effect of elevating an fs shutdown to a system-wide crash or hang. As it turns out, the flush lock is not actually required for the reclaim logic in xfs_reclaim_inode() because by that time we have already cycled the flush lock once while holding ILOCK_EXCL. Therefore, remove the additional flush lock/unlock cycle around the 'reclaim:' label and update branches into this label to release the flush lock where appropriate. Add an assert to xfs_ifunlock() to help prevent future occurences of the same problem. Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-05xfs: set a default CoW extent size of 32 blocksDarrick J. Wong
If the admin doesn't set a CoW extent size or a regular extent size hint, default to creating CoW reservations 32 blocks long to reduce fragmentation. Signed-off-by: DarricK J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: create a separate cow extent size hint for the allocatorDarrick J. Wong
Create a per-inode extent size allocator hint for copy-on-write. This hint is separate from the existing extent size hint so that CoW can take advantage of the fragmentation-reducing properties of extent size hints without disabling delalloc for regular writes. The extent size hint that's fed to the allocator during a copy on write operation is the greater of the cowextsize and regular extsize hint. During reflink, if we're sharing the entire source file to the entire destination file and the destination file doesn't already have a cowextsize hint, propagate the source file's cowextsize hint to the destination file. Furthermore, zero the bulkstat buffer prior to setting the fields so that we don't copy kernel memory contents into userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-04xfs: introduce the CoW forkDarrick J. Wong
Introduce a new in-core fork for storing copy-on-write delalloc reservations and allocated extents that are in the process of being written out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-04xfs: when replaying bmap operations, don't let unlinked inodes get reapedDarrick J. Wong
Log recovery will iget an inode to replay BUI items and iput the inode when it's done. Unfortunately, if the inode was unlinked, the iput will see that i_nlink == 0 and decide to truncate & free the inode, which prevents us from replaying subsequent BUIs. We can't skip the BUIs because we have to replay all the redo items to ensure that atomic operations complete. Since unlinked inode recovery will reap the inode anyway, we can safely introduce a new inode flag to indicate that an inode is in this 'unlinked recovery' state and should not be auto-reaped in the drop_inode path. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03xfs: introduce refcount btree definitionsDarrick J. Wong
Add new per-AG refcount btree definitions to the per-AG structures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-09-19xfs: make xfs_inode_set_eofblocks_tag cheaper for the common caseChristoph Hellwig
For long growing file writes we will usually already have the eofblocks tag set when adding more speculative preallocations. Add a flag in the inode to allow us to skip the the fairly expensive AG-wide spinlocks and multiple radix tree operations in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*Darrick J. Wong
Drop the compatibility shims that we were using to integrate the new deferred operation mechanism into the existing code. No new code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20Merge branch 'xfs-4.8-split-dax-dio' into for-nextDave Chinner
2016-07-20xfs: kill ioflagsChristoph Hellwig
Now that we have the direct I/O kiocb flag there is no real need to sample the value inside of XFS, and the invis flag was always just partially used and isn't worth keeping this infrastructure around for. This also splits the read tracepoint into buffered vs direct as we've done for writes a long time ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-21Merge branch 'xfs-4.8-iomap-write' into for-nextDave Chinner
2016-06-21xfs: handle 64-bit length in xfs_iozeroChristoph Hellwig
We'll want to use this code for large offsets now that we're skipping holes and unwritten extents efficiently. Also rename it to xfs_zero_range to be a bit more descriptive, and tell the caller if we actually did any zeroing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-06-01xfs: make several functions staticEric Sandeen
Al Viro noticed that xfs_lock_inodes should be static, and that led to ... a few more. These are just the easy ones, others require moving functions higher in source files, so that's not done here to keep this review simple. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-04-06xfs: set up inode operation vectors laterChristoph Hellwig
In the next patch we'll set up different inode operations for inline vs out of line symlinks, for that we need to make sure the flags are already set up properly. [dchinner: added xfs_setup_iops() call to xfs_rename_alloc_whiteout()] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-03-07Merge branch 'xfs-gut-icdinode-4.6' into for-nextDave Chinner
2016-02-09xfs: mode di_mode to vfs inodeDave Chinner
Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-09xfs: use vfs inode nlink field everywhereDave Chinner
The VFS tracks the inode nlink just like the xfs_icdinode. We can remove the variable from the icdinode and use the VFS inode variable everywhere, reducing the size of the xfs_icdinode by a further 4 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-09xfs: introduce inode log format objectDave Chinner
We currently carry around and log an entire inode core in the struct xfs_inode. A lot of the information in the inode core is duplicated in the VFS inode, but we cannot remove this duplication of infomration because the inode core is logged directly in xfs_inode_item_format(). Add a new function xfs_inode_item_format_core() that copies the inode core data into a struct xfs_icdinode that is pulled directly from the log vector buffer. This means we no longer directly copy the inode core, but copy the structures one member at a time. This will be slightly less efficient than copying, but will allow us to remove duplicate and unnecessary items from the struct xfs_inode. To enable us to do this, call the new structure a xfs_log_dinode, so that we know it's different to the physical xfs_dinode and the in-core xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-08xfs: Factor xfs_seek_hole_data into helperEric Sandeen
Factor xfs_seek_hole_data into an unlocked helper which takes an xfs inode rather than a file for internal use. Also allow specification of "end" - the vfs lseek interface is defined such that any offset past eof/i_size shall return -ENXIO, but we will use this for quota code which does not maintain i_size, and we want to be able to SEEK_DATA past i_size as well. So the lseek path can send in i_size, and the quota code can determine its own ending offset. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-19xfs: clean up inode lockdep annotationsDave Chinner
Lockdep annotations are a maintenance nightmare. Locking has to be modified to suit the limitations of the annotations, and we're always having to fix the annotations because they are unable to express the complexity of locking heirarchies correctly. So, next up, we've got more issues with lockdep annotations for inode locking w.r.t. XFS_LOCK_PARENT: - lockdep classes are exclusive and can't be ORed together to form new classes. - IOLOCK needs multiple PARENT subclasses to express the changes needed for the readdir locking rework needed to stop the endless flow of lockdep false positives involving readdir calling filldir under the ILOCK. - there are only 8 unique lockdep subclasses available, so we can't create a generic solution. IOWs we need to treat the 3-bit space available to each lock type differently: - IOLOCK uses xfs_lock_two_inodes(), so needs: - at least 2 IOLOCK subclasses - at least 2 IOLOCK_PARENT subclasses - MMAPLOCK uses xfs_lock_two_inodes(), so needs: - at least 2 MMAPLOCK subclasses - ILOCK uses xfs_lock_inodes with up to 5 inodes, so needs: - at least 5 ILOCK subclasses - one ILOCK_PARENT subclass - one RTBITMAP subclass - one RTSUM subclass For the IOLOCK, split the space into two sets of subclasses. For the MMAPLOCK, just use half the space for the one subclass to match the non-parent lock classes of the IOLOCK. For the ILOCK, use 0-4 as the ILOCK subclasses, 5-7 for the remaining individual subclasses. Because they are now all different, modify xfs_lock_inumorder() to handle the nested subclasses, and to assert fail if passed an invalid subclass. Further, annotate xfs_lock_inodes() to assert fail if an invalid combination of lock primitives and inode counts are passed that would result in a lockdep subclass annotation overflow. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-24Merge branch 'xfs-mmap-lock' into for-nextDave Chinner
2015-02-24Merge branch 'xfs-misc-fixes-for-4.1' into for-nextDave Chinner
2015-02-23xfs: inodes are new until the dentry cache is set upDave Chinner
Al Viro noticed a generic set of issues to do with filehandle lookup racing with dentry cache setup. They involve a filehandle lookup occurring while an inode is being created and the filehandle lookup racing with the dentry creation for the real file. This can lead to multiple dentries for the one path being instantiated. There are a host of other issues around this same set of paths. The underlying cause is that file handle lookup only waits on inode cache instantiation rather than full dentry cache instantiation. XFS is mostly immune to the problems discovered due to it's own internal inode cache, but there are a couple of corner cases where races can happen. We currently clear the XFS_INEW flag when the inode is fully set up after insertion into the cache. Newly allocated inodes are inserted locked and so aren't usable until the allocation transaction commits. This, however, occurs before the dentry and security information is fully initialised and hence the inode is unlocked and available for lookups to find too early. To solve the problem, only clear the XFS_INEW flag for newly created inodes once the dentry is fully instantiated. This means lookups will retry until the XFS_INEW flag is removed from the inode and hence avoids the race conditions in questions. THis also means that xfs_create(), xfs_create_tmpfile() and xfs_symlink() need to finish the setup of the inode in their error paths if we had allocated the inode but failed later in the creation process. xfs_symlink(), in particular, needed a lot of help to make it's error handling match that of xfs_create(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: ensure truncate forces zeroed blocks to diskDave Chinner
A new fsync vs power fail test in xfstests indicated that XFS can have unreliable data consistency when doing extending truncates that require block zeroing. The blocks beyond EOF get zeroed in memory, but we never force those changes to disk before we run the transaction that extends the file size and exposes those blocks to userspace. This can result in the blocks not being correctly zeroed after a crash. Because in-memory behaviour is correct, tools like fsx don't pick up any coherency problems - it's not until the filesystem is shutdown or the system crashes after writing the truncate transaction to the journal but before the zeroed data in the page cache is flushed that the issue is exposed. Fix this by also flushing the dirty data in memory region between the old size and new size when we've found blocks that need zeroing in the truncate process. Reported-by: Liu Bo <bo.li.liu@oracle.com> cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: introduce mmap/truncate lockDave Chinner
Right now we cannot serialise mmap against truncate or hole punch sanely. ->page_mkwrite is not able to take locks that the read IO path normally takes (i.e. the inode iolock) because that could result in lock inversions (read - iolock - page fault - page_mkwrite - iolock) and so we cannot use an IO path lock to serialise page write faults against truncate operations. Instead, introduce a new lock that is used *only* in the ->page_mkwrite path that is the equivalent of the iolock. The lock ordering in a page fault is i_mmaplock -> page lock -> i_ilock, and so in truncate we can i_iolock -> i_mmaplock and so lock out new write faults during the process of truncation. Because i_mmap_lock is outside the page lock, we can hold it across all the same operations we hold the i_iolock for. The only difference is that we never hold the i_mmaplock in the normal IO path and so do not ever have the possibility that we can page fault inside it. Hence there are no recursion issues on the i_mmap_lock and so we can use it to serialise page fault IO against inode modification operations that affect the IO path. This patch introduces the i_mmaplock infrastructure, lockdep annotations and initialisation/destruction code. Use of the new lock will be in subsequent patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-02Merge branch 'xfs-misc-fixes-for-3.20-3' into for-nextDave Chinner