aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs/xfs_trans.h
AgeCommit message (Collapse)Author
2016-08-03xfs: remove the extents array from the rmap update done log itemDarrick J. Wong
Nothing ever uses the extent array in the rmap update done redo item, so remove it before it is fixed in the on-disk log format. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: propagate bmap updates to rmapbtDarrick J. Wong
When we map, unmap, or convert an extent in a file's data or attr fork, schedule a respective update in the rmapbt. Previous versions of this patch required a 1:1 correspondence between bmap and rmap, but this is no longer true as we now have ability to make interval queries against the rmapbt. We use the deferred operations code to handle redo operations atomically and deadlock free. This plumbs in all five rmap actions (map, unmap, convert extent, alloc, free); we'll use the first three now for file data, and reflink will want the last two. We also add an error injection site to test log recovery. Finally, we need to fix the bmap shift extent code to adjust the rmaps correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: enable the xfs_defer mechanism to process rmaps to updateDarrick J. Wong
Connect the xfs_defer mechanism with the pieces that we'll need to handle deferred rmap updates. We'll wire up the existing code to our new deferred mechanism later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: log rmap intent itemsDarrick J. Wong
Provide a mechanism for higher levels to create RUI/RUD items, submit them to the log, and a stub function to deal with recovered RUI items. These parts will be connected to the rmapbt in a later patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: add owner field to extent allocation and freeingDarrick J. Wong
For the rmap btree to work, we have to feed the extent owner information to the the allocation and freeing functions. This information is what will end up in the rmap btree that tracks allocated extents. While we technically don't need the owner information when freeing extents, passing it allows us to validate that the extent we are removing from the rmap btree actually belonged to the owner we expected it to belong to. We also define a special set of owner values for internal metadata that would otherwise have no owner. This allows us to tell the difference between metadata owned by different per-ag btrees, as well as static fs metadata (e.g. AG headers) and internal journal blocks. There are also a couple of special cases we need to take care of - during EFI recovery, we don't actually know who the original owner was, so we need to pass a wildcard to indicate that we aren't checking the owner for validity. We also need special handling in growfs, as we "free" the space in the last AG when extending it, but because it's new space it has no actual owner... While touching the xfs_bmap_add_free() function, re-order the parameters to put the struct xfs_mount first. Extend the owner field to include both the owner type and some sort of index within the owner. The index field will be used to support reverse mappings when reflink is enabled. When we're freeing extents from an EFI, we don't have the owner information available (rmap updates have their own redo items). xfs_free_extent therefore doesn't need to do an rmap update. Make sure that the log replay code signals this correctly. This is based upon a patch originally from Dave Chinner. It has been extended to add more owner information with the intent of helping recovery operations when things go wrong (e.g. offset of user data block in a file). [dchinner: de-shout the xfs_rmap_*_owner helpers] [darrick: minor style fixes suggested by Christoph Hellwig] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*Darrick J. Wong
Drop the compatibility shims that we were using to integrate the new deferred operation mechanism into the existing code. No new code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: enable the xfs_defer mechanism to process extents to freeDarrick J. Wong
Connect the xfs_defer mechanism with the pieces that we'll need to handle deferred extent freeing. We'll wire up the existing code to our new deferred mechanism later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03xfs: clean up typedef usage in the EFI/EFD handling codeDarrick J. Wong
Replace structure typedefs with struct xfs_foo_* in the EFI/EFD handling code in preparation to move it over to deferred ops. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22xfs: allocate log vector buffers outside CIL context lockDave Chinner
One of the problems we currently have with delayed logging is that under serious memory pressure we can deadlock memory reclaim. THis occurs when memory reclaim (such as run by kswapd) is reclaiming XFS inodes and issues a log force to unpin inodes that are dirty in the CIL. The CIL is pushed, but this will only occur once it gets the CIL context lock to ensure that all committing transactions are complete and no new transactions start being committed to the CIL while the push switches to a new context. The deadlock occurs when the CIL context lock is held by a committing process that is doing memory allocation for log vector buffers, and that allocation is then blocked on memory reclaim making progress. Memory reclaim, however, is blocked waiting for a log force to make progress, and so we effectively deadlock at this point. To solve this problem, we have to move the CIL log vector buffer allocation outside of the context lock so that memory reclaim can always make progress when it needs to force the log. The problem with doing this is that a CIL push can take place while we are determining if we need to allocate a new log vector buffer for an item and hence the current log vector may go away without warning. That means we canot rely on the existing log vector being present when we finally grab the context lock and so we must have a replacement buffer ready to go at all times. To ensure this, introduce a "shadow log vector" buffer that is always guaranteed to be present when we gain the CIL context lock and format the item. This shadow buffer may or may not be used during the formatting, but if the log item does not have an existing log vector buffer or that buffer is too small for the new modifications, we swap it for the new shadow buffer and format the modifications into that new log vector buffer. The result of this is that for any object we modify more than once in a given CIL checkpoint, we double the memory required to track dirty regions in the log. For single modifications then we consume the shadow log vectorwe allocate on commit, and that gets consumed by the checkpoint. However, if we make multiple modifications, then the second transaction commit will allocate a shadow log vector and hence we will end up with double the memory usage as only one of the log vectors is consumed by the CIL checkpoint. The remaining shadow vector will be freed when th elog item is freed. This can probably be optimised in future - access to the shadow log vector is serialised by the object lock (as opposited to the active log vector, which is controlled by the CIL context lock) and so we can probably free shadow log vector from some objects when the log item is marked clean on removal from the AIL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-04-06xfs: better xfs_trans_alloc interfaceChristoph Hellwig
Merge xfs_trans_reserve and xfs_trans_alloc into a single function call that returns a transaction with all the required log and block reservations, and which allows passing transaction flags directly to avoid the cumbersome _xfs_trans_alloc interface. While we're at it we also get rid of the transaction type argument that has been superflous since we stopped supporting the non-CIL logging mode. The guts of it will be removed in another patch. [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-03-02xfs: remove xfs_trans_get_block_resChristoph Hellwig
Just use the t_blk_res field directly instead of obsfucating the reference by a macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-19xfs: ensure EFD trans aborts on log recovery extent free failureBrian Foster
Log recovery attempts to free extents with leftover EFIs in the AIL after initial processing. If the extent free fails (e.g., due to unrelated fs corruption), the transaction is cancelled, though it might not be dirtied at the time. If this is the case, the EFD does not abort and thus does not release the EFI. This can lead to hangs as the EFI pins the AIL. Update xlog_recover_process_efi() to log the EFD in the transaction before xfs_free_extent() errors are handled to ensure the transaction is dirty, aborts the EFD and releases the EFI on error. Since this is a requirement for EFD processing (and consistent with xfs_bmap_finish()), update the EFD logging helper to do the extent free and unconditionally log the EFD. This encodes the required EFD logging behavior into the helper and reduces the likelihood of errors down the road. [dchinner: re-add xfs_alloc.h to xfs_log_recover.c to fix build failure.] Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-19xfs: return committed status from xfs_trans_roll()Brian Foster
Some callers need to make error handling decisions based on whether the current transaction successfully committed or not. Rename xfs_trans_roll(), add a new parameter and provide a wrapper to preserve existing callers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-19xfs: disentagle EFI release from the extent countBrian Foster
Release of the EFI either occurs based on the reference count or the extent count. The extent count used is either the count tracked in the EFI or EFD, depending on the particular situation. In either case, the count is initialized to the final value and thus always matches the current efi_next_extent value once the EFI is completely constructed. For example, the EFI extent count is increased as the extents are logged in xfs_bmap_finish() and the full free list is always completely processed. Therefore, the count is guaranteed to be complete once the EFI transaction is committed. The EFD uses the efd_nextents counter to release the EFI. This counter is initialized to the count of the EFI when the EFD is created. Thus the EFD, as currently used, has no concept of partial EFI release based on extent count. Given that the EFI extent count is always released in whole, use of the extent count for reference counting is unnecessary. Remove this level of the API and release the EFI based on the core reference count. The efi_next_extent counter remains because it is still used to track the slot to log the next extent to free. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: saner xfs_trans_commit interfaceChristoph Hellwig
The flags argument to xfs_trans_commit is not useful for most callers, as a commit of a transaction without a permanent log reservation must pass 0 here, and all callers for a transaction with a permanent log reservation except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove the flags argument from the public xfs_trans_commit interfaces, and introduce low-level __xfs_trans_commit variant just for xfs_trans_roll that regrants a log reservation instead of releasing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: remove the flags argument to xfs_trans_cancelChristoph Hellwig
xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and XFS_TRANS_ABORT. Both of them are a direct product of the transaction state, and can be deducted: - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled, and XFS_TRANS_ABORT is a noop for a transaction that is not dirty. - any transaction with a permanent log reservation needs XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent log reservation is invalid. So just remove the flags argument and do the right thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: switch remaining xfs_trans_dup users to xfs_trans_rollChristoph Hellwig
We have three remaining callers of xfs_trans_dup: - xfs_itruncate_extents which open codes xfs_trans_roll - xfs_bmap_finish doesn't have an xfs_inode argument and thus leaves attaching them to it's callers, but otherwise is identical to xfs_trans_roll - xfs_dir_ialloc looks at the log reservations in the old xfs_trans structure instead of the log reservation parameters, but otherwise is identical to xfs_trans_roll. By allowing a NULL xfs_inode argument to xfs_trans_roll we can switch these three remaining users over to xfs_trans_roll and mark xfs_trans_dup static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13xfs: format log items write directly into the linear CIL bufferChristoph Hellwig
Instead of setting up pointers to memory locations in iop_format which then get copied into the CIL linear buffer after return move the copy into the individual inode items. This avoids the need to always have a memory block in the exact same layout that gets written into the log around, and allow the log items to be much more flexible in their in-memory layouts. The only caveat is that we need to properly align the data for each iovec so that don't have structures misaligned in subsequent iovecs. Note that all log item format routines now need to be careful to modify the copy of the item that was placed into the CIL after calls to xlog_copy_iovec instead of the in-memory copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-10-23xfs: decouple inode and bmap btree header filesDave Chinner
Currently the xfs_inode.h header has a dependency on the definition of the BMAP btree records as the inode fork includes an array of xfs_bmbt_rec_host_t objects in it's definition. Move all the btree format definitions from xfs_btree.h, xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to xfs_format.h to continue the process of centralising the on-disk format definitions. With this done, the xfs inode definitions are no longer dependent on btree header files. The enables a massive culling of unnecessary includes, with close to 200 #include directives removed from the XFS kernel code base. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: decouple log and transaction headersDave Chinner
xfs_trans.h has a dependency on xfs_log.h for a couple of structures. Most code that does transactions doesn't need to know anything about the log, but this dependency means that they have to include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header files and clean up the includes to be in dependency order. In doing this, remove the direct include of xfs_trans_reserve.h from xfs_trans.h so that we remove the dependency between xfs_trans.h and xfs_mount.h. Hence the xfs_trans.h include can be moved to the indicate the actual dependencies other header files have on it. Note that these are kernel only header files, so this does not translate to any userspace changes at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23xfs: remove unused transaction callback variablesDave Chinner
We don't do callbacks at transaction commit time, no do we have any infrastructure to set up or run such callbacks, so remove the variables and typedefs for these operations. If we ever need to add callbacks, we can reintroduce the variables at that time. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-30xfs: finish removing IOP_* macros.Dave Chinner
In optimising the CIL operations, some of the IOP_* macros for calling log item operations were removed. Remove the rest of them as Christoph requested. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Geoffrey Wehrman <gwehrman@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13xfs: avoid CIL allocation during insertDave Chinner
Now that we have the size of the log vector that has been allocated, we can determine if we need to allocate a new log vector for formatting and insertion. We only need to allocate a new vector if it won't fit into the existing buffer. However, we need to hold the CIL context lock while we do this so that we can't race with a push draining the currently queued log vectors. It is safe to do this as long as we do GFP_NOFS allocation to avoid avoid memory allocation recursing into the filesystem. Hence we can safely overwrite the existing log vector on the CIL if it is large enough to hold all the dirty regions of the current item. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13xfs: Reduce allocations during CIL insertionDave Chinner
Now that we have the size of the object before the formatting pass is called, we can allocation the log vector and it's buffer in a single allocation rather than two separate allocations. Store the size of the allocated buffer in the log vector so that we potentially avoid allocation for future modifications of the object. While touching this code, remove the IOP_FORMAT definition. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-13xfs: return log item size in IOP_SIZEDave Chinner
To begin optimising the CIL commit process, we need to have IOP_SIZE return both the number of vectors and the size of the data pointed to by the vectors. This enables us to calculate the size ofthe memory allocation needed before the formatting step and reduces the number of memory allocations per item by one. While there, kill the IOP_SIZE macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: refactor xfs_trans_reserve() interfaceJie Liu
With the new xfs_trans_res structure has been introduced, the log reservation size, log count as well as log flags are pre-initialized at mount time. So it's time to refine xfs_trans_reserve() interface to be more neat. Also, introduce a new helper M_RES() to return a pointer to the mp->m_resv structure to simplify the input. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: split out transaction reservation codeDave Chinner
The transaction reservation size calculations is used by both kernel and userspace, but most of the transaction code in xfs_trans.c is kernel specific. Split all the transaction reservation code out into it's own files to make sharing with userspace simpler. This just leaves kernel-only definitions in xfs_trans.h, so it doesn't need to be shared with userspace anymore, either. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: sync minor header differences needed by userspace.Dave Chinner
Little things like exported functions, __KERNEL__ protections, and so on that ensure user and kernel shared headers are identical. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: split out on-disk transaction definitionsDave Chinner
There's a bunch of definitions in xfs_trans.h that define on-disk formats - transaction headers that get written into the log, log item type definitions, etc. Split out everything into a separate file so that all which remains in xfs_trans.h are kernel only definitions. Also, remove the duplicate magic number definitions for XFS_TRANS_MAGIC... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27xfs: Inode create log itemsDave Chinner
Introduce the inode create log item type for logical inode create logging. Instead of logging the changes in buffers, pass the range to be initialised through the log by a new transaction type. This reduces the amount of log space required to record initialisation during allocation from about 128 bytes per inode to a small fixed amount per inode extent to be initialised. This requires a new log item type to track it through the log and the AIL. This is a relatively simple item - most callbacks are noops as this item has the same life cycle as the transaction. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-27xfs: Introduce an ordered buffer itemDave Chinner
If we have a buffer that we have modified but we do not wish to physically log in a transaction (e.g. we've logged a logical change), we still need to ensure that transactional integrity is maintained. Hence we must not move the tail of the log past the transaction that the buffer is associated with before the buffer is written to disk. This means these special buffers still need to be included in the transaction and added to the AIL just like a normal buffer, but we do not want the modifications to the buffer written into the transaction. IOWs, what we want is an "ordered buffer" that maintains the same transactional life cycle as a physically logged buffer, just without the transcribing of the modifications to the log. Hence we need to flag the buffer as an "ordered buffer" to avoid including it in vector size calculations or formatting during the transaction. Once the transaction is committed, the buffer appears for all intents to be the same as a physically logged buffer as it transitions through the log and AIL. Relogging will also work just fine for such an ordered buffer - the logical transaction will be replayed before the subsequent modifications that relog the buffer, so everything will be reconstructed correctly by recovery. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-19xfs: Remove two dead transaction log reservaion macrosJie Liu
Upstream commit 5b292ae3a951a58e32119d73c7ac8f5bec7395a3 xfs: make use of xfs_calc_buf_res() in xfs_trans.c Beginning from above commit, neither XFS_ALLOCFREE_LOG_RES() nor XFS_DIROP_LOG_RES() is used by those routines for calculating transaction space reservations, so it's safe to remove them now. Also, with a slightly update for the relevant comments to reflect the ideas of why those log count numbers should be. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-07xfs: introduce CONFIG_XFS_WARNDave Chinner
Running a CONFIG_XFS_DEBUG kernel in production environments is not the best idea as it introduces significant overhead, can change the behaviour of algorithms (such as allocation) to improve test coverage, and (most importantly) panic the machine on non-fatal errors. There are many cases where all we want to do is run a kernel with more bounds checking enabled, such as is provided by the ASSERT() statements throughout the code, but without all the potential overhead and drawbacks. This patch converts all the ASSERT statements to evaluate as WARN_ON(1) statements and hence if they fail dump a warning and a stack trace to the log. This has minimal overhead and does not change any algorithms, and will allow us to find strange "out of bounds" problems more easily on production machines. There are a few places where assert statements contain debug only code. These are converted to be debug-or-warn only code so that we still get all the assert checks in the code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: buffer type overruns blf_flags fieldDave Chinner
The buffer type passed to log recvoery in the buffer log item overruns the blf_flags field. I had assumed that flags field was a 32 bit value, and it turns out it is a unisgned short. Therefore having 19 flags doesn't really work. Convert the buffer type field to numeric value, and use the top 5 bits of the flags field for it. We currently have 17 types of buffers, so using 5 bits gives us plenty of room for expansion in future.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27xfs: add buffer types to directory and attribute buffersDave Chinner
Add buffer types to the buffer log items so that log recovery can validate the buffers and calculate CRCs correctly after the buffers are recovered. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-21xfs: add support for large btree blocksChristoph Hellwig
Add support for larger btree blocks that contains a CRC32C checksum, a filesystem uuid and block number for detecting filesystem consistency and out of place writes. [dchinner@redhat.com] Also include an owner field to allow reverse mappings to be implemented for improved repairability and a LSN field to so that log recovery can easily determine the last modification that made it to disk for each buffer. [dchinner@redhat.com] Add buffer log format flags to indicate the type of buffer to recovery so that we don't have to do blind magic number tests to determine what the buffer is. [dchinner@redhat.com] Modified to fit into the verifier structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: refactor space log reservation for XFS_TRANS_ATTR_SETJeff Liu
Currently, we calculate the attribute set transaction log space reservation at runtime in two parts: 1) XFS_ATTRSET_LOG_RES() which is calcuated out at mount time. 2) ((ext * (mp)->m_sb.sb_sectsize) + \ (ext * XFS_FSB_TO_B((mp), XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))) + \ (128 * (ext + (ext * XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)))))) which is calculated out at runtime since it depend on the given extent length in blocks. This patch renamed XFS_ATTRSET_LOG_RES(mp) to XFS_ATTRSETM_LOG_RES(mp) to indicate that it is figured out at mount time. Introduce XFS_ATTRSETRT_LOG_RES(mp) which would be used to calculate out the unit of the log space reservation for one block. In this way, the total runtime space for the given extent length can be figured out by: XFS_ATTRSETM_LOG_RES(mp) + XFS_ATTRSETRT_LOG_RES(mp) * ext Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: introduce XFS_SB_LOG_RES() for transactions that modify sb on diskJeff Liu
Introduce a new transaction space reservation XFS_SB_LOG_RES() for those transactions that need to modify the superblock on disk. Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: calculate XFS_TRANS_QM_QUOTAOFF_END space log reservation at mount timeJeff Liu
Convert the calculation for end of quotaoff log space reservation from runtime to mount time. Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: calculate XFS_TRANS_QM_QUOTAOFF space log reservation at mount timeJeff Liu
Convert the calculation of quota off transaction log space reservation from runtime to mount time. Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: calculate XFS_TRANS_QM_DQALLOC space log reservation at mount timeJeff Liu
The disk quota allocation log space reservation is calcuated at runtime, this patch does it at mount time. Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: calcuate XFS_TRANS_QM_SETQLIM space log reservation at mount timeJeff Liu
For adjusting quota limits transactions, we calculate out the log space reservation at runtime, this patch does it at mount time. Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01xfs: calculate XFS_TRANS_QM_SBCHANGE space log reservation at mount timeJeff Liu
The transaction log space for clearing/reseting the quota flags is calculated out at runtime, this patch can figure it out at mount time. Signed-off-by: Jie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-15xfs: convert buffer verifiers to an ops structure.Dave Chinner
To separate the verifiers from iodone functions and associate read and write verifiers at the same time, introduce a buffer verifier operations structure to the xfs_buf. This avoids the need for assigning the write verifier, clearing the iodone function and re-running ioend processing in the read verifier, and gets rid of the nasty "b_pre_io" name for the write verifier function pointer. If we ever need to, it will also be easier to add further content specific callbacks to a buffer with an ops structure in place. We also avoid needing to export verifier functions, instead we can simply export the ops structures for those that are needed outside the function they are defined in. This patch also fixes a directory block readahead verifier issue it exposed. This patch also adds ops callbacks to the inode/alloc btree blocks initialised by growfs. These will need more work before they will work with CRCs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-15xfs: make buffer read verication an IO completion functionDave Chinner
Add a verifier function callback capability to the buffer read interfaces. This will be used by the callers to supply a function that verifies the contents of the buffer when it is read from disk. This patch does not provide callback functions, but simply modifies the interfaces to allow them to be called. The reason for adding this to the read interfaces is that it is very difficult to tell fom the outside is a buffer was just read from disk or whether we just pulled it out of cache. Supplying a callbck allows the buffer cache to use it's internal knowledge of the buffer to execute it only when the buffer is read from disk. It is intended that the verifier functions will mark the buffer with an EFSCORRUPTED error when verification fails. This allows the reading context to distinguish a verification error from an IO error, and potentially take further actions on the buffer (e.g. attempt repair) based on the error reported. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is *not* dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...
2012-07-31xfs: Convert to new freezing codeJan Kara
Generic code now blocks all writers from standard write paths. So we add blocking of all writers coming from ioctl (we get a protection of ioctl against racing remount read-only as a bonus) and convert xfs_file_aio_write() to a non-racy freeze protection. We also keep freeze protection on transaction start to block internal filesystem writes such as removal of preallocated blocks. CC: Ben Myers <bpm@sgi.com> CC: Alex Elder <elder@kernel.org> CC: xfs@oss.sgi.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-01xfs: add discontiguous buffer support to transactionsDave Chinner
Now that the buffer cache supports discontiguous buffers, add support to the transaction buffer interface for getting and reading buffers. Note that this patch does not convert the buffer item logging to support discontiguous buffers. That will be done as a separate commit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-29xfs: switch to proper __bitwise type for KM_... flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-14xfs: on-stack delayed write buffer listsChristoph Hellwig
Queue delwri buffers on a local on-stack list instead of a per-buftarg one, and write back the buffers per-process instead of by waking up xfsbufd. This is now easily doable given that we have very few places left that write delwri buffers: - log recovery: Only done at mount time, and already forcing out the buffers synchronously using xfs_flush_buftarg - quotacheck: Same story. - dquot reclaim: Writes out dirty dquots on the LRU under memory pressure. We might want to look into doing more of this via xfsaild, but it's already more optimal than the synchronous inode reclaim that writes each buffer synchronously. - xfsaild: This is the main beneficiary of the change. By keeping a local list of buffers to write we reduce latency of writing out buffers, and more importably we can remove all the delwri list promotions which were hitting the buffer cache hard under sustained metadata loads. The implementation is very straight forward - xfs_buf_delwri_queue now gets a new list_head pointer that it adds the delwri buffers to, and all callers need to eventually submit the list using xfs_buf_delwi_submit or xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are skipped in xfs_buf_delwri_queue, assuming they already are on another delwri list. The biggest change to pass down the buffer list was done to the AIL pushing. Now that we operate on buffers the trylock, push and pushbuf log item methods are merged into a single push routine, which tries to lock the item, and if possible add the buffer that needs writeback to the buffer list. This leads to much simpler code than the previous split but requires the individual IOP_PUSH instances to unlock and reacquire the AIL around calls to blocking routines. Given that xfsailds now also handle writing out buffers, the conditions for log forcing and the sleep times needed some small changes. The most important one is that we consider an AIL busy as long we still have buffers to push, and the other one is that we do increment the pushed LSN for buffers that are under flushing at this moment, but still count them towards the stuck items for restart purposes. Without this we could hammer on stuck items without ever forcing the log and not make progress under heavy random delete workloads on fast flash storage devices. [ Dave Chinner: - rebase on previous patches. - improved comments for XBF_DELWRI_Q handling - fix XBF_ASYNC handling in queue submission (test 106 failure) - rename delwri submit function buffer list parameters for clarity - xfs_efd_item_push() should return XFS_ITEM_PINNED ] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>