summaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/suballoc.c
AgeCommit message (Collapse)Author
2016-09-19ocfs2: fix double unlock in case retry after free truncate logJoseph Qi
If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to free truncate log and then retry. Since ocfs2_try_to_free_truncate_log will lock/unlock global bitmap inode, we have to unlock it before calling this function. But when retry reserve and it fails with no global bitmap inode lock taken, it will unlock again in error handling branch and BUG. This issue also exists if no need retry and then ocfs2_inode_lock fails. So fix it. Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log") Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02ocfs2: retry on ENOSPC if sufficient space in truncate logEric Ren
The testcase "mmaptruncate" in ocfs2 test suite always fails with ENOSPC error on small volume (say less than 10G). This testcase repeatedly performs "extend" and "truncate" on a file. Continuously, it truncates the file to 1/2 of the size, and then extends to 100% of the size. The main bitmap will quickly run out of space because the "truncate" code prevent truncate log from being flushed by ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may have cached lots of clusters. So retry to allocate after flushing truncate log when ENOSPC is returned. And we cannot reuse the deleted blocks before the transaction committed. Fortunately, we already have a function to do this - ocfs2_try_to_free_truncate_log(). Just need to remove the "static" modifier and put it into the right place. The "unlock"/"lock" code isn't elegant, but there seems to be no better option. [zren@suse.com: locking fix] Link: http://lkml.kernel.org/r/1468031546-4797-1-git-send-email-zren@suse.com Link: http://lkml.kernel.org/r/1466586469-5541-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-05ocfs2: improve performance for localallocJoseph Qi
Currently cluster allocation is always trying to find a victim chain (a chian has most space), and this may lead to poor performance because of discontiguous allocation in some scenarios. Our test case is block size 4k, cluster size 1M and mount option with localalloc=2048 (2G), since a gd is 32256M (about 31.5G) and a localalloc window is only 2G, creating 50G file will result in 2G from gd0, 2G from gd1, ... One way to improve performance is enlarge localalloc window size (max 31104M), but this will make end user feel that about 30G is suddenly "missing", and localalloc currently do not support steal, which means one node cannot use another node's localalloc even it is not used in fact. So using the last gd to record the allocation and continues with the gd if it has enough space for a localalloc window can make the allocation as more contiguous as possible. Our test result is below (evaluated in IOPS), which is using iometer running in VM, dynamic vhd virtual disk stored in ocfs2. IO model Original After Improved(%) 16K60%Write100%Random 703 876 24.59% 8K90%Write100%Random 735 827 12.59% 4K100%Write100%Random 859 915 6.52% 4K100%Read100%Random 2092 2600 24.30% Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Tested-by: Norton Zhu <norton.zhu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04ocfs2: clean up redundant NULL checks before kfreeJoseph Qi
NULL check before kfree is redundant and so clean them up. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04ocfs2: neaten do_error, ocfs2_error and ocfs2_abortJoe Perches
These uses sometimes do and sometimes don't have '\n' terminations. Make the uses consistently use '\n' terminations and remove the newline from the functions. Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04ocfs2: acknowledge return value of ocfs2_error()Goldwyn Rodrigues
Caveat: This may return -EROFS for a read case, which seems wrong. This is happening even without this patch series though. Should we convert EROFS to EIO? Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: rollback the cleared bits if error occurs after ↵Joseph Qi
ocfs2_block_group_clear_bits ocfs2_block_group_clear_bits will clear bits in block group bitmap. Once it succeeds but fails in the following step, it will cause block group bitmap mismatch the corresponding count recorded in dinode. So rollback the cleared bits if error occurs. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: iput inode alloc when failed locallyjiangyiwen
In ocfs2_info_handle_freeinode() and ocfs2_test_inode_bit() func, after calls ocfs2_get_system_file_inode() to get inode ref, if calls ocfs2_info_scan_inode_alloc() or ocfs2_inode_lock() failed, we should iput inode alloc to avoid leaking the inode. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: rollback alloc_dinode counts when ocfs2_block_group_set_bits() failedYounger Liu
After updating alloc_dinode counts in ocfs2_alloc_dinode_update_counts(), if ocfs2_alloc_dinode_update_bitmap() failed, there is a rare case that some space may be lost. So, roll back alloc_dinode counts when ocfs2_block_group_set_bits() failed. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Younger Liu <younger.liucn@gmail.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: call ocfs2_update_inode_fsync_trans when updating any inodeDarrick J. Wong
Ensure that ocfs2_update_inode_fsync_trans() is called any time we touch an inode in a given transaction. This is a follow-on to the previous patch to reduce lock contention and deadlocking during an fsync operation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Wengang <wen.gang.wang@oracle.com> Cc: Greg Marsden <greg.marsden@oracle.com> Cc: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ↵Younger Liu
ocfs2_block_group_set_bits() ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() are already provided in suballoc.c. So, the same functions in move_extents.c are not needed any more. Declare the functions in suballoc.h and remove redundant functions in move_extents.c. Signed-off-by: Younger Liu <liuyiyang@hisense.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: return ENOMEM when sb_getblk() failsRui Xiang
The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So return ENOMEM instead when it fails. [joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03ocfs2: remove duplicated mlog_errno() in ocfs2_relink_block_groupAndrew Morton
Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Younger Liu <younger.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03ocfs2: rework transaction rollback in ocfs2_relink_block_group()Jie Liu
In ocfs2_relink_block_group(), we roll back all those changes if notify intent to modify buffers for metadata update failed even if the relevant buffer has not yet been modified/got dirty at that point, that are not quite right because of: - None buffer has been modified/dirty if failed to call ocfs2_journal_access_gd() against the previous block group buffer - Only the previous block group buffer has got dirty if failed to call ocfs2_journal_access_gd() against the block group buffer - There is no need to roll back the change for file entry buffer at all Those problems will not cause anything wrong but unnecessary. This patch fix them and kill the useless bg_ptr variable as well. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Younger Liu <younger.liu@huawei.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27ocfs2: ac->ac_allow_chain_relink=0 won't disable group relinkXiaowei.Hu
ocfs2_block_group_alloc_discontig() disables chain relink by setting ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple cluster groups. It doesn't keep the credits for all chain relink,but ocfs2_claim_suballoc_bits overrides this in this call trace: ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()-> __ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits() ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call ocfs2_search_chain() one time and disable it again, and then we run out of credits. Fix is to allow relink by default and disable it in ocfs2_block_group_alloc_discontig. Without this patch, End-users will run into a crash due to run out of credits, backtrace like this: RIP: 0010:[<ffffffffa0808b14>] [<ffffffffa0808b14>] jbd2_journal_dirty_metadata+0x164/0x170 [jbd2] RSP: 0018:ffff8801b919b5b8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0 RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8 RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0 R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800 FS: 00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0) Call Trace: ocfs2_journal_dirty+0x2f/0x70 [ocfs2] ocfs2_relink_block_group+0x111/0x480 [ocfs2] ocfs2_search_chain+0x455/0x9a0 [ocfs2] ... Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-13ocfs2: ->e_leaf_clusters endianness breakageAl Viro
le16, not le32... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-02-22ocfs2: Remove DISK_ALLOC from masklog.Tao Ma
Since all 4 files, localalloc.c, suballoc.c, alloc.c and resize.c, which use DISK_ALLOC are changed to trace events, Remove masklog DISK_ALLOC totally. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-22ocfs2: Remove mlog(0) from fs/ocfs2/suballoc.cTao Ma
This is the 3rd step to remove the debug info of DISK_ALLOC. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-03-07ocfs2: Remove EXIT from masklog.Tao Ma
mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-21ocfs2: Remove ENTRY from masklog.Tao Ma
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-15Merge branch 'globalheartbeat-2' of ↵Joel Becker
git://oss.oracle.com/git/smushran/linux-2.6 into ocfs2-merge-window Conflicts: fs/ocfs2/ocfs2.h
2010-10-11ocfs2: validate bg_free_bits_count after updateSrinivas Eeda
This patch adds a safe check to ensure bg_free_bits_count doesn't exceed bg_bits in a group descriptor. This is to avoid on disk corruption that was seen recently. debugfs: group <52803072> Group Chain: 179 Parent Inode: 11 Generation: 2959379682 CRC32: 00000000 ECC: 0000 ## Block# Total Used Free Contig Size 0 52803072 32256 4294965350 34202 18207 4032 ...... Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.Tao Ma
e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-08ocfs2: allow return of new inode block location before allocation of the inodeMark Fasheh
This allows code which needs to know the eventual block number of an inode but can't allocate it yet due to transaction or lock ordering. For example, ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation of the orphan dir because it can't yet know where the actual inode is placed - that code is actually in ocfs2_mknod_locked. This is a problem when the orphan dirs are indexed as the junk inode number will create an index entry which goes unused (and fails the later removal from the orphan dir). Now with these interfaces, ocfs2_create_inode_in_orphan() can run the block group search (and get back the inode block number) *before* any actual allocation occurs. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: use ocfs2_alloc_dinode_update_counts() instead of open codingMark Fasheh
ocfs2_search_chain() makes the same updates as ocfs2_alloc_dinode_update_counts to the alloc inode. Instead of open coding the bitmap update, use our helper function. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: properly set and use inode group alloc hintMark Fasheh
We were setting ac->ac_last_group in ocfs2_claim_suballoc_bits from res->sr_bg_blkno. Unfortunately, res->sr_bg_blkno is going to be zero under normal (non-fragmented) circumstances. The discontig block group patches effectively turned off that feature. Fix this by correctly calculating what the next group hint should be. Acked-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-09-08ocfs2: Use the right group in nfs sync check.Tao Ma
We have added discontig block group now, and now an inode can be allocated in an discontig block group. So get it in ocfs2_get_suballoc_slot_bit. The old ocfs2_test_suballoc_bit gets group block no from the allocation inode which is wrong. Fix it by passing the right group. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-07-12ocfs2: Remove the redundant cpu_to_le64.Tao Ma
In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno. But actually bg->bg_blkno is already changed to little endian in ocfs2_block_group_fill. So remove the extra cpu_to_le64. Reported-by: Marcos Matsunaga <Marcos.Matsunaga@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-18ocfs2: Silence a gcc warning.Joel Becker
ocfs2_block_group_claim_bits() is never called with min_bits=0, but we shouldn't leave status undefined if it ever is. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-27ocfs2: Set ac_last_group properly with discontig group.Tao Ma
ac_last_group is used to record the last block group we used during allocation. But the initialization process only calls ocfs2_which_suballoc_group and fails to use suballoc_loc properly. So let us do it. Another function ocfs2_test_suballoc_bit also needs fix. I have searched all the callers of ocfs2_which_suballoc_group, and all the callers notices suballoc_loc now. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-03-22ocfs2: Free block to the right block group.Tao Ma
In case the block we are going to free is allocated from a discontiguous block group, we have to use suballoc_loc to be the right group. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: ocfs2_group_bitmap_size has to handle old volume.Tao Ma
ocfs2_group_bitmap_size has to handle the case when the volume don't have discontiguous block group support. So pass the feature_incompat in and check it. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-22ocfs2: Some tiny bug fixes for discontiguous block allocation.Tao Ma
The fixes include: 1. some endian problems. 2. we should use bit/bpc in ocfs2_block_group_grow_discontig to allocate clusters. 3. set num_clusters properly in __ocfs2_claim_clusters. 4. change name from ocfs2_supports_discontig_bh to ocfs2_supports_discontig_bg. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-03-26ocfs2: Don't relink cluster groups when allocating discontig block groupsJoel Becker
We don't have enough credits, and the filesystem is in a full state anyway. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-26ocfs2: Grow discontig block groups in one transaction.Joel Becker
Rather than extending the transaction every time we add an extent to a discontiguous block group, we grab enough credits to fill the extent list up front. This means we can free the bits in the same transaction if we end up not getting enough space. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-26ocfs2: Set suballoc_loc on allocated metadata.Joel Becker
Get the suballoc_loc from ocfs2_claim_new_inode() or ocfs2_claim_metadata(). Store it on the appropriate field of the block we just allocated. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-26ocfs2: Return allocated metadata blknos on the ocfs2_suballoc_result.Joel Becker
Rather than calculating the resulting block number, return it on the ocfs2_suballoc_result structure. This way we can calculate block numbers for discontiguous block groups. Cluster groups keep doing it the old way. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-06ocfs2: ocfs2_claim_*() don't need an ocfs2_super argument.Joel Becker
They all take an ocfs2_alloc_context, which has the allocation inode. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-03-26ocfs2: Trim suballocations if they cross discontiguous regionsJoel Becker
A discontiguous block group can find a range of free bits that straddle more than one region of its space. Callers can't handle that, so we trim the returned bits until they fit within one region. Only cluster allocations ask for min_bits>1. Discontiguous block groups are only for block allocations. So min_bits doesn't matter here. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-26ocfs2: ocfs2_claim_suballoc_bits() doesn't need an osb argument.Joel Becker
It's contained on ac->ac_inode->i_sb anyway. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-04-13ocfs2: Pass suballocation results back via a structure.Joel Becker
We're going to be adding more info to a suballocator allocation. Rather than growing every function in the chain, let's pass a result structure around. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: Allocate discontiguous block groups.Joel Becker
If we cannot get a contiguous region for a block group, allocate a discontiguous one when the filesystem supports it. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-04-13ocfs2: Define data structures for discontiguous block groups.Joel Becker
Defines the OCFS2_FEATURE_INCOMPAT_DISCONTIG_BG feature bit and modifies struct ocfs2_group_desc for the feature. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
2010-05-05ocfs2: remove ocfs2_local_alloc_in_range()Mark Fasheh
Inodes are always allocated from the global bitmap now so we don't need this any more. Also, the existing implementation bounces reservations around needlessly. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: allocate btree internal block groups from the global bitmapMark Fasheh
Otherwise, the need for a very large contiguous allocation tends to wreak havoc on many inode allocation reservations on the local alloc, thus ruining any chances for contiguousness. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: use allocation reservations for directory dataMark Fasheh
Use the reservations system for unindexed dir tree allocations. We don't bother with the indexed tree as reads from it are mostly random anyway. Directory reservations are marked seperately, to allow the reservations code a chance to optimize their window sizes. This patch allocates only 8 bits for directory windows as they generally are not expected to grow as quickly as file data. Future improvements to dir window sizing can trivially be made. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05ocfs2: Make ocfs2_journal_dirty() void.Joel Becker
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>