aboutsummaryrefslogtreecommitdiffstats
path: root/fs/gfs2/glock.c
AgeCommit message (Collapse)Author
2020-04-17gfs2: Don't demote a glock until its revokes are writtenBob Peterson
[ Upstream commit df5db5f9ee112e76b5202fbc331f990a0fc316d6 ] Before this patch, run_queue would demote glocks based on whether there are any more holders. But if the glock has pending revokes that haven't been written to the media, giving up the glock might end in file system corruption if the revokes never get written due to io errors, node crashes and fences, etc. In that case, another node will replay the metadata blocks associated with the glock, but because the revoke was never written, it could replay that block even though the glock had since been granted to another node who might have made changes. This patch changes the logic in run_queue so that it never demotes a glock until its count of pending revokes reaches zero. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-04gfs2: Use async glocks for renameBob Peterson
Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can reverse the roles of which directories are "old" and which are "new" for the purposes of rename. This can cause deadlocks where two nodes end up waiting for each other. There can be several layers of directory dependencies across many nodes. This patch fixes the problem by acquiring all gfs2_rename's inode glocks asychronously and waiting for all glocks to be acquired. That way all inodes are locked regardless of the order. The timeout value for multiple asynchronous glocks is calculated to be the total of the individual wait times for each glock times two. Since gfs2_exchange is very similar to gfs2_rename, both functions are patched in the same way. A new async glock wait queue, sd_async_glock_wait, keeps a list of waiters for these events. If gfs2's holder_wake function detects an async holder, it wakes up any waiters for the event. The waiter only tests whether any of its requests are still pending. Since the glocks are sent to dlm asychronously, the wait function needs to check to see which glocks, if any, were granted. If a glock is granted by dlm (and therefore held), its minimum hold time is checked and adjusted as necessary, as other glock grants do. If the event times out, all glocks held thus far must be dequeued to resolve any existing deadlocks. Then, if there are any outstanding locking requests, we need to loop around and wait for dlm to respond to those requests too. After we release all requests, we return -ESTALE to the caller (vfs rename) which loops around and retries the request. Node1 Node2 --------- --------- 1. Enqueue A Enqueue B 2. Enqueue B Enqueue A 3. A granted 6. B granted 7. Wait for B 8. Wait for A 9. A times out (since Node 1 holds A) 10. Dequeue B (since it was granted) 11. Wait for all requests from DLM 12. B Granted (since Node2 released it in step 10) 13. Rename 14. Dequeue A 15. DLM Grants A 16. Dequeue A (due to the timeout and since we no longer have B held for our task). 17. Dequeue B 18. Return -ESTALE to vfs 19. VFS retries the operation, goto step 1. This release-all-locks / acquire-all-locks may slow rename / exchange down as both nodes struggle in the same way and do the same thing. However, this will only happen when there is contention for the same inodes, which ought to be rare. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-04gfs2: create function gfs2_glock_update_hold_timeAndreas Gruenbacher
This patch moves the code that updates glock minimum hold time to a separate function. This will be called by a future patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-09-03gfs2: Fix possible fs name overflowsBob Peterson
This patch fixes three places in which temporary character buffers could overflow due to the addition of the file system id from patch 3792ce973f07. Thanks to Dan Carpenter for pointing it out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: dump fsid when dumping glock problemsBob Peterson
Before this patch, if a glock error was encountered, the glock with the problem was dumped. But sometimes you may have lots of file systems mounted, and that doesn't tell you which file system it was for. This patch adds a new boolean parameter fsid to the dump_glock family of functions. For non-error cases, such as dumping the glocks debugfs file, the fsid is not dumped in order to keep lock dumps and glocktop as clean as possible. For all error cases, such as GLOCK_BUG_ON, the file system id is now printed. This will make it easier to debug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWNBob Peterson
Before this patch, the superblock flag indicating when a file system is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to the more obvious SDF_WITHDRAWN. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Use IS_ERR_OR_NULLKefeng Wang
Use IS_ERR_OR_NULL where appropriate. (Several more places converted by Andreas.) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-08Merge tag 'spdx-5.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
2019-06-06Revert "gfs2: Replace gl_revokes with a GLF flag"Bob Peterson
Commit 73118ca8baf7 introduced a glock reference counting bug in gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is no longer useful, so revert that commit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07gfs2: Replace gl_revokes with a GLF flagBob Peterson
The gl_revokes value determines how many outstanding revokes a glock has on the superblock revokes list; this is used to avoid unnecessary log flushes. However, gl_revokes is only ever tested for being zero, and it's only decremented in revoke_lo_after_commit, which removes all revokes from the list, so we know that the gl_revoke values of all the glocks on the list will reach zero. Therefore, we can replace gl_revokes with a bit flag. This saves an atomic counter in struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix occasional glock use-after-freeAndreas Gruenbacher
This patch has to do with the life cycle of glocks and buffers. When gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata object is assigned to track the buffer, and that is queued to various lists, including the glock's gl_ail_list to indicate it's on the active items list. Once the page associated with the buffer has been written, it is removed from the ail list, but its life isn't over until a revoke has been successfully written. So after the block is written, its bufdata object is moved from the glock's gl_ail_list to a file-system-wide list of pending revokes, sd_log_le_revoke. At that point the glock still needs to track how many revokes it contributed to that list (in gl_revokes) so that things like glock go_sync can ensure all the metadata has been not only written, but also revoked before the glock is granted to a different node. This is to guarantee journal replay doesn't replay the block once the glock has been granted to another node. Ross Lagerwall recently discovered a race in which an inode could be evicted, and its glock freed after its ail list had been synced, but while it still had unwritten revokes on the sd_log_le_revoke list. The evict decremented the glock reference count to zero, which allowed the glock to be freed. After the revoke was written, function revoke_lo_after_commit tried to adjust the glock's gl_revokes counter and clear its GLF_LFLUSH flag, at which time it referenced the freed glock. This patch fixes the problem by incrementing the glock reference count in gfs2_add_revoke when the glock's first bufdata object is moved from the glock to the global revokes list. Later, when the glock's last such bufdata object is freed, the reference count is decremented. This guarantees that whichever process finishes last (the revoke writing or the evict) will properly free the glock, and neither will reference the glock after it has been freed. Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07gfs2: Fix lru_count going negativeRoss Lagerwall
Under certain conditions, lru_count may drop below zero resulting in a large amount of log spam like this: vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \ negative objects to delete nr=-1 This happens as follows: 1) A glock is moved from lru_list to the dispose list and lru_count is decremented. 2) The dispose function calls cond_resched() and drops the lru lock. 3) Another thread takes the lru lock and tries to add the same glock to lru_list, checking if the glock is on an lru list. 4) It is on a list (actually the dispose list) and so it avoids incrementing lru_count. 5) The glock is moved to lru_list. 5) The original thread doesn't dispose it because it has been re-added to the lru list but the lru_count has still decreased by one. Fix by checking if the LRU flag is set on the glock rather than checking if the glock is on some list and rearrange the code so that the LRU flag is added/removed precisely when the glock is added/removed from lru_list. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-03-08gfs2: Fix missed wakeups in find_insert_glockAndreas Gruenbacher
Mark Syms has reported seeing tasks that are stuck waiting in find_insert_glock. It turns out that struct lm_lockname contains four padding bytes on 64-bit architectures that function glock_waitqueue doesn't skip when hashing the glock name. As a result, we can end up waking up the wrong waitqueue, and the waiting tasks may be stuck forever. Fix that by using ht_parms.key_len instead of sizeof(struct lm_lockname) for the key length. Reported-by: Mark Syms <mark.syms@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-01-23gfs: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. There is no need to save the dentries for the debugfs files, so drop those variables to save a bit of space and make the code simpler. Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-12gfs2: Dump nrpages for inodes and their glocksBob Peterson
This patch is based on an idea from Steve Whitehouse. The idea is to dump the number of pages for inodes in the glock dumps. The additional locking required me to drop const from quite a few places. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-05gfs2: Use fs_* functions instead of pr_* function where we canBob Peterson
Before this patch, various errors and messages were reported using the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does not tell you which gfs2 mount had the problem, which is often vital to debugging. This patch changes the calls from pr_* to fs_* in most of the messages so that the file system id is printed along with the message. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-12gfs2: Stop using rhashtable_walk_peekAndreas Gruenbacher
Function rhashtable_walk_peek is problematic because there is no guarantee that the glock previously returned still exists; when that key is deleted, rhashtable_walk_peek can end up returning a different key, which will cause an inconsistent glock dump. Fix this by keeping track of the current glock in the seq file iterator functions instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01gfs2: Glock dump performance regression fixAndreas Gruenbacher
Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks dump": keep the glock hash table iterator active while the glock dump file is held open. This avoids having to rescan the hash table from the start for each read, with quadratically rising runtime. In addition, use rhastable_walk_peek for resuming a glock dump at the current position: when a glock doesn't fit in the provided buffer anymore, the next read must revisit the same glock. Finally, also restart the dump from the first entry when we notice that the hash table has been resized in gfs2_glock_seq_start. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-11rhashtable: Change rhashtable_walk_start to return voidTom Herbert
Most callers of rhashtable_walk_start don't care about a resize event which is indicated by a return value of -EAGAIN. So calls to rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something like this is common: ret = rhashtable_walk_start(rhiter); if (ret && ret != -EAGAIN) goto out; Since zero and -EAGAIN are the only possible return values from the function this check is pointless. The condition never evaluates to true. This patch changes rhashtable_walk_start to return void. This simplifies code for the callers that ignore -EAGAIN. For the few cases where the caller cares about the resize event, particularly where the table can be walked in mulitple parts for netlink or seq file dump, the function rhashtable_walk_start_check has been added that returns -EAGAIN on a resize event. Signed-off-by: Tom Herbert <tom@quantonium.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-25gfs2: Fix debugfs glocks dumpAndreas Gruenbacher
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a single buffer: the right function for restarting an rhashtable iteration from the beginning of the hash table is rhashtable_walk_enter; rhashtable_walk_stop + rhashtable_walk_start will just resume from the current position. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # v4.3+
2017-08-30gfs2: constify rhashtable_paramsArvind Yadav
rhashtable_params are not supposed to change at runtime. All Functions rhashtable_* working with const rhashtable_params provided by <linux/rhashtable.h>. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25GFS2: Fix up some sparse warningsBob Peterson
This patch cleans up various pieces of GFS2 to avoid sparse errors. This doesn't fix them all, but it fixes several. The first error, in function glock_hash_walk was a genuine bug where the rhashtable could be started and not stopped. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: Clean up waiting on glocksAndreas Gruenbacher
The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are better removed, resulting in cleaner code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: gfs2_evict_inode: Put glocks asynchronouslyAndreas Gruenbacher
gfs2_evict_inode is called to free inodes under memory pressure. The function calls into DLM when an inode's last cluster-wide reference goes away (remote unlink) and to release the glock and associated DLM lock before finally destroying the inode. However, if DLM is blocked on memory to become available, calling into DLM again will deadlock. Avoid that by decoupling releasing glocks from destroying inodes in that case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously in work queue context, when the associated inodes have likely already been destroyed. With this change, inodes can end up being unlinked, remote-unlink can be triggered, and then the inode can be reallocated before all remote-unlink callbacks are processed. To detect that, revalidate the link count in gfs2_evict_inode to make sure we're not deleting an allocated, referenced inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: gfs2_glock_get: Wait on freeing glocksAndreas Gruenbacher
Keep glocks in their hash table until they are freed instead of removing them when their last reference is dropped. This allows to wait for any previous instances of a glock to go away in gfs2_glock_get before creating a new glocks. Special thanks to Andy Price for finding and fixing a problem which also required us to delete the rcu_read_unlock from the error case in function gfs2_glock_get. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09GFS2: Don't waste time locking lru_lock for non-lru glocksBob Peterson
Before this patch, glock_dq would call gfs2_glock_remove_from_lru. For glocks that are never put on the LRU, such as the transaction glock, this just takes the spin_lock, determines there's nothing to be done because the list is empty, then unlocks again. This was causing unnecessary lock contention on the lru_lock spin_lock. This patch adds a check for GLOF_LRU in the glops before taking the spin_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-07gfs2: Fix glock rhashtable rcu bugAndreas Gruenbacher
Before commit 88ffbf3e03 "GFS2: Use resizable hash table for glocks", glocks were freed via call_rcu to allow reading the glock hashtable locklessly using rcu. This was then changed to free glocks immediately, which made reading the glock hashtable unsafe. Bring back the original code for freeing glocks via call_rcu. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # 4.3+
2017-07-05gfs2: Clean up glock work enqueuingAndreas Gruenbacher
This patch adds a standardized queueing mechanism for glock work with spin_lock protection to prevent races. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-05-05GFS2: Allow glocks to be unlocked after withdrawBob Peterson
This bug fixes a regression introduced by patch 0d1c7ae9d8. The intent of the patch was to stop promoting glocks after a file system is withdrawn due to a variety of errors, because doing so results in a BUG(). (You should be able to unmount after a withdraw rather than having the kernel panic.) Unfortunately, it also stopped demotions, so glocks could not be unlocked after withdraw, which means the unmount would hang. This patch allows function do_xmote to demote locks to an unlocked state after a withdraw, but not promote them. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03gfs2: Switch to rhashtable_lookup_get_insert_fastAndreas Gruenbacher
Switch from rhashtable_lookup_insert_fast + rhashtable_lookup_fast to rhashtable_lookup_get_insert_fast, which is cleaner and avoids an extra rhashtable lookup. At the same time, turn the retry loop in gfs2_glock_get into an infinite loop. The lookup or insert will eventually succeed, usually very fast, but there is no reason to give up trying at a fixed number of iterations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16gfs2: Don't pack struct lm_locknameAndreas Gruenbacher
As per a suggestion by Linus, don't pack struct lm_lockname: we did that because the struct is used as a rhashtable key, but packing tells the compiler that the 64-bit fields in the struct may be unaligned, causing it to generate worse code on some architectures. Instead, rearrange the fields in the struct so that there is no padding between fields, and exclude any tail padding from the hash key size. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16gfs2: Deduplicate gfs2_{glocks,glstats}_openAndreas Gruenbacher
Both functions are identical except for the seq_operations used. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16gfs2: Replace rhashtable_walk_init with rhashtable_walk_enterAndreas Gruenbacher
Function rhashtable_walk_init is deprecated. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16GFS2: Prevent BUG from occurring when normal Withdraws occurBob Peterson
When the GFS2 file system withdraws due to metadata corruption, it often has outstanding transactions in the journal and delayed work queued for its glocks. This patch adds some new checks for a withdrawn file system before proceeding with operations that would obviously cause a BUG() to be triggered. That allows GFS2 to be safely unmounted rather than cause the system to go down. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-02-23Merge tag 'gfs2-4.11.addendum' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 fix from Bob Peterson: "This is an addendum for the 4.11 merge window. Andy Price wrote this patch to close a nasty race condition that allows access to glocks that are being destroyed. Without this patch, GFS2 is vulnerable to random corruption and kernel panic" * tag 'gfs2-4.11.addendum' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Add missing rcu locking for glock lookup
2017-02-23gfs2: Add missing rcu locking for glock lookupAndrew Price
We must hold the rcu read lock across looking up glocks and trying to bump their refcount to prevent the glocks from being freed in between. Cc: <stable@vger.kernel.org> # 4.3+ Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini Varadhan. 2) Simplify classifier state on sk_buff in order to shrink it a bit. From Willem de Bruijn. 3) Introduce SIPHASH and it's usage for secure sequence numbers and syncookies. From Jason A. Donenfeld. 4) Reduce CPU usage for ICMP replies we are going to limit or suppress, from Jesper Dangaard Brouer. 5) Introduce Shared Memory Communications socket layer, from Ursula Braun. 6) Add RACK loss detection and allow it to actually trigger fast recovery instead of just assisting after other algorithms have triggered it. From Yuchung Cheng. 7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot. 8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert. 9) Export MPLS packet stats via netlink, from Robert Shearman. 10) Significantly improve inet port bind conflict handling, especially when an application is restarted and changes it's setting of reuseport. From Josef Bacik. 11) Implement TX batching in vhost_net, from Jason Wang. 12) Extend the dummy device so that VF (virtual function) features, such as configuration, can be more easily tested. From Phil Sutter. 13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric Dumazet. 14) Add new bpf MAP, implementing a longest prefix match trie. From Daniel Mack. 15) Packet sample offloading support in mlxsw driver, from Yotam Gigi. 16) Add new aquantia driver, from David VomLehn. 17) Add bpf tracepoints, from Daniel Borkmann. 18) Add support for port mirroring to b53 and bcm_sf2 drivers, from Florian Fainelli. 19) Remove custom busy polling in many drivers, it is done in the core networking since 4.5 times. From Eric Dumazet. 20) Support XDP adjust_head in virtio_net, from John Fastabend. 21) Fix several major holes in neighbour entry confirmation, from Julian Anastasov. 22) Add XDP support to bnxt_en driver, from Michael Chan. 23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan. 24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi. 25) Support GRO in IPSEC protocols, from Steffen Klassert" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits) Revert "ath10k: Search SMBIOS for OEM board file extension" net: socket: fix recvmmsg not returning error from sock_error bnxt_en: use eth_hw_addr_random() bpf: fix unlocking of jited image when module ronx not set arch: add ARCH_HAS_SET_MEMORY config net: napi_watchdog() can use napi_schedule_irqoff() tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" net/hsr: use eth_hw_addr_random() net: mvpp2: enable building on 64-bit platforms net: mvpp2: switch to build_skb() in the RX path net: mvpp2: simplify MVPP2_PRS_RI_* definitions net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT net: mvpp2: remove unused register definitions net: mvpp2: simplify mvpp2_bm_bufs_add() net: mvpp2: drop useless fields in mvpp2_bm_pool and related code net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue' net: mvpp2: release reference to txq_cpu[] entry after unmapping net: mvpp2: handle too large value in mvpp2_rx_time_coal_set() net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set() net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set ...
2017-02-21Merge tag 'gfs2-4.11.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Robert Peterson: "We've got eight GFS2 patches for this merge window: - Andy Price submitted a patch to make gfs2_write_full_page a static function. - Dan Carpenter submitted a patch to fix a ERR_PTR thinko. Three patches fix bugs related to deleting very large files, which cause GFS2 to run out of journal space: - The first one prevents GFS2 delete operation from requesting too much journal space. - The second one fixes a problem whereby GFS2 can hang because it wasn't taking journal space demand into its calculations. - The third one wakes up IO waiters when a flush is done to restart processes stuck waiting for journal space to become available. The final three patches are a performance improvement related to spin_lock contention between multiple writers: - The "tr_touched" variable was switched to a flag to be more atomic and eliminate the possibility of some races. - Function meta_lo_add was moved inline with its only caller to make the code more readable and efficient. - Contention on the gfs2_log_lock spinlock was greatly reduced by avoiding the lock altogether in cases where we don't really need it: buffers that already appear in the appropriate metadata list for the journal. Many thanks to Steve Whitehouse for the ideas and principles behind these patches" * tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Make gfs2_write_full_page static GFS2: Reduce contention on gfs2_log_lock GFS2: Inline function meta_lo_add GFS2: Switch tr_touched to flag in transaction GFS2: Wake up io waiters whenever a flush is done GFS2: Made logd daemon take into account log demand GFS2: Limit number of transaction blocks requested for truncates GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
2017-02-17gfs2: Use rhashtable walk interface in glock_hash_walkHerbert Xu
The function glock_hash_walk walks the rhashtable by hand. This is broken because if it catches the hash table in the middle of a rehash, then it will miss entries. This patch replaces the manual walk by using the rhashtable walk interface. Fixes: 88ffbf3e037e ("GFS2: Use resizable hash table for glocks") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-25ktime: Cleanup ktime_set() usageThomas Gleixner
ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_nextDan Carpenter
This patch fixes a place where function gfs2_glock_iter_next can reference an invalid error pointer. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-09-21gfs2: fix to detect failure of register_shrinkerChao Yu
register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node deferred work"), we should detect the failure of it, otherwise we may fail to register shrinker after gfs2 module was been inited successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-08-02GFS2: use BIT() macroFabian Frederick
Replace 1 << value shift by more explicit BIT() macro Also fixes two bare unsigned definitions: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned hsize = BIT(ip->i_depth); Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27gfs2: Lock holder cleanupAndreas Gruenbacher
Make the code more readable by cleaning up the different ways of initializing lock holders and checking for initialized lock holders: mark lock holders as uninitialized by setting the holder's glock to NULL (gfs2_holder_mark_uninitialized) instead of zeroing out the entire object or using a separate flag. Recognize initialized holders by their non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects which are immeditiately initialized via gfs2_holder_init or gfs2_glock_nq_init. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27gfs2: Get rid of gfs2_ilookupAndreas Gruenbacher
Now that gfs2_lookup_by_inum only takes the inode glock for new inodes (and not for cached inodes anymore), there no longer is a need to optimize the cached-inode case in gfs2_get_dentry or delete_work_func, and gfs2_ilookup can be removed. In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this inconsistency goes away as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27gfs2: Fix gfs2_lookup_by_inum lock inversionAndreas Gruenbacher
The current gfs2_lookup_by_inum takes the glock of a presumed inode identified by block number, verifies that the block is indeed an inode, and then instantiates and reads the new inode via gfs2_inode_lookup. However, instantiating a new inode may block on freeing a previous instance of that inode (__wait_on_freeing_inode), and freeing an inode requires to take the glock already held, leading to lock inversion and deadlock. Fix this by first instantiating the new inode, then verifying that the block is an inode (if required), and then reading in the new inode, all in gfs2_inode_lookup. If the block we are looking for is not an inode, we discard the new inode via iget_failed, which marks inodes as bad and unhashes them. Other tasks waiting on that inode will get back a bad inode back from ilookup or iget_locked; in that case, retry the lookup. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-05-20Merge tag 'gfs2-4.7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got nine patches this time: - Abhi Das has two patches that fix a GFS2 splice issue (and an adjustment). - Ben Marzinski has a patch which allows the proper unmount of a GFS2 file system after hitting a withdraw error. - I have a patch to fix a problem where GFS2 would dereference an error value, plus three cosmetic / refactoring patches. - Daniel DeFreez has a patch to fix two glock reference count problems, where GFS2 was not properly "uninitializing" its glock holder on error paths. - Denys Vlasenko has a patch to change a function to not be inlined, thus reducing the memory footprint of the GFS2 module" * tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Refactor gfs2_remove_from_journal GFS2: Remove allocation parms from gfs2_rbm_find gfs2: use inode_lock/unlock instead of accessing i_mutex directly GFS2: Add calls to gfs2_holder_uninit in two error handlers GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read() GFS2: Get rid of dead code in inode_go_demote_ok GFS2: ignore unlock failures after withdraw