aboutsummaryrefslogtreecommitdiffstats
path: root/fs/gfs2/ops_fstype.c
AgeCommit message (Collapse)Author
2022-11-25gfs2: Switch from strlcpy to strscpyAndreas Gruenbacher
commit 204c0300c4e99707e9fb6e57840aa1127060e63f upstream. Switch from strlcpy to strscpy and make sure that @count is the size of the smaller of the source and destination buffers. This prevents reading beyond the end of the source buffer when the source string isn't null terminated. Found by a modified version of syzkaller. Suggested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25gfs2: Check sb_bsize_shift after reading superblockAndrew Price
commit 670f8ce56dd0632dc29a0322e188cc73ce3c6b92 upstream. Fuzzers like to scribble over sb_bsize_shift but in reality it's very unlikely that this field would be corrupted on its own. Nevertheless it should be checked to avoid the possibility of messy mount errors due to bad calculations. It's always a fixed value based on the block size so we can just check that it's the expected value. Tested with: mkfs.gfs2 -O -p lock_nolock /dev/vdb for i in 0 -1 64 65 32 33; do gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb mount /dev/vdb /mnt/test && umount /mnt/test done Before this patch we get a withdraw after [ 76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block [ 76.413681] bh = 19 (type: exp=5, found=4) [ 76.413681] function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492 and with UBSAN configured we also get complaints like [ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19 [ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int' After the patch, these complaints don't appear, mount fails immediately and we get an explanation in dmesg. Reported-by: syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-05gfs2: add validation checks for size of superblockAnant Thazhemadam
[ Upstream commit 0ddc5154b24c96f20e94d653b0a814438de6032b ] In gfs2_check_sb(), no validation checks are performed with regards to the size of the superblock. syzkaller detected a slab-out-of-bounds bug that was primarily caused because the block size for a superblock was set to zero. A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE. Performing validation checks and ensuring that the size of the superblock is valid fixes this bug. Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com Suggested-by: Andrew Price <anprice@redhat.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> [Minor code reordering.] Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05gfs2: use-after-free in sysfs deregistrationJamie Iles
[ Upstream commit c2a04b02c060c4858762edce4674d5cba3e5a96f ] syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y: Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228 CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S 5.9.0-rc8+ #101 Hardware name: linux,dummy-virt (DT) Workqueue: events kobject_delayed_cleanup Call trace: dump_backtrace+0x0/0x4d8 show_stack+0x34/0x48 dump_stack+0x174/0x1f8 print_address_description.constprop.0+0x5c/0x550 kasan_report+0x13c/0x1c0 __asan_report_load1_noabort+0x34/0x60 memcmp+0xd0/0xd8 gfs2_uevent+0xc4/0x188 kobject_uevent_env+0x54c/0x1240 kobject_uevent+0x2c/0x40 __kobject_del+0x190/0x1d8 kobject_delayed_cleanup+0x2bc/0x3b8 process_one_work+0x96c/0x18c0 worker_thread+0x3f0/0xc30 kthread+0x390/0x498 ret_from_fork+0x10/0x18 Allocated by task 1110: kasan_save_stack+0x28/0x58 __kasan_kmalloc.isra.0+0xc8/0xe8 kasan_kmalloc+0x10/0x20 kmem_cache_alloc_trace+0x1d8/0x2f0 alloc_super+0x64/0x8c0 sget_fc+0x110/0x620 get_tree_bdev+0x190/0x648 gfs2_get_tree+0x50/0x228 vfs_get_tree+0x84/0x2e8 path_mount+0x1134/0x1da8 do_mount+0x124/0x138 __arm64_sys_mount+0x164/0x238 el0_svc_common.constprop.0+0x15c/0x598 do_el0_svc+0x60/0x150 el0_svc+0x34/0xb0 el0_sync_handler+0xc8/0x5b4 el0_sync+0x15c/0x180 Freed by task 228: kasan_save_stack+0x28/0x58 kasan_set_track+0x28/0x40 kasan_set_free_info+0x24/0x48 __kasan_slab_free+0x118/0x190 kasan_slab_free+0x14/0x20 slab_free_freelist_hook+0x6c/0x210 kfree+0x13c/0x460 Use the same pattern as f2fs + ext4 where the kobject destruction must complete before allowing the FS itself to be freed. This means that we need an explicit free_sbd in the callers. Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Jamie Iles <jamie@nuviainc.com> [Also go to fail_free when init_names fails.] Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22gfs2: read-only mounts should grab the sd_freeze_gl glockBob Peterson
[ Upstream commit b780cc615ba4795a7ef0e93b19424828a5ad456a ] Before this patch, only read-write mounts would grab the freeze glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze glock was never initialized. That meant requests to freeze, which request the glock in EX, were granted without any state transition. That meant you could mount a gfs2 file system, which is currently frozen on a different cluster node, in read-only mode. This patch makes read-only mounts lock the freeze glock in SH mode, which will block for file systems that are frozen on another node. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24gfs2: Allow lock_nolock mount to specify jid=XBob Peterson
[ Upstream commit ea22eee4e6027d8927099de344f7fff43c507ef9 ] Before this patch, a simple typo accidentally added \n to the jid= string for lock_nolock mounts. This made it impossible to mount a gfs2 file system with a journal other than journal0. Thus: mount -tgfs2 -o hostdata="jid=1" <device> <mount pt> Resulted in: mount: wrong fs type, bad option, bad superblock on <device> In most cases this is not a problem. However, for debugging and testing purposes we sometimes want to test the integrity of other journals. This patch removes the unnecessary \n and thus allows lock_nolock users to specify an alternate journal. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-30gfs2: Fix initialisation of args for remountAndrew Price
When gfs2 was converted to use fs_context, the initialisation of the mount args structure to the currently active args was lost with the removal of gfs2_remount_fs(), so the checks of the new args on remount became checks against the default values instead of the current ones. This caused unexpected remount behaviour and test failures (xfstests generic/294, generic/306 and generic/452). Reinstate the args initialisation, this time in gfs2_init_fs_context() and conditional upon fc->purpose, as that's the only time we get control before the mount args are parsed in the remount process. Fixes: 1f52aa08d12f ("gfs2: Convert gfs2 to fs_context") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-10-24gfs2: Fix memory leak when gfs2meta's fs_context is freedAndrew Price
gfs2 and gfs2meta share an ->init_fs_context function which allocates an args structure stored in fc->fs_private. gfs2 registers a ->free function to free this memory when the fs_context is cleaned up, but there was not one registered for gfs2meta, causing a leak. Register a ->free function for gfs2meta. The existing gfs2_fc_free function does what we need. Reported-by: syzbot+c2fdfd2b783754878fb6@syzkaller.appspotmail.com Fixes: 1f52aa08d12f ("gfs2: Convert gfs2 to fs_context") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-24Merge branch 'work.mount3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more mount API conversions from Al Viro: "Assorted conversions of options parsing to new API. gfs2 is probably the most serious one here; the rest is trivial stuff. Other things in what used to be #work.mount are going to wait for the next cycle (and preferably go via git trees of the filesystems involved)" * 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gfs2: Convert gfs2 to fs_context vfs: Convert spufs to use the new mount API vfs: Convert hypfs to use the new mount API hypfs: Fix error number left in struct pointer member vfs: Convert functionfs to use the new mount API vfs: Convert bpf to use the new mount API
2019-09-18gfs2: Convert gfs2 to fs_contextAndrew Price
Convert gfs2 and gfs2meta to fs_context. Removes the duplicated vfs code from gfs2_mount and instead uses the new vfs_get_block_super() before switching the ->root to the appropriate dentry. The mount option parsing has been converted to the new API and error reporting for invalid options has been made more precise at the same time. All of the mount/remount code has been moved into ops_fstype.c Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: cluster-devel@redhat.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-04gfs2: Use async glocks for renameBob Peterson
Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can reverse the roles of which directories are "old" and which are "new" for the purposes of rename. This can cause deadlocks where two nodes end up waiting for each other. There can be several layers of directory dependencies across many nodes. This patch fixes the problem by acquiring all gfs2_rename's inode glocks asychronously and waiting for all glocks to be acquired. That way all inodes are locked regardless of the order. The timeout value for multiple asynchronous glocks is calculated to be the total of the individual wait times for each glock times two. Since gfs2_exchange is very similar to gfs2_rename, both functions are patched in the same way. A new async glock wait queue, sd_async_glock_wait, keeps a list of waiters for these events. If gfs2's holder_wake function detects an async holder, it wakes up any waiters for the event. The waiter only tests whether any of its requests are still pending. Since the glocks are sent to dlm asychronously, the wait function needs to check to see which glocks, if any, were granted. If a glock is granted by dlm (and therefore held), its minimum hold time is checked and adjusted as necessary, as other glock grants do. If the event times out, all glocks held thus far must be dequeued to resolve any existing deadlocks. Then, if there are any outstanding locking requests, we need to loop around and wait for dlm to respond to those requests too. After we release all requests, we return -ESTALE to the caller (vfs rename) which loops around and retries the request. Node1 Node2 --------- --------- 1. Enqueue A Enqueue B 2. Enqueue B Enqueue A 3. A granted 6. B granted 7. Wait for B 8. Wait for A 9. A times out (since Node 1 holds A) 10. Dequeue B (since it was granted) 11. Wait for all requests from DLM 12. B Granted (since Node2 released it in step 10) 13. Rename 14. Dequeue A 15. DLM Grants A 16. Dequeue A (due to the timeout and since we no longer have B held for our task). 17. Dequeue B 18. Return -ESTALE to vfs 19. VFS retries the operation, goto step 1. This release-all-locks / acquire-all-locks may slow rename / exchange down as both nodes struggle in the same way and do the same thing. However, this will only happen when there is contention for the same inodes, which ought to be rare. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWNBob Peterson
Before this patch, the superblock flag indicating when a file system is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to the more obvious SDF_WITHDRAWN. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Use IS_ERR_OR_NULLKefeng Wang
Use IS_ERR_OR_NULL where appropriate. (Several more places converted by Andreas.) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Clean up freeing struct gfs2_sbdAndreas Gruenbacher
Add a free_sbd function for freeing a struct gfs2_sbd. Use that for freeing a super-block descriptor, either directly or via kobject_put. Free sd_lkstats inside the kobject release function: that way, gfs2_put_super will no longer leak sd_lkstats. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07gfs2: read journal in large chunksAbhi Das
Use bios to read in the journal into the address space of the journal inode (jd_inode), sequentially and in large chunks. This is faster for locating the journal head that the previous binary search approach. When performing recovery, we keep the journal in the address space until recovery is done, which further speeds up things. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Rename sd_log_le_{revoke,ordered}Andreas Gruenbacher
Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to sd_log_ordered: not sure what le stands for here, but it doesn't add clarity, and if it stands for list entry, it's actually confusing as those are both list heads but not list entries. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-02-14Revert "gfs2: read journal in large chunks to locate the head"Bob Peterson
This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-11gfs2: read journal in large chunks to locate the headAbhi Das
Use bio(s) to read in the journal sequentially in large chunks and locate the head of the journal. This version addresses the issues Christoph pointed out w.r.t error handling and using deprecated API. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
2018-10-24Merge tag 'gfs2-4.20.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got 18 patches for this merge window, none of which are very major: - clean up the gfs2 block allocator to prepare for future performance enhancements (Andreas Gruenbacher) - fix a use-after-free problem (Andy Price) - patches that fix gfs2's broken rgrplvb mount option (me) - cleanup patches and error message improvements (me) - enable getlabel support (Steve Whitehouse and Abhi Das) - flush the glock delete workqueue at exit (Tim Smith)" * tag 'gfs2-4.20.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix minor typo: couln't versus couldn't. gfs2: write revokes should traverse sd_ail1_list in reverse gfs2: Pass resource group to rgblk_free gfs2: Remove unnecessary gfs2_rlist_alloc parameter gfs2: Fix marking bitmaps non-full gfs2: Fix some minor typos gfs2: Rename bitmap.bi_{len => bytes} gfs2: Remove unused RGRP_RSRV_MINBYTES definition gfs2: Move rs_{sizehint, rgd_gh} fields into the inode gfs2: Clean up out-of-bounds check in gfs2_rbm_from_block gfs2: Always check the result of gfs2_rbm_from_block gfs2: getlabel support GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads gfs2: Don't leave s_fs_info pointing to freed memory in init_sbd gfs2: Use fs_* functions instead of pr_* function where we can gfs2: slow the deluge of io error messages gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated gfs2: improve debug information when lvb mismatches are found
2018-10-13gfs2_meta: ->mount() can get NULL dev_nameAl Viro
get in sync with mount_bdev() handling of the same Reported-by: syzbot+c54f8e94e6bba03b04e9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-10-08gfs2: Don't leave s_fs_info pointing to freed memory in init_sbdAndrew Price
When alloc_percpu() fails, sdp gets freed but sb->s_fs_info still points to the same address. Move the assignment after that error check so that s_fs_info can only point to a valid sdp or NULL, which is checked for later in the error path, in gfs2_kill_super(). Reported-by: syzbot+dcb8b3587445007f5808@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-16gfs2: Remove sdp->sd_jheightsizeAndreas Gruenbacher
GFS2 keeps two arrarys in the superblock that define the maximum size of an inode depending on the inode's height: sdp->sd_heightsize defines the heights in units of sb->s_blocksize; sdp->sd_jheightsize defines them in units of sb->s_blocksize - sizeof(struct gfs2_meta_header). These arrays are used to determine when additional layers of indirect blocks are needed. The second array is used for directories which have an additional gfs2_meta_header at the beginning of each block. Distinguishing between these two cases makes no sense: the height required for representing N blocks will come out the same no matter if the calculation is done in gross (sb->s_blocksize) or net (sb->s_blocksize - sizeof(struct gfs2_meta_header)) units. Stuffed directories don't have an additional gfs2_meta_header, but the stuffed case is handled separately for both files and directories, anyway. Remove the unncessary sdp->sd_jheightsize array. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12GFS2: Minor improvements to comments and documentationBob Peterson
This patch simply fixes some comments and the gfs2-glocks.txt file: Places where i_rwsem was called i_mutex, and adding i_rw_mutex. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-23GFS2: Log the reason for log flushes in every log headerBob Peterson
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23GFS2: Introduce new gfs2_log_header_v2Bob Peterson
This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-11-27Rename superblock flags (MS_xyz -> SB_xyz)Linus Torvalds
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-08-25gfs2: Silence gcc format-truncation warningAndreas Gruenbacher
Enlarge sd_fsname to be big enough for the longest long lock table name and an arbitrary journal number. This silences two -Wformat-truncation warnings with gcc 7.1.1. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09GFS2: Delete debugfs files only after we evict the glocksBob Peterson
This patch moves the call to gfs2_delete_debugfs_file so that it comes after the glock hash table has been cleared. This way we can query the debugfs files if umount hangs. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-17VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)David Howells
Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>
2017-06-13Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into ↵Christoph Hellwig
nvme-base
2017-06-09block: switch bios to blk_status_tChristoph Hellwig
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-05fs: switch ->s_uuid to uuid_tChristoph Hellwig
For some file systems we still memcpy into it, but in various places this already allows us to use the proper uuid helpers. More to come.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM) Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-04-20fs: Remove SB_I_DYNBDI flagJan Kara
Now that all bdi structures filesystems use are properly refcounted, we can remove the SB_I_DYNBDI flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20gfs2: Convert to properly refcounting bdiJan Kara
Similarly to set_bdev_super() GFS2 just used block device reference to bdi. Convert it to properly getting bdi reference. The reference will get automatically dropped on superblock destruction. CC: Steven Whitehouse <swhiteho@redhat.com> CC: Bob Peterson <rpeterso@redhat.com> CC: cluster-devel@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-21Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: - blk-mq scheduling framework from me and Omar, with a port of the deadline scheduler for this framework. A port of BFQ from Paolo is in the works, and should be ready for 4.12. - Various fixups and improvements to the above scheduling framework from Omar, Paolo, Bart, me, others. - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar. This allows us to export more information that helps debug hangs or performance issues, without cluttering or abusing the sysfs API. - Fixes for the sbitmap code, the scalable bitmap code that was migrated from blk-mq, from Omar. - Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes. - Support for ranged discard requests and discard merging, also from Christoph. - Support for OPAL in the block layer, and for NVMe as well. Mainly from Scott Bauer, with fixes/updates from various others folks. - Error code fixup for gdrom from Christophe. - cciss pci irq allocation cleanup from Christoph. - Making the cdrom device operations read only, from Kees Cook. - Fixes for duplicate bdi registrations and bdi/queue life time problems from Jan and Dan. - Set of fixes and updates for lightnvm, from Matias and Javier. - A few fixes for nbd from Josef, using idr to name devices and a workqueue deadlock fix on receive. Also marks Josef as the current maintainer of nbd. - Fix from Josef, overwriting queue settings when the number of hardware queues is updated for a blk-mq device. - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO aborted, if we didn't end up aborting it. - SG gap merging fix from Ming Lei for block. - Loop fix also from Ming, fixing a race and crash between setting loop status and IO. - Two block race fixes from Tahsin, fixing request list iteration and fixing a race between device registration and udev device add notifiations. - Double free fix from cgroup writeback, from Tejun. - Another double free fix in blkcg, from Hou Tao. - Partition overflow fix for EFI from Alden Tondettar. * tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits) nvme: Check for Security send/recv support before issuing commands. block/sed-opal: allocate struct opal_dev dynamically block/sed-opal: tone down not supported warnings block: don't defer flushes on blk-mq + scheduling blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers blk-mq: don't special case flush inserts for blk-mq-sched blk-mq-sched: don't add flushes to the head of requeue queue blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not block: do not allow updates through sysfs until registration completes lightnvm: set default lun range when no luns are specified lightnvm: fix off-by-one error on target initialization Maintainers: Modify SED list from nvme to block Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN uapi: sed-opal fix IOW for activate lsp to use correct struct cdrom: Make device operations read-only elevator: fix loading wrong elevator type for blk-mq devices cciss: switch to pci_irq_alloc_vectors block/loop: fix race between I/O and set_status blk-mq-sched: don't hold queue_lock when calling exit_icq block: set make_request_fn manually in blk_mq_update_nr_hw_queues ...
2017-02-21Merge tag 'gfs2-4.11.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Robert Peterson: "We've got eight GFS2 patches for this merge window: - Andy Price submitted a patch to make gfs2_write_full_page a static function. - Dan Carpenter submitted a patch to fix a ERR_PTR thinko. Three patches fix bugs related to deleting very large files, which cause GFS2 to run out of journal space: - The first one prevents GFS2 delete operation from requesting too much journal space. - The second one fixes a problem whereby GFS2 can hang because it wasn't taking journal space demand into its calculations. - The third one wakes up IO waiters when a flush is done to restart processes stuck waiting for journal space to become available. The final three patches are a performance improvement related to spin_lock contention between multiple writers: - The "tr_touched" variable was switched to a flag to be more atomic and eliminate the possibility of some races. - Function meta_lo_add was moved inline with its only caller to make the code more readable and efficient. - Contention on the gfs2_log_lock spinlock was greatly reduced by avoiding the lock altogether in cases where we don't really need it: buffers that already appear in the appropriate metadata list for the journal. Many thanks to Steve Whitehouse for the ideas and principles behind these patches" * tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Make gfs2_write_full_page static GFS2: Reduce contention on gfs2_log_lock GFS2: Inline function meta_lo_add GFS2: Switch tr_touched to flag in transaction GFS2: Wake up io waiters whenever a flush is done GFS2: Made logd daemon take into account log demand GFS2: Limit number of transaction blocks requested for truncates GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
2017-02-02block: Use pointer to backing_dev_info from request_queueJan Kara
We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-05GFS2: Made logd daemon take into account log demandBob Peterson
Before this patch, the logd daemon only tried to flush things when the log blocks pinned exceeded a certain threshold. But when we're deleting very large files, it may require a huge number of journal blocks, and that, in turn, may exceed the threshold. This patch factors that into account. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-02GFS2: use BIT() macroFabian Frederick
Replace 1 << value shift by more explicit BIT() macro Also fixes two bare unsigned definitions: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned hsize = BIT(ip->i_depth); Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-07-26Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-06-27gfs2: Fix gfs2_lookup_by_inum lock inversionAndreas Gruenbacher
The current gfs2_lookup_by_inum takes the glock of a presumed inode identified by block number, verifies that the block is indeed an inode, and then instantiates and reads the new inode via gfs2_inode_lookup. However, instantiating a new inode may block on freeing a previous instance of that inode (__wait_on_freeing_inode), and freeing an inode requires to take the glock already held, leading to lock inversion and deadlock. Fix this by first instantiating the new inode, then verifying that the block is an inode (if required), and then reading in the new inode, all in gfs2_inode_lookup. If the block we are looking for is not an inode, we discard the new inode via iget_failed, which marks inodes as bad and unhashes them. Other tasks waiting on that inode will get back a bad inode back from ilookup or iget_locked; in that case, retry the lookup. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-07gfs2: use bio op accessorsMike Christie
Separate the op from the rq_flag_bits and have gfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block/fs/drivers: remove rw argument from submit_bioMike Christie
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02parallel lookups: actual switch to rwsemAl Viro
ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10don't bother with ->d_inode->i_sb - it's always equal to ->d_sbAl Viro
... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>