aboutsummaryrefslogtreecommitdiffstats
path: root/fs/orangefs
AgeCommit message (Collapse)Author
2023-01-18orangefs: Fix kmemleak in orangefs_{kernel,client}_debug_init()Zhang Xiaoxu
[ Upstream commit 31720a2b109b3080eb77e97b8f6f50a27b4ae599 ] When insert and remove the orangefs module, there are memory leaked as below: unreferenced object 0xffff88816b0cc000 (size 2048): comm "insmod", pid 783, jiffies 4294813439 (age 65.512s) hex dump (first 32 bytes): 6e 6f 6e 65 0a 00 00 00 00 00 00 00 00 00 00 00 none............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000031ab7788>] kmalloc_trace+0x27/0xa0 [<000000005b405fee>] orangefs_debugfs_init.cold+0xaf/0x17f [<00000000e5a0085b>] 0xffffffffa02780f9 [<000000004232d9f7>] do_one_initcall+0x87/0x2a0 [<0000000054f22384>] do_init_module+0xdf/0x320 [<000000003263bdea>] load_module+0x2f98/0x3330 [<0000000052cd4153>] __do_sys_finit_module+0x113/0x1b0 [<00000000250ae02b>] do_syscall_64+0x35/0x80 [<00000000f11c03c7>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Use the golbal variable as the buffer rather than dynamic allocate to slove the problem. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18orangefs: Fix kmemleak in orangefs_prepare_debugfs_help_string()Zhang Xiaoxu
[ Upstream commit d23417a5bf3a3afc55de5442eb46e1e60458b0a1 ] When insert and remove the orangefs module, then debug_help_string will be leaked: unreferenced object 0xffff8881652ba000 (size 4096): comm "insmod", pid 1701, jiffies 4294893639 (age 13218.530s) hex dump (first 32 bytes): 43 6c 69 65 6e 74 20 44 65 62 75 67 20 4b 65 79 Client Debug Key 77 6f 72 64 73 20 61 72 65 20 75 6e 6b 6e 6f 77 words are unknow backtrace: [<0000000004e6f8e3>] kmalloc_trace+0x27/0xa0 [<0000000006f75d85>] orangefs_prepare_debugfs_help_string+0x5e/0x480 [orangefs] [<0000000091270a2a>] _sub_I_65535_1+0x57/0xf70 [crc_itu_t] [<000000004b1ee1a3>] do_one_initcall+0x87/0x2a0 [<000000001d0614ae>] do_init_module+0xdf/0x320 [<00000000efef068c>] load_module+0x2f98/0x3330 [<000000006533b44d>] __do_sys_finit_module+0x113/0x1b0 [<00000000a0da6f99>] do_syscall_64+0x35/0x80 [<000000007790b19b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 When remove the module, should always free debug_help_string. Should always free the allocated buffer when change the free_debug_help_string. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18orangefs: Fix sysfs not cleanup when dev init failedZhang Xiaoxu
[ Upstream commit ea60a4ad0cf88b411cde6888b8c890935686ecd7 ] When the dev init failed, should cleanup the sysfs, otherwise, the module will never be loaded since can not create duplicate sysfs directory: sysfs: cannot create duplicate filename '/fs/orangefs' CPU: 1 PID: 6549 Comm: insmod Tainted: G W 6.0.0+ #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 sysfs_warn_dup.cold+0x17/0x24 sysfs_create_dir_ns+0x16d/0x180 kobject_add_internal+0x156/0x3a0 kobject_init_and_add+0xcf/0x120 orangefs_sysfs_init+0x7e/0x3a0 [orangefs] orangefs_init+0xfe/0x1000 [orangefs] do_one_initcall+0x87/0x2a0 do_init_module+0xdf/0x320 load_module+0x2f98/0x3330 __do_sys_finit_module+0x113/0x1b0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 kobject_add_internal failed for orangefs with -EEXIST, don't try to register things with the same name in the same directory. Fixes: 2f83ace37181 ("orangefs: put register_chrdev immediately before register_filesystem") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-20orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc()Christophe JAILLET
commit 40a74870b2d1d3d44e13b3b73c6571dd34f5614d upstream. 'buffer_index_array' really looks like a bitmap. So it should be allocated as such. When kzalloc is called, a number of bytes is expected, but a number of longs is passed instead. In get(), if not enough memory is allocated, un-allocated memory may be read or written. So use bitmap_zalloc() to safely allocate the correct memory size and avoid un-expected behavior. While at it, change the corresponding kfree() into bitmap_free() to keep the semantic. Fixes: ea2c9c9f6574 ("orangefs: bufmap rewrite") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17fs: orangefs: fix error return code of orangefs_revalidate_lookup()Jia-Ju Bai
[ Upstream commit 4c2b46c824a78fc8190d8eafaaea5a9078fe7479 ] When op_alloc() returns NULL to new_op, no error return code of orangefs_revalidate_lookup() is assigned. To fix this bug, ret is assigned with -ENOMEM in this case. Fixes: 8bb8aefd5afb ("OrangeFS: Change almost all instances of the string PVFS2 to OrangeFS.") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20orangefs: fix orangefs df output.Mike Marshall
[ Upstream commit 0fdec1b3c9fbb5e856a40db5993c9eaf91c74a83 ] Orangefs df output is whacky. Walt Ligon suggested this might fix it. It seems way more in line with reality now... Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21orangefs: get rid of knob code...Mike Marshall
commit ec95f1dedc9c64ac5a8b0bdb7c276936c70fdedd upstream. Christoph Hellwig sent in a reversion of "orangefs: remember count when reading." because: ->read_iter calls can race with each other and one or more ->flush calls. Remove the the scheme to store the read count in the file private data as is is completely racy and can cause use after free or double free conditions Christoph's reversion caused Orangefs not to work or to compile. I added a patch that fixed that, but intel's kbuild test robot pointed out that sending Christoph's patch followed by my patch upstream, it would break bisection because of the failure to compile. So I have combined the reversion plus my patch... here's the commit message that was in my patch: Logically, optimal Orangefs "pages" are 4 megabytes. Reading large Orangefs files 4096 bytes at a time is like trying to kick a dead whale down the beach. Before Christoph's "Revert orangefs: remember count when reading." I tried to give users a knob whereby they could, for example, use "count" in read(2) or bs with dd(1) to get whatever they considered an appropriate amount of bytes at a time from Orangefs and fill as many page cache pages as they could at once. Without the racy code that Christoph reverted Orangefs won't even compile, much less work. So this replaces the logic that used the private file data that Christoph reverted with a static number of bytes to read from Orangefs. I ran tests like the following to determine what a reasonable static number of bytes might be: dd if=/pvfsmnt/asdf of=/dev/null count=128 bs=4194304 dd if=/pvfsmnt/asdf of=/dev/null count=256 bs=2097152 dd if=/pvfsmnt/asdf of=/dev/null count=512 bs=1048576 . . . dd if=/pvfsmnt/asdf of=/dev/null count=4194304 bs=128 Reads seem faster using the static number, so my "knob code" wasn't just racy, it wasn't even a good idea... Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24help_next should increase position indexVasily Averin
[ Upstream commit 9f198a2ac543eaaf47be275531ad5cbd50db3edf ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-19Merge tag 'for-linus-5.4-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "A fix and a cleanup. The fix: way back in the stone age (2003) mode was set to the magic number "755" in what is now fs/orangefs/namei.c(orangefs_symlink). Łukasz Wrochna reported it and Artur Świgoń sent in a patch to change it to octal. Maybe it shouldn't be a magic number at all but rather something like "S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH"... cleanup: Colin Ian King found a redundant assignment and sent in a patch to remove it" [ And no, octal numbers for permissions are a lot more legible than a binary 'or' of some line noise macros. So 0755 is preferred over trying to spell it out using "helpful" macros - Linus ] * tag 'for-linus-5.4-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: remove redundant assignment to err orangefs: Add octal zero prefix
2019-09-12orangefs: remove redundant assignment to errColin Ian King
Variable err is initialized to a value that is never read and it is re-assigned later. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-09-12orangefs: Add octal zero prefixArtur Świgoń
This patch adds a missing zero to mode 755 specification required to express it in octal numeral system. Reported-by: Łukasz Wrochna <l.wrochna@samsung.com> Signed-off-by: Artur Świgoń <a.swigon@partner.samsung.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-07-31docs: fs: convert porting to ReSTMauro Carvalho Chehab
This file has its own proper style, except that, after a while, the coding style gets violated and whitespaces are placed on different ways. As Sphinx and ReST are very sentitive to whitespace differences, I had to opt if each entry after required/mandatory/... fields should start with zero spaces or with a tab. I opted to start them all from the zero position, in order to avoid needing to break lines with more than 80 columns, with would make harder for review. Most of the other changes at porting.rst were made to use an unified notation with works nice as a text file while also produce a good html output after being parsed. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31docs: fs: convert docs without extension to ReSTMauro Carvalho Chehab
There are 3 remaining files without an extension inside the fs docs dir. Manually convert them to ReST. In the case of the nfs/exporting.rst file, as the nfs docs aren't ported yet, I opted to convert and add a :orphan: there, with should be removed when it gets added into a nfs-specific part of the fs documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-16Merge tag 'for-linus-5.3-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Two small fixes. This is just a fix for an unused value that Colin King sent me and a related fix I added" * tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: eliminate needless variable assignments orangefs: remove redundant assignment to variable buffer_index
2019-07-12Merge tag 'vfs-fix-ioctl-checking-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong: "Here's a patch series that sets up common parameter checking functions for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations. The goal here is to reduce the amount of behaviorial variance between the filesystems where those ioctls originated (ext2 and XFS, respectively) and everybody else. - Standardize parameter checking for the SETFLAGS and FSSETXATTR ioctls (which were the file attribute setters for ext4 and xfs and have now been hoisted to the vfs) - Only allow the DAX flag to be set on files and directories" * tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: only allow FSSETXATTR to set DAX flag on files and dirs vfs: teach vfs_ioc_fssetxattr_check to check extent size hints vfs: teach vfs_ioc_fssetxattr_check to check project id info vfs: create a generic checking function for FS_IOC_FSSETXATTR vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
2019-07-11orangefs: eliminate needless variable assignmentsMike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-07-11orangefs: remove redundant assignment to variable buffer_indexColin Ian King
The variable buffer_index is being initialized however this is never read and later it is being reassigned to a new value. The initialization is redundant and hence can be removed. Addresses-Coverity: ("Unused Value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-07-04orangefs: fix build warning from debugfs cleanup patchGreg Kroah-Hartman
Stephen writes: After merging the driver-core tree, today's linux-next build (x86_64 allmodconfig) produced this warning: fs/orangefs/orangefs-debugfs.c: In function 'orangefs_debugfs_init': fs/orangefs/orangefs-debugfs.c:193:1: warning: label 'out' defined but not used [-Wunused-label] out: ^~~ fs/orangefs/orangefs-debugfs.c: In function 'orangefs_kernel_debug_init': fs/orangefs/orangefs-debugfs.c:204:17: warning: unused variable 'ret' [-Wunused-variable] struct dentry *ret; ^~~ Fix this up and change the return type of the function to void as it can not fail, which cleans up some more code and variables as well. Cc: Mike Marshall <hubcap@omnibond.com> Cc: Martin Brandenburg <martin@omnibond.com> Cc: devel@lists.orangefs.org Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: f095adba36bb ("orangefs: no need to check return value of debugfs_create functions") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03orangefs: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Mike Marshall <hubcap@omnibond.com> Cc: Martin Brandenburg <martin@omnibond.com> Cc: devel@lists.orangefs.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20190612152204.GA17511@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-01vfs: create a generic checking and prep function for FS_IOC_SETFLAGSDarrick J. Wong
Create a generic function to check incoming FS_IOC_SETFLAGS flag values and later prepare the inode for updates so that we can standardize the implementations that follow ext4's flag values. Note that the efivarfs implementation no longer fails a no-op SETFLAGS without CAP_LINUX_IMMUTABLE since that's the behavior in ext*. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-09Merge tag 'for-linus-5.2-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "This includes one fix and our "Orangefs through the pagecache" patch series which greatly improves our small IO performance and helps us pass more xfstests than before. Fix: - orangefs: truncate before updating size Pagecache series: - all the rest" * tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (23 commits) orangefs: truncate before updating size orangefs: copy Orangefs-sized blocks into the pagecache if possible. orangefs: pass slot index back to readpage. orangefs: remember count when reading. orangefs: add orangefs_revalidate_mapping orangefs: implement writepages orangefs: write range tracking orangefs: avoid fsync service operation on flush orangefs: skip inode writeout if nothing to write orangefs: move do_readv_writev to direct_IO orangefs: do not return successful read when the client-core disappeared orangefs: implement writepage orangefs: migrate to generic_file_read_iter orangefs: service ops done for writeback are not killable orangefs: remove orangefs_readpages orangefs: reorganize setattr functions to track attribute changes orangefs: let setattr write to cached inode orangefs: set up and use backing_dev_info orangefs: hold i_lock during inode_getattr orangefs: update attributes rather than relying on server ...
2019-05-03orangefs: truncate before updating sizeMartin Brandenburg
Otherwise we race with orangefs_writepage/orangefs_writepages which and does not expect i_size < page_offset. Fixes xfstests generic/129. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: copy Orangefs-sized blocks into the pagecache if possible.Mike Marshall
->readpage looks in file->private_data to try and find out how the userspace program set "count" in read(2) or with "dd bs=" or whatever. ->readpage uses "count" and inode->i_size to calculate how much data Orangefs should deposit in the Orangefs shared buffer, and remembers which slot the data is in. After copying data from the Orangefs shared buffer slot into "the page", readpage tries to increment through the pagecache index and fill as many pages as it can from the extra data in the shared buffer. Hopefully these extra pages will soon be needed by the vfs, and they'll be in the pagecache already. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2019-05-03orangefs: pass slot index back to readpage.Mike Marshall
When userspace deposits more than a page of data into the shared buffer, we'll need to know which slot it is in when we get back to readpage so that we can try to use the extra data to fill some extra pages. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2019-05-03orangefs: remember count when reading.Mike Marshall
Orangefs wins when it can do IO on large (up to four meg) blocks at a time, and looses when it has to do tiny "small io" reads and writes. Accessing Orangefs through the pagecache with the kernel module helps with small io, both reading and writing, a great deal. Readpage generally tries to fetch a page (four k) at a time. We'll let users use "count" (as in read(2) or pread(2) for example) as a knob to control how much data they get from Orangefs at a time and we'll try to use the data to fill extra pagecache pages when we get to ->readpage, hopefully resulting in fewer calls to readpage and Orangefs userspace. We need a way to remember how they set count so that we can still have it available when we get to ->readpage. - We'll use file->private_data to keep track of "count". We'll wrap generic_file_open with orangefs_file_open and initialize private_data to NULL there. - In ->read_iter we have access to both "count" and file, so we'll kmalloc some space onto file->private_data and store "count" there. - We'll kfree file->private_data each time we visit ->flush and reinitialize it to NULL. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2019-05-03orangefs: add orangefs_revalidate_mappingMartin Brandenburg
This is modeled after NFS, except our method is different. We use a simple timer to determine whether to invalidate the page cache. This is bound to perform. This addes a sysfs parameter cache_timeout_msecs which controls the time between page cache invalidations. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: implement writepagesMartin Brandenburg
Go through pages and look for a consecutive writable region. After finding a number of consecutive writable pages or when finding that the next page's dirty range is not contiguous and cannot be written as one request, send the write to the server. The number of pages is determined by the client-core's buffer size. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: write range trackingMartin Brandenburg
Attach the actual range of bytes written to plus the responsible uid/gid to each dirty page. This information must be sent to the server when the page is written out. Now write_begin, page_mkwrite, and invalidatepage keep up with this information. There are several conditions where they must write out the page immediately to store the new range. Two non-contiguous ranges cannot be stored on a single page. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: avoid fsync service operation on flushMartin Brandenburg
Without this, an fsync call is sent to the server even if no data changed. This resulted in a rather severe (50%) performance regression under certain metadata-heavy workloads. In the past, everything was direct IO. Nothing happend on a close call. An explicit fsync call would send an fsync request to the server which in turn fsynced the underlying file. Now there are cached writes. Then fsync began writing out dirty pages in addition to making an fsync request to the server, and close began calling fsync. With this commit, close only writes out dirty pages, and does not make the fsync request. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: skip inode writeout if nothing to writeMartin Brandenburg
Would happen if an inode is dirty but whatever happened is not something that can be written out to OrangeFS. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: move do_readv_writev to direct_IOMartin Brandenburg
direct_IO was the only caller and all direct_IO did was call it, so there's no use in having the code spread out into so many functions. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: do not return successful read when the client-core disappearedMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: implement writepageMartin Brandenburg
Now orangefs_inode_getattr fills from cache if an inode has dirty pages. also if attr_valid and dirty pages and !flags, we spin on inode writeback before returning if pages still dirty after: should it be other way Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: migrate to generic_file_read_iterMartin Brandenburg
Remove orangefs_inode_read. It was used by readpage. Calling wait_for_direct_io directly serves the purpose just as well. There is now no check of the bufmap size in the readpage path. There are already other places the bufmap size is assumed to be greater than PAGE_SIZE. Important to call truncate_inode_pages now in the write path so a subsequent read sees the new data. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: service ops done for writeback are not killableMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: remove orangefs_readpagesMartin Brandenburg
It's a copy of the loop which would run in read_pages from mm/readahead.c. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: reorganize setattr functions to track attribute changesMartin Brandenburg
OrangeFS accepts a mask indicating which attributes were changed. The kernel must not set any bits except those that were actually changed. The kernel must set the uid/gid of the request to the actual uid/gid responsible for the change. Code path for notify_change initiated setattrs is orangefs_setattr(dentry, iattr) -> __orangefs_setattr(inode, iattr) In kernel changes are initiated by calling __orangefs_setattr. Code path for writeback is orangefs_write_inode -> orangefs_inode_setattr attr_valid and attr_uid and attr_gid change together under i_lock. I_DIRTY changes separately. __orangefs_setattr lock if needs to be cleaned first, unlock and retry set attr_valid copy data in unlock mark_inode_dirty orangefs_inode_setattr lock copy attributes out unlock clear getattr_time # __writeback_single_inode clears dirty orangefs_inode_getattr # possible to get here with attr_valid set and not dirty lock if getattr_time ok or attr_valid set, unlock and return unlock do server operation # another thread may getattr or setattr, so check for that lock if getattr_time ok or attr_valid, unlock and return else, copy in update getattr_time unlock Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: let setattr write to cached inodeMartin Brandenburg
This is a fairly big change, but ultimately it's not a lot of code. Implement write_inode and then avoid the call to orangefs_inode_setattr within orangefs_setattr. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: set up and use backing_dev_infoMartin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: hold i_lock during inode_getattrMartin Brandenburg
This should be a no-op now. When inode writeback works, this will prevent a getattr from overwriting inode data while an inode is transitioning to dirty. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: update attributes rather than relying on serverMartin Brandenburg
This should be a no-op now, but once inode writeback works, it'll be necessary to have the correct attribute in the dirty inode. Previously the attribute fetch timeout was marked invalid and the server provided the updated attribute. When the inode is dirty, the server cannot be consulted since it does not yet know the pending setattr. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: simplify orangefs_inode_getattr interfaceMartin Brandenburg
No need to store the received mask. It is either STATX_BASIC_STATS or STATX_BASIC_STATS & ~STATX_SIZE. If STATX_SIZE is requested, the cache is bypassed anyway, so the cached mask is unnecessary to decide whether to do a real getattr. This is a change. Previously a getattr would want size and use the cached size. All of the in-kernel callers that wanted size did not want a cached size. Now a getattr cannot use the cached size if it wants size at all. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: do not invalidate attributes on inode createMartin Brandenburg
When an inode is created, we fetch attributes from the server. There is no need to turn around and invalidate them. No need to initialize attributes after the getattr either. Either it'll be exactly the same, or it'll be something else and wrong. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03orangefs: implement xattr cacheMartin Brandenburg
This uses the same timeout as the getattr cache. This substantially increases performance when writing files with smaller buffer sizes. When writing, the size is (often) changed, which causes a call to notify_change which calls security_inode_need_killpriv which needs a getxattr. Caching it reduces traffic to the server. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-01orangefs: make use of ->free_inode()Al Viro
Acked-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-12Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted fixes (really no common topic here)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Make __vfs_write() static vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1 pipe: stop using ->can_merge splice: don't merge into linked buffers fs: move generic stat response attr handling to vfs_getattr_nosec orangefs: don't reinitialize result_mask in ->getattr fs/devpts: always delete dcache dentry-s in dput()
2019-02-20orangefs: remove two un-needed BUG_ONs...Mike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>