aboutsummaryrefslogtreecommitdiffstats
path: root/fs/fuse/file.c
AgeCommit message (Collapse)Author
2019-05-25fuse: Add FOPEN_STREAM to use stream_open()Kirill Smelkov
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream. Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25fuse: honor RLIMIT_FSIZE in fuse_file_fallocateLiu Bo
commit 0cbade024ba501313da3b7e5dd2a188a6bc491b5 upstream. fstests generic/228 reported this failure that fuse fallocate does not honor what 'ulimit -f' has set. This adds the necessary inode_newsize_ok() check. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation") Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25fuse: fix writepages on 32bitMiklos Szeredi
commit 9de5be06d0a89ca97b5ab902694d42dfd2bb77d2 upstream. Writepage requests were cropped to i_size & 0xffffffff, which meant that mmaped writes to any file larger than 4G might be silently discarded. Fix by storing the file size in a properly sized variable (loff_t instead of size_t). Reported-by: Antonio SJ Musumeci <trapexit@spawn.link> Fixes: 6eaf4782eb09 ("fuse: writepages: crop secondary requests") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16fuse: decrement NR_WRITEBACK_TEMP on the right pageMiklos Szeredi
NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not the page cache page. Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-11fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYSChad Austin
When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection. Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent to userspace. Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR inside of fuse_file_put. Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-03fuse: fix fsync on directoryMiklos Szeredi
Commit ab2257e9941b ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: Brett Girton <btgirton@gmail.com> Tested-by: Brett Girton <btgirton@gmail.com> Fixes: ab2257e9941b ("fuse: reduce size of struct fuse_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-11-09fuse: fix use-after-free in fuse_direct_IO()Lukas Czerner
In async IO blocking case the additional reference to the io is taken for it to survive fuse_aio_complete(). In non blocking case this additional reference is not needed, however we still reference io to figure out whether to wait for completion or not. This is wrong and will lead to use-after-free. Fix it by storing blocking information in separate variable. This was spotted by KASAN when running generic/208 fstest. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv") Cc: <stable@vger.kernel.org> # v4.6
2018-11-01Merge branch 'work.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...
2018-10-24iov_iter: Use accessor functionDavid Howells
Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-15fuse: only invalidate atime in direct readMiklos Szeredi
After sending a synchronous READ request from __fuse_direct_read() we only need to invalidate atime; none of the other attributes should be changed by a read(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: realloc page arrayMiklos Szeredi
Writeback caching currently allocates requests with the maximum number of possible pages, while the actual number of pages per request depends on a couple of factors that cannot be determined when the request is allocated (whether page is already under writeback, whether page is contiguous with previous pages already added to a request). This patch allows such requests to start with no page allocation (all pages inline) and grow the page array on demand. If the max_pages tunable remains the default value, then this will mean just one allocation that is the same size as before. If the tunable is larger, then this adds at most 3 additional memory allocations (which is generously compensated by the improved performance from the larger request). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: add max_pages to init_outConstantine Shulyupin
Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 | grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: reduce size of struct fuse_inodeMiklos Szeredi
Do this by grouping fields used for cached writes and putting them into a union with fileds used for cached readdir (with obviously no overlap, since we don't have hybrid objects). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: allow using readdir cacheMiklos Szeredi
The cache is only used if it's completed, not while it's still being filled; this constraint could be lifted later, if it turns out to be useful. Introduce state in struct fuse_file that indicates the position within the cache. After a seek, reset the position to the beginning of the cache and search the cache for the current position. If the current position is not found in the cache, then fall back to uncached readdir. It can also happen that page(s) disappear from the cache, in which case we must also fall back to uncached readdir. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: do not take fc->lock in fuse_request_send_background()Kirill Tkhai
Currently, we take fc->lock there only to check for fc->connected. But this flag is changed only on connection abort, which is very rare operation. So allow checking fc->connected under just fc->bg_lock and use this lock (as well as fc->lock) when resetting fc->connected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: introduce fc->bg_lockKirill Tkhai
To reduce contention of fc->lock, this patch introduces bg_lock for protection of fields related to background queue. These are: max_background, congestion_threshold, num_background, active_background, bg_queue and blocked. This allows next patch to make async reads not requiring fc->lock, so async reads and writes will have better performance executed in parallel. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: add support for copy_file_range()Niels de Vos
There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-08-21Merge tag 'fuse-update-4.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "Various bug fixes and cleanups" * tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: reduce allocation size for splice_write fuse: use kvmalloc to allocate array of pipe_buffer structs. fuse: convert last timespec use to timespec64 fs: fuse: Adding new return type vm_fault_t fuse: simplify fuse_abort_conn() fuse: Add missed unlock_page() to fuse_readpages_fill() fuse: Don't access pipe->buffers without pipe_lock() fuse: fix initial parallel dirops fuse: Fix oops at process_init_reply() fuse: umount should wait for all requests fuse: fix unlocked access to processing queue fuse: fix double request_end()
2018-07-26fs: fuse: Adding new return type vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: Add missed unlock_page() to fuse_readpages_fill()Kirill Tkhai
The above error path returns with page unlocked, so this place seems also to behave the same. Fixes: f8dbdf81821b ("fuse: rework fuse_readpages()") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-21pids: Compute task_tgid using signal->leader_pidEric W. Biederman
The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-02-11vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29make kernel-side POLL... arch-independentAl Viro
mangle/demangle on the way to/from userland Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27fs: annotate ->poll() instancesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This fixes a regression (spotted by the Sandstorm.io folks) in the pid namespace handling introduced in 4.12. There's also a fix for honoring sync/dsync flags for pwritev2()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: getattr cleanup fuse: honor iocb sync flags on write fuse: allow server to run in different pid_ns
2017-09-12fuse: getattr cleanupMiklos Szeredi
The refreshed argument isn't used by any caller, get rid of it. Use a helper for just updating the inode (no need to fill in a kstat). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: honor iocb sync flags on writeMiklos Szeredi
If the IOCB_DSYNC flag is set a sync is not being performed by fuse_file_write_iter. Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the flags filed of the write request. We don't need to sync data or metadata, since fuse_perform_write() does write-through and the filesystem is responsible for updating file times. Original patch by Vitaly Zolotusky. Reported-by: Nate Clark <nate@neworld.us> Cc: Vitaly Zolotusky <vitaly@unitc.com>. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: allow server to run in different pid_nsMiklos Szeredi
Commit 0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke Sandstorm.io development tools, which have been sending FUSE file descriptors across PID namespace boundaries since early 2014. The above patch added a check that prevented I/O on the fuse device file descriptor if the pid namespace of the reader/writer was different from the pid namespace of the mounter. With this change passing the device file descriptor to a different pid namespace simply doesn't work. The check was added because pids are transferred to/from the fuse userspace server in the namespace registered at mount time. To fix this regression, remove the checks and do the following: 1) the pid in the request header (the pid of the task that initiated the filesystem operation) is translated to the reader's pid namespace. If a mapping doesn't exist for this pid, then a zero pid is used. Note: even if a mapping would exist between the initiator task's pid namespace and the reader's pid namespace the pid will be zero if either mapping from initator's to mounter's namespace or mapping from mounter's to reader's namespace doesn't exist. 2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone. Userspace should not interpret this value anyway. Also allow the setlk/setlkw operations if the pid of the task cannot be represented in the mounter's namespace (pid being zero in that case). Reported-by: Kenton Varda <kenton@sandstorm.io> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces") Cc: <stable@vger.kernel.org> # v4.12+ Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com>
2017-09-06Merge tag 'wberr-v4.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
2017-09-06Merge tag 'locks-v4.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This pile just has a few file locking fixes from Ben Coddington. There are a couple of cleanup patches + an attempt to bring sanity to the l_pid value that is reported back to userland on an F_GETLK request. After a few gyrations, he came up with a way for filesystems to communicate to the VFS layer code whether the pid should be translated according to the namespace or presented as-is to userland" * tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: restore a warn for leaked locks on close fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks fs/locks: Use allocation rather than the stack in fcntl_getlk()
2017-08-11fuse: set mapping error in writepage_locked when it failsJeff Layton
This ensures that we see errors on fsync when writeback fails. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-08-03fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dioAshish Samant
Commit 8fba54aebbdf ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes the ITER_BVEC page deadlock for direct io in fuse by checking in fuse_direct_io(), whether the page is a bvec page or not, before locking it. However, this check is missed when the "async_dio" mount option is enabled. In this case, set_page_dirty_lock() is called from the req->end callback in request_end(), when the fuse thread is returning from userspace to respond to the read request. This will cause the same deadlock because the bvec condition is not checked in this path. Here is the stack of the deadlocked thread, while returning from userspace: [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds. [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [13706.658788] glusterfs D ffffffff816c80f0 0 3006 1 0x00000080 [13706.658797] ffff8800d6713a58 0000000000000086 ffff8800d9ad7000 ffff8800d9ad5400 [13706.658799] ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0 7fffffffffffffff [13706.658801] 0000000000000002 ffffffff816c80f0 ffff8800d6713a78 ffffffff816c790e [13706.658803] Call Trace: [13706.658809] [<ffffffff816c80f0>] ? bit_wait_io_timeout+0x80/0x80 [13706.658811] [<ffffffff816c790e>] schedule+0x3e/0x90 [13706.658813] [<ffffffff816ca7e5>] schedule_timeout+0x1b5/0x210 [13706.658816] [<ffffffff81073ffb>] ? gup_pud_range+0x1db/0x1f0 [13706.658817] [<ffffffff810668fe>] ? kvm_clock_read+0x1e/0x20 [13706.658819] [<ffffffff81066909>] ? kvm_clock_get_cycles+0x9/0x10 [13706.658822] [<ffffffff810f5792>] ? ktime_get+0x52/0xc0 [13706.658824] [<ffffffff816c6f04>] io_schedule_timeout+0xa4/0x110 [13706.658826] [<ffffffff816c8126>] bit_wait_io+0x36/0x50 [13706.658828] [<ffffffff816c7d06>] __wait_on_bit_lock+0x76/0xb0 [13706.658831] [<ffffffffa0545636>] ? lock_request+0x46/0x70 [fuse] [13706.658834] [<ffffffff8118800a>] __lock_page+0xaa/0xb0 [13706.658836] [<ffffffff810c8500>] ? wake_atomic_t_function+0x40/0x40 [13706.658838] [<ffffffff81194d08>] set_page_dirty_lock+0x58/0x60 [13706.658841] [<ffffffffa054d968>] fuse_release_user_pages+0x58/0x70 [fuse] [13706.658844] [<ffffffffa0551430>] ? fuse_aio_complete+0x190/0x190 [fuse] [13706.658847] [<ffffffffa0551459>] fuse_aio_complete_req+0x29/0x90 [fuse] [13706.658849] [<ffffffffa05471e9>] request_end+0xd9/0x190 [fuse] [13706.658852] [<ffffffffa0549126>] fuse_dev_do_write+0x336/0x490 [fuse] [13706.658854] [<ffffffffa054963e>] fuse_dev_write+0x6e/0xa0 [fuse] [13706.658857] [<ffffffff812a9ef3>] ? security_file_permission+0x23/0x90 [13706.658859] [<ffffffff81205300>] do_iter_readv_writev+0x60/0x90 [13706.658862] [<ffffffffa05495d0>] ? fuse_dev_splice_write+0x350/0x350 [fuse] [13706.658863] [<ffffffff812062a1>] do_readv_writev+0x171/0x1f0 [13706.658866] [<ffffffff810b3d00>] ? try_to_wake_up+0x210/0x210 [13706.658868] [<ffffffff81206361>] vfs_writev+0x41/0x50 [13706.658870] [<ffffffff81206496>] SyS_writev+0x56/0xf0 [13706.658872] [<ffffffff810257a1>] ? syscall_trace_leave+0xf1/0x160 [13706.658874] [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71 Fix this by making should_dirty a fuse_io_priv parameter that can be checked in fuse_aio_complete_req(). Reported-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-31fuse: convert to errseq_t based error tracking for fsyncJeff Layton
Change to file_write_and_wait_range and file_check_and_advance_wb_err Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-16fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locksBenjamin Coddington
Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be atomic with the stateid update", NFSv4 has been inserting locks in rpciod worker context. The result is that the file_lock's fl_nspid is the kworker's pid instead of the original userspace pid. The fl_nspid is only used to represent the namespaced virtual pid number when displaying locks or returning from F_GETLK. There's no reason to set it for every inserted lock, since we can usually just look it up from fl_pid. So, instead of looking up and holding struct pid for every lock, let's just look up the virtual pid number from fl_pid when it is needed. That means we can remove fl_nspid entirely. The translaton and presentation of fl_pid should handle the following four cases: 1 - F_GETLK on a remote file with a remote lock: In this case, the filesystem should determine the l_pid to return here. Filesystems should indicate that the fl_pid represents a non-local pid value that should not be translated by returning an fl_pid <= 0. 2 - F_GETLK on a local file with a remote lock: This should be the l_pid of the lock manager process, and translated. 3 - F_GETLK on a remote file with a local lock, and 4 - F_GETLK on a local file with a local lock: These should be the translated l_pid of the local locking process. Fuse was already doing the correct thing by translating the pid into the caller's namespace. With this change we must update fuse to translate to init's pid namespace, so that the locks API can then translate from init's pid namespace into the pid namespace of the caller. With this change, the locks API will expect that if a filesystem returns a remote pid as opposed to a local pid for F_GETLK, that remote pid will be <= 0. This signifies that the pid is remote, and the locks API will forego translating that pid into the pid namespace of the local calling process. Finally, we convert remote filesystems to present remote pids using negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote pid returned for F_GETLK lock requests. Since local pids will never be larger than PID_MAX_LIMIT (which is currently defined as <= 4 million), but pid_t is an unsigned int, we should have plenty of room to represent remote pids with negative numbers if we assume that remote pid numbers are similarly limited. If this is not the case, then we run the risk of having a remote pid returned for which there is also a corresponding local pid. This is a problem we have now, but this patch should reduce the chances of that occurring, while also returning those remote pid numbers, for whatever that may be worth. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-06-09fuse: initialize the flock flag in fuse_file on allocationMateusz Jurczyk
Before the patch, the flock flag could remain uninitialized for the lifespan of the fuse_file allocation. Unless set to true in fuse_file_flock(), it would remain in an indeterminate state until read in an if statement in fuse_release_common(). This could consequently lead to taking an unexpected branch in the code. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Fixes: 37fb3a30b462 ("fuse: fix flock") Cc: <stable@vger.kernel.org> # v3.1+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-10Merge tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - Fix use after free in write error path - Use GFP_NOIO for two allocations in writeback - Fix a hang in OPEN related to server reboot - Check the result of nfs4_pnfs_ds_connect - Fix an rcu lock leak Features: - Removal of the unmaintained and unused OSD pNFS layout - Cleanup and removal of lots of unnecessary dprintk()s - Cleanup and removal of some memory failure paths now that GFP_NOFS is guaranteed to never fail. - Remove the v3-only data server limitation on pNFS/flexfiles Bugfixes: - RPC/RDMA connection handling bugfixes - Copy offload: fixes to ensure the copied data is COMMITed to disk. - Readdir: switch back to using the ->iterate VFS interface - File locking fixes from Ben Coddington - Various use-after-free and deadlock issues in pNFS - Write path bugfixes" * tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits) pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled NFSv4.1: Work around a Linux server bug... NFS append COMMIT after synchronous COPY NFSv4: Fix exclusive create attributes encoding NFSv4: Fix an rcu lock leak nfs: use kmap/kunmap directly NFS: always treat the invocation of nfs_getattr as cache hit when noac is on Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits pNFS: Fix a typo in pnfs_generic_alloc_ds_commits pNFS: Fix a deadlock when coalescing writes and returning the layout pNFS: Don't clear the layout return info if there are segments to return pNFS: Ensure we commit the layout if it has been invalidated pNFS: Don't send COMMITs to the DSes if the server invalidated our layout pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path pNFS: Ensure we check layout validity before marking it for return NFS4.1 handle interrupted slot reuse from ERR_DELAY NFSv4: check return value of xdr_inline_decode nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout() ...
2017-04-21locks: Set FL_CLOSE when removing flock locks on close()Benjamin Coddington
Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks. NFS will check for this flag to ensure an unlock is sent in a following patch. Fuse handles flock and posix locks differently for FL_CLOSE, and so requires a fixup to retain the existing behavior for flock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-18fuse: Add support for pid namespacesSeth Forshee
When the userspace process servicing fuse requests is running in a pid namespace then pids passed via the fuse fd are not being translated into that process' namespace. Translation is necessary for the pid to be useful to that process. Since no use case currently exists for changing namespaces all translations can be done relative to the pid namespace in use when fuse_conn_init() is called. For fuse this translates to mount time, and for cuse this is when /dev/cuse is opened. IO for this connection from another namespace will return errors. Requests from processes whose pid cannot be translated into the target namespace will have a value of 0 for in.h.pid. File locking changes based on previous work done by Eric Biederman. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-04-18fuse: convert fuse_file.count from atomic_t to refcount_tElena Reshetova
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "A bugfix and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: release: private_data cannot be NULL fuse: cleanup fuse_file refcounting fuse: add missing FR_FORCE
2017-02-24mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmfDave Jiang
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22fuse: release: private_data cannot be NULLMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-22fuse: cleanup fuse_file refcountingMiklos Szeredi
struct fuse_file is stored in file->private_data. Make this always be a counting reference for consistency. This also allows fuse_sync_release() to call fuse_file_put() instead of partially duplicating its functionality. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-22fuse: add missing FR_FORCEMiklos Szeredi
fuse_file_put() was missing the "force" flag for the RELEASE request when sending synchronously (fuseblk). If this flag is not set, then a sync request may be interrupted before it is dequeued by the userspace filesystem. In this case the OPEN won't be balanced with a RELEASE. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 5a18ec176c93 ("fuse: fix hang of single threaded fuseblk filesystem") Cc: <stable@vger.kernel.org> # v2.6.38+
2016-11-15fuse: fix fuse_write_end() if zero bytes were copiedMiklos Szeredi
If pos is at the beginning of a page and copied is zero then page is not zeroed but is marked uptodate. Fix by skipping everything except unlock/put of page if zero bytes were copied. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 6b12c1b37e55 ("fuse: Implement write_begin/write_end callbacks") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-10Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
2016-10-08Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro
2016-10-01fuse: don't use fuse_ioctl_copy_user() helperMiklos Szeredi
The two invocations share little code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-22fuse: Propagate dentry down to inode_change_ok()Jan Kara
To avoid clearing of capabilities or security related extended attributes too early, inode_change_ok() will need to take dentry instead of inode. Propagate it down to fuse_do_setattr(). Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>