summaryrefslogtreecommitdiffstats
path: root/fs/fuse
AgeCommit message (Collapse)Author
2020-10-01fuse: don't check refcount after stealing pageMiklos Szeredi
[ Upstream commit 32f98877c57bee6bc27f443a96f49678a2cd6a50 ] page_count() is unstable. Unless there has been an RCU grace period between when the page was removed from the page cache and now, a speculative reference may exist from the page cache. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29fuse: fix weird page warningMiklos Szeredi
commit a5005c3cda6eeb6b95645e6cc32f58dafeffc976 upstream. When PageWaiters was added, updating this check was missed. Reported-by: Nikolaus Rath <Nikolaus@rath.org> Reported-by: Hugh Dickins <hughd@google.com> Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22fuse: Fix parameter for FS_IOC_{GET,SET}FLAGSChirantan Ekbote
commit 31070f6ccec09f3bd4f1e28cd1e592fa4f3ba0b6 upstream. The ioctl encoding for this parameter is a long but the documentation says it should be an int and the kernel drivers expect it to be an int. If the fuse driver treats this as a long it might end up scribbling over the stack of a userspace process that only allocated enough space for an int. This was previously discussed in [1] and a patch for fuse was proposed in [2]. From what I can tell the patch in [2] was nacked in favor of adding new, "fixed" ioctls and using those from userspace. However there is still no "fixed" version of these ioctls and the fact is that it's sometimes infeasible to change all userspace to use the new one. Handling the ioctls specially in the fuse driver seems like the most pragmatic way for fuse servers to support them without causing crashes in userspace applications that call them. [1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/ [2]: https://sourceforge.net/p/fuse/mailman/message/31771759/ Signed-off-by: Chirantan Ekbote <chirantan@chromium.org> Fixes: 59efec7b9039 ("fuse: implement ioctl support") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13fuse: verify attributesMiklos Szeredi
commit eb59bd17d2fa6e5e84fba61a5ebdea984222e6d5 upstream. If a filesystem returns negative inode sizes, future reads on the file were causing the cpu to spin on truncate_pagecache. Create a helper to validate the attributes. This now does two things: - check the file mode - check if the file size fits in i_size without overflowing Reported-by: Arijit Banerjee <arijit@rubrik.com> Fixes: d8a5ba45457e ("[PATCH] FUSE - core") Cc: <stable@vger.kernel.org> # v2.6.14 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-13fuse: verify nlinkMiklos Szeredi
commit c634da718db9b2fac201df2ae1b1b095344ce5eb upstream. When adding a new hard link, make sure that i_nlink doesn't overflow. Fixes: ac45d61357e8 ("fuse: fix nlink after unlink") Cc: <stable@vger.kernel.org> # v3.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-20fuse: use READ_ONCE on congestion_threshold and max_backgroundKirill Tkhai
[ Upstream commit 2a23f2b8adbe4bd584f936f7ac17a99750eed9d7 ] Since they are of unsigned int type, it's allowed to read them unlocked during reporting to userspace. Let's underline this fact with READ_ONCE() macroses. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06fuse: truncate pending writes on O_TRUNCMiklos Szeredi
commit e4648309b85a78f8c787457832269a8712a8673e upstream. Make sure cached writes are not reordered around open(..., O_TRUNC), with the obvious wrong results. Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-06fuse: flush dirty data/metadata before non-truncate setattrMiklos Szeredi
commit b24e7598db62386a95a3c8b9c75630c5d56fe077 upstream. If writeback cache is enabled, then writes might get reordered with chmod/chown/utimes. The problem with this is that performing the write in the fuse daemon might itself change some of these attributes. In such case the following sequence of operations will result in file ending up with the wrong mode, for example: int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL); write (fd, "1", 1); fchown (fd, 0, 0); fchmod (fd, 04755); close (fd); This patch fixes this by flushing pending writes before performing chown/chmod/utimes. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11fuse: fix memleak in cuse_channel_openzhengbin
[ Upstream commit 9ad09b1976c562061636ff1e01bfc3a57aebe56b ] If cuse_send_init fails, need to fuse_conn_put cc->fc. cuse_channel_open->fuse_conn_init->refcount_set(&fc->count, 1) ->fuse_dev_alloc->fuse_conn_get ->fuse_dev_free->fuse_conn_put Fixes: cc080e9e9be1 ("fuse: introduce per-instance fuse_dev structure") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lockEric Biggers
[ Upstream commit 76e43c8ccaa35c30d5df853013561145a0f750a5 ] When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock. This may have to wait for fuse_iqueue::waitq.lock to be released by one of many places that take it with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by protecting the state of struct fuse_iqueue with a separate spinlock, and only accessing fuse_iqueue::waitq using the versions of the waitqueue functions which do IRQ-safe locking internally. Reproducer: #include <fcntl.h> #include <stdio.h> #include <sys/mount.h> #include <sys/stat.h> #include <sys/syscall.h> #include <unistd.h> #include <linux/aio_abi.h> int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR); aio_context_t ctx = 0; struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd }; struct iocb *cbp = &cb; sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0700); mount("foo", "mnt", "fuse", 0, opts); syscall(__NR_io_setup, 1, &ctx); syscall(__NR_io_submit, ctx, 1, &cbp); } Beginning of lockdep output: ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 5.3.0-rc5 #9 Not tainted ----------------------------------------------------- syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825 and this task is already holding: 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825 which would create a new lock dependency: (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&(&ctx->ctx_lock)->rlock){..-.} [...] Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.19+ Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05fuse: fix missing unlock_page in fuse_writepage()Vasily Averin
commit d5880c7a8620290a6c90ced7a0e8bd0ad9419601 upstream. unlock_page() was missing in case of an already in-flight write against the same page. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Fixes: ff17be086477 ("fuse: writepage: skip already in flight") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-15fuse: retrieve: cap requested size to negotiated max_writeKirill Smelkov
[ Upstream commit 7640682e67b33cab8628729afec8ca92b851394f ] FUSE filesystem server and kernel client negotiate during initialization phase, what should be the maximum write size the client will ever issue. Correspondingly the filesystem server then queues sys_read calls to read requests with buffer capacity large enough to carry request header + that max_write bytes. A filesystem server is free to set its max_write in anywhere in the range between [1*page, fc->max_pages*page]. In particular go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages corresponds to 128K. Libfuse also allows users to configure max_write, but by default presets it to possible maximum. If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we allow to retrieve more than max_write bytes, corresponding prepared NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the filesystem server, in full correspondence with server/client contract, will be only queuing sys_read with ~max_write buffer capacity, and fuse_dev_do_read throws away requests that cannot fit into server request buffer. In turn the filesystem server could get stuck waiting indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is understood by clients as that NOTIFY_REPLY was queued and will be sent back. Cap requested size to negotiate max_write to avoid the problem. This aligns with the way NOTIFY_RETRIEVE handler works, which already unconditionally caps requested retrieve size to fuse_conn->max_pages. This way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than was originally requested. Please see [1] for context where the problem of stuck filesystem was hit for real, how the situation was traced and for more involving patch that did not make it into the tree. [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 [2] https://github.com/hanwen/go-fuse Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Cc: Han-Wen Nienhuys <hanwen@google.com> Cc: Jakob Unterwurzacher <jakobunt@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11fuse: fallocate: fix return with locked inodeMiklos Szeredi
commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream. Do the proper cleanup in case the size check fails. Tested with xfstests:generic/228 Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate") Cc: Liu Bo <bo.liu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25fuse: Add FOPEN_STREAM to use stream_open()Kirill Smelkov
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream. Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25fuse: honor RLIMIT_FSIZE in fuse_file_fallocateLiu Bo
commit 0cbade024ba501313da3b7e5dd2a188a6bc491b5 upstream. fstests generic/228 reported this failure that fuse fallocate does not honor what 'ulimit -f' has set. This adds the necessary inode_newsize_ok() check. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation") Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25fuse: fix writepages on 32bitMiklos Szeredi
commit 9de5be06d0a89ca97b5ab902694d42dfd2bb77d2 upstream. Writepage requests were cropped to i_size & 0xffffffff, which meant that mmaped writes to any file larger than 4G might be silently discarded. Fix by storing the file size in a properly sized variable (loff_t instead of size_t). Reported-by: Antonio SJ Musumeci <trapexit@spawn.link> Fixes: 6eaf4782eb09 ("fuse: writepages: crop secondary requests") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-04fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox
commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12fuse: handle zero sized retrieve correctlyMiklos Szeredi
commit 97e1532ef81acb31c30f9e75bf00306c33a77812 upstream. Dereferencing req->page_descs[0] will Oops if req->max_pages is zero. Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com Fixes: b2430d7567a3 ("fuse: add per-page descriptor <offset, length> to fuse_req") Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12fuse: decrement NR_WRITEBACK_TEMP on the right pageMiklos Szeredi
commit a2ebba824106dabe79937a9f29a875f837e1b6d4 upstream. NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not the page cache page. Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12fuse: call pipe_buf_release() under pipe lockJann Horn
commit 9509941e9c534920ccc4771ae70bd6cbbe79df1c upstream. Some of the pipe_buf_release() handlers seem to assume that the pipe is locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page without taking any extra locks. From a glance through the callers of pipe_buf_release(), it looks like FUSE is the only one that calls pipe_buf_release() without having the pipe locked. This bug should only lead to a memory leak, nothing terrible. Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYSChad Austin
commit 2e64ff154ce6ce9a8dc0f9556463916efa6ff460 upstream. When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection. Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent to userspace. Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR inside of fuse_file_put. Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix possibly missed wake-up after abortMiklos Szeredi
commit 2d84a2d19b6150c6dbac1e6ebad9c82e4c123772 upstream. In current fuse_drop_waiting() implementation it's possible that fuse_wait_aborted() will not be woken up in the unlikely case that fuse_abort_conn() + fuse_wait_aborted() runs in between checking fc->connected and calling atomic_dec(&fc->num_waiting). Do the atomic_dec_and_test() unconditionally, which also provides the necessary barrier against reordering with the fc->connected check. The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE barrier after resetting fc->connected. However, this is not a performance sensitive path, and adding the explicit barrier makes it easier to document. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests") Cc: <stable@vger.kernel.org> #v4.19 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix leaked notify replyMiklos Szeredi
commit 7fabaf303458fcabb694999d6fa772cc13d4e217 upstream. fuse_request_send_notify_reply() may fail if the connection was reset for some reason (e.g. fs was unmounted). Don't leak request reference in this case. Besides leaking memory, this resulted in fc->num_waiting not being decremented and hence fuse_wait_aborted() left in a hanging and unkillable state. Fixes: 2d45ba381a74 ("fuse: add retrieve request") Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests") Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> #v2.6.36 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix use-after-free in fuse_direct_IO()Lukas Czerner
commit ebacb81273599555a7a19f7754a1451206a5fc4f upstream. In async IO blocking case the additional reference to the io is taken for it to survive fuse_aio_complete(). In non blocking case this additional reference is not needed, however we still reference io to figure out whether to wait for completion or not. This is wrong and will lead to use-after-free. Fix it by storing blocking information in separate variable. This was spotted by KASAN when running generic/208 fstest. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv") Cc: <stable@vger.kernel.org> # v4.6 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: set FR_SENT while lockedMiklos Szeredi
commit 4c316f2f3ff315cb48efb7435621e5bfb81df96d upstream. Otherwise fuse_dev_do_write() could come in and finish off the request, and the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...)) in request_end(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix blocked_waitq wakeupMiklos Szeredi
commit 908a572b80f6e9577b45e81b3dfe2e22111286b8 upstream. Using waitqueue_active() is racy. Make sure we issue a wake_up() unconditionally after storing into fc->blocked. After that it's okay to optimize with waitqueue_active() since the first wake up provides the necessary barrier for all waiters, not the just the woken one. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3c18ef8117f0 ("fuse: optimize wake_up") Cc: <stable@vger.kernel.org> # v3.10 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: Fix use-after-free in fuse_dev_do_write()Kirill Tkhai
commit d2d2d4fb1f54eff0f3faa9762d84f6446a4bc5d0 upstream. After we found req in request_find() and released the lock, everything may happen with the req in parallel: cpu0 cpu1 fuse_dev_do_write() fuse_dev_do_write() req = request_find(fpq, ...) ... spin_unlock(&fpq->lock) ... ... req = request_find(fpq, oh.unique) ... spin_unlock(&fpq->lock) queue_interrupt(&fc->iq, req); ... ... ... ... ... request_end(fc, req); fuse_put_request(fc, req); ... queue_interrupt(&fc->iq, req); Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: Fix use-after-free in fuse_dev_do_read()Kirill Tkhai
commit bc78abbd55dd28e2287ec6d6502b842321a17c87 upstream. We may pick freed req in this way: [cpu0] [cpu1] fuse_dev_do_read() fuse_dev_do_write() list_move_tail(&req->list, ...); ... spin_unlock(&fpq->lock); ... ... request_end(fc, req); ... fuse_put_request(fc, req); if (test_bit(FR_INTERRUPTED, ...)) queue_interrupt(fiq, req); Fix that by keeping req alive until we finish all manipulations. Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-21Merge tag 'fuse-update-4.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "Various bug fixes and cleanups" * tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: reduce allocation size for splice_write fuse: use kvmalloc to allocate array of pipe_buffer structs. fuse: convert last timespec use to timespec64 fs: fuse: Adding new return type vm_fault_t fuse: simplify fuse_abort_conn() fuse: Add missed unlock_page() to fuse_readpages_fill() fuse: Don't access pipe->buffers without pipe_lock() fuse: fix initial parallel dirops fuse: Fix oops at process_init_reply() fuse: umount should wait for all requests fuse: fix unlocked access to processing queue fuse: fix double request_end()
2018-08-21Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...
2018-08-13Merge branch 'work.mkdir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs icache updates from Al Viro: - NFS mkdir/open_by_handle race fix - analogous solution for FUSE, replacing the one currently in mainline - new primitive to be used when discarding halfway set up inodes on failed object creation; gives sane warranties re icache lookups not returning such doomed by still not freed inodes. A bunch of filesystems switched to that animal. - Miklos' fix for last cycle regression in iget5_locked(); -stable will need a slightly different variant, unfortunately. - misc bits and pieces around things icache-related (in adfs and jfs). * 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: jfs: don't bother with make_bad_inode() in ialloc() adfs: don't put inodes into icache new helper: inode_fake_hash() vfs: don't evict uninitialized inode jfs: switch to discard_new_inode() ext2: make sure that partially set up inodes won't be returned by ext2_iget() udf: switch to discard_new_inode() ufs: switch to discard_new_inode() btrfs: switch to discard_new_inode() new primitive: discard_new_inode() kill d_instantiate_no_diralias() nfs_instantiate(): prevent multiple aliases for directory inode
2018-08-01kill d_instantiate_no_diralias()Al Viro
The only user is fuse_create_new_entry(), and there it's used to mitigate the same mkdir/open-by-handle race as in nfs_mkdir(). The same solution applies - unhash the mkdir argument, then call d_splice_alias() and if that returns a reference to preexisting alias, dput() and report success. ->mkdir() argument left unhashed negative with the preexisting alias moved in the right place is just fine from the ->mkdir() callers point of view. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-26fuse: reduce allocation size for splice_writeAndrey Ryabinin
The 'bufs' array contains 'pipe->buffers' elements, but the fuse_dev_splice_write() uses only 'pipe->nrbufs' elements. So reduce the allocation size to 'pipe->nrbufs' elements. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: use kvmalloc to allocate array of pipe_buffer structs.Andrey Ryabinin
The amount of pipe->buffers is basically controlled by userspace by fcntl(... F_SETPIPE_SZ ...) so it could be large. High order allocations could be slow (if memory is heavily fragmented) or may fail if the order is larger than PAGE_ALLOC_COSTLY_ORDER. Since the 'bufs' doesn't need to be physically contiguous, use the kvmalloc_array() to allocate memory. If high order page isn't available, the kvamalloc*() will fallback to 0-order. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: convert last timespec use to timespec64Arnd Bergmann
All of fuse uses 64-bit timestamps with the exception of the fuse_change_attributes(), so let's convert this one as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fs: fuse: Adding new return type vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: simplify fuse_abort_conn()Miklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: Add missed unlock_page() to fuse_readpages_fill()Kirill Tkhai
The above error path returns with page unlocked, so this place seems also to behave the same. Fixes: f8dbdf81821b ("fuse: rework fuse_readpages()") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: Don't access pipe->buffers without pipe_lock()Andrey Ryabinin
fuse_dev_splice_write() reads pipe->buffers to determine the size of 'bufs' array before taking the pipe_lock(). This is not safe as another thread might change the 'pipe->buffers' between the allocation and taking the pipe_lock(). So we end up with too small 'bufs' array. Move the bufs allocations inside pipe_lock()/pipe_unlock() to fix this. Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> # v2.6.35 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: fix initial parallel diropsMiklos Szeredi
If parallel dirops are enabled in FUSE_INIT reply, then first operation may leave fi->mutex held. Reported-by: syzbot <syzbot+3f7b29af1baa9d0a55be@syzkaller.appspotmail.com> Fixes: 5c672ab3f0ee ("fuse: serialize dirops by default") Cc: <stable@vger.kernel.org> # v4.7 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: Fix oops at process_init_reply()Miklos Szeredi
syzbot is hitting NULL pointer dereference at process_init_reply(). This is because deactivate_locked_super() is called before response for initial request is processed. Fix this by aborting and waiting for all requests (including FUSE_INIT) before resetting fc->sb. Original patch by Tetsuo Handa <penguin-kernel@I-love.SKAURA.ne.jp>. Reported-by: syzbot <syzbot+b62f08f4d5857755e3bc@syzkaller.appspotmail.com> Fixes: e27c9d3877a0 ("fuse: fuse: add time_gran to INIT_OUT") Cc: <stable@vger.kernel.org> # v3.19 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: umount should wait for all requestsMiklos Szeredi
fuse_abort_conn() does not guarantee that all async requests have actually finished aborting (i.e. their ->end() function is called). This could actually result in still used inodes after umount. Add a helper to wait until all requests are fully done. This is done by looking at the "num_waiting" counter. When this counter drops to zero, we can be sure that no more requests are outstanding. Fixes: 0d8e84b0432b ("fuse: simplify request abort") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: fix unlocked access to processing queueMiklos Szeredi
fuse_dev_release() assumes that it's the only one referencing the fpq->processing list, but that's not true, since fuse_abort_conn() can be doing the same without any serialization between the two. Fixes: c3696046beb3 ("fuse: separate pqueue for clones") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: fix double request_end()Miklos Szeredi
Refcounting of request is broken when fuse_abort_conn() is called and request is on the fpq->io list: - ref is taken too late - then it is not dropped Fixes: 0d8e84b0432b ("fuse: simplify request abort") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-21pids: Compute task_tgid using signal->leader_pidEric W. Biederman
The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-12get rid of 'opened' argument of ->atomic_open() - part 3Al Viro
now it can be done... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12getting rid of 'opened' argument of ->atomic_open() - part 2Al Viro
__gfs2_lookup(), gfs2_create_inode(), nfs_finish_open() and fuse_create_open() don't need 'opened' anymore. Get rid of that argument in those. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-12getting rid of 'opened' argument of ->atomic_open() - part 1Al Viro
'opened' argument of finish_open() is unused. Kill it. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>
2018-07-12introduce FMODE_CREATED and switch to itAl Viro
Parallel to FILE_CREATED, goes into ->f_mode instead of *opened. NFS is a bit of a wart here - it doesn't have file at the point where FILE_CREATED used to be set, so we need to propagate it there (for now). IMA is another one (here and everywhere)... Note that this needs do_dentry_open() to leave old bits in ->f_mode alone - we want it to preserve FMODE_CREATED if it had been already set (no other bit can be there). Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()