summaryrefslogtreecommitdiffstats
path: root/fs/fuse
AgeCommit message (Collapse)Author
2018-12-28mm: convert totalram_pages and totalhigh_pages variables to atomicArun KS
totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-11fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYSChad Austin
When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection. Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent to userspace. Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR inside of fuse_file_put. Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-10fuse: Fix memory leak in fuse_dev_free()Takeshi Misawa
When ntfs is unmounted, the following leak is reported by kmemleak. kmemleak report: unreferenced object 0xffff880052bf4400 (size 4096): comm "mount.ntfs", pid 16530, jiffies 4294861127 (age 3215.836s) hex dump (first 32 bytes): 00 44 bf 52 00 88 ff ff 00 44 bf 52 00 88 ff ff .D.R.....D.R.... 10 44 bf 52 00 88 ff ff 10 44 bf 52 00 88 ff ff .D.R.....D.R.... backtrace: [<00000000bf4a2f8d>] fuse_fill_super+0xb22/0x1da0 [fuse] [<000000004dde0f0c>] mount_bdev+0x263/0x320 [<0000000025aebc66>] mount_fs+0x82/0x2bf [<0000000042c5a6be>] vfs_kern_mount.part.33+0xbf/0x480 [<00000000ed10cd5b>] do_mount+0x3de/0x2ad0 [<00000000d59ff068>] ksys_mount+0xba/0xd0 [<000000001bda1bcc>] __x64_sys_mount+0xba/0x150 [<00000000ebe26304>] do_syscall_64+0x151/0x490 [<00000000d25f2b42>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<000000002e0abd2c>] 0xffffffffffffffff fuse_dev_alloc() allocate fud->pq.processing. But this hash table is not freed. Fix this by freeing fud->pq.processing. Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: be2ff42c5d6e ("fuse: Use hash table to link processing request")
2018-12-03fuse: fix revalidation of attributes for permission checkMiklos Szeredi
fuse_invalidate_attr() now sets fi->inval_mask instead of fi->i_time, hence we need to check the inval mask in fuse_permission() as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 2f1e81965fd0 ("fuse: allow fine grained attr cache invaldation")
2018-12-03fuse: fix fsync on directoryMiklos Szeredi
Commit ab2257e9941b ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: Brett Girton <btgirton@gmail.com> Tested-by: Brett Girton <btgirton@gmail.com> Fixes: ab2257e9941b ("fuse: reduce size of struct fuse_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-11-22fuse: Add bad inode check in fuse_destroy_inode()Myungho Jung
make_bad_inode() sets inode->i_mode to S_IFREG if I/O error is detected in fuse_do_getattr()/fuse_do_setattr(). If the inode is not a regular file, write_files and queued_writes in fuse_inode are not initialized and have NULL or invalid pointers written by other members in a union. So, list_empty() returns false in fuse_destroy_inode(). Add is_bad_inode() to check if make_bad_inode() was called. Reported-by: syzbot+b9c89b84423073226299@syzkaller.appspotmail.com Fixes: ab2257e9941b ("fuse: reduce size of struct fuse_inode") Signed-off-by: Myungho Jung <mhjungk@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-11-09fuse: fix use-after-free in fuse_direct_IO()Lukas Czerner
In async IO blocking case the additional reference to the io is taken for it to survive fuse_aio_complete(). In non blocking case this additional reference is not needed, however we still reference io to figure out whether to wait for completion or not. This is wrong and will lead to use-after-free. Fix it by storing blocking information in separate variable. This was spotted by KASAN when running generic/208 fstest. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv") Cc: <stable@vger.kernel.org> # v4.6
2018-11-09fuse: fix possibly missed wake-up after abortMiklos Szeredi
In current fuse_drop_waiting() implementation it's possible that fuse_wait_aborted() will not be woken up in the unlikely case that fuse_abort_conn() + fuse_wait_aborted() runs in between checking fc->connected and calling atomic_dec(&fc->num_waiting). Do the atomic_dec_and_test() unconditionally, which also provides the necessary barrier against reordering with the fc->connected check. The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE barrier after resetting fc->connected. However, this is not a performance sensitive path, and adding the explicit barrier makes it easier to document. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests") Cc: <stable@vger.kernel.org> #v4.19
2018-11-09fuse: fix leaked notify replyMiklos Szeredi
fuse_request_send_notify_reply() may fail if the connection was reset for some reason (e.g. fs was unmounted). Don't leak request reference in this case. Besides leaking memory, this resulted in fc->num_waiting not being decremented and hence fuse_wait_aborted() left in a hanging and unkillable state. Fixes: 2d45ba381a74 ("fuse: add retrieve request") Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests") Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> #v2.6.36
2018-11-01Merge branch 'work.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...
2018-10-24iov_iter: Use accessor functionDavid Howells
Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-15fuse: enable caching of symlinksDan Schatzberg
FUSE file reads are cached in the page cache, but symlink reads are not. This patch enables FUSE READLINK operations to be cached which can improve performance of some FUSE workloads. In particular, I'm working on a FUSE filesystem for access to source code and discovered that about a 10% improvement to build times is achieved with this patch (there are a lot of symlinks in the source tree). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-15fuse: only invalidate atime in direct readMiklos Szeredi
After sending a synchronous READ request from __fuse_direct_read() we only need to invalidate atime; none of the other attributes should be changed by a read(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-15fuse: don't need GETATTR after every READMiklos Szeredi
If 'auto_inval_data' mode is active, then fuse_file_read_iter() will call fuse_update_attributes(), which will check the attribute validity and send a GETATTR request if some of the attributes are no longer valid. The page cache is then invalidated if the size or mtime have changed. Then, if a READ request was sent and reply received (which is the case if the data wasn't cached yet, or if the file is opened for O_DIRECT), the atime attribute is invalidated. This will result in the next read() also triggering a GETATTR, ... This can be fixed by only sending GETATTR if the mode or size are invalid, we don't need to do a refresh if only atime is invalid. More generally, none of the callers of fuse_update_attributes() need an up-to-date atime value, so for now just remove STATX_ATIME from the request mask when attributes are updated for internal use. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-15fuse: allow fine grained attr cache invaldationMiklos Szeredi
This patch adds the infrastructure for more fine grained attribute invalidation. Currently only 'atime' is invalidated separately. The use of this infrastructure is extended to the statx(2) interface, which for now means that if only 'atime' is invalid and STATX_ATIME is not specified in the mask argument, then no GETATTR request will be generated. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: realloc page arrayMiklos Szeredi
Writeback caching currently allocates requests with the maximum number of possible pages, while the actual number of pages per request depends on a couple of factors that cannot be determined when the request is allocated (whether page is already under writeback, whether page is contiguous with previous pages already added to a request). This patch allows such requests to start with no page allocation (all pages inline) and grow the page array on demand. If the max_pages tunable remains the default value, then this will mean just one allocation that is the same size as before. If the tunable is larger, then this adds at most 3 additional memory allocations (which is generously compensated by the improved performance from the larger request). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: add max_pages to init_outConstantine Shulyupin
Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 | grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: allocate page array more efficientlyMiklos Szeredi
When allocating page array for a request the array for the page pointers and the array for page descriptors are allocated by two separate kmalloc() calls. Merge these into one allocation. Also instead of initializing the request and the page arrays with memset(), use the zeroing allocation variants. Reserved requests never carry pages (page array size is zero). Make that explicit by initializing the page array pointers to NULL and make sure the assumption remains true by adding a WARN_ON(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: reduce size of struct fuse_inodeMiklos Szeredi
Do this by grouping fields used for cached writes and putting them into a union with fileds used for cached readdir (with obviously no overlap, since we don't have hybrid objects). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: use iversion for readdir cache verificationMiklos Szeredi
Use the internal iversion counter to make sure modifications of the directory through this filesystem are not missed by the mtime check (due to mtime granularity). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: use mtime for readdir cache verificationMiklos Szeredi
Store the modification time of the directory in the cache, obtained before starting to fill the cache. When reading the cache, verify that the directory hasn't changed, by checking if current modification time is the same as the one stored in the cache. This only needs to be done when the current file position is at the beginning of the directory, as mandated by POSIX. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: add readdir cache versionMiklos Szeredi
Allow the cache to be invalidated when page(s) have gone missing. In this case increment the version of the cache and reset to an empty state. Add a version number to the directory stream in struct fuse_file as well, indicating the version of the cache it's supposed to be reading. If the cache version doesn't match the stream's version, then reset the stream to the beginning of the cache. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: allow using readdir cacheMiklos Szeredi
The cache is only used if it's completed, not while it's still being filled; this constraint could be lifted later, if it turns out to be useful. Introduce state in struct fuse_file that indicates the position within the cache. After a seek, reset the position to the beginning of the cache and search the cache for the current position. If the current position is not found in the cache, then fall back to uncached readdir. It can also happen that page(s) disappear from the cache, in which case we must also fall back to uncached readdir. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-01fuse: allow caching readdirMiklos Szeredi
This patch just adds the cache filling functions, which are invoked if FOPEN_CACHE_DIR flag is set in the OPENDIR reply. Cache reading and cache invalidation are added by subsequent patches. The directory cache uses the page cache. Directory entries are packed into a page in the same format as in the READDIR reply. A page only contains whole entries, the space at the end of the page is cleared. The page is locked while being modified. Multiple parallel readdirs on the same directory can fill the cache; the only constraint is that continuity must be maintained (d_off of last entry points to position of current entry). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: extract fuse_emit() helperMiklos Szeredi
Prepare for cache filling by introducing a helper for emitting a single directory entry. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: split out readdir.cMiklos Szeredi
Directory reading code is about to grow larger, so split it out from dir.c into a new source file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: Use hash table to link processing requestKirill Tkhai
We noticed the performance bottleneck in FUSE running our Virtuozzo storage over rdma. On some types of workload we observe 20% of times spent in request_find() in profiler. This function is iterating over long requests list, and it scales bad. The patch introduces hash table to reduce the number of iterations, we do in this function. Hash generating algorithm is taken from hash_add() function, while 256 lines table is used to store pending requests. This fixes problem and improves the performance. Reported-by: Alexey Kuznetsov <kuznet@virtuozzo.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: kill req->intr_uniqueKirill Tkhai
This field is not needed after the previous patch, since we can easily convert request ID to interrupt request ID and vice versa. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: change interrupt requests allocation algorithmKirill Tkhai
Using of two unconnected IDs req->in.h.unique and req->intr_unique does not allow to link requests to a hash table. We need can't use none of them as a key to calculate hash. This patch changes the algorithm of allocation of IDs for a request. Plain requests obtain even ID, while interrupt requests are encoded in the low bit. So, in next patches we will be able to use the rest of ID bits to calculate hash, and the hash will be the same for plain and interrupt requests. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: do not take fc->lock in fuse_request_send_background()Kirill Tkhai
Currently, we take fc->lock there only to check for fc->connected. But this flag is changed only on connection abort, which is very rare operation. So allow checking fc->connected under just fc->bg_lock and use this lock (as well as fc->lock) when resetting fc->connected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: introduce fc->bg_lockKirill Tkhai
To reduce contention of fc->lock, this patch introduces bg_lock for protection of fields related to background queue. These are: max_background, congestion_threshold, num_background, active_background, bg_queue and blocked. This allows next patch to make async reads not requiring fc->lock, so async reads and writes will have better performance executed in parallel. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: add locking to max_background and congestion_threshold changesKirill Tkhai
Functions sequences like request_end()->flush_bg_queue() require that max_background and congestion_threshold are constant during their execution. Otherwise, checks like if (fc->num_background == fc->max_background) made in different time may behave not like expected. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: use READ_ONCE on congestion_threshold and max_backgroundKirill Tkhai
Since they are of unsigned int type, it's allowed to read them unlocked during reporting to userspace. Let's underline this fact with READ_ONCE() macroses. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: use list_first_entry() in flush_bg_queue()Kirill Tkhai
This cleanup patch makes the function to use the primitive instead of direct dereferencing. Also, move fiq dereferencing out of cycle, since it's always constant. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: add support for copy_file_range()Niels de Vos
There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28fuse: fix blocked_waitq wakeupMiklos Szeredi
Using waitqueue_active() is racy. Make sure we issue a wake_up() unconditionally after storing into fc->blocked. After that it's okay to optimize with waitqueue_active() since the first wake up provides the necessary barrier for all waiters, not the just the woken one. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3c18ef8117f0 ("fuse: optimize wake_up") Cc: <stable@vger.kernel.org> # v3.10
2018-09-28fuse: set FR_SENT while lockedMiklos Szeredi
Otherwise fuse_dev_do_write() could come in and finish off the request, and the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...)) in request_end(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2
2018-09-28fuse: Fix use-after-free in fuse_dev_do_write()Kirill Tkhai
After we found req in request_find() and released the lock, everything may happen with the req in parallel: cpu0 cpu1 fuse_dev_do_write() fuse_dev_do_write() req = request_find(fpq, ...) ... spin_unlock(&fpq->lock) ... ... req = request_find(fpq, oh.unique) ... spin_unlock(&fpq->lock) queue_interrupt(&fc->iq, req); ... ... ... ... ... request_end(fc, req); fuse_put_request(fc, req); ... queue_interrupt(&fc->iq, req); Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2
2018-09-28fuse: Fix use-after-free in fuse_dev_do_read()Kirill Tkhai
We may pick freed req in this way: [cpu0] [cpu1] fuse_dev_do_read() fuse_dev_do_write() list_move_tail(&req->list, ...); ... spin_unlock(&fpq->lock); ... ... request_end(fc, req); ... fuse_put_request(fc, req); if (test_bit(FR_INTERRUPTED, ...)) queue_interrupt(fiq, req); Fix that by keeping req alive until we finish all manipulations. Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2
2018-08-21Merge tag 'fuse-update-4.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "Various bug fixes and cleanups" * tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: reduce allocation size for splice_write fuse: use kvmalloc to allocate array of pipe_buffer structs. fuse: convert last timespec use to timespec64 fs: fuse: Adding new return type vm_fault_t fuse: simplify fuse_abort_conn() fuse: Add missed unlock_page() to fuse_readpages_fill() fuse: Don't access pipe->buffers without pipe_lock() fuse: fix initial parallel dirops fuse: Fix oops at process_init_reply() fuse: umount should wait for all requests fuse: fix unlocked access to processing queue fuse: fix double request_end()
2018-08-21Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...
2018-08-13Merge branch 'work.mkdir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs icache updates from Al Viro: - NFS mkdir/open_by_handle race fix - analogous solution for FUSE, replacing the one currently in mainline - new primitive to be used when discarding halfway set up inodes on failed object creation; gives sane warranties re icache lookups not returning such doomed by still not freed inodes. A bunch of filesystems switched to that animal. - Miklos' fix for last cycle regression in iget5_locked(); -stable will need a slightly different variant, unfortunately. - misc bits and pieces around things icache-related (in adfs and jfs). * 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: jfs: don't bother with make_bad_inode() in ialloc() adfs: don't put inodes into icache new helper: inode_fake_hash() vfs: don't evict uninitialized inode jfs: switch to discard_new_inode() ext2: make sure that partially set up inodes won't be returned by ext2_iget() udf: switch to discard_new_inode() ufs: switch to discard_new_inode() btrfs: switch to discard_new_inode() new primitive: discard_new_inode() kill d_instantiate_no_diralias() nfs_instantiate(): prevent multiple aliases for directory inode
2018-08-01kill d_instantiate_no_diralias()Al Viro
The only user is fuse_create_new_entry(), and there it's used to mitigate the same mkdir/open-by-handle race as in nfs_mkdir(). The same solution applies - unhash the mkdir argument, then call d_splice_alias() and if that returns a reference to preexisting alias, dput() and report success. ->mkdir() argument left unhashed negative with the preexisting alias moved in the right place is just fine from the ->mkdir() callers point of view. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-26fuse: reduce allocation size for splice_writeAndrey Ryabinin
The 'bufs' array contains 'pipe->buffers' elements, but the fuse_dev_splice_write() uses only 'pipe->nrbufs' elements. So reduce the allocation size to 'pipe->nrbufs' elements. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: use kvmalloc to allocate array of pipe_buffer structs.Andrey Ryabinin
The amount of pipe->buffers is basically controlled by userspace by fcntl(... F_SETPIPE_SZ ...) so it could be large. High order allocations could be slow (if memory is heavily fragmented) or may fail if the order is larger than PAGE_ALLOC_COSTLY_ORDER. Since the 'bufs' doesn't need to be physically contiguous, use the kvmalloc_array() to allocate memory. If high order page isn't available, the kvamalloc*() will fallback to 0-order. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: convert last timespec use to timespec64Arnd Bergmann
All of fuse uses 64-bit timestamps with the exception of the fuse_change_attributes(), so let's convert this one as well. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fs: fuse: Adding new return type vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: simplify fuse_abort_conn()Miklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: Add missed unlock_page() to fuse_readpages_fill()Kirill Tkhai
The above error path returns with page unlocked, so this place seems also to behave the same. Fixes: f8dbdf81821b ("fuse: rework fuse_readpages()") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-07-26fuse: Don't access pipe->buffers without pipe_lock()Andrey Ryabinin
fuse_dev_splice_write() reads pipe->buffers to determine the size of 'bufs' array before taking the pipe_lock(). This is not safe as another thread might change the 'pipe->buffers' between the allocation and taking the pipe_lock(). So we end up with too small 'bufs' array. Move the bufs allocations inside pipe_lock()/pipe_unlock() to fix this. Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> # v2.6.35 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>