summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)Author
2019-05-04ceph: fix use-after-free on symlink traversalAl Viro
[ Upstream commit daf5cc27eed99afdea8d96e71b89ba41f5406ef6 ] free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04NFS: Fix a typo in nfs_init_timeout_values()Trond Myklebust
[ Upstream commit 5a698243930c441afccec04e4d5dc8febfd2b775 ] Specifying a retrans=0 mount parameter to a NFS/TCP mount, is inadvertently causing the NFS client to rewrite any specified timeout parameter to the default of 60 seconds. Fixes: a956beda19a6 ("NFS: Allow the mount option retrans=0") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-05-04fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox
commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream. Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02Fix aio_poll() racesAl Viro
commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 upstream. aio_poll() has to cope with several unpleasant problems: * requests that might stay around indefinitely need to be made visible for io_cancel(2); that must not be done to a request already completed, though. * in cases when ->poll() has placed us on a waitqueue, wakeup might have happened (and request completed) before ->poll() returns. * worse, in some early wakeup cases request might end up re-added into the queue later - we can't treat "woken up and currently not in the queue" as "it's not going to stick around indefinitely" * ... moreover, ->poll() might have decided not to put it on any queues to start with, and that needs to be distinguished from the previous case * ->poll() might have tried to put us on more than one queue. Only the first will succeed for aio poll, so we might end up missing wakeups. OTOH, we might very well notice that only after the wakeup hits and request gets completed (all before ->poll() gets around to the second poll_wait()). In that case it's too late to decide that we have an error. req->woken was an attempt to deal with that. Unfortunately, it was broken. What we need to keep track of is not that wakeup has happened - the thing might come back after that. It's that async reference is already gone and won't come back, so we can't (and needn't) put the request on the list of cancellables. The easiest case is "request hadn't been put on any waitqueues"; we can tell by seeing NULL apt.head, and in that case there won't be anything async. We should either complete the request ourselves (if vfs_poll() reports anything of interest) or return an error. In all other cases we get exclusion with wakeups by grabbing the queue lock. If request is currently on queue and we have something interesting from vfs_poll(), we can steal it and complete the request ourselves. If it's on queue and vfs_poll() has not reported anything interesting, we either put it on the cancellable list, or, if we know that it hadn't been put on all queues ->poll() wanted it on, we steal it and return an error. If it's _not_ on queue, it's either been already dealt with (in which case we do nothing), or there's aio_poll_complete_work() about to be executed. In that case we either put it on the cancellable list, or, if we know it hadn't been put on all queues ->poll() wanted it on, simulate what cancel would've done. It's a lot more convoluted than I'd like it to be. Single-consumer APIs suck, and unfortunately aio is not an exception... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: store event at final iocb_put()Al Viro
commit 2bb874c0d873d13bd9b9b9c6d7b7c4edab18c8b4 upstream. Instead of having aio_complete() set ->ki_res.{res,res2}, do that explicitly in its callers, drop the reference (as aio_complete() used to do) and delay the rest until the final iocb_put(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: keep io_event in aio_kiocbAl Viro
commit a9339b7855094ba11a97e8822ae038135e879e79 upstream. We want to separate forming the resulting io_event from putting it into the ring buffer. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: fold lookup_kiocb() into its sole callerAl Viro
commit 833f4154ed560232120bc475935ee1d6a20e159f upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02pin iocb through aio.Linus Torvalds
commit b53119f13a04879c3bf502828d99d13726639ead upstream. aio_poll() is not the only case that needs file pinned; worse, while aio_read()/aio_write() can live without pinning iocb itself, the proof is rather brittle and can easily break on later changes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: simplify - and fix - fget/fput for io_submit()Linus Torvalds
commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream. Al Viro root-caused a race where the IOCB_CMD_POLL handling of fget/fput() could cause us to access the file pointer after it had already been freed: "In more details - normally IOCB_CMD_POLL handling looks so: 1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll() 2) aio_poll() resolves the descriptor to struct file by req->file = fget(iocb->aio_fildes) 3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that aio_kiocb to 2 (bumps by 1, that is). 4) aio_poll() calls vfs_poll(). After sanity checks (basically, "poll_wait() had been called and only once") it locks the queue. That's what the extra reference to iocb had been for - we know we can safely access it. 5) With queue locked, we check if ->woken has already been set to true (by aio_poll_wake()) and, if it had been, we unlock the queue, drop a reference to aio_kiocb and bugger off - at that point it's a responsibility to aio_poll_wake() and the stuff called/scheduled by it. That code will drop the reference to file in req->file, along with the other reference to our aio_kiocb. 6) otherwise, we see whether we need to wait. If we do, we unlock the queue, drop one reference to aio_kiocb and go away - eventual wakeup (or cancel) will deal with the reference to file and with the other reference to aio_kiocb 7) otherwise we remove ourselves from waitqueue (still under the queue lock), so that wakeup won't get us. No async activity will be happening, so we can safely drop req->file and iocb ourselves. If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb won't get freed under us, so we can do all the checks and locking safely. And we don't touch ->file if we detect that case. However, vfs_poll() most certainly *does* touch the file it had been given. So wakeup coming while we are still in ->poll() might end up doing fput() on that file. That case is not too rare, and usually we are saved by the still present reference from descriptor table - that fput() is not the final one. But if another thread closes that descriptor right after our fget() and wakeup does happen before ->poll() returns, we are in trouble - final fput() done while we are in the middle of a method: Al also wrote a patch to take an extra reference to the file descriptor to fix this, but I instead suggested we just streamline the whole file pointer handling by submit_io() so that the generic aio submission code simply keeps the file pointer around until the aio has completed. Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: initialize kiocb private in case any filesystems expect it.Mike Marshall
commit ec51f8ee1e63498e9f521ec0e5a6d04622bb2c67 upstream. A recent optimization had left private uninitialized. Fixes: 2bc4ca9bb600 ("aio: don't zero entire aio_kiocb aio_get_req()") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: abstract out io_event filler helperJens Axboe
commit 875736bb3f3ded168469f6a14df7a938416a99d5 upstream. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: split out iocb copy from io_submit_one()Jens Axboe
commit 88a6f18b950e2e4dce57d31daa151105f4f3dcff upstream. In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: use iocb_put() instead of open coding itJens Axboe
commit 71ebc6fef0f53459f37fb39e1466792232fa52ee upstream. Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: don't zero entire aio_kiocb aio_get_req()Jens Axboe
commit 2bc4ca9bb600cbe36941da2b2a67189fc4302a04 upstream. It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: separate out ring reservation from req allocationChristoph Hellwig
commit 432c79978c33ecef91b1b04cea6936c20810da29 upstream. This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: use assigned completion handlerJens Axboe
commit bc9bff61624ac33b7c95861abea1af24ee7a94fc upstream. We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02aio: clear IOCB_HIPRIChristoph Hellwig
commit 154989e45fd8de9bfb52bbd6e5ea763e437e54c5 upstream. No one is going to poll for aio (yet), so we must clear the HIPRI flag, as we would otherwise send it down the poll queues, where no one will be polling for completions. Signed-off-by: Christoph Hellwig <hch@lst.de> IOCB_HIPRI, not RWF_HIPRI. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.Tetsuo Handa
commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream. syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family (which is embedded into user-visible "struct nfs_mount_data" structure) despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6) bytes of AF_INET6 address to rpc_sockaddr2uaddr(). Since "struct nfs_mount_data" structure is user-visible, we can't change "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore, assuming that everybody is using AF_INET family when passing address via "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET. [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02fs/proc/proc_sysctl.c: Fix a NULL pointer dereferenceYueHaibing
commit 89189557b47b35683a27c80ee78aef18248eefb4 upstream. Syzkaller report this: sysctl could not get directory: /net//bridge -12 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline] RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline] RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline] RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459 Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48 RSP: 0018:ffff8881bb507778 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568 RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4 R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: erase_entry fs/proc/proc_sysctl.c:178 [inline] erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207 start_unregistering fs/proc/proc_sysctl.c:331 [inline] drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631 get_subdir fs/proc/proc_sysctl.c:1022 [inline] __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335 br_netfilter_init+0x68/0x1000 [br_netfilter] do_one_initcall+0xbc/0x47d init/main.c:901 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 68741688d5fbfe85 ]--- commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") forgot to handle start_unregistering() case, while header->parent is NULL, it calls erase_header() and as seen in the above syzkaller call trace, accessing &header->parent->root will trigger a NULL pointer dereference. As that commit explained, there is also no need to call start_unregistering() if header->parent is NULL. Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links") Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02nfsd: Don't release the callback slot unless it was actually heldTrond Myklebust
commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream. If there are multiple callbacks queued, waiting for the callback slot when the callback gets shut down, then they all currently end up acting as if they hold the slot, and call nfsd4_cb_sequence_done() resulting in interesting side-effects. In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done() causes a loop back to nfsd4_cb_prepare() without first freeing the slot, which causes a deadlock when nfsd41_cb_get_slot() gets called a second time. This patch therefore adds a boolean to track whether or not the callback did pick up the slot, so that it can do the right thing in these 2 cases. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ceph: fix ci->i_head_snapc leakYan, Zheng
commit 37659182bff1eeaaeadcfc8f853c6d2b6dbc3f47 upstream. We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps and i_flushing_caps may change. When they are all zeros, we should free i_head_snapc. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/38224 Reported-and-tested-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ceph: ensure d_name stability in ceph_dentry_hash()Jeff Layton
commit 76a495d666e5043ffc315695f8241f5e94a98849 upstream. Take the d_lock here to ensure that d_name doesn't change. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ceph: only use d_name directly when parent is lockedJeff Layton
commit 1bcb344086f3ecf8d6705f6d708441baa823beb3 upstream. Ben reported tripping the BUG_ON in create_request_message during some performance testing. Analysis of the vmcore showed that the length of the r_dentry->d_name string changed after we allocated the buffer, but before we encoded it. build_dentry_path returns pointers to d_name in the common case of non-snapped dentries, but this optimization isn't safe unless the parent directory is locked. When it isn't, have the code make a copy of the d_name while holding the d_lock. Cc: stable@vger.kernel.org Reported-by: Ben England <bengland@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02tracing: Fix buffer_ref pipe opsJann Horn
commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream. This fixes multiple issues in buffer_pipe_buf_ops: - The ->steal() handler must not return zero unless the pipe buffer has the only reference to the page. But generic_pipe_buf_steal() assumes that every reference to the pipe is tracked by the page's refcount, which isn't true for these buffers - buffer_pipe_buf_get(), which duplicates a buffer, doesn't touch the page's refcount. Fix it by using generic_pipe_buf_nosteal(), which refuses every attempted theft. It should be easy to actually support ->steal, but the only current users of pipe_buf_steal() are the virtio console and FUSE, and they also only use it as an optimization. So it's probably not worth the effort. - The ->get() and ->release() handlers can be invoked concurrently on pipe buffers backed by the same struct buffer_ref. Make them safe against concurrency by using refcount_t. - The pointers stored in ->private were only zeroed out when the last reference to the buffer_ref was dropped. As far as I know, this shouldn't be necessary anyway, but if we do it, let's always do it. Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02cifs: do not attempt cifs operation on smb2+ rename errorFrank Sorenson
commit 652727bbe1b17993636346716ae5867627793647 upstream. A path-based rename returning EBUSY will incorrectly try opening the file with a cifs (NT Create AndX) operation on an smb2+ mount, which causes the server to force a session close. If the mount is smb2+, skip the fallback. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02cifs: fix memory leak in SMB2_readRonnie Sahlberg
commit 05fd5c2c61732152a6bddc318aae62d7e436629b upstream. Commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e introduced a leak where if SMB2_read() returned an error we would return without freeing the request buffer. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ext4: fix some error pointer dereferencesDan Carpenter
[ Upstream commit 7159a986b4202343f6cca3bb8079ecace5816fd6 ] We can't pass error pointers to brelse(). Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli
dumping commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream. The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: fix handle leak in smb2_query_symlink()Ronnie Sahlberg
commit e6d0fb7b34f264f72c33053558a360a6a734905e upstream. If we enter smb2_query_symlink() for something that is not a symlink and where the SMB2_open() would succeed we would never end up closing this handle and would thus leak a handle on the server. Fix this by immediately calling SMB2_close() on successfull open. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: Fix use-after-free in SMB2_readZhangXiaoxu
commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e upstream. There is a KASAN use-after-free: BUG: KASAN: use-after-free in SMB2_read+0x1136/0x1190 Read of size 8 at addr ffff8880b4e45e50 by task ln/1009 Should not release the 'req' because it will use in the trace. Fixes: eccb4422cf97 ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> 4.18+ Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27cifs: Fix use-after-free in SMB2_writeZhangXiaoxu
commit 6a3eb3360667170988f8a6477f6686242061488a upstream. There is a KASAN use-after-free: BUG: KASAN: use-after-free in SMB2_write+0x1342/0x1580 Read of size 8 at addr ffff8880b6a8e450 by task ln/4196 Should not release the 'req' because it will use in the trace. Fixes: eccb4422cf97 ("smb3: Add ftrace tracepoints for improved SMB3 debugging") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> 4.18+ Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27CIFS: keep FileInfo handle live during oplock breakAurelien Aptel
commit b98749cac4a695f084a5ff076f4510b23e353ecd upstream. In the oplock break handler, writing pending changes from pages puts the FileInfo handle. If the refcount reaches zero it closes the handle and waits for any oplock break handler to return, thus causing a deadlock. To prevent this situation: * We add a wait flag to cifsFileInfo_put() to decide whether we should wait for running/pending oplock break handlers * We keep an additionnal reference of the SMB FileInfo handle so that for the rest of the handler putting the handle won't close it. - The ref is bumped everytime we queue the handler via the cifs_queue_oplock_break() helper. - The ref is decremented at the end of the handler This bug was triggered by xfstest 464. Also important fix to address the various reports of oops in smb2_push_mandatory_locks Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20f2fs: fix to dirty inode for i_mode recoveryChao Yu
[ Upstream commit ca597bddedd94906cd761d8be6a3ad21292725de ] As Seulbae Kim reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202637 We didn't recover permission field correctly after sudden power-cut, the reason is in setattr we didn't add inode into global dirty list once i_mode is changed, so latter checkpoint triggered by fsync will not flush last i_mode into disk, result in this problem, fix it. Reported-by: Seulbae Kim <seulbae@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20cifs: fallback to older infolevels on findfirst queryinfo retrySteve French
[ Upstream commit 3b7960caceafdfc2cdfe2850487f8d091eb41144 ] In cases where queryinfo fails, we have cases in cifs (vers=1.0) where with backupuid mounts we retry the query info with findfirst. This doesn't work to some NetApp servers which don't support WindowsXP (and later) infolevel 261 (SMB_FIND_FILE_ID_FULL_DIR_INFO) so in this case use other info levels (in this case it will usually be level 257, SMB_FIND_FILE_DIRECTORY_INFO). (Also fixes some indentation) See kernel bugzilla 201435 Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20f2fs: cleanup dirty pages if recover failedSheng Yong
[ Upstream commit 26b5a079197c8cb6725565968b7fd3299bd1877b ] During recover, we will try to create new dentries for inodes with dentry_mark. But if the parent is missing (e.g. killed by fsck), recover will break. But those recovered dirty pages are not cleanup. This will hit f2fs_bug_on: [ 53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint [ 53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3 [ 53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2 [ 53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2 [ 53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295 [ 53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320 [ 53.546174] Modules linked in: [ 53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1 [ 53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320 [ 53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8 [ 53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282 [ 53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33 [ 53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590 [ 53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3 [ 53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff [ 53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009 [ 53.546226] FS: 00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000 [ 53.546229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0 [ 53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 53.546242] Call Trace: [ 53.546248] f2fs_submit_page_bio+0x95/0x740 [ 53.546253] read_node_page+0x161/0x1e0 [ 53.546271] ? truncate_node+0x650/0x650 [ 53.546283] ? add_to_page_cache_lru+0x12c/0x170 [ 53.546288] ? pagecache_get_page+0x262/0x2d0 [ 53.546292] __get_node_page+0x200/0x660 [ 53.546302] f2fs_update_inode_page+0x4a/0x160 [ 53.546306] f2fs_write_inode+0x86/0xb0 [ 53.546317] __writeback_single_inode+0x49c/0x620 [ 53.546322] writeback_single_inode+0xe4/0x1e0 [ 53.546326] sync_inode_metadata+0x93/0xd0 [ 53.546330] ? sync_inode+0x10/0x10 [ 53.546342] ? do_raw_spin_unlock+0xed/0x100 [ 53.546347] f2fs_sync_inode_meta+0xe0/0x130 [ 53.546351] f2fs_fill_super+0x287d/0x2d10 [ 53.546367] ? vsnprintf+0x742/0x7a0 [ 53.546372] ? f2fs_commit_super+0x180/0x180 [ 53.546379] ? up_write+0x20/0x40 [ 53.546385] ? set_blocksize+0x5f/0x140 [ 53.546391] ? f2fs_commit_super+0x180/0x180 [ 53.546402] mount_bdev+0x181/0x200 [ 53.546406] mount_fs+0x94/0x180 [ 53.546411] vfs_kern_mount+0x6c/0x1e0 [ 53.546415] do_mount+0xe5e/0x1510 [ 53.546420] ? fs_reclaim_release+0x9/0x30 [ 53.546424] ? copy_mount_string+0x20/0x20 [ 53.546428] ? fs_reclaim_acquire+0xd/0x30 [ 53.546435] ? __might_sleep+0x2c/0xc0 [ 53.546440] ? ___might_sleep+0x53/0x170 [ 53.546453] ? __might_fault+0x4c/0x60 [ 53.546468] ? _copy_from_user+0x95/0xa0 [ 53.546474] ? memdup_user+0x39/0x60 [ 53.546478] ksys_mount+0x88/0xb0 [ 53.546482] __x64_sys_mount+0x5d/0x70 [ 53.546495] do_syscall_64+0x65/0x130 [ 53.546503] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.547639] ---[ end trace b804d1ea2fec893e ]--- So if recover fails, we need to drop all recovered data. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20f2fs: fix to do sanity check with current segment numberChao Yu
[ Upstream commit 042be0f849e5fc24116d0afecfaf926eed5cac63 ] https://bugzilla.kernel.org/show_bug.cgi?id=200219 Reproduction way: - mount image - run poc code - umount image F2FS-fs (loop1): Bitmap was wrongly set, blk:15364 ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 2 PID: 17686 Comm: umount Tainted: G W O 4.18.0-rc2+ #39 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 EIP: update_sit_entry+0x459/0x4e0 [f2fs] Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44 EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001 ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282 CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0 Call Trace: f2fs_allocate_data_block+0x124/0x580 [f2fs] do_write_page+0x78/0x150 [f2fs] f2fs_do_write_node_page+0x25/0xa0 [f2fs] __write_node_page+0x2bf/0x550 [f2fs] f2fs_sync_node_pages+0x60e/0x6d0 [f2fs] ? sync_inode_metadata+0x2f/0x40 ? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs] ? up_write+0x1e/0x80 f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs] ? mark_held_locks+0x5d/0x80 ? _raw_spin_unlock_irq+0x27/0x50 kill_f2fs_super+0x68/0x90 [f2fs] deactivate_locked_super+0x3d/0x70 deactivate_super+0x40/0x60 cleanup_mnt+0x39/0x70 __cleanup_mnt+0x10/0x20 task_work_run+0x81/0xa0 exit_to_usermode_loop+0x59/0xa7 do_fast_syscall_32+0x1f5/0x22c entry_SYSENTER_32+0x53/0x86 EIP: 0xb7f95c51 Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000 ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246 Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs] ---[ end trace d423f83982cfcdc5 ]--- The reason is, different log headers using the same segment, once one log's next block address is used by another log, it will cause panic as above. Main area: 24 segs, 24 secs 24 zones - COLD data: 0, 0, 0 - WARM data: 1, 1, 1 - HOT data: 20, 20, 20 - Dir dnode: 22, 22, 22 - File dnode: 22, 22, 22 - Indir nodes: 21, 21, 21 So this patch adds sanity check to detect such condition to avoid this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-209p locks: add mount option for lock retry intervalDinu-Razvan Chis-Serban
[ Upstream commit 5e172f75e51e3de1b4274146d9b990f803cb5c2a ] The default P9_LOCK_TIMEOUT can be too long for some users exporting a local file system to a guest VM (30s), make this configurable at mount time. Link: http://lkml.kernel.org/r/1536295827-3181-1-git-send-email-asmadeus@codewreck.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195727 Signed-off-by: Dinu-Razvan Chis-Serban <justcsdr@gmail.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-209p: do not trust pdu content for stat item sizeGertjan Halkes
[ Upstream commit 2803cf4379ed252894f046cb8812a48db35294e3 ] v9fs_dir_readdir() could deadloop if a struct was sent with a size set to -2 Link: http://lkml.kernel.org/r/1536134432-11997-1-git-send-email-asmadeus@codewreck.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88021 Signed-off-by: Gertjan Halkes <gertjan@google.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20f2fs: fix to avoid NULL pointer dereference on se->discard_mapChao Yu
[ Upstream commit 7d20c8abb2edcf962ca857d51f4d0f9cd4b19053 ] https://bugzilla.kernel.org/show_bug.cgi?id=200951 These is a NULL pointer dereference issue reported in bugzilla: Hi, in the setup there is a SATA SSD connected to a SATA-to-USB bridge. The disc is "Samsung SSD 850 PRO 256G" which supports TRIM. There are four partitions: sda1: FAT /boot sda2: F2FS / sda3: F2FS /home sda4: F2FS The bridge is ASMT1153e which uses the "uas" driver. There is no TRIM pass-through, so, when mounting it reports: mounting with "discard" option, but the device does not support discard The USB host is USB3.0 and UASP capable. It is the one on RK3399. Given this everything works fine, except there is no TRIM support. In order to enable TRIM a new UDEV rule is added [1]: /etc/udev/rules.d/10-sata-bridge-trim.rules: ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap" After reboot any F2FS write hangs forever and dmesg reports: Unable to handle kernel NULL pointer dereference Also tested on a x86_64 system: works fine even with TRIM enabled. same disc same bridge different usb host controller different cpu architecture not root filesystem Regards, Vicenç. [1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280 Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122 [000000000000003e] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1 Hardware name: Sapphire-RK3399 Board (DT) pstate: 00000005 (nzcv daif -PAN -UAO) pc : update_sit_entry+0x304/0x4b0 lr : update_sit_entry+0x108/0x4b0 sp : ffff00000ca13bd0 x29: ffff00000ca13bd0 x28: 000000000000003e x27: 0000000000000020 x26: 0000000000080000 x25: 0000000000000048 x24: ffff8000ebb85cf8 x23: 0000000000000253 x22: 00000000ffffffff x21: 00000000000535f2 x20: 00000000ffffffdf x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8 x17: 0000000007ce6926 x16: 000000001c83ffa8 x15: 0000000000000000 x14: ffff8000f602df90 x13: 0000000000000006 x12: 0000000000000040 x11: 0000000000000228 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 00000000000535f2 x6 : ffff8000ebff3440 x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8 x3 : 00000000ffffffff x2 : 0000000000000020 x1 : 0000000000000000 x0 : ffff8000eb9e5800 Process nvim (pid: 957, stack limit = 0x0000000063a78320) Call trace: update_sit_entry+0x304/0x4b0 f2fs_invalidate_blocks+0x98/0x140 truncate_node+0x90/0x400 f2fs_remove_inode_page+0xe8/0x340 f2fs_evict_inode+0x2b0/0x408 evict+0xe0/0x1e0 iput+0x160/0x260 do_unlinkat+0x214/0x298 __arm64_sys_unlinkat+0x3c/0x68 el0_svc_handler+0x94/0x118 el0_svc+0x8/0xc Code: f9400800 b9488400 36080140 f9400f01 (387c4820) ---[ end trace a0f21a307118c477 ]--- The reason is it is possible to enable discard flag on block queue via UDEV, but during mount, f2fs will initialize se->discard_map only if this flag is set, once the flag is set after mount, f2fs may dereference NULL pointer on se->discard_map. So this patch does below changes to fix this issue: - initialize and update se->discard_map all the time. - don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag during mount. - don't issue small discard on zoned block device. - introduce some functions to enhance the readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: prohibit fstrim in norecovery modeDarrick J. Wong
[ Upstream commit 18915b5873f07e5030e6fb108a050fa7c71c59fb ] The ext4 fstrim implementation uses the block bitmaps to find free space that can be discarded. If we haven't replayed the journal, the bitmaps will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20x86/gart: Exclude GART aperture from kcoreKairui Song
[ Upstream commit ffc8599aa9763f39f6736a79da4d1575e7006f9a ] On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20fix incorrect error code mapping for OBJECTID_NOT_FOUNDSteve French
[ Upstream commit 85f9987b236cf46e06ffdb5c225cf1f3c0acb789 ] It was mapped to EIO which can be confusing when user space queries for an object GUID for an object for which the server file system doesn't support (or hasn't saved one). As Amir Goldstein suggested this is similar to ENOATTR (equivalently ENODATA in Linux errno definitions) so changing NT STATUS code mapping for OBJECTID_NOT_FOUND to ENODATA. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: report real fs size after failed resizeLukas Czerner
[ Upstream commit 6c7328400e0488f7d49e19e02290ba343b6811b2 ] Currently when the file system resize using ext4_resize_fs() fails it will report into log that "resized filesystem to <requested block count>". However this may not be true in the case of failure. Use the current block count as returned by ext4_blocks_count() to report the block count. Additionally, report a warning that "error occurred during file system resize" Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: add missing brelse() in add_new_gdb_meta_bg()Lukas Czerner
[ Upstream commit d64264d6218e6892edd832dc3a5a5857c2856c53 ] Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh in case ext4_journal_get_write_access() fails. Additionally kvfree() is missing in the same error path. Fix it by moving the ext4_journal_get_write_access() before the ext4 sb update as Ted suggested and release n_group_desc and gdb_bh in case it fails. Fixes: 61a9c11e5e7a ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20ext4: avoid panic during forced rebootJan Kara
[ Upstream commit 1dc1097ff60e4105216da7cd0aa99032b039a994 ] When admin calls "reboot -f" - i.e., does a hard system reboot by directly calling reboot(2) - ext4 filesystem mounted with errors=panic can panic the system. This happens because the underlying device gets disabled without unmounting the filesystem and thus some syscall running in parallel to reboot(2) can result in the filesystem getting IO errors. This is somewhat surprising to the users so try improve the behavior by switching to errors=remount-ro behavior when the system is running reboot(2). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()ZhangXiaoxu
[ Upstream commit 62c9d2674b31d4c8a674bee86b7edc6da2803aea ] Commit 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") forgot to call fsnotify_put_mark() with IN_MASK_CREATE after fsnotify_find_mark() Fixes: 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-17block: fix the return errno for direct IOJason Yan
commit a89afe58f1a74aac768a5eb77af95ef4ee15beaa upstream. If the last bio returned is not dio->bio, the status of the bio will not assigned to dio->bio if it is error. This will cause the whole IO status wrong. ksoftirqd/21-117 [021] ..s. 4017.966090: 8,0 C N 4883648 [0] <idle>-0 [018] ..s. 4017.970888: 8,0 C WS 4924800 + 1024 [0] <idle>-0 [018] ..s. 4017.970909: 8,0 D WS 4935424 + 1024 [<idle>] <idle>-0 [018] ..s. 4017.970924: 8,0 D WS 4936448 + 321 [<idle>] ksoftirqd/21-117 [021] ..s. 4017.995033: 8,0 C R 4883648 + 336 [65475] ksoftirqd/21-117 [021] d.s. 4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7 ksoftirqd/21-117 [021] d.s. 4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0 We always have to assign bio->bi_status to dio->bio.bi_status because we will only check dio->bio.bi_status when we return the whole IO to the upper layer. Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17btrfs: prop: fix vanished compression property after failed setAnand Jain
commit 272e5326c7837697882ce3162029ba893059b616 upstream. The compression property resets to NULL, instead of the old value if we fail to set the new compression parameter. $ btrfs prop get /btrfs compression compression=lzo $ btrfs prop set /btrfs compression zli ERROR: failed to set compression for /btrfs: Invalid argument $ btrfs prop get /btrfs compression This is because the compression property ->validate() is successful for 'zli' as the strncmp() used the length passed from the userspace. Fix it by using the expected string length in strncmp(). Fixes: 63541927c8d1 ("Btrfs: add support for inode properties") Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17btrfs: prop: fix zstd compression parameter validationAnand Jain
commit 50398fde997f6be8faebdb5f38e9c9c467370f51 upstream. We let pass zstd compression parameter even if it is not fully valid. For example: $ btrfs prop set /btrfs compression zst $ btrfs prop get /btrfs compression compression=zst zlib and lzo are fine. Fix it by checking the correct prefix length. Fixes: 5c1aab1dd544 ("btrfs: Add zstd support") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17Btrfs: do not allow trimming when a fs is mounted with the nologreplay optionFilipe Manana
commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream. Whan a filesystem is mounted with the nologreplay mount option, which requires it to be mounted in RO mode as well, we can not allow discard on free space inside block groups, because log trees refer to extents that are not pinned in a block group's free space cache (pinning the extents is precisely the first phase of replaying a log tree). So do not allow the fitrim ioctl to do anything when the filesystem is mounted with the nologreplay option, because later it can be mounted RW without that option, which causes log replay to happen and result in either a failure to replay the log trees (leading to a mount failure), a crash or some silent corruption. Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>