aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block
AgeCommit message (Collapse)Author
2019-10-05nbd: add missing config putMike Christie
[ Upstream commit 887e975c4172d0d5670c39ead2f18ba1e4ec8133 ] Fix bug added with the patch: commit 8f3ea35929a0806ad1397db99a89ffee0140822a Author: Josef Bacik <josef@toxicpanda.com> Date: Mon Jul 16 12:11:35 2018 -0400 nbd: handle unexpected replies better where if the timeout handler runs when the completion path is and we fail to grab the mutex in the timeout handler we will leave a config reference and cannot free the config later. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05loop: Add LOOP_SET_DIRECT_IO to compat ioctlAlessio Balsini
[ Upstream commit fdbe4eeeb1aac219b14f10c0ed31ae5d1123e9b8 ] Enabling Direct I/O with loop devices helps reducing memory usage by avoiding double caching. 32 bit applications running on 64 bits systems are currently not able to request direct I/O because is missing from the lo_compat_ioctl. This patch fixes the compatibility issue mentioned above by exporting LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry. The input argument for this ioctl is a single long converted to a 1-bit boolean, so compatibility is preserved. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21floppy: fix usercopy directionJann Horn
commit 52f6f9d74f31078964ca1574f7bb612da7877ac8 upstream. As sparse points out, these two copy_from_user() should actually be copy_to_user(). Fixes: 229b53c9bf4e ("take floppy compat ioctls to sodding floppy.c") Cc: stable@vger.kernel.org Acked-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-06xen/blkback: fix memory leaksWenwen Wang
[ Upstream commit ae78ca3cf3d9e9f914bfcd0bc5c389ff18b9c2e0 ] In read_per_ring_refs(), after 'req' and related memory regions are allocated, xen_blkif_map() is invoked to map the shared frame, irq, and etc. However, if this mapping process fails, no cleanup is performed, leading to memory leaks. To fix this issue, invoke the cleanup before returning the error. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16drbd: dynamically allocate shash descriptorArnd Bergmann
[ Upstream commit 77ce56e2bfaa64127ae5e23ef136c0168b818777 ] Building with clang and KASAN, we get a warning about an overly large stack frame on 32-bit architectures: drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect' [-Werror,-Wframe-larger-than=] We already allocate other data dynamically in this function, so just do the same for the shash descriptor, which makes up most of this memory. Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/ Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16loop: set PF_MEMALLOC_NOIO for the worker threadMikulas Patocka
commit d0a255e795ab976481565f6ac178314b34fbf891 upstream. A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06nbd: replace kill_bdev() with __invalidate_device() againMunehisa Kamata
commit 2b5c8f0063e4b263cf2de82029798183cf85c320 upstream. Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()") once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere") resurrected kill_bdev() and it has been there since then. So buffer_head mappings still get killed on a server disconnection, and we can still hit the BUG_ON on a filesystem on the top of the nbd device. EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null) block nbd0: Receive control failed (result -32) block nbd0: shutting down sockets print_req_error: I/O error, dev nbd0, sector 66264 flags 3000 EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block print_req_error: I/O error, dev nbd0, sector 2264 flags 3000 EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure ------------[ cut here ]------------ kernel BUG at fs/buffer.c:3057! invalid opcode: 0000 [#1] SMP PTI CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4 Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:submit_bh_wbc+0x18b/0x190 ... Call Trace: jbd2_write_superblock+0xf1/0x230 [jbd2] ? account_entity_enqueue+0xc5/0xf0 jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2] jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2] ? __switch_to_asm+0x40/0x70 ... ? lock_timer_base+0x67/0x80 kjournald2+0x121/0x360 [jbd2] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? commit_timeout+0x10/0x10 [jbd2] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 With __invalidate_device(), I no longer hit the BUG_ON with sync or unmount on the disconnected device. Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere") Cc: linux-block@vger.kernel.org Cc: Ratna Manoj Bolla <manoj.br@gmail.com> Cc: nbd@other.debian.org Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw@amazon.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26floppy: fix out-of-bounds read in copy_bufferDenis Efremov
[ Upstream commit da99466ac243f15fbba65bd261bfc75ffa1532b6 ] This fixes a global out-of-bounds read access in the copy_buffer function of the floppy driver. The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect and head fields (unsigned int) of the floppy_drive structure are used to compute the max_sector (int) in the make_raw_rw_request function. It is possible to overflow the max_sector. Next, max_sector is passed to the copy_buffer function and used in one of the memcpy calls. An unprivileged user could trigger the bug if the device is accessible, but requires a floppy disk to be inserted. The patch adds the check for the .sect * .head multiplication for not overflowing in the set_geometry function. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26floppy: fix invalid pointer dereference in drive_nameDenis Efremov
[ Upstream commit 9b04609b784027968348796a18f601aed9db3789 ] This fixes the invalid pointer dereference in the drive_name function of the floppy driver. The native_format field of the struct floppy_drive_params is used as floppy_type array index in the drive_name function. Thus, the field should be checked the same way as the autodetect field. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should be used to call the drive_name. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for a value of the native_format field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26floppy: fix out-of-bounds read in next_valid_formatDenis Efremov
[ Upstream commit 5635f897ed83fd539df78e98ba69ee91592f9bb8 ] This fixes a global out-of-bounds read access in the next_valid_format function of the floppy driver. The values from autodetect field of the struct floppy_drive_params are used as indices for the floppy_type array in the next_valid_format function 'floppy_type[DP->autodetect[probed_format]].sect'. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for values of the autodetect field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26floppy: fix div-by-zero in setup_format_paramsDenis Efremov
[ Upstream commit f3554aeb991214cbfafd17d55e2bfddb50282e32 ] This fixes a divide by zero error in the setup_format_params function of the floppy driver. Two consecutive ioctls can trigger the bug: The first one should set the drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK to become zero. Next, the floppy format operation should be called. A floppy disk is not required to be inserted. An unprivileged user could trigger the bug if the device is accessible. The patch checks F_SECT_PER_TRACK for a non-zero value in the set_geometry function. The proper check should involve a reasonable upper limit for the .sect and .rate fields, but it could change the UAPI. The patch also checks F_SECT_PER_TRACK in the setup_format_params, and cancels the formatting operation in case of zero. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26block: null_blk: fix race condition for null_del_devBob Liu
[ Upstream commit 7602843fd873cae43a444b83b14dfdd114a9659c ] Dulicate call of null_del_dev() will trigger null pointer error like below. The reason is a race condition between nullb_device_power_store() and nullb_group_drop_item(). CPU#0 CPU#1 ---------------- ----------------- do_rmdir() >configfs_rmdir() >client_drop_item() >nullb_group_drop_item() nullb_device_power_store() >null_del_dev() >test_and_clear_bit(NULLB_DEV_FL_UP >null_del_dev() ^^^^^ Duplicated null_dev_dev() triger null pointer error >clear_bit(NULLB_DEV_FL_UP The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev(). [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 698.613608] #PF error: [normal kernel read fault] [ 698.613611] PGD 0 P4D 0 [ 698.613619] Oops: 0000 [#1] SMP PTI [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35 [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018 [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk] [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48 [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286 [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681 [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000 [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000 [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08 [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000 [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000 [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0 [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 698.613700] Call Trace: [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk] [ 698.613722] client_drop_item+0x29/0x40 [ 698.613728] configfs_rmdir+0x1ed/0x300 [ 698.613738] vfs_rmdir+0xb2/0x130 [ 698.613743] do_rmdir+0x1c7/0x1e0 [ 698.613750] __x64_sys_rmdir+0x17/0x20 [ 698.613759] do_syscall_64+0x5a/0x110 [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11xen-blkfront: switch kcalloc to kvcalloc for large array allocationRoger Pau Monne
commit 1d5c76e66433382a1e170d1d5845bb0fed7467aa upstream. There's no reason to request physically contiguous memory for those allocations. [boris: added CC to stable] Cc: stable@vger.kernel.org Reported-by: Ian Jackson <ian.jackson@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25brd: re-enable __GFP_HIGHMEM in brd_insert_page()Hou Tao
commit f6b50160a06d4a0d6a3999ab0c5aec4f52dba248 upstream. __GFP_HIGHMEM is disabled if dax is enabled on brd, however dax support for brd has been removed since commit (7a862fbbdec6 "brd: remove dax support"), so restore __GFP_HIGHMEM in brd_insert_page(). Also remove the no longer applicable comments about DAX and highmem. Cc: stable@vger.kernel.org Fixes: 7a862fbbdec6 ("brd: remove dax support") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-10virtio-blk: limit number of hw queues by nr_cpu_idsDongli Zhang
[ Upstream commit bf348f9b78d413e75bb079462751a1d86b6de36c ] When tag_set->nr_maps is 1, the block layer limits the number of hw queues by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues. In addition, specifically for pci scenario, when the 'num-queues' specified by qemu is more than maxcpus, virtio-blk would not be able to allocate more than maxcpus vectors in order to have a vector for each queue. As a result, it falls back into MSI-X with one vector for config and one shared for queues. Considering above reasons, this patch limits the number of hw queues used by virtio-blk by nr_cpu_ids. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08xsysace: Fix error handling in ace_setupGuenter Roeck
[ Upstream commit 47b16820c490149c2923e8474048f2c6e7557cab ] If xace hardware reports a bad version number, the error handling code in ace_setup() calls put_disk(), followed by queue cleanup. However, since the disk data structure has the queue pointer set, put_disk() also cleans and releases the queue. This results in blk_cleanup_queue() accessing an already released data structure, which in turn may result in a crash such as the following. [ 10.681671] BUG: Kernel NULL pointer dereference at 0x00000040 [ 10.681826] Faulting instruction address: 0xc0431480 [ 10.682072] Oops: Kernel access of bad area, sig: 11 [#1] [ 10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440 [ 10.682387] Modules linked in: [ 10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G W 5.0.0-rc6-next-20190218+ #2 [ 10.682733] NIP: c0431480 LR: c043147c CTR: c0422ad8 [ 10.682863] REGS: cf82fbe0 TRAP: 0300 Tainted: G W (5.0.0-rc6-next-20190218+) [ 10.683065] MSR: 00029000 <CE,EE,ME> CR: 22000222 XER: 00000000 [ 10.683236] DEAR: 00000040 ESR: 00000000 [ 10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000 [ 10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000 [ 10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000 [ 10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800 [ 10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114 [ 10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114 [ 10.684602] Call Trace: [ 10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable) [ 10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c [ 10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68 [ 10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c [ 10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508 [ 10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8 [ 10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c [ 10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464 [ 10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4 [ 10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc [ 10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0 [ 10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234 [ 10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c [ 10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac [ 10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330 [ 10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478 [ 10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114 [ 10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c [ 10.687349] Instruction dump: [ 10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008 [ 10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008 [ 10.688056] ---[ end trace 13c9ff51d41b9d40 ]--- Fix the problem by setting the disk queue pointer to NULL before calling put_disk(). A more comprehensive fix might be to rearrange the code to check the hardware version before initializing data structures, but I don't know if this would have undesirable side effects, and it would increase the complexity of backporting the fix to older kernels. Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash interface") Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-02zram: pass down the bvec we need to read into in the work structJérôme Glisse
commit e153abc0739ff77bd89c9ba1688cdb963464af97 upstream. When scheduling work item to read page we need to pass down the proper bvec struct which points to the page to read into. Before this patch it uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is wrong. Note that without this patch on arch/kernel where PAGE_SIZE != 4096 userspace could read random memory through a zram block device (thought userspace probably would have no control on the address being read). Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02loop: do not print warn message if partition scan is successfulDongli Zhang
[ Upstream commit 40853d6fc619a6fd3d3177c3973a2eac9b598a80 ] Do not print warn message when the partition scan returns 0. Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()Dongli Zhang
[ Upstream commit 758a58d0bc67457f1215321a536226654a830eeb ] Commit 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") moves blkdev_reread_part() out of the loop_ctl_mutex. However, GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result, __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and will not rescan the loop device to delete all partitions. Below are steps to reproduce the issue: step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100 step2 # losetup -P /dev/loop0 tmp.raw step3 # parted /dev/loop0 mklabel gpt step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1 step5 # losetup -d /dev/loop0 Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and there is below kernel warning message: [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22) This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part(). Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-27loop: access lo_backing_file only when the loop device is Lo_boundDongli Zhang
commit f7c8a4120eedf24c36090b7542b179ff7a649219 upstream. Commit 758a58d0bc67 ("loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()") separates "lo->lo_backing_file = NULL" and "lo->lo_state = Lo_unbound" into different critical regions protected by loop_ctl_mutex. However, there is below race that the NULL lo->lo_backing_file would be accessed when the backend of a loop is another loop device, e.g., loop0's backend is a file, while loop1's backend is loop0. loop0's backend is file loop1's backend is loop0 __loop_clr_fd() mutex_lock(&loop_ctl_mutex); lo->lo_backing_file = NULL; --> set to NULL mutex_unlock(&loop_ctl_mutex); loop_set_fd() mutex_lock_killable(&loop_ctl_mutex); loop_validate_file() f = l->lo_backing_file; --> NULL access if loop0 is not Lo_unbound mutex_lock(&loop_ctl_mutex); lo->lo_state = Lo_unbound; mutex_unlock(&loop_ctl_mutex); lo->lo_backing_file should be accessed only when the loop device is Lo_bound. In fact, the problem has been introduced already in commit 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()") after which loop_validate_file() could see devices in Lo_rundown state with which it did not count. It was harmless at that point but still. Fixes: 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()") Reported-by: syzbot+9bdc1adc1c55e7fe765b@syzkaller.appspotmail.com Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23floppy: check_events callback should not return a negative numberYufen Yu
[ Upstream commit 96d7cb932e826219ec41ac02e5af037ffae6098c ] floppy_check_events() is supposed to return bit flags to say which events occured. We should return zero to say that no event flags are set. Only BIT(0) and BIT(1) are used in the caller. And .check_events interface also expect to return an unsigned int value. However, after commit a0c80efe5956, it may return -EINTR (-4u). Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't affect runtime, but it obviously is still worth fixing. Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12block/swim3: Fix -EBUSY error when re-opening device after unmountFinn Thain
[ Upstream commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 ] When the block device is opened with FMODE_EXCL, ref_count is set to -1. This value doesn't get reset when the device is closed which means the device cannot be opened again. Fix this by checking for refcount <= 0 in the release method. Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12zram: fix lockdep warning of free block handlingMinchan Kim
[ Upstream commit 3c9959e025472122a61faebb208525cf26b305d1 ] Patch series "zram idle page writeback", v3. Inherently, swap device has many idle pages which are rare touched since it was allocated. It is never problem if we use storage device as swap. However, it's just waste for zram-swap. This patchset supports zram idle page writeback feature. * Admin can define what is idle page "no access since X time ago" * Admin can define when zram should writeback them * Admin can define when zram should stop writeback to prevent wearout Details are in each patch's description. This patch (of 7): ================================ WARNING: inconsistent lock state 4.19.0+ #390 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50 {SOFTIRQ-ON-W} state was registered at: _raw_spin_lock+0x2c/0x40 zram_make_request+0x755/0xdc9 generic_make_request+0x373/0x6a0 submit_bio+0x6c/0x140 __swap_writepage+0x3a8/0x480 shrink_page_list+0x1102/0x1a60 shrink_inactive_list+0x21b/0x3f0 shrink_node_memcg.constprop.99+0x4f8/0x7e0 shrink_node+0x7d/0x2f0 do_try_to_free_pages+0xe0/0x300 try_to_free_pages+0x116/0x2b0 __alloc_pages_slowpath+0x3f4/0xf80 __alloc_pages_nodemask+0x2a2/0x2f0 __handle_mm_fault+0x42e/0xb50 handle_mm_fault+0x55/0xb0 __do_page_fault+0x235/0x4b0 page_fault+0x1e/0x30 irq event stamp: 228412 hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600 hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600 softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427 softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&zram->bitmap_lock)->rlock); <Interrupt> lock(&(&zram->bitmap_lock)->rlock); *** DEADLOCK *** no locks held by zram_verify/2095. stack backtrace: CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: <IRQ> dump_stack+0x67/0x9b print_usage_bug+0x1bd/0x1d3 mark_lock+0x4aa/0x540 __lock_acquire+0x51d/0x1300 lock_acquire+0x90/0x180 _raw_spin_lock+0x2c/0x40 put_entry_bdev+0x1e/0x50 zram_free_page+0xf6/0x110 zram_slot_free_notify+0x42/0xa0 end_swap_bio_read+0x5b/0x170 blk_update_request+0x8f/0x340 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x98/0x650 blk_done_softirq+0x9e/0xd0 __do_softirq+0xcc/0x427 irq_exit+0xd1/0xe0 do_IRQ+0x93/0x120 common_interrupt+0xf/0xf </IRQ> With writeback feature, zram_slot_free_notify could be called in softirq context by end_swap_bio_read. However, bitmap_lock is not aware of that so lockdep yell out: get_entry_bdev spin_lock(bitmap->lock); irq softirq end_swap_bio_read zram_slot_free_notify zram_slot_lock <-- deadlock prone zram_free_page put_entry_bdev spin_lock(bitmap->lock); <-- deadlock prone With akpm's suggestion (i.e. bitmap operation is already atomic), we could remove bitmap lock. It might fail to find a empty slot if serious contention happens. However, it's not severe problem because huge page writeback has already possiblity to fail if there is severe memory pressure. Worst case is just keeping the incompressible in memory, not storage. The other problem is zram_slot_lock in zram_slot_slot_free_notify. To make it safe is this patch introduces zram_slot_trylock where zram_slot_free_notify uses it. Although it's rare to be contented, this patch adds new debug stat "miss_free" to keep monitoring how often it happens. Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12drbd: skip spurious timeout (ping-timeo) when failing promoteLars Ellenberg
[ Upstream commit 9848b6ddd8c92305252f94592c5e278574e7a6ac ] If you try to promote a Secondary while connected to a Primary and allow-two-primaries is NOT set, we will wait for "ping-timeout" to give this node a chance to detect a dead primary, in case the cluster manager noticed faster than we did. But if we then are *still* connected to a Primary, we fail (after an additional timeout of ping-timout). This change skips the spurious second timeout. Most people won't notice really, since "ping-timeout" by default is half a second. But in some installations, ping-timeout may be 10 or 20 seconds or more, and spuriously delaying the error return becomes annoying. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12drbd: disconnect, if the wrong UUIDs are attached on a connected peerLars Ellenberg
[ Upstream commit b17b59602b6dcf8f97a7dc7bc489a48388d7063a ] With "on-no-data-accessible suspend-io", DRBD requires the next attach or connect to be to the very same data generation uuid tag it lost last. If we first lost connection to the peer, then later lost connection to our own disk, we would usually refuse to re-connect to the peer, because it presents the wrong data set. However, if the peer first connects without a disk, and then attached its disk, we accepted that same wrong data set, which would be "unexpected" by any user of that DRBD and cause "undefined results" (read: very likely data corruption). The fix is to forcefully disconnect as soon as we notice that the peer attached to the "wrong" dataset. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12drbd: narrow rcu_read_lock in drbd_sync_handshakeRoland Kammerer
[ Upstream commit d29e89e34952a9ad02c77109c71a80043544296e ] So far there was the possibility that we called genlmsg_new(GFP_NOIO)/mutex_lock() while holding an rcu_read_lock(). This included cases like: drbd_sync_handshake (acquire the RCU lock) drbd_asb_recover_1p drbd_khelper drbd_bcast_event genlmsg_new(GFP_NOIO) --> may sleep drbd_sync_handshake (acquire the RCU lock) drbd_asb_recover_1p drbd_khelper notify_helper genlmsg_new(GFP_NOIO) --> may sleep drbd_sync_handshake (acquire the RCU lock) drbd_asb_recover_1p drbd_khelper notify_helper mutex_lock --> may sleep While using GFP_ATOMIC whould have been possible in the first two cases, the real fix is to narrow the rcu_read_lock. Reported-by: Jia-Ju Bai <baijiaju1990@163.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAINYoung Xiao
[ Upstream commit a11f6ca9aef989b56cd31ff4ee2af4fb31a172ec ] __vdc_tx_trigger should only loop on EAGAIN a finite number of times. See commit adddc32d6fde ("sunvnet: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN") for detail. Signed-off-by: Young Xiao <YangX92@hotmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-22nbd: Use set_blocksize() to set device blocksizeJan Kara
commit c8a83a6b54d0ca078de036aafb3f6af58c1dc5eb upstream. NBD can update block device block size implicitely through bd_set_size(). Make it explicitely set blocksize with set_blocksize() as this behavior of bd_set_size() is going away. CC: Josef Bacik <jbacik@fb.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: drop caches if offset or block_size are changedJaegeuk Kim
commit 5db470e229e22b7eda6e23b5566e532c96fb5bc3 upstream. If we don't drop caches used in old offset or block_size, we can get old data from new offset/block_size, which gives unexpected data to user. For example, Martijn found a loopback bug in the below scenario. 1) LOOP_SET_FD loads first two pages on loop file 2) LOOP_SET_STATUS64 changes the offset on the loop file 3) mount is failed due to the cached pages having wrong superblock Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Reported-by: Martijn Coenen <maco@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()Tetsuo Handa
commit 628bd85947091830a8c4872adfd5ed1d515a9cf2 upstream. Commit 0a42e99b58a20883 ("loop: Get rid of loop_index_mutex") forgot to remove mutex_unlock(&loop_ctl_mutex) from loop_control_ioctl() when replacing loop_index_mutex with loop_ctl_mutex. Fixes: 0a42e99b58a20883 ("loop: Get rid of loop_index_mutex") Reported-by: syzbot <syzbot+c0138741c2290fc5e63f@syzkaller.appspotmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Get rid of 'nested' acquisition of loop_ctl_mutexJan Kara
commit c28445fa06a3a54e06938559b9514c5a7f01c90f upstream. The nested acquisition of loop_ctl_mutex (->lo_ctl_mutex back then) has been introduced by commit f028f3b2f987e "loop: fix circular locking in loop_clr_fd()" to fix lockdep complains about bd_mutex being acquired after lo_ctl_mutex during partition rereading. Now that these are properly fixed, let's stop fooling lockdep. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutexJan Kara
commit 1dded9acf6dc9a34cd27fcf8815507e4e65b3c4f upstream. Code in loop_change_fd() drops reference to the old file (and also the new file in a failure case) under loop_ctl_mutex. Similarly to a situation in loop_set_fd() this can create a circular locking dependency if this was the last reference holding the file open. Delay dropping of the file reference until we have released loop_ctl_mutex. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Fix deadlock when calling blkdev_reread_part()Jan Kara
commit 0da03cab87e6323ff2e05b14bc7d5c6fcc531efd upstream. Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to complain about circular lock dependency between bdev->bd_mutex and lo->lo_ctl_mutex. The problem is that on loop device open or close lo_open() and lo_release() get called with bdev->bd_mutex held and they need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is called with loop_ctl_mutex held, it will call blkdev_reread_part() which acquires bdev->bd_mutex. See syzbot report for details [1]. Move call to blkdev_reread_part() in __loop_clr_fd() from under loop_ctl_mutex to finish fixing of the lockdep warning and the possible deadlock. [1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588 Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Move loop_reread_partitions() out of loop_ctl_mutexJan Kara
commit 85b0a54a82e4fbceeb1aebb7cb6909edd1a24668 upstream. Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to complain about circular lock dependency between bdev->bd_mutex and lo->lo_ctl_mutex. The problem is that on loop device open or close lo_open() and lo_release() get called with bdev->bd_mutex held and they need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is called with loop_ctl_mutex held, it will call blkdev_reread_part() which acquires bdev->bd_mutex. See syzbot report for details [1]. Move all calls of loop_rescan_partitions() out of loop_ctl_mutex to avoid lockdep warning and fix deadlock possibility. [1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588 Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Move special partition reread handling in loop_clr_fd()Jan Kara
commit d57f3374ba4817f7c8d26fae8a13d20ac8d31b92 upstream. The call of __blkdev_reread_part() from loop_reread_partition() happens only when we need to invalidate partitions from loop_release(). Thus move a detection for this into loop_clr_fd() and simplify loop_reread_partition(). This makes loop_reread_partition() safe to use without loop_ctl_mutex because we use only lo->lo_number and lo->lo_file_name in case of error for reporting purposes (thus possibly reporting outdate information is not a big deal) and we are safe from 'lo' going away under us by elevated lo->lo_refcnt. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Push loop_ctl_mutex down to loop_change_fd()Jan Kara
commit c371077000f4138ee3c15fbed50101ff24bdc91d upstream. Push loop_ctl_mutex down to loop_change_fd(). We will need this to be able to call loop_reread_partitions() without loop_ctl_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Push loop_ctl_mutex down to loop_set_fd()Jan Kara
commit 757ecf40b7e029529768eb5f9562d5eeb3002106 upstream. Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to call loop_reread_partitions() without lo_ctl_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Push loop_ctl_mutex down to loop_set_status()Jan Kara
commit 550df5fdacff94229cde0ed9b8085155654c1696 upstream. Push loop_ctl_mutex down to loop_set_status(). We will need this to be able to call loop_reread_partitions() without loop_ctl_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Push loop_ctl_mutex down to loop_get_status()Jan Kara
commit 4a5ce9ba5877e4640200d84a735361306ad1a1b8 upstream. Push loop_ctl_mutex down to loop_get_status() to avoid the unusual convention that the function gets called with loop_ctl_mutex held and releases it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Push loop_ctl_mutex down into loop_clr_fd()Jan Kara
commit 7ccd0791d98531df7cd59e92d55e4f063d48a070 upstream. loop_clr_fd() has a weird locking convention that is expects loop_ctl_mutex held, releases it on success and keeps it on failure. Untangle the mess by moving locking of loop_ctl_mutex into loop_clr_fd(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Split setting of lo_state from loop_clr_fdJan Kara
commit a2505b799a496b7b84d9a4a14ec870ff9e42e11b upstream. Move setting of lo_state to Lo_rundown out into the callers. That will allow us to unlock loop_ctl_mutex while the loop device is protected from other changes by its special state. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Push lo_ctl_mutex down into individual ioctlsJan Kara
commit a13165441d58b216adbd50252a9cc829d78a6bce upstream. Push acquisition of lo_ctl_mutex down into individual ioctl handling branches. This is a preparatory step for pushing the lock down into individual ioctl handling functions so that they can release the lock as they need it. We also factor out some simple ioctl handlers that will not need any special handling to reduce unnecessary code duplication. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Get rid of loop_index_mutexJan Kara
commit 0a42e99b58a208839626465af194cfe640ef9493 upstream. Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as there is no good reason to keep these two separate and it just complicates the locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22loop: Fold __loop_release into loop_releaseJan Kara
commit 967d1dc144b50ad005e5eecdfadfbcfb399ffff6 upstream. __loop_release() has a single call site. Fold it there. This is currently not a huge win but it will make following replacement of loop_index_mutex more obvious. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22block/loop: Use global lock for ioctl() operation.Tetsuo Handa
commit 310ca162d779efee8a2dc3731439680f3e9c1e86 upstream. syzbot is reporting NULL pointer dereference [1] which is caused by race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other loop devices at loop_validate_file() without holding corresponding lo->lo_ctl_mutex locks. Since ioctl() request on loop devices is not frequent operation, we don't need fine grained locking. Let's use global lock in order to allow safe traversal at loop_validate_file(). Note that syzbot is also reporting circular locking dependency between bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling blkdev_reread_part() with lock held. This patch does not address it. [1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3 [2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22block/loop: Don't grab "struct file" for vfs_getattr() operation.Tetsuo Handa
commit b1ab5fa309e6c49e4e06270ec67dd7b3e9971d04 upstream. vfs_getattr() needs "struct path" rather than "struct file". Let's use path_get()/path_put() rather than get_file()/fput(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is setIlya Dryomov
commit 85f5a4d666fd9be73856ed16bb36c5af5b406b29 upstream. There is a window between when RBD_DEV_FLAG_REMOVING is set and when the device is removed from rbd_dev_list. During this window, we set "already" and return 0. Returning 0 from write(2) can confuse userspace tools because 0 indicates that nothing was written. In particular, "rbd unmap" will retry the write multiple times a second: 10:28:05.463299 write(4, "0", 1) = 0 10:28:05.463509 write(4, "0", 1) = 0 10:28:05.463720 write(4, "0", 1) = 0 10:28:05.463942 write(4, "0", 1) = 0 10:28:05.464155 write(4, "0", 1) = 0 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13zram: fix double free backing deviceMinchan Kim
commit 5547932dc67a48713eece4fa4703bfdf0cfcb818 upstream. If blkdev_get fails, we shouldn't do blkdev_put. Otherwise, kernel emits below log. This patch fixes it. WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120 Modules linked in: CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:blkdev_put+0x105/0x120 Call Trace: __x64_sys_swapoff+0x46d/0x490 do_syscall_64+0x5a/0x190 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 4466 hardirqs last enabled at (4465): __free_pages_ok+0x1e3/0x490 hardirqs last disabled at (4466): trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (3420): __do_softirq+0x333/0x446 softirqs last disabled at (3407): irq_exit+0xd1/0xe0 Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01floppy: fix race condition in __floppy_read_block_0()Jens Axboe
[ Upstream commit de7b75d82f70c5469675b99ad632983c50b6f7e7 ] LKP recently reported a hang at bootup in the floppy code: [ 245.678853] INFO: task mount:580 blocked for more than 120 seconds. [ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1 [ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 245.682181] mount D 6372 580 1 0x00000004 [ 245.683023] Call Trace: [ 245.683425] __schedule+0x2df/0x570 [ 245.683975] schedule+0x2d/0x80 [ 245.684476] schedule_timeout+0x19d/0x330 [ 245.685090] ? wait_for_common+0xa5/0x170 [ 245.685735] wait_for_common+0xac/0x170 [ 245.686339] ? do_sched_yield+0x90/0x90 [ 245.686935] wait_for_completion+0x12/0x20 [ 245.687571] __floppy_read_block_0+0xfb/0x150 [ 245.688244] ? floppy_resume+0x40/0x40 [ 245.688844] floppy_revalidate+0x20f/0x240 [ 245.689486] check_disk_change+0x43/0x60 [ 245.690087] floppy_open+0x1ea/0x360 [ 245.690653] __blkdev_get+0xb4/0x4d0 [ 245.691212] ? blkdev_get+0x1db/0x370 [ 245.691777] blkdev_get+0x1f3/0x370 [ 245.692351] ? path_put+0x15/0x20 [ 245.692871] ? lookup_bdev+0x4b/0x90 [ 245.693539] blkdev_get_by_path+0x3d/0x80 [ 245.694165] mount_bdev+0x2a/0x190 [ 245.694695] squashfs_mount+0x10/0x20 [ 245.695271] ? squashfs_alloc_inode+0x30/0x30 [ 245.695960] mount_fs+0xf/0x90 [ 245.696451] vfs_kern_mount+0x43/0x130 [ 245.697036] do_mount+0x187/0xc40 [ 245.697563] ? memdup_user+0x28/0x50 [ 245.698124] ksys_mount+0x60/0xc0 [ 245.698639] sys_mount+0x19/0x20 [ 245.699167] do_int80_syscall_32+0x61/0x130 [ 245.699813] entry_INT80_32+0xc7/0xc7 showing that we never complete that read request. The reason is that the completion setup is racy - it initializes the completion event AFTER submitting the IO, which means that the IO could complete before/during the init. If it does, we are passing garbage to complete() and we may sleep forever waiting for the event to occur. Fixes: 7b7b68bba5ef ("floppy: bail out in open() if drive is not responding to block0 read") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27block: brd: associate with queue until adding diskMing Lei
[ Upstream commit 153fcd5f6d93b8e1e4040b1337f564a10f8d93af ] brd_free() may be called in failure path on one brd instance which disk isn't added yet, so release handler of gendisk may free the associated request_queue early and causes the following use-after-free[1]. This patch fixes this issue by associating gendisk with request_queue just before adding disk. [1] KASAN: use-after-free Read in del_timer_syncNon-volatile memory driver v1.3 Linux agpgart interface v0.103 [drm] Initialized vgem 1.0.0 20120112 for virtual device on minor 0 usbcore: registered new interface driver udl ================================================================== BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 Read of size 8 at addr ffff8801d1b6b540 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 del_timer_sync+0xb7/0x270 kernel/time/timer.c:1283 blk_cleanup_queue+0x413/0x710 block/blk-core.c:809 brd_free+0x5d/0x71 drivers/block/brd.c:422 brd_init+0x2eb/0x393 drivers/block/brd.c:518 do_one_initcall+0x145/0x957 init/main.c:890 do_initcall_level init/main.c:958 [inline] do_initcalls init/main.c:966 [inline] do_basic_setup init/main.c:984 [inline] kernel_init_freeable+0x5c6/0x6b9 init/main.c:1148 kernel_init+0x11/0x1ae init/main.c:1068 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:350 Reported-by: syzbot+3701447012fe951dabb2@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>