summaryrefslogtreecommitdiffstats
path: root/drivers/block
AgeCommit message (Collapse)Author
2020-01-27signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman
[ Upstream commit 33da8e7c814f77310250bb54a9db36a44c5de784 ] My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27rbd: clear ->xferred on error from rbd_obj_issue_copyup()Ilya Dryomov
[ Upstream commit 356889c49d84f11f446ec235bd52ca1a7d581aa0 ] Otherwise the assert in rbd_obj_end_request() is triggered. Fixes: 3da691bf4366 ("rbd: new request handling code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23xen/blkfront: Adjust indentation in xlvbd_alloc_gendiskNathan Chancellor
commit 589b72894f53124a39d1bb3c0cecaf9dcabac417 upstream. Clang warns: ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] nr_parts = PARTS_PER_DISK; ^ ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here if (err) ^ This is because there is a space at the beginning of this line; remove it so that the indentation is consistent according to the Linux kernel coding style and clang no longer warns. While we are here, the previous line has some trailing whitespace; clean that up as well. Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD") Link: https://github.com/ClangBuiltLinux/linux/issues/791 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09xen/blkback: Avoid unmapping unmapped grant pagesSeongJae Park
[ Upstream commit f9bd84a8a845d82f9b5a081a7ae68c98a11d2e84 ] For each I/O request, blkback first maps the foreign pages for the request to its local pages. If an allocation of a local page for the mapping fails, it should unmap every mapping already made for the request. However, blkback's handling mechanism for the allocation failure does not mark the remaining foreign pages as unmapped. Therefore, the unmap function merely tries to unmap every valid grant page for the request, including the pages not mapped due to the allocation failure. On a system that fails the allocation frequently, this problem leads to following kernel crash. [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 [ 372.012562] Oops: 0002 [#1] SMP [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 ... [ 372.012746] Call Trace: [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] ... [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] ... Decompressing Linux... Parsing ELF... done. Booting the kernel. [ 0.000000] Initializing cgroup subsys cpuset This commit fixes this problem by marking the grant pages of the given request that didn't mapped due to the allocation failure as invalid. Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") Reviewed-by: David Woodhouse <dwmw@amazon.de> Reviewed-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09xen-blkback: prevent premature module unloadPaul Durrant
[ Upstream commit fa2ac657f9783f0891b2935490afe9a7fd29d3fa ] Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem cache. This cache is destoyed when xen-blkif is unloaded so it is necessary to wait for the deferred free routine used for such objects to complete. This necessity was missed in commit 14855954f636 "xen-blkback: allow module to be cleanly unloaded". This patch fixes the problem by taking/releasing extra module references in xen_blkif_alloc/free() respectively. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31nbd: fix shutdown and recv work deadlock v2Mike Christie
commit 1c05839aa973cfae8c3db964a21f9c0eef8fcc21 upstream. This fixes a regression added with: commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 Author: Mike Christie <mchristi@redhat.com> Date: Sun Aug 4 14:10:06 2019 -0500 nbd: fix max number of supported devs where we can deadlock during device shutdown. The problem occurs if the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has returned and the userspace app has droppped its reference via closing the device and running nbd_release. The recv_work nbd_config_put call would then drop the refcount to zero and try to destroy the config which would try to do destroy_workqueue from the recv work. This patch just has nbd_start_device_ioctl do a flush_workqueue when it wakes so we know after the ioctl returns running works have exited. This also fixes a possible race where we could try to reuse the device while old recv_works are still running. Cc: stable@vger.kernel.org Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31loop: fix no-unmap write-zeroes request behaviorDarrick J. Wong
[ Upstream commit efcfec579f6139528c9e6925eca2bc4a36da65c6 ] Currently, if the loop device receives a WRITE_ZEROES request, it asks the underlying filesystem to punch out the range. This behavior is correct if unmapping is allowed. However, a NOUNMAP request means that the caller doesn't want us to free the storage backing the range, so punching out the range is incorrect behavior. To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to the fallocate documentation) required to ensure that the entire range is backed by real storage, which suffices for our purposes. Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-17drbd: Change drbd_request_detach_interruptible's return type to intNathan Chancellor
[ Upstream commit 5816a0932b4fd74257b8cc5785bc8067186a8723 ] Clang warns when an implicit conversion is done between enumerated types: drivers/block/drbd/drbd_state.c:708:8: warning: implicit conversion from enumeration type 'enum drbd_ret_code' to different enumeration type 'enum drbd_state_rv' [-Wenum-conversion] rv = ERR_INTR; ~ ^~~~~~~~ drbd_request_detach_interruptible's only call site is in the return statement of adm_detach, which returns an int. Change the return type of drbd_request_detach_interruptible to match, silencing Clang's warning. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-13rsxx: add missed destroy_workqueue calls in removeChuhong Yuan
[ Upstream commit dcb77e4b274b8f13ac6482dfb09160cd2fae9a40 ] The driver misses calling destroy_workqueue in remove like what is done when probe fails. Add the missed calls to fix it. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05drbd: fix print_st_err()'s prototype to match the definitionLuc Van Oostenryck
[ Upstream commit 2c38f035117331eb78d0504843c79ea7c7fabf37 ] print_st_err() is defined with its 4th argument taking an 'enum drbd_state_rv' but its prototype use an int for it. Fix this by using 'enum drbd_state_rv' in the prototype too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05drbd: do not block when adjusting "disk-options" while IO is frozenLars Ellenberg
[ Upstream commit f708bd08ecbdc23d03aaedf5b3311ebe44cfdb50 ] "suspending" IO is overloaded. It can mean "do not allow new requests" (obviously), but it also may mean "must not complete pending IO", for example while the fencing handlers do their arbitration. When adjusting disk options, we suspend io (disallow new requests), then wait for the activity-log to become unused (drain all IO completions), and possibly replace it with a new activity log of different size. If the other "suspend IO" aspect is active, pending IO completions won't happen, and we would block forever (unkillable drbdsetup process). Fix this by skipping the activity log adjustment if the "al-extents" setting did not change. Also, in case it did change, fail early without blocking if it looks like we would block forever. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05drbd: reject attach of unsuitable uuids even if connectedLars Ellenberg
[ Upstream commit fe43ed97bba3b11521abd934b83ed93143470e4f ] Multiple failure scenario: a) all good Connected Primary/Secondary UpToDate/UpToDate b) lose disk on Primary, Connected Primary/Secondary Diskless/UpToDate c) continue to write to the device, changes only make it to the Secondary storage. d) lose disk on Secondary, Connected Primary/Secondary Diskless/Diskless e) now try to re-attach on Primary This would have succeeded before, even though that is clearly the wrong data set to attach to (missing the modifications from c). Because we only compared our "effective" and the "to-be-attached" data generation uuid tags if (device->state.conn < C_CONNECTED). Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE) compare the uuids, and reject the attach. This patch also tries to improve the reverse scenario: first lose Secondary, then Primary disk, then try to attach the disk on Secondary. Before this patch, the attach on the Secondary succeeds, but since commit drbd: disconnect, if the wrong UUIDs are attached on a connected peer the Primary will notice unsuitable data, and drop the connection hard. Though unfortunately at a point in time during the handshake where we cannot easily abort the attach on the peer without more refactoring of the handshake. We now reject any attach to "unsuitable" uuids, as long as we can see a Primary role, unless we already have access to "good" data. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05drbd: ignore "all zero" peer volume sizes in handshakeLars Ellenberg
[ Upstream commit 94c43a13b8d6e3e0dd77b3536b5e04a84936b762 ] During handshake, if we are diskless ourselves, we used to accept any size presented by the peer. Which could be zero if that peer was just brought up and connected to us without having a disk attached first, in which case both peers would just "flip" their volume sizes. Now, even a diskless node will ignore "zero" sizes presented by a diskless peer. Also a currently Diskless Primary will refuse to shrink during handshake: it may be frozen, and waiting for a "suitable" local disk or peer to re-appear (on-no-data-accessible suspend-io). If the peer is smaller than what we used to be, it is not suitable. The logic for a diskless node during handshake is now supposed to be: believe the peer, if - I don't have a current size myself - we agree on the size anyways - I do have a current size, am Secondary, and he has the only disk - I do have a current size, am Primary, and he has the only disk, which is larger than my current size Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05block: drbd: remove a stray unlock in __drbd_send_protocol()Dan Carpenter
[ Upstream commit 8e9c523016cf9983b295e4bc659183d1fa6ef8e0 ] There are two callers of this function and they both unlock the mutex so this ends up being a double unlock. Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01nbd: prevent memory leakNavid Emamdoost
commit 03bf73c315edca28f47451913177e14cd040a216 upstream. In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the reallocted memory is leak. The correct behaviour should be assigning the reallocted memory to config->socks right after success. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01amiflop: clean up on errors during setupOmar Sandoval
[ Upstream commit 53d0f8dbde89cf6c862c7a62e00c6123e02cba41 ] The error handling in fd_probe_drives() doesn't clean up at all. Fix it up in preparation for converting to blk-mq. While we're here, get rid of the commented out amiga_floppy_remove(). Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01skd: fixup usage of legacy IO APIJens Axboe
[ Upstream commit 6d1f9dfde7343c4ebfb8f84dcb333af571bb3b22 ] We need to be using the mq variant of request requeue here. Fixes: ca33dd92968b ("skd: Convert to blk-mq") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01nbd:fix memory leak in nbd_get_socket()Sun Ke
commit dff10bbea4be47bdb615b036c834a275b7c68133 upstream. Before returning NULL, put the sock first. Cc: stable@vger.kernel.org Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10nbd: handle racing with error'ed out commandsJosef Bacik
[ Upstream commit 7ce23e8e0a9cd38338fc8316ac5772666b565ca9 ] We hit the following warning in production print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700 ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60 Workqueue: knbd-recv recv_work [nbd] RIP: 0010:refcount_sub_and_test_checked+0x53/0x60 Call Trace: blk_mq_free_request+0xb7/0xf0 blk_mq_complete_request+0x62/0xf0 recv_work+0x29/0xa1 [nbd] process_one_work+0x1f5/0x3f0 worker_thread+0x2d/0x3d0 ? rescuer_thread+0x340/0x340 kthread+0x111/0x130 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x1f/0x30 ---[ end trace b079c3c67f98bb7c ]--- This was preceded by us timing out everything and shutting down the sockets for the device. The problem is we had a request in the queue at the same time, so we completed the request twice. This can actually happen in a lot of cases, we fail to get a ref on our config, we only have one connection and just error out the command, etc. Fix this by checking cmd->status in nbd_read_stat. We only change this under the cmd->lock, so we are safe to check this here and see if we've already error'ed this command out, which would indicate that we've completed it as well. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-10nbd: protect cmd->status with cmd->lockJosef Bacik
[ Upstream commit de6346ecbc8f5591ebd6c44ac164e8b8671d71d7 ] We already do this for the most part, except in timeout and clear_req. For the timeout case we take the lock after we grab a ref on the config, but that isn't really necessary because we're safe to touch the cmd at this point, so just move the order around. For the clear_req cause this is initiated by the user, so again is safe. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06nbd: verify socket is supported during setupMike Christie
[ Upstream commit cf1b2326b734896734c6e167e41766f9cee7686a ] nbd requires socket families to support the shutdown method so the nbd recv workqueue can be woken up from its sock_recvmsg call. If the socket does not support the callout we will leave recv works running or get hangs later when the device or module is removed. This adds a check during socket connection/reconnection to make sure the socket being passed in supports the needed callout. Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06nbd: fix possible sysfs duplicate warningXiubo Li
[ Upstream commit 862488105b84ca744b3d8ff131e0fcfe10644be1 ] 1. nbd_put takes the mutex and drops nbd->ref to 0. It then does idr_remove and drops the mutex. 2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails to find an existing device, so it does nbd_dev_add. 3. just before the nbd_put could call nbd_dev_remove or not finished totally, but if nbd_dev_add try to add_disk, we can hit: debugfs: Directory 'nbd1' with parent 'block' already present! This patch will make sure all the disk add/remove stuff are done by holding the nbd_index_mutex lock. Reported-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06zram: fix race between backing_dev_show and backing_dev_storeSasha Levin
[ Upstream commit f7daefe4231e57381d92c2e2ad905a899c28e402 ] CPU0: CPU1: backing_dev_show backing_dev_store ...... ...... file = zram->backing_dev; down_read(&zram->init_lock); down_read(&zram->init_init_lock) file_path(file, ...); zram->backing_dev = backing_dev; up_read(&zram->init_lock); up_read(&zram->init_lock); gets the value of zram->backing_dev too early in backing_dev_show, which resultin the value being NULL at the beginning, and not NULL later. backtrace: d_path+0xcc/0x174 file_path+0x10/0x18 backing_dev_show+0x40/0xb4 dev_attr_show+0x20/0x54 sysfs_kf_seq_show+0x9c/0x10c kernfs_seq_show+0x28/0x30 seq_read+0x184/0x488 kernfs_fop_read+0x5c/0x1a4 __vfs_read+0x44/0x128 vfs_read+0xa0/0x138 SyS_read+0x54/0xb4 Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com Signed-off-by: Chenwandun <chenwandun@huawei.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11nbd: fix crash when the blksize is zeroXiubo Li
[ Upstream commit 553768d1169a48c0cd87c4eb4ab57534ee663415 ] This will allow the blksize to be set zero and then use 1024 as default. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> [fix to use goto out instead of return in genl_connect] Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11nbd: fix max number of supported devsMike Christie
commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 upstream. This fixes a bug added in 4.10 with commit: commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024 Author: Josef Bacik <jbacik@fb.com> Date: Tue Nov 22 14:04:40 2016 -0500 nbd: add multi-connection support that limited the number of devices to 256. Before the patch we could create 1000s of devices, but the patch switched us from using our own thread to using a work queue which has a default limit of 256 active works. The problem is that our recv_work function sits in a loop until disconnection but only handles IO for one connection. The work is started when the connection is started/restarted, but if we end up creating 257 or more connections, the queue_work call just queues connection257+'s recv_work and that waits for connection 1 - 256's recv_work to be disconnected and that work instance completing. Instead of reverting back to kthreads, this has us allocate a workqueue_struct per device, so we can block in the work. Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07pktcdvd: remove warning on attempting to register non-passthrough devJens Axboe
[ Upstream commit eb09b3cc464d2c3bbde9a6648603c8d599ea8582 ] Anatoly reports that he gets the below warning when booting -git on a sparc64 box on debian unstable: ... [ 13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES implementation [ 13.428002] ------------[ cut here ]------------ [ 13.428081] WARNING: CPU: 21 PID: 586 at drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd] [ 13.428147] Attempt to register a non-SCSI queue [ 13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64 n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear md_mod crc32c_sparc64 [ 13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted 5.3.0-10169-g574cc4539762 #1234 [ 13.428507] Call Trace: [ 13.428542] [00000000004635c0] __warn+0xc0/0x100 [ 13.428582] [0000000000463634] warn_slowpath_fmt+0x34/0x60 [ 13.428626] [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd] [ 13.428674] [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd] [ 13.428724] [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0 [ 13.428764] [00000000006b96c8] ksys_ioctl+0x48/0x80 [ 13.428803] [00000000006b9714] sys_ioctl+0x14/0x40 [ 13.428847] [0000000000406294] linux_sparc_syscall+0x34/0x44 [ 13.428890] irq event stamp: 4181 [ 13.428924] hardirqs last enabled at (4189): [<00000000004e0a74>] console_unlock+0x634/0x6c0 [ 13.428984] hardirqs last disabled at (4196): [<00000000004e0540>] console_unlock+0x100/0x6c0 [ 13.429048] softirqs last enabled at (3978): [<0000000000b2e2d8>] __do_softirq+0x498/0x520 [ 13.429110] softirqs last disabled at (3967): [<000000000042cfb4>] do_softirq_own_stack+0x34/0x60 [ 13.429172] ---[ end trace 2220ca468f32967d ]--- [ 13.430018] pktcdvd: setup of pktcdvd device failed [ 13.455589] des_sparc64: Using sparc64 des opcodes optimized DES implementation [ 13.515334] camellia_sparc64: Using sparc64 camellia opcodes optimized CAMELLIA implementation [ 13.522856] pktcdvd: setup of pktcdvd device failed [ 13.529327] pktcdvd: setup of pktcdvd device failed [ 13.532932] pktcdvd: setup of pktcdvd device failed [ 13.536165] pktcdvd: setup of pktcdvd device failed [ 13.539372] pktcdvd: setup of pktcdvd device failed [ 13.542834] pktcdvd: setup of pktcdvd device failed [ 13.546536] pktcdvd: setup of pktcdvd device failed [ 15.431071] XFS (dm-0): Mounting V5 Filesystem ... Apparently debian auto-attaches any cdrom like device to pktcdvd, which can lead to the above warning. There's really no reason to warn for this situation, kill it. Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05nbd: add missing config putMike Christie
[ Upstream commit 887e975c4172d0d5670c39ead2f18ba1e4ec8133 ] Fix bug added with the patch: commit 8f3ea35929a0806ad1397db99a89ffee0140822a Author: Josef Bacik <josef@toxicpanda.com> Date: Mon Jul 16 12:11:35 2018 -0400 nbd: handle unexpected replies better where if the timeout handler runs when the completion path is and we fail to grab the mutex in the timeout handler we will leave a config reference and cannot free the config later. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05loop: Add LOOP_SET_DIRECT_IO to compat ioctlAlessio Balsini
[ Upstream commit fdbe4eeeb1aac219b14f10c0ed31ae5d1123e9b8 ] Enabling Direct I/O with loop devices helps reducing memory usage by avoiding double caching. 32 bit applications running on 64 bits systems are currently not able to request direct I/O because is missing from the lo_compat_ioctl. This patch fixes the compatibility issue mentioned above by exporting LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry. The input argument for this ioctl is a single long converted to a 1-bit boolean, so compatibility is preserved. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21floppy: fix usercopy directionJann Horn
commit 52f6f9d74f31078964ca1574f7bb612da7877ac8 upstream. As sparse points out, these two copy_from_user() should actually be copy_to_user(). Fixes: 229b53c9bf4e ("take floppy compat ioctls to sodding floppy.c") Cc: stable@vger.kernel.org Acked-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-06xen/blkback: fix memory leaksWenwen Wang
[ Upstream commit ae78ca3cf3d9e9f914bfcd0bc5c389ff18b9c2e0 ] In read_per_ring_refs(), after 'req' and related memory regions are allocated, xen_blkif_map() is invoked to map the shared frame, irq, and etc. However, if this mapping process fails, no cleanup is performed, leading to memory leaks. To fix this issue, invoke the cleanup before returning the error. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16drbd: dynamically allocate shash descriptorArnd Bergmann
[ Upstream commit 77ce56e2bfaa64127ae5e23ef136c0168b818777 ] Building with clang and KASAN, we get a warning about an overly large stack frame on 32-bit architectures: drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect' [-Werror,-Wframe-larger-than=] We already allocate other data dynamically in this function, so just do the same for the shash descriptor, which makes up most of this memory. Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/ Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-16loop: set PF_MEMALLOC_NOIO for the worker threadMikulas Patocka
commit d0a255e795ab976481565f6ac178314b34fbf891 upstream. A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06nbd: replace kill_bdev() with __invalidate_device() againMunehisa Kamata
commit 2b5c8f0063e4b263cf2de82029798183cf85c320 upstream. Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()") once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere") resurrected kill_bdev() and it has been there since then. So buffer_head mappings still get killed on a server disconnection, and we can still hit the BUG_ON on a filesystem on the top of the nbd device. EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null) block nbd0: Receive control failed (result -32) block nbd0: shutting down sockets print_req_error: I/O error, dev nbd0, sector 66264 flags 3000 EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block print_req_error: I/O error, dev nbd0, sector 2264 flags 3000 EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure ------------[ cut here ]------------ kernel BUG at fs/buffer.c:3057! invalid opcode: 0000 [#1] SMP PTI CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4 Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:submit_bh_wbc+0x18b/0x190 ... Call Trace: jbd2_write_superblock+0xf1/0x230 [jbd2] ? account_entity_enqueue+0xc5/0xf0 jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2] jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2] ? __switch_to_asm+0x40/0x70 ... ? lock_timer_base+0x67/0x80 kjournald2+0x121/0x360 [jbd2] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? commit_timeout+0x10/0x10 [jbd2] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 With __invalidate_device(), I no longer hit the BUG_ON with sync or unmount on the disconnected device. Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere") Cc: linux-block@vger.kernel.org Cc: Ratna Manoj Bolla <manoj.br@gmail.com> Cc: nbd@other.debian.org Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw@amazon.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26floppy: fix out-of-bounds read in copy_bufferDenis Efremov
[ Upstream commit da99466ac243f15fbba65bd261bfc75ffa1532b6 ] This fixes a global out-of-bounds read access in the copy_buffer function of the floppy driver. The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect and head fields (unsigned int) of the floppy_drive structure are used to compute the max_sector (int) in the make_raw_rw_request function. It is possible to overflow the max_sector. Next, max_sector is passed to the copy_buffer function and used in one of the memcpy calls. An unprivileged user could trigger the bug if the device is accessible, but requires a floppy disk to be inserted. The patch adds the check for the .sect * .head multiplication for not overflowing in the set_geometry function. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26floppy: fix invalid pointer dereference in drive_nameDenis Efremov
[ Upstream commit 9b04609b784027968348796a18f601aed9db3789 ] This fixes the invalid pointer dereference in the drive_name function of the floppy driver. The native_format field of the struct floppy_drive_params is used as floppy_type array index in the drive_name function. Thus, the field should be checked the same way as the autodetect field. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should be used to call the drive_name. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for a value of the native_format field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26floppy: fix out-of-bounds read in next_valid_formatDenis Efremov
[ Upstream commit 5635f897ed83fd539df78e98ba69ee91592f9bb8 ] This fixes a global out-of-bounds read access in the next_valid_format function of the floppy driver. The values from autodetect field of the struct floppy_drive_params are used as indices for the floppy_type array in the next_valid_format function 'floppy_type[DP->autodetect[probed_format]].sect'. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for values of the autodetect field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26floppy: fix div-by-zero in setup_format_paramsDenis Efremov
[ Upstream commit f3554aeb991214cbfafd17d55e2bfddb50282e32 ] This fixes a divide by zero error in the setup_format_params function of the floppy driver. Two consecutive ioctls can trigger the bug: The first one should set the drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK to become zero. Next, the floppy format operation should be called. A floppy disk is not required to be inserted. An unprivileged user could trigger the bug if the device is accessible. The patch checks F_SECT_PER_TRACK for a non-zero value in the set_geometry function. The proper check should involve a reasonable upper limit for the .sect and .rate fields, but it could change the UAPI. The patch also checks F_SECT_PER_TRACK in the setup_format_params, and cancels the formatting operation in case of zero. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26block: null_blk: fix race condition for null_del_devBob Liu
[ Upstream commit 7602843fd873cae43a444b83b14dfdd114a9659c ] Dulicate call of null_del_dev() will trigger null pointer error like below. The reason is a race condition between nullb_device_power_store() and nullb_group_drop_item(). CPU#0 CPU#1 ---------------- ----------------- do_rmdir() >configfs_rmdir() >client_drop_item() >nullb_group_drop_item() nullb_device_power_store() >null_del_dev() >test_and_clear_bit(NULLB_DEV_FL_UP >null_del_dev() ^^^^^ Duplicated null_dev_dev() triger null pointer error >clear_bit(NULLB_DEV_FL_UP The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev(). [ 698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 698.613608] #PF error: [normal kernel read fault] [ 698.613611] PGD 0 P4D 0 [ 698.613619] Oops: 0000 [#1] SMP PTI [ 698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35 [ 698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018 [ 698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk] [ 698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48 [ 698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286 [ 698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681 [ 698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000 [ 698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000 [ 698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08 [ 698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000 [ 698.613680] FS: 00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000 [ 698.613685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0 [ 698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 698.613700] Call Trace: [ 698.613712] nullb_group_drop_item+0x50/0x70 [null_blk] [ 698.613722] client_drop_item+0x29/0x40 [ 698.613728] configfs_rmdir+0x1ed/0x300 [ 698.613738] vfs_rmdir+0xb2/0x130 [ 698.613743] do_rmdir+0x1c7/0x1e0 [ 698.613750] __x64_sys_rmdir+0x17/0x20 [ 698.613759] do_syscall_64+0x5a/0x110 [ 698.613768] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11xen-blkfront: switch kcalloc to kvcalloc for large array allocationRoger Pau Monne
commit 1d5c76e66433382a1e170d1d5845bb0fed7467aa upstream. There's no reason to request physically contiguous memory for those allocations. [boris: added CC to stable] Cc: stable@vger.kernel.org Reported-by: Ian Jackson <ian.jackson@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25brd: re-enable __GFP_HIGHMEM in brd_insert_page()Hou Tao
commit f6b50160a06d4a0d6a3999ab0c5aec4f52dba248 upstream. __GFP_HIGHMEM is disabled if dax is enabled on brd, however dax support for brd has been removed since commit (7a862fbbdec6 "brd: remove dax support"), so restore __GFP_HIGHMEM in brd_insert_page(). Also remove the no longer applicable comments about DAX and highmem. Cc: stable@vger.kernel.org Fixes: 7a862fbbdec6 ("brd: remove dax support") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-10virtio-blk: limit number of hw queues by nr_cpu_idsDongli Zhang
[ Upstream commit bf348f9b78d413e75bb079462751a1d86b6de36c ] When tag_set->nr_maps is 1, the block layer limits the number of hw queues by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues. In addition, specifically for pci scenario, when the 'num-queues' specified by qemu is more than maxcpus, virtio-blk would not be able to allocate more than maxcpus vectors in order to have a vector for each queue. As a result, it falls back into MSI-X with one vector for config and one shared for queues. Considering above reasons, this patch limits the number of hw queues used by virtio-blk by nr_cpu_ids. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08xsysace: Fix error handling in ace_setupGuenter Roeck
[ Upstream commit 47b16820c490149c2923e8474048f2c6e7557cab ] If xace hardware reports a bad version number, the error handling code in ace_setup() calls put_disk(), followed by queue cleanup. However, since the disk data structure has the queue pointer set, put_disk() also cleans and releases the queue. This results in blk_cleanup_queue() accessing an already released data structure, which in turn may result in a crash such as the following. [ 10.681671] BUG: Kernel NULL pointer dereference at 0x00000040 [ 10.681826] Faulting instruction address: 0xc0431480 [ 10.682072] Oops: Kernel access of bad area, sig: 11 [#1] [ 10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440 [ 10.682387] Modules linked in: [ 10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G W 5.0.0-rc6-next-20190218+ #2 [ 10.682733] NIP: c0431480 LR: c043147c CTR: c0422ad8 [ 10.682863] REGS: cf82fbe0 TRAP: 0300 Tainted: G W (5.0.0-rc6-next-20190218+) [ 10.683065] MSR: 00029000 <CE,EE,ME> CR: 22000222 XER: 00000000 [ 10.683236] DEAR: 00000040 ESR: 00000000 [ 10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000 [ 10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000 [ 10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000 [ 10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800 [ 10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114 [ 10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114 [ 10.684602] Call Trace: [ 10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable) [ 10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c [ 10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68 [ 10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c [ 10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508 [ 10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8 [ 10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c [ 10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464 [ 10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4 [ 10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc [ 10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0 [ 10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234 [ 10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c [ 10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac [ 10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330 [ 10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478 [ 10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114 [ 10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c [ 10.687349] Instruction dump: [ 10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008 [ 10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008 [ 10.688056] ---[ end trace 13c9ff51d41b9d40 ]--- Fix the problem by setting the disk queue pointer to NULL before calling put_disk(). A more comprehensive fix might be to rearrange the code to check the hardware version before initializing data structures, but I don't know if this would have undesirable side effects, and it would increase the complexity of backporting the fix to older kernels. Fixes: 74489a91dd43a ("Add support for Xilinx SystemACE CompactFlash interface") Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-02zram: pass down the bvec we need to read into in the work structJérôme Glisse
commit e153abc0739ff77bd89c9ba1688cdb963464af97 upstream. When scheduling work item to read page we need to pass down the proper bvec struct which points to the page to read into. Before this patch it uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is wrong. Note that without this patch on arch/kernel where PAGE_SIZE != 4096 userspace could read random memory through a zram block device (thought userspace probably would have no control on the address being read). Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02loop: do not print warn message if partition scan is successfulDongli Zhang
[ Upstream commit 40853d6fc619a6fd3d3177c3973a2eac9b598a80 ] Do not print warn message when the partition scan returns 0. Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()Dongli Zhang
[ Upstream commit 758a58d0bc67457f1215321a536226654a830eeb ] Commit 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") moves blkdev_reread_part() out of the loop_ctl_mutex. However, GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result, __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and will not rescan the loop device to delete all partitions. Below are steps to reproduce the issue: step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100 step2 # losetup -P /dev/loop0 tmp.raw step3 # parted /dev/loop0 mklabel gpt step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1 step5 # losetup -d /dev/loop0 Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and there is below kernel warning message: [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22) This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part(). Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-27loop: access lo_backing_file only when the loop device is Lo_boundDongli Zhang
commit f7c8a4120eedf24c36090b7542b179ff7a649219 upstream. Commit 758a58d0bc67 ("loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()") separates "lo->lo_backing_file = NULL" and "lo->lo_state = Lo_unbound" into different critical regions protected by loop_ctl_mutex. However, there is below race that the NULL lo->lo_backing_file would be accessed when the backend of a loop is another loop device, e.g., loop0's backend is a file, while loop1's backend is loop0. loop0's backend is file loop1's backend is loop0 __loop_clr_fd() mutex_lock(&loop_ctl_mutex); lo->lo_backing_file = NULL; --> set to NULL mutex_unlock(&loop_ctl_mutex); loop_set_fd() mutex_lock_killable(&loop_ctl_mutex); loop_validate_file() f = l->lo_backing_file; --> NULL access if loop0 is not Lo_unbound mutex_lock(&loop_ctl_mutex); lo->lo_state = Lo_unbound; mutex_unlock(&loop_ctl_mutex); lo->lo_backing_file should be accessed only when the loop device is Lo_bound. In fact, the problem has been introduced already in commit 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()") after which loop_validate_file() could see devices in Lo_rundown state with which it did not count. It was harmless at that point but still. Fixes: 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()") Reported-by: syzbot+9bdc1adc1c55e7fe765b@syzkaller.appspotmail.com Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23floppy: check_events callback should not return a negative numberYufen Yu
[ Upstream commit 96d7cb932e826219ec41ac02e5af037ffae6098c ] floppy_check_events() is supposed to return bit flags to say which events occured. We should return zero to say that no event flags are set. Only BIT(0) and BIT(1) are used in the caller. And .check_events interface also expect to return an unsigned int value. However, after commit a0c80efe5956, it may return -EINTR (-4u). Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't affect runtime, but it obviously is still worth fixing. Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12block/swim3: Fix -EBUSY error when re-opening device after unmountFinn Thain
[ Upstream commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 ] When the block device is opened with FMODE_EXCL, ref_count is set to -1. This value doesn't get reset when the device is closed which means the device cannot be opened again. Fix this by checking for refcount <= 0 in the release method. Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12zram: fix lockdep warning of free block handlingMinchan Kim
[ Upstream commit 3c9959e025472122a61faebb208525cf26b305d1 ] Patch series "zram idle page writeback", v3. Inherently, swap device has many idle pages which are rare touched since it was allocated. It is never problem if we use storage device as swap. However, it's just waste for zram-swap. This patchset supports zram idle page writeback feature. * Admin can define what is idle page "no access since X time ago" * Admin can define when zram should writeback them * Admin can define when zram should stop writeback to prevent wearout Details are in each patch's description. This patch (of 7): ================================ WARNING: inconsistent lock state 4.19.0+ #390 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50 {SOFTIRQ-ON-W} state was registered at: _raw_spin_lock+0x2c/0x40 zram_make_request+0x755/0xdc9 generic_make_request+0x373/0x6a0 submit_bio+0x6c/0x140 __swap_writepage+0x3a8/0x480 shrink_page_list+0x1102/0x1a60 shrink_inactive_list+0x21b/0x3f0 shrink_node_memcg.constprop.99+0x4f8/0x7e0 shrink_node+0x7d/0x2f0 do_try_to_free_pages+0xe0/0x300 try_to_free_pages+0x116/0x2b0 __alloc_pages_slowpath+0x3f4/0xf80 __alloc_pages_nodemask+0x2a2/0x2f0 __handle_mm_fault+0x42e/0xb50 handle_mm_fault+0x55/0xb0 __do_page_fault+0x235/0x4b0 page_fault+0x1e/0x30 irq event stamp: 228412 hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600 hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600 softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427 softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&zram->bitmap_lock)->rlock); <Interrupt> lock(&(&zram->bitmap_lock)->rlock); *** DEADLOCK *** no locks held by zram_verify/2095. stack backtrace: CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: <IRQ> dump_stack+0x67/0x9b print_usage_bug+0x1bd/0x1d3 mark_lock+0x4aa/0x540 __lock_acquire+0x51d/0x1300 lock_acquire+0x90/0x180 _raw_spin_lock+0x2c/0x40 put_entry_bdev+0x1e/0x50 zram_free_page+0xf6/0x110 zram_slot_free_notify+0x42/0xa0 end_swap_bio_read+0x5b/0x170 blk_update_request+0x8f/0x340 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x98/0x650 blk_done_softirq+0x9e/0xd0 __do_softirq+0xcc/0x427 irq_exit+0xd1/0xe0 do_IRQ+0x93/0x120 common_interrupt+0xf/0xf </IRQ> With writeback feature, zram_slot_free_notify could be called in softirq context by end_swap_bio_read. However, bitmap_lock is not aware of that so lockdep yell out: get_entry_bdev spin_lock(bitmap->lock); irq softirq end_swap_bio_read zram_slot_free_notify zram_slot_lock <-- deadlock prone zram_free_page put_entry_bdev spin_lock(bitmap->lock); <-- deadlock prone With akpm's suggestion (i.e. bitmap operation is already atomic), we could remove bitmap lock. It might fail to find a empty slot if serious contention happens. However, it's not severe problem because huge page writeback has already possiblity to fail if there is severe memory pressure. Worst case is just keeping the incompressible in memory, not storage. The other problem is zram_slot_lock in zram_slot_slot_free_notify. To make it safe is this patch introduces zram_slot_trylock where zram_slot_free_notify uses it. Although it's rare to be contented, this patch adds new debug stat "miss_free" to keep monitoring how often it happens. Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12drbd: skip spurious timeout (ping-timeo) when failing promoteLars Ellenberg
[ Upstream commit 9848b6ddd8c92305252f94592c5e278574e7a6ac ] If you try to promote a Secondary while connected to a Primary and allow-two-primaries is NOT set, we will wait for "ping-timeout" to give this node a chance to detect a dead primary, in case the cluster manager noticed faster than we did. But if we then are *still* connected to a Primary, we fail (after an additional timeout of ping-timout). This change skips the spurious second timeout. Most people won't notice really, since "ping-timeout" by default is half a second. But in some installations, ping-timeout may be 10 or 20 seconds or more, and spuriously delaying the error return becomes annoying. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>