aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md/dm.c
AgeCommit message (Collapse)Author
2020-08-18dm: use bio_uninit instead of bio_disassociate_blkgChristoph Hellwig
commit 382761dc6312965a11f82f2217e16ec421bf17ae upstream. bio_uninit is the proper API to clean up a BIO that has been allocated on stack or inside a structure that doesn't come from the BIO allocator. Switch dm to use that instead of bio_disassociate_blkg, which really is an implementation detail. Note that the bio_uninit calls are also moved to the two callers of __send_empty_flush, so that they better pair with the bio_init calls used to initialize them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-03dm integrity: fix integrity recalculation that is improperly skippedMikulas Patocka
commit 5df96f2b9f58a5d2dc1f30fe7de75e197f2c25f2 upstream. Commit adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 ("dm: report suspended device during destroy") broke integrity recalculation. The problem is dm_suspended() returns true not only during suspend, but also during resume. So this race condition could occur: 1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work) 2. integrity_recalc (&ic->recalc_work) preempts the current thread 3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret; 4. integrity_recalc exits and no recalculating is done. To fix this race condition, add a function dm_post_suspending that is only true during the postsuspend phase and use it instead of dm_suspended(). Signed-off-by: Mikulas Patocka <mpatocka redhat com> Fixes: adc0daad366b ("dm: report suspended device during destroy") Cc: stable vger kernel org # v4.18+ Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-08-03dm: use noio when sending kobject eventMikulas Patocka
commit 6958c1c640af8c3f40fa8a2eee3b5b905d95b677 upstream. kobject_uevent may allocate memory and it may be called while there are dm devices suspended. The allocation may recurse into a suspended device, causing a deadlock. We must set the noio flag when sending a uevent. The observed deadlock was reported here: https://www.redhat.com/archives/dm-devel/2020-March/msg00025.html Reported-by: Khazhismel Kumykov <khazhy@google.com> Reported-by: Tahsin Erdogan <tahsin@google.com> Reported-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-06-01Revert "dm: always call blk_queue_split() in dm_process_bio()"Mike Snitzer
commit 120c9257f5f19e5d1e87efcbb5531b7cd81b7d74 upstream. This reverts commit effd58c95f277744f75d6e08819ac859dbcbd351. blk_queue_split() is causing excessive IO splitting -- because blk_max_size_offset() depends on 'chunk_sectors' limit being set and if it isn't (as is the case for DM targets!) it falls back to splitting on a 'max_sectors' boundary regardless of offset. "Fix" this by reverting back to _not_ using blk_queue_split() in dm_process_bio() for normal IO (reads and writes). Long-term fix is still TBD but it should focus on training blk_max_size_offset() to call into a DM provided hook (to call DM's max_io_len()). Test results from simple misaligned IO test on 4-way dm-striped device with chunksize of 128K and stripesize of 512K: xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev before this revert: 253,0 21 1 0.000000000 2206 Q R 224 + 4072 [xfs_io] 253,0 21 2 0.000008267 2206 X R 224 / 480 [xfs_io] 253,0 21 3 0.000010530 2206 X R 224 / 256 [xfs_io] 253,0 21 4 0.000027022 2206 X R 480 / 736 [xfs_io] 253,0 21 5 0.000028751 2206 X R 480 / 512 [xfs_io] 253,0 21 6 0.000033323 2206 X R 736 / 992 [xfs_io] 253,0 21 7 0.000035130 2206 X R 736 / 768 [xfs_io] 253,0 21 8 0.000039146 2206 X R 992 / 1248 [xfs_io] 253,0 21 9 0.000040734 2206 X R 992 / 1024 [xfs_io] 253,0 21 10 0.000044694 2206 X R 1248 / 1504 [xfs_io] 253,0 21 11 0.000046422 2206 X R 1248 / 1280 [xfs_io] 253,0 21 12 0.000050376 2206 X R 1504 / 1760 [xfs_io] 253,0 21 13 0.000051974 2206 X R 1504 / 1536 [xfs_io] 253,0 21 14 0.000055881 2206 X R 1760 / 2016 [xfs_io] 253,0 21 15 0.000057462 2206 X R 1760 / 1792 [xfs_io] 253,0 21 16 0.000060999 2206 X R 2016 / 2272 [xfs_io] 253,0 21 17 0.000062489 2206 X R 2016 / 2048 [xfs_io] 253,0 21 18 0.000066133 2206 X R 2272 / 2528 [xfs_io] 253,0 21 19 0.000067507 2206 X R 2272 / 2304 [xfs_io] 253,0 21 20 0.000071136 2206 X R 2528 / 2784 [xfs_io] 253,0 21 21 0.000072764 2206 X R 2528 / 2560 [xfs_io] 253,0 21 22 0.000076185 2206 X R 2784 / 3040 [xfs_io] 253,0 21 23 0.000077486 2206 X R 2784 / 2816 [xfs_io] 253,0 21 24 0.000080885 2206 X R 3040 / 3296 [xfs_io] 253,0 21 25 0.000082316 2206 X R 3040 / 3072 [xfs_io] 253,0 21 26 0.000085788 2206 X R 3296 / 3552 [xfs_io] 253,0 21 27 0.000087096 2206 X R 3296 / 3328 [xfs_io] 253,0 21 28 0.000093469 2206 X R 3552 / 3808 [xfs_io] 253,0 21 29 0.000095186 2206 X R 3552 / 3584 [xfs_io] 253,0 21 30 0.000099228 2206 X R 3808 / 4064 [xfs_io] 253,0 21 31 0.000101062 2206 X R 3808 / 3840 [xfs_io] 253,0 21 32 0.000104956 2206 X R 4064 / 4096 [xfs_io] 253,0 21 33 0.001138823 0 C R 4096 + 200 [0] after this revert: 253,0 18 1 0.000000000 4430 Q R 224 + 3896 [xfs_io] 253,0 18 2 0.000018359 4430 X R 224 / 256 [xfs_io] 253,0 18 3 0.000028898 4430 X R 256 / 512 [xfs_io] 253,0 18 4 0.000033535 4430 X R 512 / 768 [xfs_io] 253,0 18 5 0.000065684 4430 X R 768 / 1024 [xfs_io] 253,0 18 6 0.000091695 4430 X R 1024 / 1280 [xfs_io] 253,0 18 7 0.000098494 4430 X R 1280 / 1536 [xfs_io] 253,0 18 8 0.000114069 4430 X R 1536 / 1792 [xfs_io] 253,0 18 9 0.000129483 4430 X R 1792 / 2048 [xfs_io] 253,0 18 10 0.000136759 4430 X R 2048 / 2304 [xfs_io] 253,0 18 11 0.000152412 4430 X R 2304 / 2560 [xfs_io] 253,0 18 12 0.000160758 4430 X R 2560 / 2816 [xfs_io] 253,0 18 13 0.000183385 4430 X R 2816 / 3072 [xfs_io] 253,0 18 14 0.000190797 4430 X R 3072 / 3328 [xfs_io] 253,0 18 15 0.000197667 4430 X R 3328 / 3584 [xfs_io] 253,0 18 16 0.000218751 4430 X R 3584 / 3840 [xfs_io] 253,0 18 17 0.000226005 4430 X R 3840 / 4096 [xfs_io] 253,0 18 18 0.000250404 4430 Q R 4120 + 176 [xfs_io] 253,0 18 19 0.000847708 0 C R 4096 + 24 [0] 253,0 18 20 0.000855783 0 C R 4120 + 176 [0] Fixes: effd58c95f27774 ("dm: always call blk_queue_split() in dm_process_bio()") Cc: stable@vger.kernel.org Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Tested-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21dm: fix congested_fn for request-based deviceHou Tao
commit 974f51e8633f0f3f33e8f86bbb5ae66758aa63c7 upstream. We neither assign congested_fn for requested-based blk-mq device nor implement it correctly. So fix both. Also, remove incorrect comment from dm_init_normal_md_queue and rename it to dm_init_congested_fn. Fixes: 4aa9c692e052 ("bdi: separate out congested state into a separate struct") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-21dm: report suspended device during destroyMikulas Patocka
commit adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 upstream. The function dm_suspended returns true if the target is suspended. However, when the target is being suspended during unload, it returns false. An example where this is a problem: the test "!dm_suspended(wc->ti)" in writecache_writeback is not sufficient, because dm_suspended returns zero while writecache_suspend is in progress. As is, without an enhanced dm_suspended, simply switching from flush_workqueue to drain_workqueue still emits warnings: workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function flushes the current work. However, the workqueue may re-queue itself and flush_workqueue doesn't wait for re-queued works to finish. Because of this - the function writecache_writeback continues execution after the device was suspended and then concurrently with writecache_dtr, causing a crash in writecache_writeback. We must use drain_workqueue - that waits until the work and all re-queued works finish. As a prereq for switching to drain_workqueue, this commit fixes dm_suspended to return true after the presuspend hook and before the postsuspend hook - just like during a normal suspend. It allows simplifying the dm-integrity and dm-writecache targets so that they don't have to maintain suspended flags on their own. With this change use of drain_workqueue() can be used effectively. This change was tested with the lvm2 testsuite and cryptsetup testsuite and the are no regressions. Fixes: 48debafe4f2f ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18+ Reported-by: Corey Marthaler <cmarthal@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-04dm: fix potential for q->make_request_fn NULL pointerMike Snitzer
commit 47ace7e012b9f7ad71d43ac9063d335ea3d6820b upstream. Move blk_queue_make_request() to dm.c:alloc_dev() so that q->make_request_fn is never NULL during the lifetime of a DM device (even one that is created without a DM table). Otherwise generic_make_request() will crash simply by doing: dmsetup create -n test mount /dev/dm-N /mnt While at it, move ->congested_data initialization out of dm.c:alloc_dev() and into the bio-based specific init method. Reported-by: Stefan Bader <stefan.bader@canonical.com> BugLink: https://bugs.launchpad.net/bugs/1860231 Fixes: ff36ab34583a ("dm: remove request-based logic from make_request_fn wrapper") Depends-on: c12c9a3c3860c ("dm: various cleanups to md->queue initialization code") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2019-05-25Merge tag 'libnvdimm-fixes-5.2-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - Fix a regression that disabled device-mapper dax support - Remove unnecessary hardened-user-copy overhead (>30%) for dax read(2)/write(2). - Fix some compilation warnings. * tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead dax: Arrange for dax_supported check to span multiple devices libnvdimm: Fix compilation warnings with W=1
2019-05-21dm: make sure to obey max_io_len_target_boundaryMichael Lass
Commit 61697a6abd24 ("dm: eliminate 'split_discard_bios' flag from DM target interface") incorrectly removed code from __send_changing_extent_only() that is required to impose a per-target IO boundary on IO that exceeds max_io_len_target_boundary(). Otherwise "special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond where allowed. Fix this by restoring the max_io_len_target_boundary() limit in __send_changing_extent_only() Fixes: 61697a6abd24 ("dm: eliminate 'split_discard_bios' flag from DM target interface") Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Michael Lass <bevan@bi-co.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-05-20dax: Arrange for dax_supported check to span multiple devicesDan Williams
Pankaj reports that starting with commit ad428cdb525a "dax: Check the end of the block-device capacity with dax_direct_access()" device-mapper no longer allows dax operation. This results from the stricter checks in __bdev_dax_supported() that validate that the start and end of a block-device map to the same 'pagemap' instance. Teach the dax-core and device-mapper to validate the 'pagemap' on a per-target basis. This is accomplished by refactoring the bdev_dax_supported() internals into generic_fsdax_supported() which takes a sector range to validate. Consequently generic_fsdax_supported() is suitable to be used in a device-mapper ->iterate_devices() callback. A new ->dax_supported() operation is added to allow composite devices to split and route upper-level bdev_dax_supported() requests. Fixes: ad428cdb525a ("dax: Check the end of the block-device...") Cc: <stable@vger.kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-05-16dm: fix a couple brace coding style issuesSheetal Singala
Signed-off-by: Sheetal Singala <2396sheetal@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-26dm: only initialize md->dax_dev if CONFIG_DAX_DRIVER is enabledPeng Wang
md->dax_dev defaults to NULL and there is no need to initialize it if CONFIG_DAX_DRIVER is disabled. Signed-off-by: Peng Wang <rocking@whu.edu.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-04dm: disable DISCARD if the underlying storage no longer supports itMike Snitzer
Storage devices which report supporting discard commands like WRITE_SAME_16 with unmap, but reject discard commands sent to the storage device. This is a clear storage firmware bug but it doesn't change the fact that should a program cause discards to be sent to a multipath device layered on this buggy storage, all paths can end up failed at the same time from the discards, causing possible I/O loss. The first discard to a path will fail with Illegal Request, Invalid field in cdb, e.g.: kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current] kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00 kernel: blk_update_request: critical target error, dev sdfn, sector 10487808 The SCSI layer converts this to the BLK_STS_TARGET error number, the sd device disables its support for discard on this path, and because of the BLK_STS_TARGET error multipath fails the discard without failing any path or retrying down a different path. But subsequent discards can cause path failures. Any discards sent to the path which already failed a discard ends up failing with EIO from blk_cloned_rq_check_limits with an "over max size limit" error since the discard limit was set to 0 by the sd driver for the path. As the error is EIO, this now fails the path and multipath tries to send the discard down the next path. This cycle continues as discards are sent until all paths fail. Fix this by training DM core to disable DISCARD if the underlying storage already did so. Also, fix branching in dm_done() and clone_endio() to reflect the mutually exclussive nature of the IO operations in question. Cc: stable@vger.kernel.org Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm: revert 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * ↵Mikulas Patocka
PAGE_SIZE") The limit was already incorporated to dm-crypt with commit 4e870e948fba ("dm crypt: fix error with too large bios"), so we don't need to apply it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is wrong anyway because the variable ti->max_io_len it is supposed to be in the units of 512-byte sectors not in bytes. Reduction of the limit to 1048576 sectors could even cause data corruption in rare cases - suppose that we have a dm-striped device with stripe size 768MiB. The target will call dm_set_target_max_io_len with the value 1572864. The buggy code would reduce it to 1048576. Now, the dm-core will errorneously split the bios on 1048576-sector boundary insetad of 1572864-sector boundary and pass these stripe-crossing bios to the striped target. Cc: stable@vger.kernel.org # v4.16+ Fixes: 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-03-05dm: always call blk_queue_split() in dm_process_bio()Mike Snitzer
Do not just call blk_queue_split() if the bio is_abnormal_io(). Fixes: 568c73a355e ("dm: update dm_process_bio() to split bio if in ->make_request_fn()") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-03-05dm: remove unused _rq_tio_cache and _rq_cacheMike Snitzer
Also move dm_rq_target_io structure definition from dm-rq.h to dm-rq.c Fixes: 6a23e05c2fe3c6 ("dm: remove legacy request-based IO path") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-20dm: eliminate 'split_discard_bios' flag from DM target interfaceMike Snitzer
There is no need to have DM core split discards on behalf of a DM target now that blk_queue_split() handles splitting discards based on the queue_limits. A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-19dm: update dm_process_bio() to split bio if in ->make_request_fn()Mike Snitzer
Must call blk_queue_split() otherwise queue_limits for abnormal requests (e.g. discard, writesame, etc) won't be imposed. In addition, add dm_queue_split() to simplify DM specific splitting that is needed for targets that impose ti->max_io_len. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-06dm: don't use bio_trim() afterallMike Snitzer
bio_trim() has an early return, which makes it _not_ idempotent, if the offset is 0 and the bio's bi_size already matches the requested size. Prior to DM, all users of bio_trim() were fine with this. But DM has exposed the fact that bio_trim()'s early return is incompatible with a cloned bio whose integrity payload must be trimmed via bio_integrity_trim(). Fix this by reverting DM back to doing the equivalent of bio_trim() but in an idempotent manner (so bio_integrity_trim is always performed). Follow-on work is needed to assess what benefit bio_trim()'s early return is providing to its existing callers. Reported-by: Milan Broz <gmazyland@gmail.com> Fixes: 57c36519e4b94 ("dm: fix clone_bio() to trigger blk_recount_segments()") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-06dm: add memory barrier before waitqueue_activeMikulas Patocka
Block core changes to switch bio-based IO accounting to be percpu had a side-effect of altering DM core to now rely on calling waitqueue_active (in both bio-based and request-based) to check if another task is in dm_wait_for_completion(). A memory barrier is needed before calling waitqueue_active(). DM core doesn't piggyback on a preceding memory barrier so it must explicitly use its own. For more details on why using waitqueue_active() without a preceding barrier is unsafe, please see the comment before the waitqueue_active() definition in include/linux/wait.h. Add the missing memory barrier by switching to using wq_has_sleeper(). Fixes: 6f75723190d8 ("dm: remove the pending IO accounting") Fixes: c4576aed8d85 ("dm: fix request-based dm's use of dm_wait_for_completion") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-01-22dm: add missing trace_block_split() to __split_and_process_bio()Mike Snitzer
Provides useful context about bio splits in blktrace. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-01-22dm: fix dm_wq_work() to only use __split_and_process_bio() if appropriateMike Snitzer
Otherwise targets that don't support/expect IO splitting could resubmit bios using code paths with unnecessary IO splitting complexity. Depends-on: 24113d487843 ("dm: avoid indirect call in __dm_make_request") Fixes: 978e51ba38e00 ("dm: optimize bio-based NVMe IO submission") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-01-21dm: fix redundant IO accounting for bios that need splittingMike Snitzer
The risk of redundant IO accounting was not taken into consideration when commit 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk") introduced IO splitting in terms of recursion via generic_make_request(). Fix this by subtracting the split bio's payload from the IO stats that were already accounted for by start_io_acct() upon dm_make_request() entry. This repeat oscillation of the IO accounting, up then down, isn't ideal but refactoring DM core's IO splitting to pre-split bios _before_ they are accounted turned out to be an excessive amount of change that will need a full development cycle to refine and verify. Before this fix: /dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so bios are split on 32k boundaries. # fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \ --iodepth=1 --ioengine=libaio --direct=1 --refill_buffers with debugging added: [103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128 [103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio: [103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64 ... 16M written yet 136M (278528 * 512b) accounted: # cat /sys/block/dm-2/stat | awk '{ print $7 }' 278528 After this fix: 16M written and 16M (32768 * 512b) accounted: # cat /sys/block/dm-2/stat | awk '{ print $7 }' 32768 Fixes: 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk") Cc: stable@vger.kernel.org # 4.16+ Reported-by: Bryan Gurney <bgurney@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-01-21dm: fix clone_bio() to trigger blk_recount_segments()Mike Snitzer
DM's clone_bio() now benefits from using bio_trim() by fixing the fact that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does; which triggers blk_recount_segments() via bio_phys_segments(). Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-28Merge tag 'for-4.21/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Eliminate a couple indirect calls from bio-based DM core. - Fix DM to allow reads that exceed readahead limits by setting io_pages in the backing_dev_info. - A couple code cleanups in request-based DM. - Fix various DM targets to check for device sector overflow if CONFIG_LBDAF is not set. - Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t isn't large enough on 32bit when CONFIG_LBDAF is not set. - Performance fixes to DM's kcopyd and the snapshot target focused on limiting memory use and workqueue stalls. - Fix typos in the integrity and writecache targets. - Log which algorithm is used for dm-crypt's encryption and dm-integrity's hashing. - Fix false -EBUSY errors in DM raid target's handling of check/repair messages. - Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt the Nth byte in a bio's payload. * tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: do not allow readahead to limit IO size dm raid: fix false -EBUSY when handling check/repair message dm rq: cleanup leftover code from recently removed q->mq_ops branching dm verity: log the hash algorithm implementation dm crypt: log the encryption algorithm implementation dm integrity: fix spelling mistake in workqueue name dm flakey: Properly corrupt multi-page bios. dm: Check for device sector overflow if CONFIG_LBDAF is not set dm crypt: use u64 instead of sector_t to store iv_offset dm kcopyd: Fix bug causing workqueue stalls dm snapshot: Fix excessive memory usage and workqueue stalls dm bufio: update comment in dm-bufio.c dm writecache: fix typo in error msg for creating writecache_flush_thread dm: remove indirect calls from __send_changing_extent_only() dm mpath: only flush workqueue when needed dm rq: remove unused arguments from rq_completed() dm: avoid indirect call in __dm_make_request
2018-12-28Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main pull request for block/storage for 4.21. Larger than usual, it was a busy round with lots of goodies queued up. Most notable is the removal of the old IO stack, which has been a long time coming. No new features for a while, everything coming in this week has all been fixes for things that were previously merged. This contains: - Use atomic counters instead of semaphores for mtip32xx (Arnd) - Cleanup of the mtip32xx request setup (Christoph) - Fix for circular locking dependency in loop (Jan, Tetsuo) - bcache (Coly, Guoju, Shenghui) * Optimizations for writeback caching * Various fixes and improvements - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith) * host and target support for NVMe over TCP * Error log page support * Support for separate read/write/poll queues * Much improved polling * discard OOM fallback * Tracepoint improvements - lightnvm (Hans, Hua, Igor, Matias, Javier) * Igor added packed metadata to pblk. Now drives without metadata per LBA can be used as well. * Fix from Geert on uninitialized value on chunk metadata reads. * Fixes from Hans and Javier to pblk recovery and write path. * Fix from Hua Su to fix a race condition in the pblk recovery code. * Scan optimization added to pblk recovery from Zhoujie. * Small geometry cleanup from me. - Conversion of the last few drivers that used the legacy path to blk-mq (me) - Removal of legacy IO path in SCSI (me, Christoph) - Removal of legacy IO stack and schedulers (me) - Support for much better polling, now without interrupts at all. blk-mq adds support for multiple queue maps, which enables us to have a map per type. This in turn enables nvme to have separate completion queues for polling, which can then be interrupt-less. Also means we're ready for async polled IO, which is hopefully coming in the next release. - Killing of (now) unused block exports (Christoph) - Unification of the blk-rq-qos and blk-wbt wait handling (Josef) - Support for zoned testing with null_blk (Masato) - sx8 conversion to per-host tag sets (Christoph) - IO priority improvements (Damien) - mq-deadline zoned fix (Damien) - Ref count blkcg series (Dennis) - Lots of blk-mq improvements and speedups (me) - sbitmap scalability improvements (me) - Make core inflight IO accounting per-cpu (Mikulas) - Export timeout setting in sysfs (Weiping) - Cleanup the direct issue path (Jianchao) - Export blk-wbt internals in block debugfs for easier debugging (Ming) - Lots of other fixes and improvements" * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits) kyber: use sbitmap add_wait_queue/list_del wait helpers sbitmap: add helpers for add/del wait queue handling block: save irq state in blkg_lookup_create() dm: don't reuse bio for flushes nvme-pci: trace SQ status on completions nvme-rdma: implement polling queue map nvme-fabrics: allow user to pass in nr_poll_queues nvme-fabrics: allow nvmf_connect_io_queue to poll nvme-core: optionally poll sync commands block: make request_to_qc_t public nvme-tcp: fix spelling mistake "attepmpt" -> "attempt" nvme-tcp: fix endianess annotations nvmet-tcp: fix endianess annotations nvme-pci: refactor nvme_poll_irqdisable to make sparse happy nvme-pci: only set nr_maps to 2 if poll queues are supported nvmet: use a macro for default error location nvmet: fix comparison of a u16 with -1 blk-mq: enable IO poll if .nr_queues of type poll > 0 blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() blk-mq: skip zero-queue maps in blk_mq_map_swqueue ...
2018-12-19dm: don't reuse bio for flushesJens Axboe
DM currently has a statically allocated bio that it uses to issue empty flushes. It doesn't submit this bio, it just uses it for maintaining state while setting up clones. Multiple users can access this bio at the same time. This wasn't previously an issue, even if it was a bit iffy, but with the blkg associations it can become one. We setup the blkg association, then clone bio's and submit, then remove the blkg assocation again. But since we can have multiple tasks doing this at the same time, against multiple blkg's, then we can either lose references to a blkg, or put it twice. The latter causes complaints on the percpu ref being <= 0 when released, and can cause use-after-free as well. Ming reports that xfstest generic/475 triggers this: ------------[ cut here ]------------ percpu ref (blkg_release) <= 0 (0) after switching to atomic WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0 Switch to just using an on-stack bio for this, and get rid of the embedded bio. Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") Reported-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18dm: remove indirect calls from __send_changing_extent_only()Mike Snitzer
No need to be so fancy. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-18dm: avoid indirect call in __dm_make_requestMikulas Patocka
Indirect calls are inefficient because of retpolines that are used for spectre workaround. This patch replaces an indirect call with a condition (that can be predicted by the branch predictor). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-17blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()Jens Axboe
There's a single user of this function, dm, and dm just wants to check if IO is inflight, not that it's just allocated. This fixes a hang with srp/002 in blktests with dm, where it tries to suspend but waits for inflight IO to finish first. As it checks for just allocated requests, this fails. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11dm: fix request-based dm's use of dm_wait_for_completionMike Snitzer
The md->wait waitqueue is used by both bio-based and request-based DM. Commit dbd3bbd291 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO") lost sight of the requirement that dm_wait_for_completion() must work with all types of DM devices. Fix md_in_flight() to call the blk-mq or bio-based method accordingly. Fixes: dbd3bbd291 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10dm: fix inflight IO checkJens Axboe
After switching to percpu inflight counters, the inflight check is totally buggy. It's perfectly valid for some counters to be non-zero while having a total inflight IO count of 0, that's how these kinds of counters work (inc on one CPU, dec on another). Fix the md_in_flight() check to sum all counters before returning a false positive, potentially. While at it, remove the inflight read for IO completion. We don't need it, just wake anyone that's waiting for the IO count to drop to zero. The caller needs to re-check that value anyway when woken, which it does. Fixes: 6f75723190d8 ("dm: remove the pending IO accounting") Acked-by: Mike Snitzer <snitzer@redhat.com> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10dm: remove the pending IO accountingMikulas Patocka
Remove the "pending" atomic counters, that duplicate block-core's in_flight counters, and update md_in_flight() to look at percpu in_flight counters. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10dm: dont rewrite dm_disk(md)->part0.in_flightMikulas Patocka
generic_start_io_acct and generic_end_io_acct already update the variable in_flight using atomic operations, so we don't have to overwrite them again. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07dm: set the static flush bio device on demandDennis Zhou
The next patch changes the macro bio_set_dev() to associate a bio with a blkg based on the device set. However, dm creates a static bio to be used as the basis for cloning empty flush bios on creation. The bio_set_dev() call in alloc_dev() will cause problems with the next patch adding association to bio_set_dev() because the call is before the bdev is associated with a gendisk (bd_disk is %NULL). To get around this, set the device on the static bio every time and use that to clone to the other bios. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07dm: call blk_queue_split() to impose device limits on biosMike Snitzer
Otherwise the incoming bios, of various types, won't be shaped based on the DM device's advertised limits. Depends-on: af67c31fba ("blk: remove bio_set arg from blk_queue_split()") Fixes: 744889b7cb ("block: don't deal with discard limit in blkdev_issue_discard()") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-11-15block: remove the lock argument to blk_alloc_queue_nodeChristoph Hellwig
With the legacy request path gone there is no real need to override the queue_lock. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-26Merge tag 'for-4.20/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - The biggest change this cycle is to remove support for the legacy IO path (.request_fn) from request-based DM. Jens has already started preparing for complete removal of the legacy IO path in 4.21 but this earlier removal of support from DM has been coordinated with Jens (as evidenced by the commit being attributed to him). Making request-based DM exclussively blk-mq only cleans up that portion of DM core quite nicely. - Convert the thinp and zoned targets over to using refcount_t where applicable. - A couple fixes to the DM zoned target for refcounting and other races buried in the implementation of metadata block creation and use. - Small cleanups to remove redundant unlikely() around a couple WARN_ON_ONCE(). - Simplify how dm-ioctl copies from userspace, eliminating some potential for a malicious user trying to change the executed ioctl after its processing has begun. - Tweaked DM crypt target to use the DM device name when naming the various workqueues created for a particular DM crypt device (makes the N workqueues for a DM crypt device more easily understood and enhances user's accounting capabilities at a glance via "ps") - Small fixup to remove dead branch in DM writecache's memory_entry(). * tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm writecache: remove disabled code in memory_entry() dm zoned: fix various dmz_get_mblock() issues dm zoned: fix metadata block ref counting dm raid: avoid bitmap with raid4/5/6 journal device dm crypt: make workqueue names device-specific dm: add dm_table_device_name() dm ioctl: harden copy_params()'s copy_from_user() from malicious users dm: remove unnecessary unlikely() around WARN_ON_ONCE() dm zoned: target: use refcount_t for dm zoned reference counters dm thin: use refcount_t for thin_c reference counting dm table: require that request-based DM be layered on blk-mq devices dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED dm: remove legacy request-based IO path
2018-10-25block: add a report_zones methodChristoph Hellwig
Dispatching a report zones command through the request queue is a major pain due to the command reply payload rewriting necessary. Given that blkdev_report_zones() is executing everything synchronously, implement report zones as a block device file operation instead, allowing major simplification of the code in many places. sd, null-blk, dm-linear and dm-flakey being the only block device drivers supporting exposing zoned block devices, these drivers are modified to provide the device side implementation of the report_zones() block device file operation. For device mappers, a new report_zones() target type operation is defined so that the upper block layer calls blkdev_report_zones() can be propagated down to the underlying devices of the dm targets. Implementation for this new operation is added to the dm-linear and dm-flakey targets. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Changed method block_device argument to gendisk * Various bug fixes and improvements * Added support for null_blk, dm-linear and dm-flakey. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-16dm: remove unnecessary unlikely() around WARN_ON_ONCE()Igor Stoppa
WARN_ON() already contains an unlikely(), so it's not necessary to wrap it into another. Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-11dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASEDMike Snitzer
Now that request-based DM is only using blk-mq, there is no need to differentiate between legacy "rq" and new "mq". We're back to a single request-based DM -- and there was much rejoicing! Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-11dm: remove legacy request-based IO pathJens Axboe
dm supports both, and since we're killing off the legacy path in general, get rid of it in dm. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-09dm: fix report zone remapping to account for partition offsetDamien Le Moal
If dm-linear or dm-flakey are layered on top of a partition of a zoned block device, remapping of the start sector and write pointer position of the zones reported by a report zones BIO must be modified to account for the target table entry mapping (start offset within the device and entry mapping with the dm device). If the target's backing device is a partition of a whole disk, the start sector on the physical device of the partition must also be accounted for when modifying the zone information. However, dm_remap_zone_report() was not considering this last case, resulting in incorrect zone information remapping with targets using disk partitions. Fix this by calculating the target backing device start sector using the position of the completed report zones BIO and the unchanged position and size of the original report zone BIO. With this value calculated, the start sector and write pointer position of the target zones can be correctly remapped. Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-07-18block: Add and use op_stat_group() for indexing disk_stat fields.Michael Callahan
Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-28dm: prevent DAX mounts if not supportedRoss Zwisler
Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX flag is set on the device's request queue to decide whether or not the device supports filesystem DAX. Really we should be using bdev_dax_supported() like filesystems do at mount time. This performs other tests like checking to make sure the dax_direct_access() path works. We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if any of the underlying devices do not support DAX. This makes the handling of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags in dm_table_set_restrictions(). Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this will ensure that filesystems built upon DM devices will only be able to mount with DAX if all underlying devices also support DAX. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support") Cc: stable@vger.kernel.org Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-22dm: use bio_split() when splitting out the already processed bioMike Snitzer
Use of bio_clone_bioset() is inefficient if there is no need to clone the original bio's bio_vec array. Best to use the bio_clone_fast() variant. Also, just using bio_advance() is only part of what is needed to properly setup the clone -- it doesn't account for the various bio_integrity() related work that also needs to be performed (see bio_split). Address both of these issues by switching from bio_clone_bioset() to bio_split(). Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk") Cc: stable@vger.kernel.org # 4.15+, requires removal of '&' before md->queue->bio_split Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-08Merge tag 'libnvdimm-for-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This adds a user for the new 'bytes-remaining' updates to memcpy_mcsafe() that you already received through Ingo via the x86-dax- for-linus pull. Not included here, but still targeting this cycle, is support for handling memory media errors (poison) consumed via userspace dax mappings. Summary: - DAX broke a fundamental assumption of truncate of file mapped pages. The truncate path assumed that it is safe to disconnect a pinned page from a file and let the filesystem reclaim the physical block. With DAX the page is equivalent to the filesystem block. Introduce dax_layout_busy_page() to enable filesystems to wait for pinned DAX pages to be released. Without this wait a filesystem could allocate blocks under active device-DMA to a new file. - DAX arranges for the block layer to be bypassed and uses dax_direct_access() + copy_to_iter() to satisfy read(2) calls. However, the memcpy_mcsafe() facility is available through the pmem block driver. In order to safely handle media errors, via the DAX block-layer bypass, introduce copy_to_iter_mcsafe(). - Fix cache management policy relative to the ACPI NFIT Platform Capabilities Structure to properly elide cache flushes when they are not necessary. The table indicates whether CPU caches are power-fail protected. Clarify that a deep flush is always performed on REQ_{FUA,PREFLUSH} requests" * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) dax: Use dax_write_cache* helpers libnvdimm, pmem: Do not flush power-fail protected CPU caches libnvdimm, pmem: Unconditionally deep flush on *sync libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH acpi, nfit: Remove ecc_unit_size dax: dax_insert_mapping_entry always succeeds libnvdimm, e820: Register all pmem resources libnvdimm: Debug probe times linvdimm, pmem: Preserve read-only setting for pmem devices x86, nfit_test: Add unit test for memcpy_mcsafe() pmem: Switch to copy_to_iter_mcsafe() dax: Report bytes remaining in dax_iomap_actor() dax: Introduce a ->copy_to_iter dax operation uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation xfs, dax: introduce xfs_break_dax_layouts() xfs: prepare xfs_break_layouts() for another layout type xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL mm, fs, dax: handle layout changes to pinned dax mappings mm: fix __gup_device_huge vs unmap mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS ...
2018-06-08dm: use bioset_init_from_src() to copy bio_setJens Axboe
We can't just copy and clear a bio_set, use the bio helper to setup a new bio_set with the settings from another one. Fixes: 6f1c819c219f ("dm: convert to bioset_init()/mempool_init()") Reported-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com> Tested-by: Venkat R.B <vrbagal1@linux.vnet.ibm.com> Tested-by: Li Wang <liwang@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-04Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...
2018-05-30dm: convert to bioset_init()/mempool_init()Kent Overstreet
Convert dm to embedded bio sets. Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>