aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-table.c
AgeCommit message (Collapse)Author
2024-05-14Merge tag 'for-6.10/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add a dm-crypt optional "high_priority" flag that enables the crypt workqueues to use WQ_HIGHPRI. - Export dm-crypt workqueues via sysfs (by enabling WQ_SYSFS) to allow for improved visibility and controls over IO and crypt workqueues. - Fix dm-crypt to no longer constrain max_segment_size to PAGE_SIZE. This limit isn't needed given that the block core provides late bio splitting if bio exceeds underlying limits (e.g. max_segment_size). - Fix dm-crypt crypt_queue's use of WQ_UNBOUND to not use WQ_CPU_INTENSIVE because it is meaningless with WQ_UNBOUND. - Fix various issues with dm-delay target (ranging from a resource teardown fix, a fix for hung task when using kthread mode, and other improvements that followed from code inspection). * tag 'for-6.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-delay: remove timer_lock dm-delay: change locking to avoid contention dm-delay: fix max_delay calculations dm-delay: fix hung task introduced by kthread mode dm-delay: fix workqueue delay_timer race dm-crypt: don't set WQ_CPU_INTENSIVE for WQ_UNBOUND crypt_queue dm: use queue_limits_set dm-crypt: stop constraining max_segment_size to PAGE_SIZE dm-crypt: export sysfs of all workqueues dm-crypt: add the optional "high_priority" flag
2024-05-01dm: Check that a zoned table leads to a valid mapped deviceDamien Le Moal
Using targets such as dm-linear, a mapped device can be created to contain only conventional zones. Such device should not be treated as zoned as it does not contain any mandatory sequential write required zone. Since such device can be randomly written, we can modify dm_set_zones_restrictions() to set the mapped device zoned queue limit to false to expose it as a regular block device. The function dm_check_zoned() does this after counting the number of conventional zones of the mapped device and comparing it to the total number of zones reported. The special dm_check_zoned_cb() report zones callback function is used to count conventional zones. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Link: https://lore.kernel.org/r/20240501110907.96950-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-23dm: use queue_limits_setChristoph Hellwig
Use queue_limits_set which validates the limits and takes care of updating the readahead settings instead of directly assigning them to the queue. For that make sure all limits are actually updated before the assignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-11Revert "dm: use queue_limits_set"Linus Torvalds
This reverts commit 8e0ef412869430d114158fc3b9b1fb111e247bd3. It's broken, and causes the boot to fail on encrypted volumes. Reported-and-bisected-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/all/20240311235023.GA1205@cmpxchg.org/ Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-01dm: use queue_limits_setChristoph Hellwig
Use queue_limits_set which validates the limits and takes care of updating the readahead settings instead of directly assigning them to the queue. For that make sure all limits are actually updated before the assignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240228225653.947152-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-30dm: limit the number of targets and parameter size areaMikulas Patocka
The kvmalloc function fails with a warning if the size is larger than INT_MAX. The warning was triggered by a syscall testing robot. In order to avoid the warning, this commit limits the number of targets to 1048576 and the size of the parameter area to 1073741824. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-12-19block: remove support for the host aware zone modelChristoph Hellwig
When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-31dm error: Add support for zoned block devicesDamien Le Moal
dm-error is used in several test cases in the xfstests test suite to check the handling of IO errors in file systems. However, with several file systems getting native support for zoned block devices (e.g. btrfs and f2fs), dm-error's lack of zoned block device support creates problems as the file system attempts executing zone commands (e.g. a zone append operation) against a dm-error non-zoned block device, which causes various issues in the block layer (e.g. WARN_ON triggers). This commit adds supports for zoned block devices to dm-error, allowing a DM device table containing an error target to be exposed as a zoned block device (if all targets have a compatible zoned model support and mapping). This is done as follows: 1) Allow passing 2 arguments to an error target, similar to dm-linear: a backing device and a start sector. These arguments are optional and dm-error retains its characteristics if the arguments are not specified. 2) Implement the iterate_devices method so that dm-core can normally check the zone support and restrictions (e.g. zone alignment of the targets). When the backing device arguments are not specified, the iterate_devices method never calls the fn() argument. When no backing device is specified, as before, we assume that the DM device is not zoned. When the backing device arguments are specified, the zoned model of the DM device will depend on the backing device type: - If the backing device is zoned and its model and mapping is compatible with other targets of the device, the resulting device will be zoned, with the dm-error mapped portion always returning errors (similar to the default non-zoned case). - If the backing device is not zoned, then the DM device will not be either. This zone support for dm-error requires the definition of a functional report_zones operation so that dm_revalidate_zones() can operate correctly and resources for emulating zone append operations initialized. This is necessary for cases where dm-error is used to partially map a device and have an overall correct handling of zone append. This means that dm-error does not fail report zones operations. Two changes that are not obvious are included to avoid issues: 1) dm_table_supports_zoned_model() is changed to directly check if the backing device of a wildcard target (= dm-error target) is zoned. Otherwise, we wouldn't be able to catch the invalid setup of dm-error without a backing device (non zoned case) being combined with zoned targets. 2) dm_table_supports_dax() is modified to return false if the wildcard target is found. Otherwise, when dm-error is set without a backing device, we end up with a NULL pointer dereference in set_dax_synchronous (dax_dev is NULL). This is consistent with the current behavior because dm_table_supports_dax() always returned false for targets that do not define the iterate_devices method. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-09-14dm: fix a race condition in retrieve_depsMikulas Patocka
There's a race condition in the multipath target when retrieve_deps races with multipath_message calling dm_get_device and dm_put_device. retrieve_deps walks the list of open devices without holding any lock but multipath may add or remove devices to the list while it is running. The end result may be memory corruption or use-after-free memory access. See this description of a UAF with multipath_message(): https://listman.redhat.com/archives/dm-devel/2022-October/052373.html Fix this bug by introducing a new rw semaphore "devices_lock". We grab devices_lock for read in retrieve_deps and we grab it for write in dm_get_device and dm_put_device. Reported-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05dm: only call early_lookup_bdev from early boot contextChristoph Hellwig
early_lookup_bdev is supposed to only be called from the early boot code, but dm_get_device calls it as a general fallback when lookup_bdev fails, which is problematic because early_lookup_bdev bypasses all normal path based permission checking, and might cause problems with certain container environments renaming devices. Switch to only call early_lookup_bdev when dm is built-in and the system state in not running yet. This means it is still available when tables are constructed by dm-init.c from the kernel command line, but not otherwise. Note that this strictly speaking changes the kernel ABI as the PARTUUID= and PARTLABEL= style syntax is now not available during a running systems. They never were intended for that, but this breaks things we'll have to figure out a way to make them available again. But if avoidable in any way I'd rather avoid that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20230531125535.676098-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05dm: remove dm_get_dev_tChristoph Hellwig
Open code dm_get_dev_t in the only remaining caller, and propagate the exact error code from lookup_bdev and early_lookup_bdev. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05init: improve the name_to_dev_t interfaceChristoph Hellwig
name_to_dev_t has a very misleading name, that doesn't make clear it should only be used by the early init code, and also has a bad calling convention that doesn't allow returning different kinds of errors. Rename it to early_lookup_bdev to make the use case clear, and return an errno, where -EINVAL means the string could not be parsed, and -ENODEV means it the string was valid, but there was no device found for it. Also stub out the whole call for !CONFIG_BLOCK as all the non-block root cases are always covered in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-26Merge tag 'for-6.4/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Split dm-bufio's rw_semaphore and rbtree. Offers improvements to dm-bufio's locking to allow increased concurrent IO -- particularly for read access for buffers already in dm-bufio's cache. - Also split dm-bio-prison-v1's spinlock and rbtree with comparable aim at improving concurrent IO (for the DM thinp target). - Both the dm-bufio and dm-bio-prison-v1 scaling of the number of locks and rbtrees used are managed by dm_num_hash_locks(). And the hash function used by both is dm_hash_locks_index(). - Allow DM targets to require DISCARD, WRITE_ZEROES and SECURE_ERASE to be split at the target specified boundary (in terms of max_discard_sectors, max_write_zeroes_sectors and max_secure_erase_sectors respectively). - DM verity error handling fix for check_at_most_once on FEC. - Update DM verity target to emit audit events on verification failure and more. - DM core ->io_hints improvements needed in support of new discard support that is added to the DM "zero" and "error" targets. - Fix missing kmem_cache_destroy() call in initialization error path of both the DM integrity and DM clone targets. - A couple fixes for DM flakey, also add "error_reads" feature. - Fix DM core's resume to not lock FS when the DM map is NULL; otherwise initial table load can race with FS mount that takes superblock's ->s_umount rw_semaphore. - Various small improvements to both DM core and DM targets. * tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits) dm: don't lock fs when the map is NULL in process of resume dm flakey: add an "error_reads" option dm flakey: remove trailing space in the table line dm flakey: fix a crash with invalid table line dm ioctl: fix nested locking in table_clear() to remove deadlock concern dm: unexport dm_get_queue_limits() dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE dm: add helper macro for simple DM target module init and exit dm raid: remove unused d variable dm: remove unnecessary (void*) conversions dm mirror: add DMERR message if alloc_workqueue fails dm: push error reporting down to dm_register_target() dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path dm clone: call kmem_cache_destroy() in dm_clone_init() error path dm error: add discard support dm zero: add discard support dm table: allow targets without devices to set ->io_hints dm verity: emit audit events on verification failure and more dm verity: fix error handling for check_at_most_once on FEC dm: improve hash_locks sizing and hash function ...
2023-04-04dm table: allow targets without devices to set ->io_hintsMikulas Patocka
In dm_calculate_queue_limits, add call to ->io_hints hook if the target doesn't provide ->iterate_devices. This is needed so the "error" and "zero" targets may support discards. The 2 following commits will add their respective discard support. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-03-16blk-crypto: make blk_crypto_evict_key() return voidEric Biggers
blk_crypto_evict_key() is only called in contexts such as inode eviction where failure is not an option. So there is nothing the caller can do with errors except log them. (dm-table.c does "use" the error code, but only to pass on to upper layers, so it doesn't really count.) Just make blk_crypto_evict_key() return void and log errors itself. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-14dm: avoid split of quoted strings where possibleHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: fix trailing statementsHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: fix undue/missing spacesHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: address indent/space issuesHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: avoid assignment in if conditionsHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: change "unsigned" to "unsigned int"Heinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: add missing SPDX-License-IndentifiersHeinz Mauelshagen
'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has deprecated its use. Suggested-by: John Wiele <jwiele@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-12dm table: check that a dm device doesn't reference itselfBenjamin Marzinski
If a DM device's table references itself, it will crash the kernel with an infinite recursion. Check for a self-reference in dm_get_device(). This is a quick check, but it won't catch more complicated circular references. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-11-21blk-crypto: don't use struct request_queue for public interfacesChristoph Hellwig
Switch all public blk-crypto interfaces to use struct block_device arguments to specify the device they operate on instead of th request_queue, which is a block layer implementation detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221114042944.1009870-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-18dm: change from DMWARN to DMERR or DMCRIT for fatal errorsMikulas Patocka
Change DMWARN to DMERR in cases when there is an unrecoverable error. Change DMWARN to DMCRIT when handling of a case is unimplemented. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-09-27block: replace blk_queue_nowait with bdev_nowaitChristoph Hellwig
Replace blk_queue_nowait with a bdev_nowait helpers that takes the block_device given that the I/O submission path should not have to look into the request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20220927075815.269694-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02Merge tag 'for-6.0/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Refactor DM core's mempool allocation so that it clearer by not being split acorss files. - Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling. - Optimize DM core's more common bio splitting by eliminating the use of bio cloning with bio_split+bio_chain. Shift that cloning cost to the relatively unlikely dm_io requeue case that only occurs during error handling. Introduces dm_io_rewind() that will clone a bio that reflects the subset of the original bio that must be requeued. - Remove DM core's dm_table_get_num_targets() wrapper and audit all dm_table_get_target() callers. - Fix potential for OOM with DM writecache target by setting a default MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory, whichever is smaller). - Fix DM writecache target's stats that are reported through DM-specific table info. - Fix use-after-free crash in dm_sm_register_threshold_callback(). - Refine DM core's Persistent Reservation handling in preparation for broader work Mike Christie is doing to add compatibility with Microsoft Windows Failover Cluster. - Fix various KASAN reported bugs in the DM raid target. - Fix DM raid target crash due to md_handle_request() bio splitting that recurses to block core without properly initializing the bio's bi_dev. - Fix some code comment typos and fix some Documentation formatting. * tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits) dm: fix dm-raid crash if md_handle_request() splits bio dm raid: fix address sanitizer warning in raid_resume dm raid: fix address sanitizer warning in raid_status dm: Start pr_preempt from the same starting path dm: Fix PR release handling for non All Registrants dm: Start pr_reserve from the same starting path dm: Allow dm_call_pr to be used for path searches dm: return early from dm_pr_call() if DM device is suspended dm thin: fix use-after-free crash in dm_sm_register_threshold_callback dm writecache: count number of blocks discarded, not number of discard bios dm writecache: count number of blocks written, not number of write bios dm writecache: count number of blocks read, not number of read bios dm writecache: return void from functions dm kcopyd: use __GFP_HIGHMEM when allocating pages dm writecache: set a default MAX_WRITEBACK_JOBS Documentation: dm writecache: Render status list as list Documentation: dm writecache: add blank line before optional parameters dm snapshot: fix typo in snapshot_map() comment dm raid: remove redundant "the" in parse_raid_params() comment dm cache: fix typo in 2 comment blocks ...
2022-07-07dm table: rename dm_target variable in dm_table_add_target()Mike Snitzer
Rename from "tgt" to "ti" so that all of dm-table.c code uses the same naming for dm_target variables. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm table: audit all dm_table_get_target() callersMike Snitzer
All callers of dm_table_get_target() are expected to do proper bounds checking on the index they pass. Move dm_table_get_target() to dm-core.h to make it extra clear that only DM core code should be using it. Switch it to be inlined while at it. Standardize all DM core callers to use the same for loop pattern and make associated variables as local as possible. Rename some variables (e.g. s/table/t/ and s/tgt/ti/) along the way. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-07dm table: remove dm_table_get_num_targets() wrapperMike Snitzer
More efficient and readable to just access table->num_targets directly. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-06block: remove blk_queue_zone_sectorsChristoph Hellwig
Always use bdev_zone_sectors instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: use bdev_is_zoned instead of open coding itChristoph Hellwig
Use bdev_is_zoned in all places where a block_device is available instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29dm: refactor dm_md_mempool allocationChristoph Hellwig
The current split between dm_table_alloc_md_mempools and dm_alloc_md_mempools is rather arbitrary, so merge the two into one easy to follow function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-08dm: fix bio_set allocationChristoph Hellwig
The use of bioset_init_from_src mean that the pre-allocated pools weren't used for anything except parameter passing, and the integrity pool creation got completely lost for the actual live mapped_device. Fix that by assigning the actual preallocated dm_md_mempools to the mapped_device and using that for I/O instead of creating new mempools. Fixes: 2a2a4c510b76 ("dm: use bioset_init_from_src() to copy bio_set") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-05-31dm table: fix dm_table_supports_poll to return false if no data devicesMike Snitzer
It was reported that the "generic/250" test in xfstests (which uses the dm-error target) demonstrates a regression where the kernel crashes in bioset_exit(). Since commit cfc97abcbe0b ("dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset") the bioset_init() for the dm_io bioset will setup the bioset's per-cpu alloc cache if all devices have QUEUE_FLAG_POLL set. But there was an bug where a target that doesn't have any data devices (and that doesn't even set the .iterate_devices dm target callback) will incorrectly return true from dm_table_supports_poll(). Fix this by updating dm_table_supports_poll() to follow dm-table.c's well-worn pattern for testing that _all_ targets in a DM table do in fact have underlying devices that set QUEUE_FLAG_POLL. NOTE: An additional block fix is still needed so that bio_alloc_cache_destroy() clears the bioset's ->cache member. Otherwise, a DM device's table reload that transitions the DM device's bioset from using a per-cpu alloc cache to _not_ using one will result in bioset_exit() crashing in bio_alloc_cache_destroy() because dm's dm_io bioset ("io_bs") was left with a stale ->cache member. Fixes: cfc97abcbe0b ("dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset") Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-05-05dm: conditionally enable branching for less used featuresMike Snitzer
Use jump_labels to further reduce cost of unlikely branches for zoned block devices, dm-stats and swap_bios throttling. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-05-05dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io biosetMike Snitzer
A bioset's per-cpu alloc cache may have broader utility in the future but for now constrain it to being tightly coupled to QUEUE_FLAG_POLL. Also change dm_io_complete() to use bio_clear_polled() so that it properly clears all associated bio state on requeue. This commit improves DM's hipri bio polling (REQ_POLLED) perf by 7 - 20% depending on the system. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-17block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARDChristoph Hellwig
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: remove QUEUE_FLAG_DISCARDChristoph Hellwig
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_stable_writes helperChristoph Hellwig
Add a helper to check the stable writes flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17block: add a bdev_nonrot helperChristoph Hellwig
Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-24Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (qla2xxx, pm8001, libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates and bug fixes. The high blast radius core update is the removal of write same, which affects block and several non-SCSI devices. The other big change, which is more local, is the removal of the SCSI pointer" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits) scsi: scsi_ioctl: Drop needless assignment in sg_io() scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn() scsi: lpfc: Copyright updates for 14.2.0.0 patches scsi: lpfc: Update lpfc version to 14.2.0.0 scsi: lpfc: SLI path split: Refactor BSG paths scsi: lpfc: SLI path split: Refactor Abort paths scsi: lpfc: SLI path split: Refactor SCSI paths scsi: lpfc: SLI path split: Refactor CT paths scsi: lpfc: SLI path split: Refactor misc ELS paths scsi: lpfc: SLI path split: Refactor VMID paths scsi: lpfc: SLI path split: Refactor FDISC paths scsi: lpfc: SLI path split: Refactor LS_RJT paths scsi: lpfc: SLI path split: Refactor LS_ACC paths scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4 scsi: lpfc: SLI path split: Refactor lpfc_iocbq scsi: lpfc: Use kcalloc() ...
2022-03-09dm: support bio pollingMing Lei
Support bio polling (REQ_POLLED) in the following approach: 1) only support io polling on normal READ/WRITE, and other abnormal IOs still fallback to IRQ mode, so the target io (and DM's clone bio) is exactly inside the dm io. 2) hold one refcnt on io->io_count after submitting this dm bio with REQ_POLLED 3) support dm native bio splitting, any dm io instance associated with current bio will be added into one list which head is bio->bi_private which will be recovered before ending this bio 4) implement .poll_bio() callback, call bio_poll() on the single target bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call dm_io_dec_pending() after the target io is done in .poll_bio() 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL, which is based on Jeffle's previous patch. These changes are good for a 30-35% IOPS improvement for polled IO. For detailed test results please see (Jens, thanks for testing!): https://listman.redhat.com/archives/dm-devel/2022-March/049868.html or https://marc.info/?l=linux-block&m=164684246214700&w=2 Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-02dm: stop using bdevnameChristoph Hellwig
Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22scsi: dm: Remove WRITE_SAME supportChristoph Hellwig
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start deleting it. Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-04dax: remove dax_capableChristoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-09Merge tag 'for-5.16/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM core support for emitting audit events through the audit subsystem. Also enhance both the integrity and crypt targets to emit events to via dm-audit. - Various other simple code improvements and cleanups. * tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: log table creation error code dm: make workqueue names device-specific dm writecache: Make use of the helper macro kthread_run() dm crypt: Make use of the helper macro kthread_run() dm verity: use bvec_kmap_local in verity_for_bv_block dm log writes: use memcpy_from_bvec in log_writes_map dm integrity: use bvec_kmap_local in __journal_read_write dm integrity: use bvec_kmap_local in integrity_metadata dm: add add_disk() error handling dm: Remove redundant flush_workqueue() calls dm crypt: log aead integrity violations to audit subsystem dm integrity: log audit events for dm-integrity target dm: introduce audit event module for device mapper
2021-11-01dm table: log table creation error codeMichał Mirosław
Help debugging table creation errors by adding the error name in the log. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-11-01Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull bdev size cleanups from Jens Axboe: "Clean up the bdev size handling with new bdev_nr_bytes() helper" * tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits) partitions/ibm: use bdev_nr_sectors instead of open coding it partitions/efi: use bdev_nr_bytes instead of open coding it block/ioctl: use bdev_nr_sectors and bdev_nr_bytes block: cache inode size in bdev udf: use sb_bdev_nr_blocks reiserfs: use sb_bdev_nr_blocks ntfs: use sb_bdev_nr_blocks jfs: use sb_bdev_nr_blocks ext4: use sb_bdev_nr_blocks block: add a sb_bdev_nr_blocks helper block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate squashfs: use bdev_nr_bytes instead of open coding it reiserfs: use bdev_nr_bytes instead of open coding it pstore/blk: use bdev_nr_bytes instead of open coding it ntfs3: use bdev_nr_bytes instead of open coding it nilfs2: use bdev_nr_bytes instead of open coding it nfs/blocklayout: use bdev_nr_bytes instead of open coding it jfs: use bdev_nr_bytes instead of open coding it hfsplus: use bdev_nr_sectors instead of open coding it hfs: use bdev_nr_sectors instead of open coding it ...