aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-raid.c
AgeCommit message (Collapse)Author
2024-04-13dm-raid: fix lockdep waring in "pers->hot_add_disk"Yu Kuai
[ Upstream commit 95009ae904b1e9dca8db6f649f2d7c18a6e42c75 ] The lockdep assert is added by commit a448af25becf ("md/raid10: remove rcu protection to access rdev from conf") in print_conf(). And I didn't notice that dm-raid is calling "pers->hot_add_disk" without holding 'reconfig_mutex'. "pers->hot_add_disk" read and write many fields that is protected by 'reconfig_mutex', and raid_resume() already grab the lock in other contex. Hence fix this problem by protecting "pers->host_add_disk" with the lock. Fixes: 9092c02d9435 ("DM RAID: Add ability to restore transiently failed devices on resume") Fixes: a448af25becf ("md/raid10: remove rcu protection to access rdev from conf") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Acked-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240305072306.2562024-10-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26dm raid: fix false positive for requeue needed during reshapeMing Lei
[ Upstream commit b25b8f4b8ecef0f48c05f0c3572daeabefe16526 ] An empty flush doesn't have a payload, so it should never be looked at when considering to possibly requeue a bio for the case when a reshape is in progress. Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") Reported-by: Patrick Plenefisch <simonpatp@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11dm raid: fix missing reconfig_mutex unlock in raid_ctr() error pathsYu Kuai
[ Upstream commit bae3028799dc4f1109acc4df37c8ff06f2d8f1a0 ] In the error paths 'bad_stripe_cache' and 'bad_check_reshape', 'reconfig_mutex' is still held after raid_ctr() returns. Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25dm raid: fix address sanitizer warning in raid_statusMikulas Patocka
commit 1fbeea217d8f297fe0e0956a1516d14ba97d0396 upstream. There is this warning when using a kernel with the address sanitizer and running this testsuite: https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid ================================================================== BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid] Read of size 4 at addr ffff888079d2c7e8 by task lvcreate/13319 CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0x6a/0x9c print_address_description.constprop.0+0x1f/0x1e0 print_report.cold+0x55/0x244 kasan_report+0xc9/0x100 raid_status+0x1747/0x2820 [dm_raid] dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod] table_load+0x35c/0x630 [dm_mod] ctl_ioctl+0x411/0x630 [dm_mod] dm_ctl_ioctl+0xa/0x10 [dm_mod] __x64_sys_ioctl+0x12a/0x1a0 do_syscall_64+0x5b/0x80 The warning is caused by reading conf->max_nr_stripes in raid_status. The code in raid_status reads mddev->private, casts it to struct r5conf and reads the entry max_nr_stripes. However, if we have different raid type than 4/5/6, mddev->private doesn't point to struct r5conf; it may point to struct r0conf, struct r1conf, struct r10conf or struct mpconf. If we cast a pointer to one of these structs to struct r5conf, we will be reading invalid memory and KASAN warns about it. Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25dm raid: fix address sanitizer warning in raid_resumeMikulas Patocka
commit 7dad24db59d2d2803576f2e3645728866a056dab upstream. There is a KASAN warning in raid_resume when running the lvm test lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks is greater than rs->raid_disks, so the loop touches one entry beyond the allocated length. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07dm raid: fix accesses beyond end of raid member arrayHeinz Mauelshagen
commit 332bd0778775d0cf105c4b9e03e460b590749916 upstream. On dm-raid table load (using raid_ctr), dm-raid allocates an array rs->devs[rs->raid_disks] for the raid device members. rs->raid_disks is defined by the number of raid metadata and image tupples passed into the target's constructor. In the case of RAID layout changes being requested, that number can be different from the current number of members for existing raid sets as defined in their superblocks. Example RAID layout changes include: - raid1 legs being added/removed - raid4/5/6/10 number of stripes changed (stripe reshaping) - takeover to higher raid level (e.g. raid5 -> raid6) When accessing array members, rs->raid_disks must be used in control loops instead of the potentially larger value in rs->md.raid_disks. Otherwise it will cause memory access beyond the end of the rs->devs array. Fix this by changing code that is prone to out-of-bounds access. Also fix validate_raid_redundancy() to validate all devices that are added. Also, use braces to help clean up raid_iterate_devices(). The out-of-bounds memory accesses was discovered using KASAN. This commit was verified to pass all LVM2 RAID tests (with KASAN enabled). Cc: stable@vger.kernel.org Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload ↵Heinz Mauelshagen
sequences commit f99a8e4373eeacb279bc9696937a55adbff7a28a upstream. If fast table reloads occur during an ongoing reshape of raid4/5/6 devices the target may race reading a superblock vs the the MD resync thread; causing an inconclusive reshape state to be read in its constructor. lvm2 test lvconvert-raid-reshape-stripes-load-reload.sh can cause BUG_ON() to trigger in md_run(), e.g.: "kernel BUG at drivers/md/raid5.c:7567!". Scenario triggering the bug: 1. the MD sync thread calls end_reshape() from raid5_sync_request() when done reshaping. However end_reshape() _only_ updates the reshape position to MaxSector keeping the changed layout configuration though (i.e. any delta disks, chunk sector or RAID algorithm changes). That inconclusive configuration is stored in the superblock. 2. dm-raid constructs a mapping, loading named inconsistent superblock as of step 1 before step 3 is able to finish resetting the reshape state completely, and calls md_run() which leads to mentioned bug in raid5.c. 3. the MD RAID personality's finish_reshape() is called; which resets the reshape information on chunk sectors, delta disks, etc. This explains why the bug is rarely seen on multi-core machines, as MD's finish_reshape() superblock update races with the dm-raid constructor's superblock load in step 2. Fix identifies inconclusive superblock content in the dm-raid constructor and resets it before calling md_run(), factoring out identifying checks into rs_is_layout_change() to share in existing rs_reshape_requested() and new rs_reset_inclonclusive_reshape(). Also enhance a comment and remove an empty line. Cc: stable@vger.kernel.org Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19dm raid: fix discard limits for raid1Mike Snitzer
commit cc07d72bf350b77faeffee1c37bc52197171473f upstream. Block core warned that discard_granularity was 0 for dm-raid with personality of raid1. Reason is that raid_io_hints() was incorrectly special-casing raid1 rather than raid0. Fix raid_io_hints() by removing discard limits settings for raid1. Check for raid0 instead. Fixes: 61697a6abd24a ("dm: eliminate 'split_discard_bios' flag from DM target interface") Cc: stable@vger.kernel.org Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Stephan Bärwolf <stephan@matrixstorm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-11dm raid: fix updating of max_discard_sectors limitMing Lei
Unit of 'chunk_size' is byte, instead of sector, so fix it by setting the queue_limits' max_discard_sectors to rs->md.chunk_sectors. Also, rename chunk_size to chunk_size_bytes. Without this fix, too big max_discard_sectors is applied on the request queue of dm-raid, finally raid code has to split the bio again. This re-split done by raid causes the following nested clone_endio: 1) one big bio 'A' is submitted to dm queue, and served as the original bio 2) one new bio 'B' is cloned from the original bio 'A', and .map() is run on this bio of 'B', and B's original bio points to 'A' 3) raid code sees that 'B' is too big, and split 'B' and re-submit the remainded part of 'B' to dm-raid queue via generic_make_request(). 4) now dm will handle 'B' as new original bio, then allocate a new clone bio of 'C' and run .map() on 'C'. Meantime C's original bio points to 'B'. 5) suppose now 'C' is completed by raid directly, then the following clone_endio() is called recursively: clone_endio(C) ->clone_endio(B) #B is original bio of 'C' ->bio_endio(A) 'A' can be big enough to make hundreds of nested clone_endio(), then stack can be corrupted easily. Fixes: 61697a6abd24a ("dm: eliminate 'split_discard_bios' flag from DM target interface") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-21dm raid: add missing cleanup in raid_ctr()Wenwen Wang
If rs_prepare_reshape() fails, no cleanup is executed, leading to leak of the raid_set structure allocated at the beginning of raid_ctr(). To fix this issue, go to the label 'bad' if the error occurs. Fixes: 11e4723206683 ("dm raid: stop keeping raid set frozen altogether") Cc: stable@vger.kernel.org Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-15docs: device-mapper: move it to the admin-guideMauro Carvalho Chehab
The DM support describes lots of aspects related to mapped disk partitions from the userspace PoV. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-06-14docs: convert docs to ReST and rename to *.rstMauro Carvalho Chehab
The conversion is actually: - add blank lines and indentation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-02-20dm: eliminate 'split_discard_bios' flag from DM target interfaceMike Snitzer
There is no need to have DM core split discards on behalf of a DM target now that blk_queue_split() handles splitting discards based on the queue_limits. A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-18dm raid: fix false -EBUSY when handling check/repair messageHeinz Mauelshagen
Sending a check/repair message infrequently leads to -EBUSY instead of properly identifying an active resync. This occurs because raid_message() is testing recovery bits in a racy way. Fix by calling decipher_sync_action() from raid_message() to properly identify the idle state of the RAID device. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-18dm raid: avoid bitmap with raid4/5/6 journal deviceHeinz Mauelshagen
With raid4/5/6, journal device and write intent bitmap are mutually exclusive. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-17dm raid: remove bogus const from decipher_sync_action() return typeGeert Uytterhoeven
With gcc-4.1.2: drivers/md/dm-raid.c:3357: warning: type qualifiers ignored on function return type Remove the "const" keyword to fix this. Fixes: 36a240a706d43383 ("dm raid: fix RAID leg rebuild errors") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: bump target version, update comments and documentationHeinz Mauelshagen
Bump target version to reflect the documented fixes are available. Also fix some code comments (typos and clarity). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix RAID leg rebuild errorsHeinz Mauelshagen
On fast devices such as NVMe, a flaw in rs_get_progress() results in false target status output when userspace lvm2 requests leg rebuilds (symptom of the failure is device health chars 'aaaaaaaa' instead of expected 'aAaAAAAA' causing lvm2 to fail). The correct sync action state definitions already exist in decipher_sync_action() so fix rs_get_progress() to use it. Change decipher_sync_action() to return an enum rather than a string for the sync states and call it from rs_get_progress(). Introduce sync_str() to translate from enum to the string that is needed by raid_status(). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix rebuild of specific devices by updating superblockHeinz Mauelshagen
Update superblock when particular devices are requested via rebuild (e.g. lvconvert --replace ...) to avoid spurious failure with the "New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified" error message. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix stripe adding reshape deadlockHeinz Mauelshagen
When initiating a stripe adding reshape, a deadlock between md_stop_writes() waiting for the sync thread to stop and the running sync thread waiting for inactive stripes occurs (this frequently happens on single-core but rarely on multi-core systems). Fix this deadlock by setting MD_RECOVERY_WAIT to have the main MD resynchronization thread worker (md_do_sync()) bail out when initiating the reshape via constructor arguments. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06dm raid: fix reshape race on small devicesHeinz Mauelshagen
Loading a new mapping table, the dm-raid target's constructor retrieves the volatile reshaping state from the raid superblocks. When the new table is activated in a following resume, the actual reshape position is retrieved. The reshape driven by the previous mapping can already have finished on small and/or fast devices thus updating raid superblocks about the new raid layout. This causes the actual array state (e.g. stripe size reshape finished) to be inconsistent with the one in the new mapping, causing hangs with left behind devices. This race does not occur with usual raid device sizes but with small ones (e.g. those created by the lvm2 test suite). Fix by no longer transferring stale/inconsistent raid_set state during preresume. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-08-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a new driver for Rohm BU21029 touch controller - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free - updates to Atmel, eeti. pxrc and iforce drivers - assorted driver cleanups and fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits) MAINTAINERS: Add PhoenixRC Flight Controller Adapter Input: do not use WARN() in input_alloc_absinfo() Input: mark expected switch fall-throughs Input: raydium_i2c_ts - use true and false for boolean values Input: evdev - switch to bitmap API Input: gpio-keys - switch to bitmap_zalloc() Input: elan_i2c_smbus - cast sizeof to int for comparison bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free() md: Avoid namespace collision with bitmap API dm: Avoid namespace collision with bitmap API Input: pm8941-pwrkey - add resin entry Input: pm8941-pwrkey - abstract register offsets and event code Input: iforce - reorganize joystick configuration lists Input: atmel_mxt_ts - move completion to after config crc is updated Input: atmel_mxt_ts - don't report zero pressure from T9 Input: atmel_mxt_ts - zero terminate config firmware file Input: atmel_mxt_ts - refactor config update code to add context struct Input: atmel_mxt_ts - config CRC may start at T71 Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE ...
2018-08-01md: Avoid namespace collision with bitmap APIAndy Shevchenko
bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods. On the other hand MD bitmap API is special case. Adding 'md' prefix to it to avoid name space collision. No functional changes intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Shaohua Li <shli@kernel.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-06-22dm raid: don't use 'const' in function returnArnd Bergmann
A newly introduced function has 'const int' as the return type, but as "make W=1" reports, that has no meaning: drivers/md/dm-raid.c:510:18: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] This changes the return type to plain 'int'. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 33e53f06850f ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: 552aa679f2657431 ("dm raid: use rs_is_raid*()") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-06treewide: Use struct_size() for kmalloc()-familyKees Cook
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script: // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len * // sizeof *pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@ - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-04dm raid: fix parse_raid_params() variable range issueHeinz Mauelshagen
parse_raid_params() compares variable "int value" with INT_MAX. E.g. related Coverity report excerpt: CID 1364818 (#2 of 3): Operands don't affect result (CONSTANT_EXPRESSION_RESULT) [select issue] 1433 if (value > INT_MAX) { Fix by changing checks to avoid INT_MAX. Whilst on it, avoid unnecessary checks against constants and add check for sane recovery speed min/max. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm raid: fix nosync statusHeinz Mauelshagen
Fix a race for "nosync" activations providing "aa.." device health characters and "0/N" sync ratio rather than "AA..." and "N/N". Occurs when status for the raid set is retrieved during resume before the MD sync thread starts and clears the MD_RECOVERY_NEEDED flag. Cc: stable@vger.kernel.org # 4.16+ Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm: allow targets to return output from messages they are sentMike Snitzer
Could be useful for a target to return stats or other information. If a target does DMEMIT() anything to @result from its .message method then it must return 1 to the caller. Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-03-06dm raid: fix incorrect sync_ratio when degradedJonathan Brassow
Upstream commit 4102d9de6d375 ("dm raid: fix rs_get_progress() synchronization state/ratio") in combination with commit 7c29744ecce ("dm raid: simplify rs_get_progress()") introduced a regression by incorrectly reporting a sync_ratio of 0 for degraded raid sets. This caused lvm2 to fail to repair raid legs automatically. Fix by identifying the degraded state by checking the MD_RECOVERY_INTR flag and returning mddev->recovery_cp in case it is set. MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR MD_RECOVERY_NEEDED ] when a RAID member fails. It then shuts down any sync thread that is running and leaves us with all MD_RECOVERY_* flags cleared. The bug occurs if a status is requested in the short time it takes to shut down any sync thread and clear the flags, because we were keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial phase of a “recover” sync thread. However, this is an incorrect interpretation if MD_RECOVERY_INTR is also set. This also explains why the bug only happened when automatic repair was enabled and not a normal ‘manual’ method. It is impossible to react quick enough to hit the problematic window without it being automated. Fix passes automatic repair tests. Fixes: 7c29744ecce ("dm raid: simplify rs_get_progress()") Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-31Merge tag 'for-4.16/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - DM core fixes to ensure that bio submission follows a depth-first tree walk; this is critical to allow forward progress without the need to use the bioset's BIOSET_NEED_RESCUER. - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure. - DM core cleanups and improvements to make bio-based DM more efficient (e.g. reduced memory footprint as well leveraging per-bio-data more). - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages the more direct IO submission path in the block layer; this mode is used by DM multipath and also optimizes targets like DM thin-pool that stack directly on NVMe data device. - DM multipath improvements to factor out legacy SCSI-only (e.g. scsi_dh) code paths to allow for more optimized support for NVMe multipath. - A fix for DM multipath path selectors (service-time and queue-length) to select paths in a more balanced way; largely academic but doesn't hurt. - Numerous DM raid target fixes and improvements. - Add a new DM "unstriped" target that enables Intel to workaround firmware limitations in some NVMe drives that are striped internally (this target also works when stacked above the DM "striped" target). - Various Documentation fixes and improvements. - Misc cleanups and fixes across various DM infrastructure and targets (e.g. bufio, flakey, log-writes, snapshot). * tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits) dm cache: Documentation: update default migration_throttling value dm mpath selector: more evenly distribute ties dm unstripe: fix target length versus number of stripes size check dm thin: fix trailing semicolon in __remap_and_issue_shared_cell dm table: fix NVMe bio-based dm_table_determine_type() validation dm: various cleanups to md->queue initialization code dm mpath: delay the retry of a request if the target responded as busy dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure dm log writes: fix max length used for kstrndup dm: backfill missing calls to mutex_destroy() dm snapshot: use mutex instead of rw_semaphore dm flakey: check for null arg_name in parse_features() dm thin: extend thinpool status format string with omitted fields dm thin: fixes in thin-provisioning.txt dm thin: document representation of <highest mapped sector> when there is none dm thin: fix documentation relative to low water mark threshold dm cache: be consistent in specifying sectors and SI units in cache.txt dm cache: delete obsoleted paragraph in cache.txt dm cache: fix grammar in cache-policies.txt ...
2018-01-17dm raid: make raid_sets symbol staticWei Yongjun
Fixes the following sparse warning: drivers/md/dm-raid.c:33:1: warning: symbol 'raid_sets' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13dm raid: use rs_is_raid*()Heinz Mauelshagen
Cleanup, no functional change. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13dm raid: simplify rs_get_progress()Heinz Mauelshagen
No need to calculate the reshaping progress because mddev->curr_resync_completed holds it. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13dm raid: ensure 'a' chars during reshapeHeinz Mauelshagen
During reshape, 'A' chars were reported in status rather than 'a'. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13dm raid: stop keeping raid set frozen altogetherHeinz Mauelshagen
In order to avoid redoing synchronization/recovery/reshape partially, the raid set got frozen until after all passed in table line flags had been cleared. The related table reload sequence had to be precisely followed, or reshaping may lead to data corruption caused by the active mapping carrying on with a reshape when the inactive mapping already had retrieved a stale reshape position. Harden by retrieving the actual resync/recovery/reshape position during resume whilst the active table is suspended thus avoiding to keep the raid set frozen altogether. This prevents superfluous redoing of an already resynchronized or recovered segment and, most importantly, potential for redoing of an already reshaped segment causing data corruption. Fixes: d39f0010e ("dm raid: fix raid_resume() to keep raid set frozen as needed") Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13dm raid: validate current raid sets redundancyHeinz Mauelshagen
Verifying the current raid sets redundancy based on retrieved superblock content has to use the superblock's raid level (e.g. raid0), not the constructor requested one (e.g. raid10). Using the requested raid level of raid10 lead to a "divide error" on raid0 which defines data copies divided by to be zero. Also check for bogus data copies. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-11md: introduce new personality funciton start()Song Liu
In do_md_run(), md threads should not wake up until the array is fully initialized in md_run(). However, in raid5_run(), raid5-cache may wake up mddev->thread to flush stripes that need to be written back. This design doesn't break badly right now. But it could lead to bad bug in the future. This patch tries to resolve this problem by splitting start up work into two personality functions, run() and start(). Tasks that do not require the md threads should go into run(), while task that require the md threads go into start(). r5l_load_log() is moved to raid5_start(), so it is not called until the md threads are started in do_md_run(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-12-08dm raid: bump target version to reflect numerous fixesMike Snitzer
Also update Documentation accordingly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: small cleanup and remove unsed "struct raid_set" memberHeinz Mauelshagen
Move raid_resume()'s setting of 'rw' and 'in_sync' to just prior to mddev_resume(). Also, remove unused 'bitmap_loaded' member from "struct raid_set". No functional changes. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: fix rs_get_progress() synchronization state/ratioHeinz Mauelshagen
Fix various sync state issues causing racy/bogus sync ratio, sync_action ad health chars in dm_status() info output. Sync ratio could be N/N (i.e. 100%) shortly after raid set creation, i.e. creating a new RaidLV or upconverting a linear LV to raid1 thus: "0 2097152 raid raid1 2 Aa 2097162/2097152 recover 0 0 -" instead of: "0 2097152 raid raid1 2 Aa 0/2097152 idle 0 0 -" Sync action could be non-idle, when the MD thread was done with io. Health chars could be 'A' when they should be 'a' for a short time before a resynchonization started. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: avoid passing array_in_sync variable to raid_status() calleesHeinz Mauelshagen
The raid_status() function passes the bool array_in_sync variable around providing synchronization state of the MD array. Replace it with a runtime flag. This will avoid a pattern of having to pass discrete variables to various functions. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: display a consistent copy of the MD status via raid_status()Heinz Mauelshagen
The MD sync thread updates recovery flags providing state of any running, idle, frozen, recovering, reshaping, ... activity it performs and updates respective flags asynchronously versus dm processing raid_status(). To close that race window, take a single copy of the flags and pass it into its callees. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: fix raid_resume() to keep raid set frozen as neededHeinz Mauelshagen
During a reshape request: if userspace reloads a "raid" table multiple times, resulting in multiple superblock reads, the raid set needs to stay frozen until all config changes (chunk size, layout data_offset, delta_disks) have been stored in the superblocks and respective flags cleared. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: add component device size checks to avoid runtime failureHeinz Mauelshagen
Check all component data device sizes versus calculated size. Reject if device(s) are too small. Otherwise, MD will fail the operation by accessing beyond the end of the data device. An example use-case is that growing bitmap won't fit any more and the MD runtime will report an error when DM raid should catch this earlier. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: fix raid set size revalidationHeinz Mauelshagen
The raid set size is being revalidated unconditionally before a reshaping conversion is started. MD requires the size to only be reduced in case of a stripe removing (i.e. shrinking) reshape but not when growing because the raid array has to stay small until after the growing reshape finishes. Fix by avoiding the size revalidation in preresume unless a shrinking reshape is requested. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: correct resizing state relative to reshape space in ctrHeinz Mauelshagen
Pay attention to existing reshape space to define if a raid set needs resizing. Otherwise we can hit "Can't resize a reshaping raid set" when a reshape is being requested. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: consume sizes after md_finish_reshape() completes changing themHeinz Mauelshagen
The md raid personalities call md_finish_reshape() at the end of a reshape conversion which adjusts rdev->sectors. Correct/check rdev->sectors before initiating a reshape and raise the recovery pointer accordingly. Otherwise, the DM raid coordinated reshape will fail. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm raid: fix deadlock caused by premature md_stop_writes()Heinz Mauelshagen
md_stop_writes() is called in raid_presuspend() causing deadlocks on bios submitted afterwards -- which happens on loaded raid sets with conversion requests. Fix by moving md_stop_writes() to raid_postsuspend(). NOTE: when the recovery's frozen (MD_RECOVERY_FROZEN), writes haven't been started (or are already stopped) so don't stop them again. Also remove superfluous readonly setting. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-17Merge tag 'for-4.15/dm-changes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper updates from Mike Snitzer: "Given your expected travel I figured I'd get these fixes to you sooner rather than later. - a DM multipath stable@ fix to silence an annoying error message that isn't _really_ an error - a DM core @stable fix for discard support that was enabled for an entire DM device despite only having partial support for discards due to a mix of discard capabilities across the underlying devices. - a couple other DM core discard fixes. - a DM bufio @stable fix that resolves a 32-bit overflow" * tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: fix integer overflow when limiting maximum cache size dm: clear all discard attributes in queue_limits when discards are disabled dm: do not set 'discards_supported' in targets that do not need it dm: discard support requires all targets in a table support discards dm mpath: remove annoying message of 'blk_get_request() returned -11'
2017-11-16dm: do not set 'discards_supported' in targets that do not need itMike Snitzer
The DM target's 'discards_supported' flag is intended to act as an override. Meaning, even if the underlying storage doesn't support discards the DM target will. Signed-off-by: Mike Snitzer <snitzer@redhat.com>