Age | Commit message (Collapse) | Author |
|
commit 763303a83a095a88c3a8a0d1abf97165db2e8bf5 upstream.
nvme_mpath_clear_ctrl_paths() iterates through
the ctrl->namespaces list while holding ctrl->scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.
Specifically, nvme_scan_work() sorts ctrl->namespaces
AFTER unlocking scan_lock.
This may result in the following (rare) crash in ctrl disconnect
during scan_work:
BUG: kernel NULL pointer dereference, address: 0000000000000050
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
...
Call Trace:
nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
nvme_remove_namespaces+0x35/0xe0 [nvme_core]
nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
nvme_sysfs_delete+0x49/0x60 [nvme_core]
dev_attr_store+0x17/0x30
sysfs_kf_write+0x3e/0x50
kernfs_fop_write+0x11e/0x1a0
__vfs_write+0x1b/0x40
vfs_write+0xb9/0x1a0
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8d02bfb154
Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.
Alternative: sort ctrl->namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl->namespaces modification by anyone other than scan_work.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 9ad9e8d6ca29c1446d81c6518ae634a2141dfd22 upstream.
In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.
Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit af8fd0424713a2adb812d10d55e86718152cf656 upstream.
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
This fails because ctrl disconnects.
Therefore nvme_update_ns_ana_state() is not called
and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
nvme_read_ana_log(ctrl, groups_only=true).
However, nvme_update_ana_state() does not update namespaces
because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.
Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.
More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.
Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.
Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit a4f40484e7f1dff56bb9f286cc59ffa36e0259eb upstream.
In the current code, the nvme is using a fixed 4k PRP entry size,
but if the kernel use a page size which is more than 4k, we should
consider the situation that the bv_offset may be larger than the
dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
cause the command can't be executed correctly.
Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 67b483dd03c4cd9e90e4c3943132dce514ea4e88 upstream.
If the connect times out, we may have already destroyed the
queue in the timeout handler, so test if the queue is still
allocated in the connect error handler.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit 65e68edce0db433aa0c2b26d7dc14fbbbeb89fbb upstream.
It is not possible to get 64-bit results from the passthru commands,
what prevents from getting for the Capabilities (CAP) property value.
As a result, it is not possible to implement IOL's NVMe Conformance
test 4.3 Case 1 for Fabrics targets [1] (page 123).
This issue has been already discussed [2], but without a solution.
This patch solves the problem by adding new ioctls with a new
passthru structure, including 64-bit results. The older ioctls stay
unchanged.
[1] https://www.iol.unh.edu/sites/default/files/testsuites/nvme/UNH-IOL_NVMe_Conformance_Test_Suite_v11.0.pdf
[2] http://lists.infradead.org/pipermail/linux-nvme/2018-June/018791.html
Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit f03e42c6af60f778a6d1ccfb857db9b2ec835279 upstream.
Booting with default_ps_max_latency_us >6000 makes the device fail.
Also SUBNQN is NULL and gives a warning on each boot/resume.
$ nvme id-ctrl /dev/nvme0 | grep ^subnqn
subnqn : (null)
I use this device with an Acer Nitro 5 (AN515-43-R8BF) Laptop.
To be sure is not a Laptop issue only, I tested the device on
my server board with the same results.
( with 2x,4x link on the board and 4x link on a PCI-E card ).
Signed-off-by: Gabriel Craciunescu <nix.or.die@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit bc4f6e06a90ea016855fc67212b4d500145f0b8a upstream.
"ret" should be a negative error code here, but it's either success or
possibly uninitialized.
Fixes: 32fd90c40768 ("nvme: change locking for the per-subsystem controller list")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit ddef29578a81a1d4d8f2b26a7adbfe21407ee3ea upstream.
Allow the do/while statement to continue if current time
is not after the proposed time 'deadline'. Intent is to
allow loop to proceed for a specific time period. Currently
the loop, as coded, will exit after first pass.
Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
commit b224726de5e496dbf78147a66755c3d81e28bdd2 upstream.
User space programs like udevd may try to read to partitions at the
same time the driver detects a namespace is unusable, and may deadlock
if revalidate_disk() is called while such a process is waiting to
enter the frozen queue. On detecting a dead namespace, move the disk
revalidate after unblocking dispatchers that may be holding bd_butex.
changelog Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Balbir Singh <sblbir@amzn.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
[ Upstream commit e01f91dff91c7b16a6e3faf2565017d497a73f83 ]
ANA log parsing invokes nvme_update_ana_state() per ANA group desc.
This updates the state of namespaces with nsids in desc->nsids[].
Both ctrl->namespaces list and desc->nsids[] array are sorted by nsid.
Hence nvme_update_ana_state() performs a single walk over ctrl->namespaces:
- if current namespace matches the current desc->nsids[n],
this namespace is updated, and n is incremented.
- the process stops when it encounters the end of either
ctrl->namespaces end or desc->nsids[]
In case desc->nsids[n] does not match any of ctrl->namespaces,
the remaining nsids following desc->nsids[n] will not be updated.
Such situation was considered abnormal and generated WARN_ON_ONCE.
However ANA log MAY contain nsids not (yet) found in ctrl->namespaces.
For example, lets consider the following scenario:
- nvme0 exposes namespaces with nsids = [2, 3] to the host
- a new namespace nsid = 1 is added dynamically
- also, a ANA topology change is triggered
- NS_CHANGED aen is generated and triggers scan_work
- before scan_work discovers nsid=1 and creates a namespace, a NOTICE_ANA
aen was issues and ana_work receives ANA log with nsids=[1, 2, 3]
Result: ana_work fails to update ANA state on existing namespaces [2, 3]
Solution:
Change the way nvme_update_ana_state() namespace list walk
checks the current namespace against desc->nsids[n] as follows:
a) ns->head->ns_id < desc->nsids[n]: keep walking ctrl->namespaces.
b) ns->head->ns_id == desc->nsids[n]: match, update the namespace
c) ns->head->ns_id >= desc->nsids[n]: skip to desc->nsids[n+1]
This enables correct operation in the scenario described above.
This also allows ANA log to contain nsids currently invisible
to the host, i.e. inactive nsids.
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3bec2e3754becebd4c452999adb49bc62c575ea4 ]
In nvme spec 1.3 there is a definition for data write/read counters
from SMART log, (See section 5.14.1.2):
This value is reported in thousands (i.e., a value of 1
corresponds to 1000 units of 512 bytes read) and is rounded up.
However, in nvme target where value is reported with actual units,
but not thousands of units as the spec requires.
Signed-off-by: Tom Wu <tomwu@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a89fcca8185633993018dc081d6b021d005e6d0b ]
Commit 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation")
introduced a validation for controllers with duplicate cntlid that runs
on nvme_init_subsystem(). The problem is that the validation relies on
ctrl->cntlid, and this value is assigned (from id_ctrl value) after the
call for nvme_init_subsystem() in nvme_init_identify() for non-fabrics
scenario. That leads to ctrl->cntlid always being 0 in case we have a
physical set of controllers in the same subsystem.
This patch fixes that by loading the discovered cntlid id_ctrl value into
ctrl->cntlid before the subsystem initialization, only for the non-fabrics
case. The patch was tested with emulated nvme devices (qemu) having two
controllers in a single subsystem. Without the patch, we couldn't make
it work failing in the duplicate check; when running with the patch, we
could see the subsystem holding both controllers.
For the fabrics case we see ctrl->cntlid has a more intricate relation
with the admin connect, so we didn't change that.
Fixes: 1b1031ca63b2 ("nvme: validate cntlid during controller initialisation")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 504db087aaccdb32af61539916409f7dca31ceb5 ]
nvme_state_set_live() making a path available triggers requeue_work
in order to resubmit requests that ended up on requeue_list when no
paths were available.
This requeue_work may race with concurrent nvme_ns_head_make_request()
that do not observe the live path yet.
Such concurrent requests may by made by either:
- New IO submission.
- Requeue_work triggered by nvme_failover_req() or another ana_work.
A race may cause requeue_work capture the state of requeue_list before
more requests get onto the list. These requests will stay on the list
forever unless requeue_work is triggered again.
In order to prevent such race, nvme_state_set_live() should
synchronize_srcu(&head->srcu) before triggering the requeue_work and
prevent nvme_ns_head_make_request referencing an old snapshot of the
path list.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bd46a90634302bfe791e93ad5496f98f165f7ae0 ]
Ensure the controller is not in the NEW state when nvme_probe() exits.
This will always allow a subsequent nvme_remove() to set the state to
DELETING, fixing a potential race between the initial asynchronous probe
and device removal.
Reported-by: Li Zhong <lizhongfs@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0157ec8dad3c8fc9bc9790f76e0831ffdaf2e7f0 ]
With multipath enabled, nvme_scan_work() can read from the device
(through nvme_mpath_add_disk()) and hang [1]. However, with fabrics,
once ctrl->state is set to NVME_CTRL_DELETING, the reads will hang
(see nvmf_check_ready()) and the mpath stack device make_request
will block if head->list is not empty. However, when the head->list
consistst of only DELETING/DEAD controllers, we should actually not
block, but rather fail immediately.
In addition, before we go ahead and remove the namespaces, make sure
to clear the current path and kick the requeue list so that the
request will fast fail upon requeuing.
[1]:
--
INFO: task kworker/u4:3:166 blocked for more than 120 seconds.
Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:3 D 0 166 2 0x80004000
Workqueue: nvme-wq nvme_scan_work
Call Trace:
__schedule+0x851/0x1400
schedule+0x99/0x210
io_schedule+0x21/0x70
do_read_cache_page+0xa57/0x1330
read_cache_page+0x4a/0x70
read_dev_sector+0xbf/0x380
amiga_partition+0xc4/0x1230
check_partition+0x30f/0x630
rescan_partitions+0x19a/0x980
__blkdev_get+0x85a/0x12f0
blkdev_get+0x2a5/0x790
__device_add_disk+0xe25/0x1250
device_add_disk+0x13/0x20
nvme_mpath_set_live+0x172/0x2b0
nvme_update_ns_ana_state+0x130/0x180
nvme_set_ns_ana_state+0x9a/0xb0
nvme_parse_ana_log+0x1c3/0x4a0
nvme_mpath_add_disk+0x157/0x290
nvme_validate_ns+0x1017/0x1bd0
nvme_scan_work+0x44d/0x6a0
process_one_work+0x7d7/0x1240
worker_thread+0x8e/0xff0
kthread+0x2c3/0x3b0
ret_from_fork+0x35/0x40
INFO: task kworker/u4:1:1034 blocked for more than 120 seconds.
Not tainted 5.2.0-rc6-vmlocalyes-00005-g808c8c2dc0cf #316
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:1 D 0 1034 2 0x80004000
Workqueue: nvme-delete-wq nvme_delete_ctrl_work
Call Trace:
__schedule+0x851/0x1400
schedule+0x99/0x210
schedule_timeout+0x390/0x830
wait_for_completion+0x1a7/0x310
__flush_work+0x241/0x5d0
flush_work+0x10/0x20
nvme_remove_namespaces+0x85/0x3d0
nvme_do_delete_ctrl+0xb4/0x1e0
nvme_delete_ctrl_work+0x15/0x20
process_one_work+0x7d7/0x1240
worker_thread+0x8e/0xff0
kthread+0x2c3/0x3b0
ret_from_fork+0x35/0x40
--
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d94211b8bad3787e0655a67284105f57db728cb1 ]
When start_queue fails, we need to make sure to drain the
queue cq before freeing the rdma resources because we might
still race with the completion path. Have start_queue() error
path safely stop the queue.
--
[30371.808111] nvme nvme1: Failed reconnect attempt 11
[30371.808113] nvme nvme1: Reconnecting in 10 seconds...
[...]
[30382.069315] nvme nvme1: creating 4 I/O queues.
[30382.257058] nvme nvme1: Connect Invalid SQE Parameter, qid 4
[30382.257061] nvme nvme1: failed to connect queue: 4 ret=386
[30382.305001] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[30382.305022] IP: qedr_poll_cq+0x8a3/0x1170 [qedr]
[30382.305028] PGD 0 P4D 0
[30382.305037] Oops: 0000 [#1] SMP PTI
[...]
[30382.305153] Call Trace:
[30382.305166] ? __switch_to_asm+0x34/0x70
[30382.305187] __ib_process_cq+0x56/0xd0 [ib_core]
[30382.305201] ib_poll_handler+0x26/0x70 [ib_core]
[30382.305213] irq_poll_softirq+0x88/0x110
[30382.305223] ? sort_range+0x20/0x20
[30382.305232] __do_softirq+0xde/0x2c6
[30382.305241] ? sort_range+0x20/0x20
[30382.305249] run_ksoftirqd+0x1c/0x60
[30382.305258] smpboot_thread_fn+0xef/0x160
[30382.305265] kthread+0x113/0x130
[30382.305273] ? kthread_create_worker_on_cpu+0x50/0x50
[30382.305281] ret_from_fork+0x35/0x40
--
Reported-by: Nicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b9156daeb1601d69007b7e50efcf89d69d72ec1d ]
When the user issues a command with side effects, we will end up freezing
the namespace request queue when updating disk info (and the same for
the corresponding mpath disk node).
However, we are not freezing the mpath node request queue,
which means that mpath I/O can still come in and block on blk_queue_enter
(called from nvme_ns_head_make_request -> direct_make_request).
This is a deadlock, because blk_queue_enter will block until the inner
namespace request queue is unfroze, but that process is blocked because
the namespace revalidation is trying to update the mpath disk info
and freeze its request queue (which will never complete because
of the I/O that is blocked on blk_queue_enter).
Fix this by freezing all the subsystem nsheads request queues before
executing the passthru command. Given that these commands are infrequent
we should not worry about this temporary I/O freeze to keep things sane.
Here is the matching hang traces:
--
[ 374.465002] INFO: task systemd-udevd:17994 blocked for more than 122 seconds.
[ 374.472975] Not tainted 5.2.0-rc3-mpdebug+ #42
[ 374.478522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 374.487274] systemd-udevd D 0 17994 1 0x00000000
[ 374.493407] Call Trace:
[ 374.496145] __schedule+0x2ef/0x620
[ 374.500047] schedule+0x38/0xa0
[ 374.503569] blk_queue_enter+0x139/0x220
[ 374.507959] ? remove_wait_queue+0x60/0x60
[ 374.512540] direct_make_request+0x60/0x130
[ 374.517219] nvme_ns_head_make_request+0x11d/0x420 [nvme_core]
[ 374.523740] ? generic_make_request_checks+0x307/0x6f0
[ 374.529484] generic_make_request+0x10d/0x2e0
[ 374.534356] submit_bio+0x75/0x140
[ 374.538163] ? guard_bio_eod+0x32/0xe0
[ 374.542361] submit_bh_wbc+0x171/0x1b0
[ 374.546553] block_read_full_page+0x1ed/0x330
[ 374.551426] ? check_disk_change+0x70/0x70
[ 374.556008] ? scan_shadow_nodes+0x30/0x30
[ 374.560588] blkdev_readpage+0x18/0x20
[ 374.564783] do_read_cache_page+0x301/0x860
[ 374.569463] ? blkdev_writepages+0x10/0x10
[ 374.574037] ? prep_new_page+0x88/0x130
[ 374.578329] ? get_page_from_freelist+0xa2f/0x1280
[ 374.583688] ? __alloc_pages_nodemask+0x179/0x320
[ 374.588947] read_cache_page+0x12/0x20
[ 374.593142] read_dev_sector+0x2d/0xd0
[ 374.597337] read_lba+0x104/0x1f0
[ 374.601046] find_valid_gpt+0xfa/0x720
[ 374.605243] ? string_nocheck+0x58/0x70
[ 374.609534] ? find_valid_gpt+0x720/0x720
[ 374.614016] efi_partition+0x89/0x430
[ 374.618113] ? string+0x48/0x60
[ 374.621632] ? snprintf+0x49/0x70
[ 374.625339] ? find_valid_gpt+0x720/0x720
[ 374.629828] check_partition+0x116/0x210
[ 374.634214] rescan_partitions+0xb6/0x360
[ 374.638699] __blkdev_reread_part+0x64/0x70
[ 374.643377] blkdev_reread_part+0x23/0x40
[ 374.647860] blkdev_ioctl+0x48c/0x990
[ 374.651956] block_ioctl+0x41/0x50
[ 374.655766] do_vfs_ioctl+0xa7/0x600
[ 374.659766] ? locks_lock_inode_wait+0xb1/0x150
[ 374.664832] ksys_ioctl+0x67/0x90
[ 374.668539] __x64_sys_ioctl+0x1a/0x20
[ 374.672732] do_syscall_64+0x5a/0x1c0
[ 374.676828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 374.738474] INFO: task nvmeadm:49141 blocked for more than 123 seconds.
[ 374.745871] Not tainted 5.2.0-rc3-mpdebug+ #42
[ 374.751419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 374.760170] nvmeadm D 0 49141 36333 0x00004080
[ 374.766301] Call Trace:
[ 374.769038] __schedule+0x2ef/0x620
[ 374.772939] schedule+0x38/0xa0
[ 374.776452] blk_mq_freeze_queue_wait+0x59/0x100
[ 374.781614] ? remove_wait_queue+0x60/0x60
[ 374.786192] blk_mq_freeze_queue+0x1a/0x20
[ 374.790773] nvme_update_disk_info.isra.57+0x5f/0x350 [nvme_core]
[ 374.797582] ? nvme_identify_ns.isra.50+0x71/0xc0 [nvme_core]
[ 374.804006] __nvme_revalidate_disk+0xe5/0x110 [nvme_core]
[ 374.810139] nvme_revalidate_disk+0xa6/0x120 [nvme_core]
[ 374.816078] ? nvme_submit_user_cmd+0x11e/0x320 [nvme_core]
[ 374.822299] nvme_user_cmd+0x264/0x370 [nvme_core]
[ 374.827661] nvme_dev_ioctl+0x112/0x1d0 [nvme_core]
[ 374.833114] do_vfs_ioctl+0xa7/0x600
[ 374.837117] ? __audit_syscall_entry+0xdd/0x130
[ 374.842184] ksys_ioctl+0x67/0x90
[ 374.845891] __x64_sys_ioctl+0x1a/0x20
[ 374.850082] do_syscall_64+0x5a/0x1c0
[ 374.854178] entry_SYSCALL_64_after_hwframe+0x44/0xa9
--
Reported-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Tested-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8c36e66fb407ce076535a7db98ab9f6d720b866a ]
In the error path for nvme_init_subsystem(), nvme_put_subsystem()
will call device_put(), but it will get called again after the
mutex_unlock().
The device_put() only needs to be called if device_add() fails.
This bug caused a KASAN use-after-free error when adding and
removing subsytems in a loop:
BUG: KASAN: use-after-free in device_del+0x8d9/0x9a0
Read of size 8 at addr ffff8883cdaf7120 by task multipathd/329
CPU: 0 PID: 329 Comm: multipathd Not tainted 5.2.0-rc6-vmlocalyes-00019-g70a2b39005fd-dirty #314
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0x7b/0xb5
print_address_description+0x6f/0x280
? device_del+0x8d9/0x9a0
__kasan_report+0x148/0x199
? device_del+0x8d9/0x9a0
? class_release+0x100/0x130
? device_del+0x8d9/0x9a0
kasan_report+0x12/0x20
__asan_report_load8_noabort+0x14/0x20
device_del+0x8d9/0x9a0
? device_platform_notify+0x70/0x70
nvme_destroy_subsystem+0xf9/0x150
nvme_free_ctrl+0x280/0x3a0
device_release+0x72/0x1d0
kobject_put+0x144/0x410
put_device+0x13/0x20
nvme_free_ns+0xc4/0x100
nvme_release+0xb3/0xe0
__blkdev_put+0x549/0x6e0
? kasan_check_write+0x14/0x20
? bd_set_size+0xb0/0xb0
? kasan_check_write+0x14/0x20
? mutex_lock+0x8f/0xe0
? __mutex_lock_slowpath+0x20/0x20
? locks_remove_file+0x239/0x370
blkdev_put+0x72/0x2c0
blkdev_close+0x8d/0xd0
__fput+0x256/0x770
? _raw_read_lock_irq+0x40/0x40
____fput+0xe/0x10
task_work_run+0x10c/0x180
? filp_close+0xf7/0x140
exit_to_usermode_loop+0x151/0x170
do_syscall_64+0x240/0x2e0
? prepare_exit_to_usermode+0xd5/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5a79af05d7
Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 06 fc ff ff 8b 44 24
RSP: 002b:00007f5a7799c810 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007f5a79af05d7
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008
RBP: 00007f5a58000f98 R08: 0000000000000002 R09: 00007f5a7935ee80
R10: 0000000000000000 R11: 0000000000000293 R12: 000055e432447240
R13: 0000000000000000 R14: 0000000000000001 R15: 000055e4324a9cf0
Allocated by task 1236:
save_stack+0x21/0x80
__kasan_kmalloc.constprop.6+0xab/0xe0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x102/0x210
nvme_init_identify+0x13c3/0x3820
nvme_loop_configure_admin_queue+0x4fa/0x5e0
nvme_loop_create_ctrl+0x469/0xf40
nvmf_dev_write+0x19a3/0x21ab
__vfs_write+0x66/0x120
vfs_write+0x154/0x490
ksys_write+0x104/0x240
__x64_sys_write+0x73/0xb0
do_syscall_64+0xa5/0x2e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 329:
save_stack+0x21/0x80
__kasan_slab_free+0x129/0x190
kasan_slab_free+0xe/0x10
kfree+0xa7/0x200
nvme_release_subsystem+0x49/0x60
device_release+0x72/0x1d0
kobject_put+0x144/0x410
put_device+0x13/0x20
klist_class_dev_put+0x31/0x40
klist_put+0x8f/0xf0
klist_del+0xe/0x10
device_del+0x3a7/0x9a0
nvme_destroy_subsystem+0xf9/0x150
nvme_free_ctrl+0x280/0x3a0
device_release+0x72/0x1d0
kobject_put+0x144/0x410
put_device+0x13/0x20
nvme_free_ns+0xc4/0x100
nvme_release+0xb3/0xe0
__blkdev_put+0x549/0x6e0
blkdev_put+0x72/0x2c0
blkdev_close+0x8d/0xd0
__fput+0x256/0x770
____fput+0xe/0x10
task_work_run+0x10c/0x180
exit_to_usermode_loop+0x151/0x170
do_syscall_64+0x240/0x2e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 32fd90c40768 ("nvme: change locking for the per-subsystem controller list")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cfc1a1af56200362d1508b82b9a3cc3acb2eae0c ]
Presently, nvmet_file_flush() always returns a call to
errno_to_nvme_status() but that helper doesn't take into account the
case when errno=0. So nvmet_file_flush() always returns an error code.
All other callers of errno_to_nvme_status() check for success before
calling it.
To fix this, ensure errno_to_nvme_status() returns success if the
errno is zero. This should prevent future mistakes like this from
happening.
Fixes: c6aa3542e010 ("nvmet: add error log support for file backend")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 86b9a63e595ff03f9d0a7b92b6acc231fecefc29 ]
After calling nvme_loop_delete_ctrl(), the controllers will not
yet be deleted because nvme_delete_ctrl() only schedules work
to do the delete.
This means a race can occur if a port is removed but there
are still active controllers trying to access that memory.
To fix this, flush the nvme_delete_wq before returning from
nvme_loop_remove_port() so that any controllers that might
be in the process of being deleted won't access a freed port.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3aed86731ee2b23e4dc4d2c6d943d33992cd551b ]
When a port is removed through configfs, any connected controllers
are still active and can still send commands. This causes a
use-after-free bug which is detected by KASAN for any admin command
that dereferences req->port (like in nvmet_execute_identify_ctrl).
To fix this, disconnect all active controllers when a subsystem is
removed from a port. This ensures there are no active controllers
when the port is eventually removed.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fab7772bfbcfe8fb8e3e352a6a8fcaf044cded17 ]
When CONFIG_NVME_MULTIPATH is set, only the hidden gendisk associated
with the per-controller ns is run through revalidate_disk when a
rescan is triggered, while the visible blockdev never gets its size
(bdev->bd_inode->i_size) updated to reflect any capacity changes that
may have occurred.
This prevents online resizing of nvme block devices and in extension of
any filesystems atop that will are unable to expand while mounted, as
userspace relies on the blockdev size for obtaining the disk capacity
(via BLKGETSIZE/64 ioctls).
Fix this by explicitly revalidating the actual namespace gendisk in
addition to the per-controller gendisk, when multipath is enabled.
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e654dfd38c1ecf58d8d019f3c053189413484a5b ]
When freeing the subsystem after finding another match with
__nvme_find_get_subsystem(), use put_device() instead of
__nvme_release_subsystem() which calls kfree() directly.
Per the documentation, put_device() should always be used
after device_initialization() is called. Otherwise, leaks
like the one below which was detected by kmemleak may occur.
Once the call of __nvme_release_subsystem() is removed it no
longer makes sense to keep the helper, so fold it back
into nvme_release_subsystem().
unreferenced object 0xffff8883d12bfbc0 (size 16):
comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s)
hex dump (first 16 bytes):
6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff nvme-subsys2....
backtrace:
[<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
[<0000000081169e5f>] kvasprintf+0xad/0x130
[<0000000025626f25>] kvasprintf_const+0x47/0x120
[<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
[<000000004881f8b3>] dev_set_name+0x98/0xc0
[<000000007124dae3>] nvme_init_identify+0x1995/0x38e0
[<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
[<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
[<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
[<000000002259b3d5>] __vfs_write+0x66/0x120
[<000000002f6df81e>] vfs_write+0x154/0x490
[<000000007e8cfc19>] ksys_write+0x10a/0x240
[<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
[<00000000fee6d692>] do_syscall_64+0xaa/0x470
[<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: ab9e00cc72fa ("nvme: track subsystems")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 08b903b5fd0c49e5f224a9bf085b6329ec3c55c0 ]
The ADATA SX6000LNP NVMe SSDs have the same subnqn and, due to this, a
system with more than one of these SSDs will only have one usable.
[ 0.942706] nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).
[ 0.943017] nvme nvme1: Removing after probe failure status: -22
02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
71:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
There are no firmware updates available from the vendor, unfortunately.
Applying the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk for these SSDs resolves
the issue, and they all work after this patch:
/dev/nvme0n1 2J1120050420 ADATA SX6000LNP [...]
/dev/nvme1n1 2J1120050540 ADATA SX6000LNP [...]
Signed-off-by: Misha Nasledov <misha@nasledov.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 66b20ac0a1a10769d059d6903202f53494e3d902 upstream.
Fix a crash with multipath activated. It happends when ANA log
page is larger than MDTS and because of that ANA is disabled.
The driver then tries to access unallocated buffer when connecting
to a nvme target. The signature is as follows:
[ 300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192).
[ 300.435387] nvme nvme0: disabling ANA support.
[ 300.437835] nvme nvme0: creating 4 I/O queues.
[ 300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009
[ 300.464609] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 300.466342] #PF error: [normal kernel read fault]
[ 300.467385] PGD 0 P4D 0
[ 300.467987] Oops: 0000 [#1] SMP PTI
[ 300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4
[ 300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[ 300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[ 300.477374] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
[ 300.478334] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
[ 300.479784] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
[ 300.481488] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
[ 300.483203] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
[ 300.484928] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
[ 300.486626] FS: 0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
[ 300.488538] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 300.489907] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
[ 300.491612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 300.493303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 300.494991] Call Trace:
[ 300.495645] nvme_mpath_add_disk+0x5c/0xb0 [nvme_core]
[ 300.496880] nvme_validate_ns+0x2ef/0x550 [nvme_core]
[ 300.498105] ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core]
[ 300.499539] nvme_scan_work+0x2b4/0x370 [nvme_core]
[ 300.500717] ? __switch_to_asm+0x35/0x70
[ 300.501663] process_one_work+0x171/0x380
[ 300.502340] worker_thread+0x49/0x3f0
[ 300.503079] kthread+0xf8/0x130
[ 300.503795] ? max_active_store+0x80/0x80
[ 300.504690] ? kthread_bind+0x10/0x10
[ 300.505502] ret_from_fork+0x35/0x40
[ 300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix serio_raw libata virtio_pci virtio_ring virtio
[ 300.514984] CR2: 0000000000000008
[ 300.515569] ---[ end trace faa2eefad7e7f218 ]---
[ 300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[ 300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[ 300.520353] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
[ 300.521229] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
[ 300.522399] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
[ 300.523560] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
[ 300.524734] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
[ 300.525915] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
[ 300.527084] FS: 0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
[ 300.528396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 300.529440] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
[ 300.530739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 300.531989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 300.533264] Kernel panic - not syncing: Fatal exception
[ 300.534338] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]---
Condition check refactoring from Christoph Hellwig.
Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Tested-by: Jean-Baptiste Riaux <jbriaux@kalray.eu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7d30c81b80ea9b0812d27030a46a5bf4c4e328f5 ]
git://git.infradead.org/nvme.git nvme-5.3 branch now causes the
following NULL deref oops. Check the ctrl->opts first before the deref.
[ 16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056
[ 16.338551] #PF: supervisor read access in kernel mode
[ 16.338551] #PF: error_code(0x0000) - not-present page
[ 16.338551] PGD 0 P4D 0
[ 16.338551] Oops: 0000 [#1] SMP PTI
[ 16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1
[ 16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[ 16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
[ 16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
[ 16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
[ 16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
[ 16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
[ 16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
[ 16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
[ 16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
[ 16.338551] FS: 0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
[ 16.338551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
[ 16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.338551] Call Trace:
[ 16.338551] nvme_scan_work+0x2c0/0x340 [nvme_core]
[ 16.338551] ? __switch_to_asm+0x40/0x70
[ 16.338551] ? _raw_spin_unlock_irqrestore+0x18/0x30
[ 16.338551] ? try_to_wake_up+0x408/0x450
[ 16.338551] process_one_work+0x20b/0x3e0
[ 16.338551] worker_thread+0x1f9/0x3d0
[ 16.338551] ? cancel_delayed_work+0xa0/0xa0
[ 16.338551] kthread+0x117/0x120
[ 16.338551] ? kthread_stop+0xf0/0xf0
[ 16.338551] ret_from_fork+0x3a/0x50
[ 16.338551] Modules linked in: nvme nvme_core
[ 16.338551] CR2: 0000000000000056
[ 16.338551] ---[ end trace b9bf761a93e62d84 ]---
[ 16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
[ 16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
[ 16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
[ 16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
[ 16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
[ 16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
[ 16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
[ 16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
[ 16.338551] FS: 0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
[ 16.338551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
[ 16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 958f2a0f8121 ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 958f2a0f8121ae36a5cbff383ab94fadf1fba5eb ]
There was a few false alarms sighted on target side about wrong data
digest while performing high throughput load to XFS filesystem shared
through NVMoF TCP.
This flag tells the rest of the kernel to ensure that the data buffer
does not change while the write is in flight. It incurs a performance
penalty, so only enable it when it is actually needed, i.e. when we are
calculating data digests.
Although even with this change in place, ext2 users can steel experience
false positives, as ext2 is not respecting this flag. This may be apply
to vfat as well.
Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Signed-off-by: Mike Playle <mplayle@solarflare.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 37c15219599f7a4baa73f6e3432afc69ba7cc530 ]
According to commit a10674bf2406 ("tcp: detecting the misuse of
.sendpage for Slab objects") and previous discussion, tcp_sendpage
should not be used for pages that is managed by SLAB, as SLAB is not
taking page reference counters into consideration.
Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7637de311bd2124b298a072852448b940d8a34b9 ]
When running a NVMe device that is attached to a addressing
challenged PCIe root port that requires bounce buffering, our
request sizes can easily overflow the swiotlb bounce buffer
size. Limit the maximum I/O size to the limit exposed by
the DMA mapping subsystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Atish Patra <Atish.Patra@wdc.com>
Tested-by: Atish Patra <Atish.Patra@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bfac8e9f55cf62a000b643a0081488badbe92d96 ]
Modify nvme_alloc_sq_cmds() to call pci_free_p2pmem() to free the memory
it allocated using pci_alloc_p2pmem() in case pci_p2pmem_virt_to_bus()
returns null.
Makes sure not to call pci_free_p2pmem() if pci_alloc_p2pmem() returned
NULL, which can happen if CONFIG_PCI_P2PDMA is not configured.
The current implementation is not expected to leak since
pci_p2pmem_virt_to_bus() is expected to fail only if pci_alloc_p2pmem()
returns null. However, checking the return value of pci_alloc_p2pmem()
is more explicit.
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dad77d63903e91a2e97a0c984cabe5d36e91ba60 ]
If the "irq_queues" are greater than num_possible_cpus(),
nvme_calc_irq_sets() can have irq set_size for HCTX_TYPE_DEFAULT greater
than it can be afforded.
2039 affd->set_size[HCTX_TYPE_DEFAULT] = nrirqs - nr_read_queues;
It might cause a WARN() from the irq_build_affinity_masks() like [1]:
220 if (nr_present < numvecs)
221 WARN_ON(nr_present + nr_others < numvecs);
This patch prevents it from the WARN() by adjusting the max_vector value
from the nvme_setup_irqs().
[1] WARN messages when modprobe nvme write_queues=32 poll_queues=0:
root@target:~/nvme# nproc
8
root@target:~/nvme# modprobe nvme write_queues=32 poll_queues=0
[ 17.925326] nvme nvme0: pci function 0000:00:04.0
[ 17.940601] WARNING: CPU: 3 PID: 1030 at kernel/irq/affinity.c:221 irq_create_affinity_masks+0x222/0x330
[ 17.940602] Modules linked in: nvme nvme_core [last unloaded: nvme]
[ 17.940605] CPU: 3 PID: 1030 Comm: kworker/u17:4 Tainted: G W 5.1.0+ #156
[ 17.940605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 17.940608] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 17.940609] RIP: 0010:irq_create_affinity_masks+0x222/0x330
[ 17.940611] Code: 4c 8d 4c 24 28 4c 8d 44 24 30 e8 c9 fa ff ff 89 44 24 18 e8 c0 38 fa ff 8b 44 24 18 44 8b 54 24 1c 5a 44 01 d0 41 39 c4 76 02 <0f> 0b 48 89 df 44 01 e5 e8 f1 ce 10 00 48 8b 34 24 44 89 f0 44 01
[ 17.940611] RSP: 0018:ffffc90002277c50 EFLAGS: 00010216
[ 17.940612] RAX: 0000000000000008 RBX: ffff88807ca48860 RCX: 0000000000000000
[ 17.940612] RDX: ffff88807bc03800 RSI: 0000000000000020 RDI: 0000000000000000
[ 17.940613] RBP: 0000000000000001 R08: ffffc90002277c78 R09: ffffc90002277c70
[ 17.940613] R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000020
[ 17.940614] R13: 0000000000025d08 R14: 0000000000000001 R15: ffff88807bc03800
[ 17.940614] FS: 0000000000000000(0000) GS:ffff88807db80000(0000) knlGS:0000000000000000
[ 17.940616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 17.940617] CR2: 00005635e583f790 CR3: 000000000240a000 CR4: 00000000000006e0
[ 17.940617] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 17.940618] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 17.940618] Call Trace:
[ 17.940622] __pci_enable_msix_range+0x215/0x540
[ 17.940623] ? kernfs_put+0x117/0x160
[ 17.940625] pci_alloc_irq_vectors_affinity+0x74/0x110
[ 17.940626] nvme_reset_work+0xc30/0x1397 [nvme]
[ 17.940628] ? __switch_to_asm+0x34/0x70
[ 17.940628] ? __switch_to_asm+0x40/0x70
[ 17.940629] ? __switch_to_asm+0x34/0x70
[ 17.940630] ? __switch_to_asm+0x40/0x70
[ 17.940630] ? __switch_to_asm+0x34/0x70
[ 17.940631] ? __switch_to_asm+0x40/0x70
[ 17.940632] ? nvme_irq_check+0x30/0x30 [nvme]
[ 17.940633] process_one_work+0x20b/0x3e0
[ 17.940634] worker_thread+0x1f9/0x3d0
[ 17.940635] ? cancel_delayed_work+0xa0/0xa0
[ 17.940636] kthread+0x117/0x120
[ 17.940637] ? kthread_stop+0xf0/0xf0
[ 17.940638] ret_from_fork+0x3a/0x50
[ 17.940639] ---[ end trace aca8a131361cd42a ]---
[ 17.942124] nvme nvme0: 7/1/0 default/read/poll queues
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e71afda49335620e3d9adf56015676db33a3bd86 ]
This patch removes the confusing assignment of the variable result at
the time of declaration and sets the value in error cases next to the
places where the actual error is happening.
Here we also set the result value to -ENODEV when we fail at the final
ctrl state transition in nvme_reset_work(). Without this assignment
result will hold 0 from nvme_setup_io_queue() and on failure 0 will be
passed to he nvme_remove_dead_ctrl() from final state transition.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cee6c269b016ba89c62e34d6bccb103ee2c7de4f ]
If the state change to NVME_CTRL_CONNECTING fails, the dmesg is going to
be like:
[ 293.689160] nvme nvme0: failed to mark controller CONNECTING
[ 293.689160] nvme nvme0: Removing after probe failure status: 0
Even it prints the first line to indicate the situation, the second line
is not proper because the status is 0 which means normally success of
the previous operation.
This patch makes it indicate the proper error value when it fails.
[ 25.932367] nvme nvme0: failed to mark controller CONNECTING
[ 25.932369] nvme nvme0: Removing after probe failure status: -16
This situation is able to be easily reproduced by:
root@target:~# rmmod nvme && modprobe nvme && rmmod nvme
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2181e455612a8db2761eabbf126640552a451e96 ]
When a shared namespace is removed, we call blk_cleanup_queue()
when the device can still be accessed as the current path and this can
result in submission to a dying queue. Hence, direct_make_request()
called by our mpath device may fail (propagating the failure to userspace).
Instead, we want to failover this I/O to a different path if one exists.
Thus, before we cleanup the request queue, we make sure that the device is
cleared from the current path nor it can be selected again as such.
Fix this by:
- clear the ns from the head->list and synchronize rcu to make sure there is
no concurrent path search that restores it as the current path
- clear the mpath current path in order to trigger a subsequent path search
and sync srcu to wait for any ongoing request submissions
- safely continue to namespace removal and blk_cleanup_queue
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Pull NVMe fixes from Sagi.
* 'nvme-5.2-rc-next' of git://git.infradead.org/nvme:
nvme-rdma: use dynamic dma mapping per command
nvme: Fix u32 overflow in the number of namespace list calculation
nvmet: fix data_len to 0 for bdev-backed write_zeroes
nvme-tcp: fix queue mapping when queue count is limited
nvme-rdma: fix queue mapping when queue count is limited
|
|
Commit 87fd125344d6 ("nvme-rdma: remove redundant reference between
ib_device and tagset") caused a kernel panic when disconnecting from an
inaccessible controller (disconnect during re-connection).
--
nvme nvme0: Removing ctrl: NQN "testnqn1"
nvme_rdma: nvme_rdma_exit_request: hctx 0 queue_idx 1
BUG: unable to handle kernel paging request at 0000000080000228
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
...
Call Trace:
blk_mq_exit_hctx+0x5c/0xf0
blk_mq_exit_queue+0xd4/0x100
blk_cleanup_queue+0x9a/0xc0
nvme_rdma_destroy_io_queues+0x52/0x60 [nvme_rdma]
nvme_rdma_shutdown_ctrl+0x3e/0x80 [nvme_rdma]
nvme_do_delete_ctrl+0x53/0x80 [nvme_core]
nvme_sysfs_delete+0x45/0x60 [nvme_core]
kernfs_fop_write+0x105/0x180
vfs_write+0xad/0x1a0
ksys_write+0x5a/0xd0
do_syscall_64+0x55/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa215417154
--
The reason for this crash is accessing an already freed ib_device for
performing dma_unmap during exit_request commands. The root cause for
that is that during re-connection all the queues are destroyed and
re-created (and the ib_device is reference counted by the queues and
freed as well) but the tagset stays alive and all the DMA mappings (that
we perform in init_request) kept in the request context. The original
commit fixed a different bug that was introduced during bonding (aka nic
teaming) tests that for some scenarios change the underlying ib_device
and caused memory leakage and possible segmentation fault. This commit
is a complementary commit that also changes the wrong DMA mappings that
were saved in the request context and making the request sqe dma
mappings dynamic with the command lifetime (i.e. mapped in .queue_rq and
unmapped in .complete). It also fixes the above crash of accessing freed
ib_device during destruction of the tagset.
Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reported-by: Jim Harris <james.r.harris@intel.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
The Number of Namespaces (nn) field in the identify controller data structure is
defined as u32 and the maximum allowed value in NVMe specification is
0xFFFFFFFEUL. This change fixes the possible overflow of the DIV_ROUND_UP()
operation used in nvme_scan_ns_list() by casting the nn to u64.
Signed-off-by: Jaesoo Lee <jalee@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
NVMe uses PRPs (or optionally unlimited SGLs) for data transfers and
has no specific limit for a single DMA segement. Limiting the size
will cause problems because the block layer assumes PRP-ish devices
using a virt boundary mask don't have a segment limit. And while this
is true, we also really need to tell the DMA mapping layer about it,
otherwise dma-debug will trip over it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The WRITE ZEROES command has no data transfer so that we need to
initialize the struct (nvmet_req *req)->data_len to 0x0. While
(nvmet_req *req)->transfer_len is initialized in nvmet_req_init(),
data_len will be initialized by nowhere which might cause the failure
with status code NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR randomly. It's
because nvmet_req_execute() checks like:
if (unlikely(req->data_len != req->transfer_len)) {
req->error_loc = offsetof(struct nvme_common_command, dptr);
nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR);
} else
req->execute(req);
This patch fixes req->data_len not to be a randomly assigned by
initializing it to 0x0 when preparing the command in
nvmet_bdev_parse_io_cmd().
nvmet_file_parse_io_cmd() which is for file-backed I/O has already
initialized the data_len field to 0x0, though.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
When the controller supports less queues than requested, we
should make sure that queue mapping does the right thing and
not assume that all queues are available. This fixes a crash
when the controller supports less queues than requested.
The rules are:
1. if no write queues are requested, we assign the available queues
to the default queue map. The default and read queue maps share the
existing queues.
2. if write queues are requested:
- first make sure that read queue map gets the requested
nr_io_queues count
- then grant the default queue map the minimum between the requested
nr_write_queues and the remaining queues. If there are no available
queues to dedicate to the default queue map, fallback to (1) and
share all the queues in the existing queue map.
Also, provide a log indication on how we constructed the different
queue maps.
Reported-by: Harris, James R <james.r.harris@intel.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
Cc: <stable@vger.kernel.org> # v5.0+
Suggested-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
When the controller supports less queues than requested, we
should make sure that queue mapping does the right thing and
not assume that all queues are available. This fixes a crash
when the controller supports less queues than requested.
The rules are:
1. if no write/poll queues are requested, we assign the available queues
to the default queue map. The default and read queue maps share the
existing queues.
2. if write queues are requested:
- first make sure that read queue map gets the requested
nr_io_queues count
- then grant the default queue map the minimum between the requested
nr_write_queues and the remaining queues. If there are no available
queues to dedicate to the default queue map, fallback to (1) and
share all the queues in the existing queue map.
3. if poll queues are requested:
- map the remaining queues to the poll queue map.
Also, provide a log indication on how we constructed the different
queue maps.
Reported-by: Harris, James R <james.r.harris@intel.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Jim Harris <james.r.harris@intel.com>
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith, with fixes from a few folks.
- bio and sbitmap before atomic barrier fixes (Andrea)
- Hang fix for blk-mq freeze and unfreeze (Bob)
- Single segment count regression fix (Christoph)
- AoE now has a new maintainer
- tools/io_uring/ Makefile fix, and sync with liburing (me)
* tag 'for-linus-20190524' of git://git.kernel.dk/linux-block: (23 commits)
tools/io_uring: sync with liburing
tools/io_uring: fix Makefile for pthread library link
blk-mq: fix hang caused by freeze/unfreeze sequence
block: remove the bi_seg_{front,back}_size fields in struct bio
block: remove the segment size check in bio_will_gap
block: force an unlimited segment size on queues with a virt boundary
block: don't decrement nr_phys_segments for physically contigous segments
sbitmap: fix improper use of smp_mb__before_atomic()
bio: fix improper use of smp_mb__before_atomic()
aoe: list new maintainer for aoe driver
nvme-pci: use blk-mq mapping for unmanaged irqs
nvme: update MAINTAINERS
nvme: copy MTFA field from identify controller
nvme: fix memory leak for power latency tolerance
nvme: release namespace SRCU protection before performing controller ioctls
nvme: merge nvme_ns_ioctl into nvme_ioctl
nvme: remove the ifdef around nvme_nvm_ioctl
nvme: fix srcu locking on error return in nvme_get_ns_from_disk
nvme: Fix known effects
nvme-pci: Sync queues on reset
...
|
|
If a device is providing a single IRQ vector, the IO queue will share
that vector with the admin queue. This is an unmanaged vector, so does
not have a valid PCI IRQ affinity. Avoid trying to extract a managed
affinity in this case and let blk-mq set up the cpu:queue mapping instead.
Otherwise we'd hit the following warning when the device is using MSI:
WARNING: CPU: 4 PID: 7 at drivers/pci/msi.c:1272 pci_irq_get_affinity+0x66/0x80
Modules linked in: nvme nvme_core serio_raw
CPU: 4 PID: 7 Comm: kworker/u16:0 Tainted: G W 5.2.0-rc1+ #494
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
RIP: 0010:pci_irq_get_affinity+0x66/0x80
Code: 0b 31 c0 c3 83 e2 10 48 c7 c0 b0 83 35 91 74 2a 48 8b 87 d8 03 00 00 48 85 c0 74 0e 48 8b 50 30 48 85 d2 74 05 39 70 14 77 05 <0f> 0b 31 c0 c3 48 63 f6 48 8d 04 76 48 8d 04 c2 f3 c3 48 8b 40 30
RSP: 0000:ffffb5abc01d3cc8 EFLAGS: 00010246
RAX: ffff9536786a39c0 RBX: 0000000000000000 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9536781ed000
RBP: ffff95367346a008 R08: ffff95367d43f080 R09: ffff953678c07800
R10: ffff953678164800 R11: 0000000000000000 R12: 0000000000000000
R13: ffff9536781ed000 R14: 00000000ffffffff R15: ffff95367346a008
FS: 0000000000000000(0000) GS:ffff95367d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdf814a3ff0 CR3: 000000001a20f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
blk_mq_pci_map_queues+0x37/0xd0
nvme_pci_map_queues+0x80/0xb0 [nvme]
blk_mq_alloc_tag_set+0x133/0x2f0
nvme_reset_work+0x105d/0x1590 [nvme]
process_one_work+0x291/0x530
worker_thread+0x218/0x3d0
? process_one_work+0x530/0x530
kthread+0x111/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
---[ end trace 74587339d93c83c0 ]---
Fixes: 22b5560195bd6 ("nvme-pci: Separate IO and admin queue IRQ vectors")
Reported-by: Iván Chavero <ichavero@chavero.com.mx>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
We use the controller's reported maximum firmware activation time as our
timeout before resetting a controller for a failed activation notice,
but this value was never being read so we could only use the default
timeout. Copy the Identify Controller MTFA field to the corresponding
nvme_ctrl's mtfa field.
Fixes: b6dccf7fae433 (“nvme: add support for FW activation without reset”).
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Signed-off-by: Laine Walker-Avina <laine.walker-avina@intel.com>
[changelog, fix endian]
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Unconditionally hide device pm latency tolerance when uninitializing
the controller to ensure all qos resources are released so that we're
not leaking this memory. This is safe to call if none were allocated in
the first place, or were previously freed.
Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Suggested-by: Keith Busch <keith.busch@intel.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
[changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
|
Holding the SRCU critical section protecting the namespace list can
cause deadlocks when using the per-namespace admin passthrough ioctl to
delete as namespace. Release it earlier when performing per-controller
ioctls to avoid that.
Reported-by: Kenneth Heitke <kenneth.heitke@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Merge the two functions to make future changes a little easier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
|
We already have a proper stub if lightnvm is not enabled, so don't bother
with the ifdef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|