summaryrefslogtreecommitdiffstats
path: root/drivers/scsi
AgeCommit message (Collapse)Author
2022-03-23scsi: mpt3sas: Page fault in reply q processingMatt Lupfer
commit 69ad4ef868c1fc7609daa235dfa46d28ba7a3ba3 upstream. A page fault was encountered in mpt3sas on a LUN reset error path: [ 145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0) [ 145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2) [ 145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2) [ 145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00 [ 145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0) [ 145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0) [ 149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002) [ 149.875202] BUG: unable to handle page fault for address: 00000007fffc445d [ 149.885617] #PF: supervisor read access in kernel mode [ 149.894346] #PF: error_code(0x0000) - not-present page [ 149.903123] PGD 0 P4D 0 [ 149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S O 5.10.89-altav-1 #1 [ 149.934327] Hardware name: DDN 200NVX2 /200NVX2-MB , BIOS ATHG2.2.02.01 09/10/2021 [ 149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas] [ 149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee [ 149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246 [ 150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071 [ 150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8 [ 150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff [ 150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000 [ 150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80 [ 150.054963] FS: 0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000 [ 150.066715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0 [ 150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 150.108323] PKRU: 55555554 [ 150.114690] Call Trace: [ 150.120497] ? printk+0x48/0x4a [ 150.127049] mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas] [ 150.136453] mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas] [ 150.145759] scsih_dev_reset+0xea/0x300 [mpt3sas] [ 150.153891] scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod] [ 150.162206] ? __scsi_host_match+0x20/0x20 [scsi_mod] [ 150.170406] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] [ 150.178925] ? blk_mq_tagset_busy_iter+0x45/0x60 [ 150.186638] ? scsi_try_target_reset+0x90/0x90 [scsi_mod] [ 150.195087] scsi_error_handler+0x3a5/0x4a0 [scsi_mod] [ 150.203206] ? __schedule+0x1e9/0x610 [ 150.209783] ? scsi_eh_get_sense+0x210/0x210 [scsi_mod] [ 150.217924] kthread+0x12e/0x150 [ 150.224041] ? kthread_worker_fn+0x130/0x130 [ 150.231206] ret_from_fork+0x1f/0x30 This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q pointer outside of the list_for_each_entry() loop. At the end of the full list traversal the pointer is invalid. Move the _base_process_reply_queue() call inside of the loop. Link: https://lore.kernel.org/r/d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset") Cc: stable@vger.kernel.org Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Matt Lupfer <mlupfer@ddn.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16block: drop unused includes in <linux/genhd.h>Christoph Hellwig
commit b81e0c2372e65e5627864ba034433b64b2fc73f5 upstream. Drop various include not actually used in genhd.h itself, and move the remaning includes closer together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>a Reported-by: "H. Nikolaus Schaller" <hns@goldelico.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: "Maciej W. Rozycki" <macro@orcam.me.uk> [ resolves MIPS build failure by luck, root cause needs to be fixed in Linus's tree properly, but this is needed for now to fix the build - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-11xen/scsifront: don't use gnttab_query_foreign_access() for mapped statusJuergen Gross
Commit 33172ab50a53578a95691310f49567c9266968b0 upstream. It isn't enough to check whether a grant is still being in use by calling gnttab_query_foreign_access(), as a mapping could be realized by the other side just after having called that function. In case the call was done in preparation of revoking a grant it is better to do so via gnttab_try_end_foreign_access() and check the success of that operation instead. This is CVE-2022-23038 / part of XSA-396. Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and ↵Mike Christie
qedi_process_cmd_cleanup_resp() commit f10f582d28220f50099d3f561116256267821429 upstream. This fixes a deadlock added with commit b40f3894e39e ("scsi: qedi: Complete TMF works before disconnect") Bug description from Jia-Ju Bai: qedi_process_tmf_resp() spin_lock(&session->back_lock); --> Line 201 (Lock A) spin_lock(&qedi_conn->tmf_work_lock); --> Line 230 (Lock B) qedi_process_cmd_cleanup_resp() spin_lock_bh(&qedi_conn->tmf_work_lock); --> Line 752 (Lock B) spin_lock_bh(&conn->session->back_lock); --> Line 784 (Lock A) When qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp() are concurrently executed, the deadlock can occur. This patch fixes the deadlock by not holding the tmf_work_lock in qedi_process_cmd_cleanup_resp while holding the back_lock. The tmf_work_lock is only needed while we remove the tmf_work from the work_list. Link: https://lore.kernel.org/r/20220208185448.6206-1-michael.christie@oracle.com Fixes: b40f3894e39e ("scsi: qedi: Complete TMF works before disconnect") Cc: Manish Rangankar <mrangankar@marvell.com> Cc: Nilesh Javali <njavali@marvell.com> Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loopJames Smart
commit 7f4c5a26f735dea4bbc0eb8eb9da99cda95a8563 upstream. When connected point to point, the driver does not know the FC4's supported by the other end. In Fabrics, it can query the nameserver. Thus the driver must send PRLIs for the FC4s it supports and enable support based on the acc(ept) or rej(ect) of the respective FC4 PRLI. Currently the driver supports SCSI and NVMe PRLIs. Unfortunately, although the behavior is per standard, many devices have come to expect only SCSI PRLIs. In this particular example, the NVMe PRLI is properly RJT'd but the target decided that it must LOGO after seeing the unexpected NVMe PRLI. The LOGO causes the sequence to restart and login is now in an infinite failure loop. Fix the problem by having the driver, on a pt2pt link, remember NVMe PRLI accept or reject status across logout as long as the link stays "up". When retrying login, if the prior NVMe PRLI was rejected, it will not be sent on the next login. Link: https://lore.kernel.org/r/20220212163120.15385-1-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23scsi: ufs: Fix a deadlock in the error handlerBart Van Assche
commit 945c3cca05d78351bba29fa65d93834cb7934c7b upstream. The following deadlock has been observed on a test setup: - All tags allocated - The SCSI error handler calls ufshcd_eh_host_reset_handler() - ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() - ufshcd_err_handler() locks up as follows: Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace: __switch_to+0x298/0x5d8 __schedule+0x6cc/0xa94 schedule+0x12c/0x298 blk_mq_get_tag+0x210/0x480 __blk_mq_alloc_request+0x1c8/0x284 blk_get_request+0x74/0x134 ufshcd_exec_dev_cmd+0x68/0x640 ufshcd_verify_dev_init+0x68/0x35c ufshcd_probe_hba+0x12c/0x1cb8 ufshcd_host_reset_and_restore+0x88/0x254 ufshcd_reset_and_restore+0xd0/0x354 ufshcd_err_handler+0x408/0xc58 process_one_work+0x24c/0x66c worker_thread+0x3e8/0xa4c kthread+0x150/0x1b4 ret_from_fork+0x10/0x30 Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request. Link: https://lore.kernel.org/r/20211203231950.193369-10-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23scsi: ufs: Remove dead codeBart Van Assche
commit d77ea8226b3be23b0b45aa42851243b62a27bda1 upstream. Commit 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag conflicts") guarantees that 'tag' is not in use by any SCSI command. Remove the check that returns early if a conflict occurs. Link: https://lore.kernel.org/r/20211203231950.193369-6-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_taskJohn Garry
[ Upstream commit df7abcaa1246e2537ab4016077b5443bb3c09378 ] Currently a use-after-free may occur if a sas_task is aborted by the upper layer before we handle the I/O completion in mpi_ssp_completion() or mpi_sata_completion(). In this case, the following are the two steps in handling those I/O completions: - Call complete() to inform the upper layer handler of completion of the I/O. - Release driver resources associated with the sas_task in pm8001_ccb_task_free() call. When complete() is called, the upper layer may free the sas_task. As such, we should not touch the associated sas_task afterwards, but we do so in the pm8001_ccb_task_free() call. Fix by swapping the complete() and pm8001_ccb_task_free() calls ordering. Link: https://lore.kernel.org/r/1643289172-165636-4-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23scsi: pm8001: Fix use-after-free for aborted TMF sas_taskJohn Garry
[ Upstream commit 61f162aa4381845acbdc7f2be4dfb694d027c018 ] Currently a use-after-free may occur if a TMF sas_task is aborted before we handle the IO completion in mpi_ssp_completion(). The abort occurs due to timeout. When the timeout occurs, the SAS_TASK_STATE_ABORTED flag is set and the sas_task is freed in pm8001_exec_internal_tmf_task(). However, if the I/O completion occurs later, the I/O completion still thinks that the sas_task is available. Fix this by clearing the ccb->task if the TMF times out - the I/O completion handler does nothing if this pointer is cleared. Link: https://lore.kernel.org/r/1643289172-165636-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23scsi: core: Reallocate device's budget map on queue depth changeMing Lei
[ Upstream commit edb854a3680bacc9ef9b91ec0c5ff6105886f6f3 ] We currently use ->cmd_per_lun as initial queue depth for setting up the budget_map. Martin Wilck reported that it is common for the queue_depth to be subsequently updated in slave_configure() based on detected hardware characteristics. As a result, for some drivers, the static host template settings for cmd_per_lun and can_queue won't actually get used in practice. And if the default values are used to allocate the budget_map, memory may be consumed unnecessarily. Fix the issue by reallocating the budget_map after ->slave_configure() returns. At that time the device queue_depth should accurately reflect what the hardware needs. Link: https://lore.kernel.org/r/20220127153733.409132-1-ming.lei@redhat.com Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: Martin Wilck <martin.wilck@suse.com> Suggested-by: Martin Wilck <martin.wilck@suse.com> Tested-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23scsi: pm80xx: Fix double completion for SATA devicesAjish Koshy
[ Upstream commit c26b85ea16365079be8d206b20556a60a0c69ad4 ] Current code handles completions for SATA devices in mpi_sata_completion() and mpi_sata_event(). However, at the time when any SATA event happens, for almost all the event types, the command is still in the target. It is therefore incorrect to complete the task in sata_event(). There are some events for which we get sata_completions, some need recovery procedure and others abort. All the tasks must be completed via sata_completion() path. Removed the task done related code from sata_events(). For tasks where we don't get completions, let top layer call abort() to abort the command post timeout. Link: https://lore.kernel.org/r/20220124082255.86223-1-Ajish.Koshy@microchip.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Co-developed-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Viswas G <Viswas.G@microchip.com> Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23scsi: lpfc: Fix mailbox command failure during driver initializationJames Smart
commit efe1dc571a5b808baa26682eef16561be2e356fd upstream. Contention for the mailbox interface may occur during driver initialization (immediately after a function reset), between mailbox commands initiated via ioctl (bsg) and those driver requested by the driver. After setting SLI_ACTIVE flag for a port, there is a window in which the driver will allow an ioctl to be initiated while the adapter is initializing and issuing mailbox commands via polling. The polling logic then gets confused. Correct by having thread setting SLI_ACTIVE spot an active mailbox command and allow it complete before proceeding. Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16scsi: lpfc: Reduce log messages seen after firmware downloadJames Smart
commit 5852ed2a6a39c862c8a3fdf646e1f4e01b91d710 upstream. Messages around firmware download were incorrectly tagged as being related to discovery trace events. Thus, firmware download status ended up dumping the trace log as well as the firmware update message. As there were a couple of log messages in this state, the trace log was dumped multiple times. Resolve this by converting from trace events to SLI events. Link: https://lore.kernel.org/r/20220207180442.72836-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabledJames Smart
commit c80b27cfd93ba9f5161383f798414609e84729f3 upstream. The driver is initiating NVMe PRLIs to determine device NVMe support. This should not be occurring if CONFIG_NVME_FC support is disabled. Correct this by changing the default value for FC4 support. Currently it defaults to FCP and NVMe. With change, when NVME_FC support is not enabled in the kernel, the default value is just FCP. Link: https://lore.kernel.org/r/20220207180516.73052-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16scsi: myrs: Fix crash in error caseTong Zhang
[ Upstream commit 4db09593af0b0b4d7d4805ebb3273df51d7cc30d ] In myrs_detect(), cs->disable_intr is NULL when privdata->hw_init() fails with non-zero. In this case, myrs_cleanup(cs) will call a NULL ptr and crash the kernel. [ 1.105606] myrs 0000:00:03.0: Unknown Initialization Error 5A [ 1.105872] myrs 0000:00:03.0: Failed to initialize Controller [ 1.106082] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 1.110774] Call Trace: [ 1.110950] myrs_cleanup+0xe4/0x150 [myrs] [ 1.111135] myrs_probe.cold+0x91/0x56a [myrs] [ 1.111302] ? DAC960_GEM_intr_handler+0x1f0/0x1f0 [myrs] [ 1.111500] local_pci_probe+0x48/0x90 Link: https://lore.kernel.org/r/20220123225717.1069538-1-ztong0001@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: ufs: Treat link loss as fatal errorKiwoong Kim
[ Upstream commit c99b9b2301492b665b6e51ba6c06ec362eddcd10 ] This event is raised when link is lost as specified in UFSHCI spec and that means communication is not possible. Thus initializing UFS interface needs to be done. Make UFS driver considers Link Lost as fatal in the INT_FATAL_ERRORS mask. This will trigger a host reset whenever a link lost interrupt occurs. Link: https://lore.kernel.org/r/1642743475-54275-1-git-send-email-kwmad.kim@samsung.com Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: ufs: Use generic error code in ufshcd_set_dev_pwr_mode()Kiwoong Kim
[ Upstream commit ad6c8a426446873febc98140d81d5353f8c0825b ] The return value of ufshcd_set_dev_pwr_mode() is passed to device PM core. However, the function currently returns a SCSI result which the PM core doesn't understand. This might lead to unexpected behaviors in userland; a platform reset was observed in Android. Use a generic error code for SSU failures. Link: https://lore.kernel.org/r/1642743182-54098-1-git-send-email-kwmad.kim@samsung.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: pm8001: Fix bogus FW crash for maxcpus=1John Garry
[ Upstream commit 62afb379a0fee7e9c2f9f68e1abeb85ceddf51b9 ] According to the comment in check_fw_ready() we should not check the IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers. However we check this very field in process_oq() for processing the highest index interrupt vector. The highest interrupt vector is checked as the FW is programmed to signal fatal errors through this irq. Change that function to not check IOP1_READY for those mentioned controllers, but do check ILA_READY in both cases. The reason I assume that this was not hit earlier was because we always allocated 64 MSI(X), and just did not pass the vector index check in process_oq(), i.e. the handler never ran for vector index 63. Link: https://lore.kernel.org/r/1642508105-95432-1-git-send-email-john.garry@huawei.com Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: qedf: Change context reset messages to ratelimitedSaurav Kashyap
[ Upstream commit 64fd4af6274eb0f49d29772c228fffcf6bde1635 ] If FCoE is not configured, libfc/libfcoe keeps on retrying FLOGI and after 3 retries driver does a context reset and tries fipvlan again. This leads to context reset message flooding the logs. Hence ratelimit the message to prevent flooding the logs. Link: https://lore.kernel.org/r/20220117135311.6256-4-njavali@marvell.com Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: qedf: Fix refcount issue when LOGO is received during TMFSaurav Kashyap
[ Upstream commit 5239ab63f17cee643bd4bf6addfedebaa7d4f41e ] Hung task call trace was seen during LOGO processing. [ 974.309060] [0000:00:00.0]:[qedf_eh_device_reset:868]: 1:0:2:0: LUN RESET Issued... [ 974.309065] [0000:00:00.0]:[qedf_initiate_tmf:2422]: tm_flags 0x10 sc_cmd 00000000c16b930f op = 0x2a target_id = 0x2 lun=0 [ 974.309178] [0000:00:00.0]:[qedf_initiate_tmf:2431]: portid=016900 tm_flags =LUN RESET [ 974.309222] [0000:00:00.0]:[qedf_initiate_tmf:2438]: orig io_req = 00000000ec78df8f xid = 0x180 ref_cnt = 1. [ 974.309625] host1: rport 016900: Received LOGO request while in state Ready [ 974.309627] host1: rport 016900: Delete port [ 974.309642] host1: rport 016900: work event 3 [ 974.309644] host1: rport 016900: lld callback ev 3 [ 974.313243] [0000:61:00.2]:[qedf_execute_tmf:2383]:1: fcport is uploading, not executing flush. [ 974.313295] [0000:61:00.2]:[qedf_execute_tmf:2400]:1: task mgmt command success... [ 984.031088] INFO: task jbd2/dm-15-8:7645 blocked for more than 120 seconds. [ 984.031136] Not tainted 4.18.0-305.el8.x86_64 #1 [ 984.031166] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 984.031209] jbd2/dm-15-8 D 0 7645 2 0x80004080 [ 984.031212] Call Trace: [ 984.031222] __schedule+0x2c4/0x700 [ 984.031230] ? unfreeze_partials.isra.83+0x16e/0x1a0 [ 984.031233] ? bit_wait_timeout+0x90/0x90 [ 984.031235] schedule+0x38/0xa0 [ 984.031238] io_schedule+0x12/0x40 [ 984.031240] bit_wait_io+0xd/0x50 [ 984.031243] __wait_on_bit+0x6c/0x80 [ 984.031248] ? free_buffer_head+0x21/0x50 [ 984.031251] out_of_line_wait_on_bit+0x91/0xb0 [ 984.031257] ? init_wait_var_entry+0x50/0x50 [ 984.031268] jbd2_journal_commit_transaction+0x112e/0x19f0 [jbd2] [ 984.031280] kjournald2+0xbd/0x270 [jbd2] [ 984.031284] ? finish_wait+0x80/0x80 [ 984.031291] ? commit_timeout+0x10/0x10 [jbd2] [ 984.031294] kthread+0x116/0x130 [ 984.031300] ? kthread_flush_work_fn+0x10/0x10 [ 984.031305] ret_from_fork+0x1f/0x40 There was a ref count issue when LOGO is received during TMF. This leads to one of the I/Os hanging with the driver. Fix the ref count. Link: https://lore.kernel.org/r/20220117135311.6256-3-njavali@marvell.com Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: qedf: Add stag_work to all the vportsSaurav Kashyap
[ Upstream commit b70a99fd13282d7885f69bf1372e28b7506a1613 ] Call trace seen when creating NPIV ports, only 32 out of 64 show online. stag work was not initialized for vport, hence initialize the stag work. WARNING: CPU: 8 PID: 645 at kernel/workqueue.c:1635 __queue_delayed_work+0x68/0x80 CPU: 8 PID: 645 Comm: kworker/8:1 Kdump: loaded Tainted: G IOE --------- -- 4.18.0-348.el8.x86_64 #1 Hardware name: Dell Inc. PowerEdge MX740c/0177V9, BIOS 2.12.2 07/09/2021 Workqueue: events fc_lport_timeout [libfc] RIP: 0010:__queue_delayed_work+0x68/0x80 Code: 89 b2 88 00 00 00 44 89 82 90 00 00 00 48 01 c8 48 89 42 50 41 81 f8 00 20 00 00 75 1d e9 60 24 07 00 44 89 c7 e9 98 f6 ff ff <0f> 0b eb c5 0f 0b eb a1 0f 0b eb a7 0f 0b eb ac 44 89 c6 e9 40 23 RSP: 0018:ffffae514bc3be40 EFLAGS: 00010006 RAX: ffff8d25d6143750 RBX: 0000000000000202 RCX: 0000000000000002 RDX: ffff8d2e31383748 RSI: ffff8d25c000d600 RDI: ffff8d2e31383788 RBP: ffff8d2e31380de0 R08: 0000000000002000 R09: ffff8d2e31383750 R10: ffffffffc0c957e0 R11: ffff8d2624800000 R12: ffff8d2e31380a58 R13: ffff8d2d915eb000 R14: ffff8d25c499b5c0 R15: ffff8d2e31380e18 FS: 0000000000000000(0000) GS:ffff8d2d1fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fd0484b8b8 CR3: 00000008ffc10006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: queue_delayed_work_on+0x36/0x40 qedf_elsct_send+0x57/0x60 [qedf] fc_lport_enter_flogi+0x90/0xc0 [libfc] fc_lport_timeout+0xb7/0x140 [libfc] process_one_work+0x1a7/0x360 ? create_worker+0x1a0/0x1a0 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x35/0x40 ---[ end trace 008f00f722f2c2ff ]-- Initialize stag work for all the vports. Link: https://lore.kernel.org/r/20220117135311.6256-2-njavali@marvell.com Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16scsi: ufs: ufshcd-pltfrm: Check the return value of devm_kstrdup()Xiaoke Wang
[ Upstream commit a65b32748f4566f986ba2495a8236c141fa42a26 ] devm_kstrdup() returns pointer to allocated string on success, NULL on failure. So it is better to check the return value of it. Link: https://lore.kernel.org/r/tencent_4257E15D4A94FF9020DDCC4BB9B21C041408@qq.com Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-08scsi: bnx2fc: Make bnx2fc_recv_frame() mp safeJohn Meneghini
commit 936bd03405fc83ba039d42bc93ffd4b88418f1d3 upstream. Running tests with a debug kernel shows that bnx2fc_recv_frame() is modifying the per_cpu lport stats counters in a non-mpsafe way. Just boot a debug kernel and run the bnx2fc driver with the hardware enabled. [ 1391.699147] BUG: using smp_processor_id() in preemptible [00000000] code: bnx2fc_ [ 1391.699160] caller is bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc] [ 1391.699174] CPU: 2 PID: 4355 Comm: bnx2fc_l2_threa Kdump: loaded Tainted: G B [ 1391.699180] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 1391.699183] Call Trace: [ 1391.699188] dump_stack_lvl+0x57/0x7d [ 1391.699198] check_preemption_disabled+0xc8/0xd0 [ 1391.699205] bnx2fc_recv_frame+0xbf9/0x1760 [bnx2fc] [ 1391.699215] ? do_raw_spin_trylock+0xb5/0x180 [ 1391.699221] ? bnx2fc_npiv_create_vports.isra.0+0x4e0/0x4e0 [bnx2fc] [ 1391.699229] ? bnx2fc_l2_rcv_thread+0xb7/0x3a0 [bnx2fc] [ 1391.699240] bnx2fc_l2_rcv_thread+0x1af/0x3a0 [bnx2fc] [ 1391.699250] ? bnx2fc_ulp_init+0xc0/0xc0 [bnx2fc] [ 1391.699258] kthread+0x364/0x420 [ 1391.699263] ? _raw_spin_unlock_irq+0x24/0x50 [ 1391.699268] ? set_kthread_struct+0x100/0x100 [ 1391.699273] ret_from_fork+0x22/0x30 Restore the old get_cpu/put_cpu code with some modifications to reduce the size of the critical section. Link: https://lore.kernel.org/r/20220124145110.442335-1-jmeneghi@redhat.com Fixes: d576a5e80cd0 ("bnx2fc: Improve stats update mechanism") Tested-by: Guangwu Zhang <guazhang@redhat.com> Acked-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()John Meneghini
commit 847f9ea4c5186fdb7b84297e3eeed9e340e83fce upstream. The bnx2fc_destroy() functions are removing the interface before calling destroy_work. This results multiple WARNings from sysfs_remove_group() as the controller rport device attributes are removed too early. Replace the fcoe_port's destroy_work queue. It's not needed. The problem is easily reproducible with the following steps. Example: $ dmesg -w & $ systemctl enable --now fcoe $ fipvlan -s -c ens2f1 $ fcoeadm -d ens2f1.802 [ 583.464488] host2: libfc: Link down on port (7500a1) [ 583.472651] bnx2fc: 7500a1 - rport not created Yet!! [ 583.490468] ------------[ cut here ]------------ [ 583.538725] sysfs group 'power' not found for kobject 'rport-2:0-0' [ 583.568814] WARNING: CPU: 3 PID: 192 at fs/sysfs/group.c:279 sysfs_remove_group+0x6f/0x80 [ 583.607130] Modules linked in: dm_service_time 8021q garp mrp stp llc bnx2fc cnic uio rpcsec_gss_krb5 auth_rpcgss nfsv4 ... [ 583.942994] CPU: 3 PID: 192 Comm: kworker/3:2 Kdump: loaded Not tainted 5.14.0-39.el9.x86_64 #1 [ 583.984105] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 584.016535] Workqueue: fc_wq_2 fc_rport_final_delete [scsi_transport_fc] [ 584.050691] RIP: 0010:sysfs_remove_group+0x6f/0x80 [ 584.074725] Code: ff 5b 48 89 ef 5d 41 5c e9 ee c0 ff ff 48 89 ef e8 f6 b8 ff ff eb d1 49 8b 14 24 48 8b 33 48 c7 c7 ... [ 584.162586] RSP: 0018:ffffb567c15afdc0 EFLAGS: 00010282 [ 584.188225] RAX: 0000000000000000 RBX: ffffffff8eec4220 RCX: 0000000000000000 [ 584.221053] RDX: ffff8c1586ce84c0 RSI: ffff8c1586cd7cc0 RDI: ffff8c1586cd7cc0 [ 584.255089] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffb567c15afc00 [ 584.287954] R10: ffffb567c15afbf8 R11: ffffffff8fbe7f28 R12: ffff8c1486326400 [ 584.322356] R13: ffff8c1486326480 R14: ffff8c1483a4a000 R15: 0000000000000004 [ 584.355379] FS: 0000000000000000(0000) GS:ffff8c1586cc0000(0000) knlGS:0000000000000000 [ 584.394419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 584.421123] CR2: 00007fe95a6f7840 CR3: 0000000107674002 CR4: 00000000000606e0 [ 584.454888] Call Trace: [ 584.466108] device_del+0xb2/0x3e0 [ 584.481701] device_unregister+0x13/0x60 [ 584.501306] bsg_unregister_queue+0x5b/0x80 [ 584.522029] bsg_remove_queue+0x1c/0x40 [ 584.541884] fc_rport_final_delete+0xf3/0x1d0 [scsi_transport_fc] [ 584.573823] process_one_work+0x1e3/0x3b0 [ 584.592396] worker_thread+0x50/0x3b0 [ 584.609256] ? rescuer_thread+0x370/0x370 [ 584.628877] kthread+0x149/0x170 [ 584.643673] ? set_kthread_struct+0x40/0x40 [ 584.662909] ret_from_fork+0x22/0x30 [ 584.680002] ---[ end trace 53575ecefa942ece ]--- Link: https://lore.kernel.org/r/20220115040044.1013475-1-jmeneghi@redhat.com Fixes: 0cbf32e1681d ("[SCSI] bnx2fc: Avoid calling bnx2fc_if_destroy with unnecessary locks") Tested-by: Guangwu Zhang <guazhang@redhat.com> Co-developed-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01scsi: elx: efct: Don't use GFP_KERNEL under spin lockYang Yingliang
commit 61263b3a11a2594b4e898f166c31162236182b5c upstream. GFP_KERNEL/GFP_DMA can't be used under a spin lock. According the comment, els_ios_lock is used to protect els ios list so we can move down the spin lock to avoid using this flag under the lock. Link: https://lore.kernel.org/r/20220111012441.3232527-1-yangyingliang@huawei.com Fixes: 8f406ef72859 ("scsi: elx: libefc: Extended link Service I/O handling") Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27scsi: ufs: ufs-mediatek: Fix error checking in ufs_mtk_init_va09_pwr_ctrl()Miaoqian Lin
commit 3ba880a12df5aa4488c18281701b5b1bc3d4531a upstream. The function regulator_get() returns an error pointer. Use IS_ERR() to validate the return value. Link: https://lore.kernel.org/r/20211222070930.9449-1-linmq006@gmail.com Fixes: cf137b3ea49a ("scsi: ufs-mediatek: Support VA09 regulator operations") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27scsi: core: Show SCMD_LAST in text formBart Van Assche
commit 3369046e54ca8f82e0cb17740643da2d80d3cfa8 upstream. The SCSI debugfs code supports showing information about pending commands, including translating SCSI command flags from numeric into text format. Also convert the SCMD_LAST flag from numeric into text form. Link: https://lore.kernel.org/r/20211129194609.3466071-4-bvanassche@acm.org Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27scsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalanceJames Smart
commit 7576d48c64f36f6fea9df2882f710a474fa35f40 upstream. Issuing lpfc_force_rscn twice results in an ndlp kref use-after-free call trace. A prior patch reworked the get/put handling by ensuring nlp_get was done before WQE submission and a put was done in the completion path. Unfortunately, the issue_els_rscn path had a piece of legacy code that did a nlp_put, causing an imbalance on the ref counts. Fixed by removing the unnecessary legacy code snippet. Link: https://lore.kernel.org/r/20211204002644.116455-4-jsmart2021@gmail.com Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27scsi: mpi3mr: Fixes around reply request queuesSreekanth Reddy
[ Upstream commit 243bcc8efdb1f44b1a1d415e6821a246714c68ce ] Set reply queue depth of 1K for B0 and 4K for A0. While freeing the segmented request queues use the actual queue depth that is used while creating them. Link: https://lore.kernel.org/r/20211220141159.16117-25-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: sr: Don't use GFP_DMAChristoph Hellwig
[ Upstream commit d94d94969a4ba07a43d62429c60372320519c391 ] The allocated buffers are used as a command payload, for which the block layer and/or DMA API do the proper bounce buffering if needed. Link: https://lore.kernel.org/r/20211222090842.920724-1-hch@lst.de Reported-by: Baoquan He <bhe@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanupJames Smart
[ Upstream commit 7dd2e2a923173d637c272e483966be8e96a72b64 ] Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIVJames Smart
[ Upstream commit f0d3919697492950f57a26a1093aee53880d669d ] During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries were still busy. This situation was only seen doing rmmod after at least 1 vport (NPIV) instance was created and destroyed. The number of messages scaled with the number of vports created. When a vport is created, it can receive a PLOGI from another initiator Nport. When this happens, the driver prepares to ack the PLOGI and prepares an RPI for registration (via mbx cmd) which includes an mbuf allocation. During the unsolicited PLOGI processing and after the RPI preparation, the driver recognizes it is one of the vport instances and decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI, the mailbox struct allocated for RPI registration is freed, but the mbuf that was also allocated is not released. Fix by freeing the mbuf with the mailbox struct in the LS_RJT path. As part of the code review to figure the issue out a couple of other areas where found that also would not have released the mbuf. Those are cleaned up as well. Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: ufs: Fix a kernel crash during shutdownBart Van Assche
[ Upstream commit 3489c34bd02b73a72646037d673a122a53cee174 ] Fix the following kernel crash: Unable to handle kernel paging request at virtual address ffffffc91e735000 Call trace: __queue_work+0x26c/0x624 queue_work_on+0x6c/0xf0 ufshcd_hold+0x12c/0x210 __ufshcd_wl_suspend+0xc0/0x400 ufshcd_wl_shutdown+0xb8/0xcc device_shutdown+0x184/0x224 kernel_restart+0x4c/0x124 __arm64_sys_reboot+0x194/0x264 el0_svc_common+0xc8/0x1d4 do_el0_svc+0x30/0x8c el0_svc+0x20/0x30 el0_sync_handler+0x84/0xe4 el0_sync+0x1bc/0x1c0 Fix this crash by ungating the clock before destroying the work queue on which clock gating work is queued. Link: https://lore.kernel.org/r/20211203231950.193369-15-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: block: pm: Always set request queue runtime active in ↵Alan Stern
blk_post_runtime_resume() [ Upstream commit 6e1fcab00a23f7fe9f4fe9704905a790efa1eeab ] John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: ufs: Fix race conditions related to driver dataBart Van Assche
[ Upstream commit 21ad0e49085deb22c094f91f9da57319a97188e4 ] The driver data pointer must be set before any callbacks are registered that use that pointer. Hence move the initialization of that pointer from after the ufshcd_init() call to inside ufshcd_init(). Link: https://lore.kernel.org/r/20211203231950.193369-7-bvanassche@acm.org Fixes: 3b1d05807a9a ("[SCSI] ufs: Segregate PCI Specific Code") Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: core: Fix scsi_device_max_queue_depth()Bart Van Assche
[ Upstream commit 4bc3bffc1a885eb5cb259e4a25146a4c7b1034e3 ] The comment above scsi_device_max_queue_depth() and also the description of commit ca4453213951 ("scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)") contradict the implementation of the function scsi_device_max_queue_depth(). Additionally, the maximum queue depth of a SCSI LUN never exceeds host->can_queue. Fix scsi_device_max_queue_depth() by changing max_t() into min_t(). Link: https://lore.kernel.org/r/20211203231950.193369-2-bvanassche@acm.org Fixes: ca4453213951 ("scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)") Cc: Hannes Reinecke <hare@suse.de> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27scsi: pm80xx: Update WARN_ON check in pm8001_mpi_build_cmd()Igor Pylypiv
[ Upstream commit 606c54ae975ad3af540b505b46b55a687501711f ] Starting from commit 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") driver initializes only max_q_num queues. Do not use an invalid queue if the WARN_ON condition is true. Link: https://lore.kernel.org/r/20211101232825.2350233-4-ipylypiv@google.com Fixes: 7640e1eb8c5d ("scsi: pm80xx: Make mpi_build_cmd locking consistent") Reviewed-by: Vishakha Channapattan <vishakhavc@google.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-11scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()Lixiaokeng
[ Upstream commit 1b8d0300a3e9f216ae4901bab886db7299899ec6 ] |- iscsi_if_destroy_conn |-dev_attr_show |-iscsi_conn_teardown |-spin_lock_bh |-iscsi_sw_tcp_conn_get_param |-kfree(conn->persistent_address) |-iscsi_conn_get_param |-kfree(conn->local_ipaddr) ==>|-read persistent_address ==>|-read local_ipaddr |-spin_unlock_bh When iscsi_conn_teardown() and iscsi_conn_get_param() happen in parallel, a UAF may be triggered. Link: https://lore.kernel.org/r/046ec8a0-ce95-d3fc-3235-666a7c65b224@huawei.com Reported-by: Lu Tixiong <lutianxiong@huawei.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Lixiaokeng <lixiaokeng@huawei.com> Signed-off-by: Linfeilong <linfeilong@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-05scsi: vmw_pvscsi: Set residual data length conditionallyAlexey Makhalov
commit 142c779d05d1fef75134c3cb63f52ccbc96d9e1f upstream. The PVSCSI implementation in the VMware hypervisor under specific configuration ("SCSI Bus Sharing" set to "Physical") returns zero dataLen in the completion descriptor for READ CAPACITY(16). As a result, the kernel can not detect proper disk geometry. This can be recognized by the kernel message: [ 0.776588] sd 1:0:0:0: [sdb] Sector size 0 reported, assuming 512. The PVSCSI implementation in QEMU does not set dataLen at all, keeping it zeroed. This leads to a boot hang as was reported by Shmulik Ladkani. It is likely that the controller returns the garbage at the end of the buffer. Residual length should be set by the driver in that case. The SCSI layer will erase corresponding data. See commit bdb2b8cab439 ("[SCSI] erase invalid data returned by device") for details. Commit e662502b3a78 ("scsi: vmw_pvscsi: Set correct residual data length") introduced the issue by setting residual length unconditionally, causing the SCSI layer to erase the useful payload beyond dataLen when this value is returned as 0. As a result, considering existing issues in implementations of PVSCSI controllers, we do not want to call scsi_set_resid() when dataLen == 0. Calling scsi_set_resid() has no effect if dataLen equals buffer length. Link: https://lore.kernel.org/lkml/20210824120028.30d9c071@blondie/ Link: https://lore.kernel.org/r/20211220190514.55935-1-amakhalov@vmware.com Fixes: e662502b3a78 ("scsi: vmw_pvscsi: Set correct residual data length") Cc: Matt Wang <wwentao@vmware.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Vishal Bhakta <vbhakta@vmware.com> Cc: VMware PV-Drivers <pv-drivers@vmware.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()Dan Carpenter
[ Upstream commit 9020be114a47bf7ff33e179b3bb0016b91a098e6 ] The "mybuf" string comes from the user, so we need to ensure that it is NUL terminated. Link: https://lore.kernel.org/r/20211214070527.GA27934@kili Fixes: bd2cdd5e400f ("scsi: lpfc: NVME Initiator: Add debugfs support") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()George Kennedy
commit e0a2c28da11e2c2b963fc01d50acbf03045ac732 upstream. In resp_mode_select() sanity check the block descriptor len to avoid UAF. BUG: KASAN: use-after-free in resp_mode_select+0xa4c/0xb40 drivers/scsi/scsi_debug.c:2509 Read of size 1 at addr ffff888026670f50 by task scsicmd/15032 CPU: 1 PID: 15032 Comm: scsicmd Not tainted 5.15.0-01d0625 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Call Trace: <TASK> dump_stack_lvl+0x89/0xb5 lib/dump_stack.c:107 print_address_description.constprop.9+0x28/0x160 mm/kasan/report.c:257 kasan_report.cold.14+0x7d/0x117 mm/kasan/report.c:443 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report_generic.c:306 resp_mode_select+0xa4c/0xb40 drivers/scsi/scsi_debug.c:2509 schedule_resp+0x4af/0x1a10 drivers/scsi/scsi_debug.c:5483 scsi_debug_queuecommand+0x8c9/0x1e70 drivers/scsi/scsi_debug.c:7537 scsi_queue_rq+0x16b4/0x2d10 drivers/scsi/scsi_lib.c:1521 blk_mq_dispatch_rq_list+0xb9b/0x2700 block/blk-mq.c:1640 __blk_mq_sched_dispatch_requests+0x28f/0x590 block/blk-mq-sched.c:325 blk_mq_sched_dispatch_requests+0x105/0x190 block/blk-mq-sched.c:358 __blk_mq_run_hw_queue+0xe5/0x150 block/blk-mq.c:1762 __blk_mq_delay_run_hw_queue+0x4f8/0x5c0 block/blk-mq.c:1839 blk_mq_run_hw_queue+0x18d/0x350 block/blk-mq.c:1891 blk_mq_sched_insert_request+0x3db/0x4e0 block/blk-mq-sched.c:474 blk_execute_rq_nowait+0x16b/0x1c0 block/blk-exec.c:63 sg_common_write.isra.18+0xeb3/0x2000 drivers/scsi/sg.c:837 sg_new_write.isra.19+0x570/0x8c0 drivers/scsi/sg.c:775 sg_ioctl_common+0x14d6/0x2710 drivers/scsi/sg.c:941 sg_ioctl+0xa2/0x180 drivers/scsi/sg.c:1166 __x64_sys_ioctl+0x19d/0x220 fs/ioctl.c:52 do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:50 entry_SYSCALL_64_after_hwframe+0x44/0xae arch/x86/entry/entry_64.S:113 Link: https://lore.kernel.org/r/1637262208-28850-1-git-send-email-george.kennedy@oracle.com Reported-by: syzkaller <syzkaller@googlegroups.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22scsi: scsi_debug: Fix type in min_t to avoid stack OOBGeorge Kennedy
commit 36e07d7ede88a1f1ef8f0f209af5b7612324ac2c upstream. Change min_t() to use type "u32" instead of type "int" to avoid stack out of bounds. With min_t() type "int" the values get sign extended and the larger value gets used causing stack out of bounds. BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: stack-out-of-bounds in sg_copy_buffer+0x1de/0x240 lib/scatterlist.c:976 Read of size 127 at addr ffff888072607128 by task syz-executor.7/18707 CPU: 1 PID: 18707 Comm: syz-executor.7 Not tainted 5.15.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x89/0xb5 lib/dump_stack.c:106 print_address_description.constprop.9+0x28/0x160 mm/kasan/report.c:256 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold.14+0x7d/0x117 mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x1a3/0x210 mm/kasan/generic.c:189 memcpy+0x23/0x60 mm/kasan/shadow.c:65 memcpy include/linux/fortify-string.h:191 [inline] sg_copy_buffer+0x1de/0x240 lib/scatterlist.c:976 sg_copy_from_buffer+0x33/0x40 lib/scatterlist.c:1000 fill_from_dev_buffer.part.34+0x82/0x130 drivers/scsi/scsi_debug.c:1162 fill_from_dev_buffer drivers/scsi/scsi_debug.c:1888 [inline] resp_readcap16+0x365/0x3b0 drivers/scsi/scsi_debug.c:1887 schedule_resp+0x4d8/0x1a70 drivers/scsi/scsi_debug.c:5478 scsi_debug_queuecommand+0x8c9/0x1ec0 drivers/scsi/scsi_debug.c:7533 scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1520 [inline] scsi_queue_rq+0x16b0/0x2d40 drivers/scsi/scsi_lib.c:1699 blk_mq_dispatch_rq_list+0xb9b/0x2700 block/blk-mq.c:1639 __blk_mq_sched_dispatch_requests+0x28f/0x590 block/blk-mq-sched.c:325 blk_mq_sched_dispatch_requests+0x105/0x190 block/blk-mq-sched.c:358 __blk_mq_run_hw_queue+0xe5/0x150 block/blk-mq.c:1761 __blk_mq_delay_run_hw_queue+0x4f8/0x5c0 block/blk-mq.c:1838 blk_mq_run_hw_queue+0x18d/0x350 block/blk-mq.c:1891 blk_mq_sched_insert_request+0x3db/0x4e0 block/blk-mq-sched.c:474 blk_execute_rq_nowait+0x16b/0x1c0 block/blk-exec.c:62 sg_common_write.isra.18+0xeb3/0x2000 drivers/scsi/sg.c:836 sg_new_write.isra.19+0x570/0x8c0 drivers/scsi/sg.c:774 sg_ioctl_common+0x14d6/0x2710 drivers/scsi/sg.c:939 sg_ioctl+0xa2/0x180 drivers/scsi/sg.c:1165 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x19d/0x220 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lore.kernel.org/r/1636484247-21254-1-git-send-email-george.kennedy@oracle.com Reported-by: syzkaller <syzkaller@googlegroups.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22scsi: scsi_debug: Don't call kcalloc() if size arg is zeroGeorge Kennedy
commit 3344b58b53a76199dae48faa396e9fc37bf86992 upstream. If the size arg to kcalloc() is zero, it returns ZERO_SIZE_PTR. Because of that, for a following NULL pointer check to work on the returned pointer, kcalloc() must not be called with the size arg equal to zero. Return early without error before the kcalloc() call if size arg is zero. BUG: KASAN: null-ptr-deref in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: null-ptr-deref in sg_copy_buffer+0x138/0x240 lib/scatterlist.c:974 Write of size 4 at addr 0000000000000010 by task syz-executor.1/22789 CPU: 1 PID: 22789 Comm: syz-executor.1 Not tainted 5.15.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x89/0xb5 lib/dump_stack.c:106 __kasan_report mm/kasan/report.c:446 [inline] kasan_report.cold.14+0x112/0x117 mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x1a3/0x210 mm/kasan/generic.c:189 memcpy+0x3b/0x60 mm/kasan/shadow.c:66 memcpy include/linux/fortify-string.h:191 [inline] sg_copy_buffer+0x138/0x240 lib/scatterlist.c:974 do_dout_fetch drivers/scsi/scsi_debug.c:2954 [inline] do_dout_fetch drivers/scsi/scsi_debug.c:2946 [inline] resp_verify+0x49e/0x930 drivers/scsi/scsi_debug.c:4276 schedule_resp+0x4d8/0x1a70 drivers/scsi/scsi_debug.c:5478 scsi_debug_queuecommand+0x8c9/0x1ec0 drivers/scsi/scsi_debug.c:7533 scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1520 [inline] scsi_queue_rq+0x16b0/0x2d40 drivers/scsi/scsi_lib.c:1699 blk_mq_dispatch_rq_list+0xb9b/0x2700 block/blk-mq.c:1639 __blk_mq_sched_dispatch_requests+0x28f/0x590 block/blk-mq-sched.c:325 blk_mq_sched_dispatch_requests+0x105/0x190 block/blk-mq-sched.c:358 __blk_mq_run_hw_queue+0xe5/0x150 block/blk-mq.c:1761 __blk_mq_delay_run_hw_queue+0x4f8/0x5c0 block/blk-mq.c:1838 blk_mq_run_hw_queue+0x18d/0x350 block/blk-mq.c:1891 blk_mq_sched_insert_request+0x3db/0x4e0 block/blk-mq-sched.c:474 blk_execute_rq_nowait+0x16b/0x1c0 block/blk-exec.c:62 blk_execute_rq+0xdb/0x360 block/blk-exec.c:102 sg_scsi_ioctl drivers/scsi/scsi_ioctl.c:621 [inline] scsi_ioctl+0x8bb/0x15c0 drivers/scsi/scsi_ioctl.c:930 sg_ioctl_common+0x172d/0x2710 drivers/scsi/sg.c:1112 sg_ioctl+0xa2/0x180 drivers/scsi/sg.c:1165 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x19d/0x220 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lore.kernel.org/r/1636056397-13151-1-git-send-email-george.kennedy@oracle.com Reported-by: syzkaller <syzkaller@googlegroups.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22scsi: ufs: core: Retry START_STOP on UNIT_ATTENTIONJaegeuk Kim
commit af21c3fd5b3ec540a97b367a70b26616ff7e0c55 upstream. Commit 57d104c153d3 ("ufs: add UFS power management support") made the UFS driver submit a REQUEST SENSE command before submitting a power management command to a WLUN to clear the POWER ON unit attention. Instead of submitting a REQUEST SENSE command before submitting a power management command, retry the power management command until it succeeds. This is the preparation to get rid of all UNIT ATTENTION code which should be handled by users. Link: https://lore.kernel.org/r/20211001182015.1347587-2-jaegeuk@kernel.org Cc: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14scsi: scsi_debug: Fix buffer size of REPORT ZONES commandShin'ichiro Kawasaki
commit 7db0e0c8190a086ef92ce5bb960836cde49540aa upstream. According to ZBC and SPC specifications, the unit of ALLOCATION LENGTH field of REPORT ZONES command is byte. However, current scsi_debug implementation handles it as number of zones to calculate buffer size to report zones. When the ALLOCATION LENGTH has a large number, this results in too large buffer size and causes memory allocation failure. Fix the failure by handling ALLOCATION LENGTH as byte unit. Link: https://lore.kernel.org/r/20211207010638.124280-1-shinichiro.kawasaki@wdc.com Fixes: f0d1cf9378bd ("scsi: scsi_debug: Add ZBC zone commands") Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14scsi: pm80xx: Do not call scsi_remove_host() in pm8001_alloc()Igor Pylypiv
commit 653926205741add87a6cf452e21950eebc6ac10b upstream. Calling scsi_remove_host() before scsi_add_host() results in a crash: BUG: kernel NULL pointer dereference, address: 0000000000000108 RIP: 0010:device_del+0x63/0x440 Call Trace: device_unregister+0x17/0x60 scsi_remove_host+0xee/0x2a0 pm8001_pci_probe+0x6ef/0x1b90 [pm80xx] local_pci_probe+0x3f/0x90 We cannot call scsi_remove_host() in pm8001_alloc() because scsi_add_host() has not been called yet at that point in time. Function call tree: pm8001_pci_probe() | `- pm8001_pci_alloc() | | | `- pm8001_alloc() | | | `- scsi_remove_host() | `- scsi_add_host() Link: https://lore.kernel.org/r/20211201041627.1592487-1-ipylypiv@google.com Fixes: 05c6c029a44d ("scsi: pm80xx: Increase number of supported queues") Reviewed-by: Vishakha Channapattan <vishakhavc@google.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14scsi: qla2xxx: Format log strings only if neededRoman Bolshakov
commit 69002c8ce914ef0ae22a6ea14b43bb30b9a9a6a8 upstream. Commit 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs") introduced unconditional log string formatting to ql_dbg() even if ql_dbg_log event is disabled. It harms performance because some strings are formatted in fastpath and/or interrupt context. Link: https://lore.kernel.org/r/20211112145446.51210-1-r.bolshakov@yadro.com Fixes: 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs") Cc: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08scsi: ufs: ufs-pci: Add support for Intel ADLAdrian Hunter
commit 7dc9fb47bc9a95f1cc6c5655341860c5e50f91d4 upstream. Add PCI ID and callbacks to support Intel Alder Lake. Link: https://lore.kernel.org/r/20211124204218.1784559-1-adrian.hunter@intel.com Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGOJames Smart
commit 0956ba63bd94355bf38cd40f7eb9104577739ab8 upstream. A commit introduced formal regstration of all Fabric nodes to the SCSI transport as well as REG/UNREG RPI mailbox requests. The commit introduced the NLP_RELEASE_RPI flag for rports set in the lpfc_cmpl_els_logo_acc() routine to help clean up the RPIs. This new code caused the driver to release the RPI value used for the remote port and marked the RPI invalid. When the driver later attempted to re-login, it would use the invalid RPI and the adapter rejected the PLOGI request. As no login occurred, the devloss timer on the rport expired and connectivity was lost. This patch corrects the code by removing the snippet that requests the rpi to be unregistered. This change only occurs on a node that is already marked to be rediscovered. This puts the code back to its original behavior, preserving the already-assigned rpi value (registered or not) which can be used on the re-login attempts. Link: https://lore.kernel.org/r/20211123165646.62740-1-jsmart2021@gmail.com Fixes: fe83e3b9b422 ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller") Cc: <stable@vger.kernel.org> # v5.14+ Co-developed-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08scsi: iscsi: Unblock session then wake up error handlerMike Christie
[ Upstream commit a0c2f8b6709a9a4af175497ca65f93804f57b248 ] We can race where iscsi_session_recovery_timedout() has woken up the error handler thread and it's now setting the devices to offline, and session_recovery_timedout()'s call to scsi_target_unblock() is also trying to set the device's state to transport-offline. We can then get a mix of states. For the case where we can't relogin we want the devices to be in transport-offline so when we have repaired the connection __iscsi_unblock_session() can set the state back to running. Set the device state then call into libiscsi to wake up the error handler. Link: https://lore.kernel.org/r/20211105221048.6541-2-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>