aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/s390
AgeCommit message (Collapse)Author
2022-02-01scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devicesSteffen Maier
commit 8c9db6679be4348b8aae108e11d4be2f83976e30 upstream. Suppose we have an environment with a number of non-NPIV FCP devices (virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical FCP channel (HBA port) and its I_T nexus. Plus a number of storage target ports zoned to such shared channel. Now one target port logs out of the fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port recovery depending on the ADISC result. This happens on all such FCP devices (in different Linux images) concurrently as they all receive a copy of this RSCN. In the following we look at one of those FCP devices. Requests other than FSF_QTCB_FCP_CMND can be slow until they get a response. Depending on which requests are affected by slow responses, there are different recovery outcomes. Here we want to fix failed recoveries on port or adapter level by avoiding recovery requests that can be slow. We need the cached N_Port_ID for the remote port "link" test with ADISC. Just before sending the ADISC, we now intentionally forget the old cached N_Port_ID. The idea is that on receiving an RSCN for a port, we have to assume that any cached information about this port is stale. This forces a fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for the same port. Since we typically can still communicate with the nameserver efficiently, we now reach steady state quicker: Either the nameserver still does not know about the port so we stop recovery, or the nameserver already knows the port potentially with a new N_Port_ID and we can successfully and quickly perform open port recovery. For the one case, where ADISC returns successfully, we re-initialize port->d_id because that case does not involve any port recovery. This also solves a problem if the storage WWPN quickly logs into the fabric again but with a different N_Port_ID. Such as on virtual WWPN takeover during target NPIV failover. [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the RSCN from the storage FDISC was ignored by zfcp and we could not successfully recover the failover. On some later failback on the storage, we could have been lucky if the virtual WWPN got the same old N_Port_ID from the SAN switch as we still had cached. Then the related RSCN triggered a successful port reopen recovery. However, there is no guarantee to get the same N_Port_ID on NPIV FDISC. Even though NPIV-enabled FCP devices are not affected by this problem, this code change optimizes recovery time for gone remote ports as a side effect. The timely drop of cached N_Port_IDs prevents unnecessary slow open port attempts. While the problem might have been in code before v2.6.32 commit 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix depends on the gid_pn_work introduced with that commit, so we mark it as culprit to satisfy fix dependencies. Note: Point-to-point remote port is already handled separately and gets its N_Port_ID from the cached peer_d_id. So resetting port->d_id in general does not affect PtP. Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com Fixes: 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") Cc: <stable@vger.kernel.org> #2.6.32+ Suggested-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/cio: make ccw_device_dma_* more robustHalil Pasic
commit ad9a14517263a16af040598c7920c09ca9670a31 upstream. Since commit 48720ba56891 ("virtio/s390: use DMA memory for ccw I/O and classic notifiers") we were supposed to make sure that virtio_ccw_release_dev() completes before the ccw device and the attached dma pool are torn down, but unfortunately we did not. Before that commit it used to be OK to delay cleaning up the memory allocated by virtio-ccw indefinitely (which isn't really intuitive for guys used to destruction happens in reverse construction order), but now we trigger a BUG_ON if the genpool is destroyed before all memory allocated from it is deallocated. Which brings down the guest. We can observe this problem, when unregister_virtio_device() does not give up the last reference to the virtio_device (e.g. because a virtio-scsi attached scsi disk got removed without previously unmounting its previously mounted partition). To make sure that the genpool is only destroyed after all the necessary freeing is done let us take a reference on the ccw device on each ccw_device_dma_zalloc() and give it up on each ccw_device_dma_free(). Actually there are multiple approaches to fixing the problem at hand that can work. The upside of this one is that it is the safest one while remaining simple. We don't crash the guest even if the driver does not pair allocations and frees. The downside is the reference counting overhead, that the reference counting for ccw devices becomes more complex, in a sense that we need to pair the calls to the aforementioned functions for it to be correct, and that if we happen to leak, we leak more than necessary (the whole ccw device instead of just the genpool). Some alternatives to this approach are taking a reference in virtio_ccw_online() and giving it up in virtio_ccw_release_dev() or making sure virtio_ccw_release_dev() completes its work before virtio_ccw_remove() returns. The downside of these approaches is that these are less safe against programming errors. Cc: <stable@vger.kernel.org> # v5.3 Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 48720ba56891 ("virtio/s390: use DMA memory for ccw I/O and classic notifiers") Reported-by: bfu@redhat.com Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/tape: fix timer initialization in tape_std_assign()Sven Schnelle
commit 213fca9e23b59581c573d558aa477556f00b8198 upstream. commit 9c6c273aa424 ("timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()") changed the timer setup from init_timer_on_stack(() to timer_setup(), but missed to change the mod_timer() call. And while at it, use msecs_to_jiffies() instead of the open coded timeout calculation. Cc: stable@vger.kernel.org Fixes: 9c6c273aa424 ("timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17s390/cio: check the subchannel validity for dev_busidVineeth Vijayan
commit a4751f157c194431fae9e9c493f456df8272b871 upstream. Check the validity of subchanel before reading other fields in the schib. Fixes: d3683c055212 ("s390/cio: add dev_busid sysfs entry for each subchannel") CC: <stable@vger.kernel.org> Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20211105154451.847288-1-vneethv@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22s390/sclp: fix Secure-IPL facility detectionAlexander Egorenkov
commit d76b14f3971a0638b6cd0da289f8b48acee287d0 upstream. Prevent out-of-range access if the returned SCLP SCCB response is smaller in size than the address of the Secure-IPL flag. Fixes: c9896acc7851 ("s390/ipl: Provide has_secure sysfs attribute") Cc: stable@vger.kernel.org # 5.2+ Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15s390/cio: add dev_busid sysfs entry for each subchannelVineeth Vijayan
[ Upstream commit d3683c055212bf910d4e318f7944910ce10dbee6 ] Introduce dev_busid, which exports the device-id associated with the io-subchannel (and message-subchannel). The dev_busid indicates that of the device which may be physically installed on the corrosponding subchannel. The dev_busid value "none" indicates that the subchannel is not valid, there is no I/O device currently associated with the subchannel. The dev_busid information would be helpful to write device-specific udev-rules associated with the subchannel. The dev_busid interface would be available even when the sch is not bound to any driver or if there is no operational device connected on it. Hence this attribute can be used to write udev-rules which are specific to the device associated with the subchannel. Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20s390/sclp_vt220: fix console name to match deviceValentin Vidic
[ Upstream commit b7d91d230a119fdcc334d10c9889ce9c5e15118b ] Console name reported in /proc/consoles: ttyS1 -W- (EC p ) 4:65 does not match the char device name: crw--w---- 1 root root 4, 65 May 17 12:18 /dev/ttysclp0 so debian-installer inside a QEMU s390x instance gets confused and fails to start with the following error: steal-ctty: No such file or directory Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Link: https://lore.kernel.org/r/20210427194010.9330-1-vvidic@valentin-vidic.from.hr Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14s390/cio: dont call css_wait_for_slow_path() inside a lockVineeth Vijayan
commit c749d8c018daf5fba6dfac7b6c5c78b27efd7d65 upstream. Currently css_wait_for_slow_path() gets called inside the chp->lock. The path-verification-loop of slowpath inside this lock could lead to deadlock as reported by the lockdep validator. The ccw_device_get_chp_desc() during the instance of a device-set-online would try to acquire the same 'chp->lock' to read the chp->desc. The instance of this function can get called from multiple scenario, like probing or setting-device online manually. This could, in some corner-cases lead to the deadlock. lockdep validator reported this as, CPU0 CPU1 ---- ---- lock(&chp->lock); lock(kn->active#43); lock(&chp->lock); lock((wq_completion)cio); The chp->lock was introduced to serialize the access of struct channel_path. This lock is not needed for the css_wait_for_slow_path() function, so invoke the slow-path function outside this lock. Fixes: b730f3a93395 ("[S390] cio: add lock to struct channel_path") Cc: <stable@vger.kernel.org> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16vfio-ccw: Serialize FSM IDLE state with I/O completionEric Farman
[ Upstream commit 2af7a834a435460d546f0cf0a8b8e4d259f1d910 ] Today, the stacked call to vfio_ccw_sch_io_todo() does three things: 1) Update a solicited IRB with CP information, and release the CP if the interrupt was the end of a START operation. 2) Copy the IRB data into the io_region, under the protection of the io_mutex 3) Reset the vfio-ccw FSM state to IDLE to acknowledge that vfio-ccw can accept more work. The trouble is that step 3 is (A) invoked for both solicited and unsolicited interrupts, and (B) sitting after the mutex for step 2. This second piece becomes a problem if it processes an interrupt for a CLEAR SUBCHANNEL while another thread initiates a START, thus allowing the CP and FSM states to get out of sync. That is: CPU 1 CPU 2 fsm_do_clear() fsm_irq() fsm_io_request() vfio_ccw_sch_io_todo() fsm_io_helper() Since the FSM state and CP should be kept in sync, let's make a note when the CP is released, and rely on that as an indication that the FSM should also be reset at the end of this routine and open up the device for more work. Signed-off-by: Eric Farman <farman@linux.ibm.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20210511195631.3995081-4-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03vfio-ccw: Check initialized flag in cp_init()Eric Farman
[ Upstream commit c6c82e0cd8125d30f2f1b29205c7e1a2f1a6785b ] We have a really nice flag in the channel_program struct that indicates if it had been initialized by cp_init(), and use it as a guard in the other cp accessor routines, but not for a duplicate call into cp_init(). The possibility of this occurring is low, because that flow is protected by the private->io_mutex and FSM CP_PROCESSING state. But then why bother checking it in (for example) cp_prefetch() then? Let's just be consistent and check for that in cp_init() too. Fixes: 71189f263f8a3 ("vfio-ccw: make it safe to access channel programs") Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Message-Id: <20210511195631.3995081-2-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17s390/dasd: fix hanging IO request during DASD driver unbindStefan Haberland
commit 66f669a272898feb1c69b770e1504aa2ec7723d1 upstream. Prevent that an IO request is build during device shutdown initiated by a driver unbind. This request will never be able to be processed or canceled and will hang forever. This will lead also to a hanging unbind. Fix by checking not only if the device is in READY state but also check that there is no device offline initiated before building a new IO request. Fixes: e443343e509a ("s390/dasd: blk-mq conversion") Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Tested-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/dasd: fix hanging DASD driver unbindStefan Haberland
commit 7d365bd0bff3c0310c39ebaffc9a8458e036d666 upstream. In case of an unbind of the DASD device driver the function dasd_generic_remove() is called which shuts down the device. Among others this functions removes the int_handler from the cdev. During shutdown the device cancels all outstanding IO requests and waits for completion of the clear request. Unfortunately the clear interrupt will never be received when there is no interrupt handler connected. Fix by moving the int_handler removal after the call to the state machine where no request or interrupt is outstanding. Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Tested-by: Bjoern Walk <bwalk@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/crypto: return -EFAULT if copy_to_user() failsWang Qing
commit 942df4be7ab40195e2a839e9de81951a5862bc5b upstream. The copy_to_user() function returns the number of bytes remaining to be copied, but we want to return -EFAULT if the copy doesn't complete. Fixes: e06670c5fe3b ("s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctl") Signed-off-by: Wang Qing <wangqing@vivo.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/1614600502-16714-1-git-send-email-wangqing@vivo.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/cio: return -EFAULT if copy_to_user() failsEric Farman
commit d9c48a948d29bcb22f4fe61a81b718ef6de561a0 upstream. Fixes: 120e214e504f ("vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls") Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17s390/cio: return -EFAULT if copy_to_user() fails againWang Qing
commit 51c44babdc19aaf882e1213325a0ba291573308f upstream. The copy_to_user() function returns the number of bytes remaining to be copied, but we want to return -EFAULT if the copy doesn't complete. Fixes: e01bcdd61320 ("vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl") Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/1614600093-13992-1-git-send-email-wangqing@vivo.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04virtio/s390: implement virtio-ccw revision 2 correctlyCornelia Huck
commit 182f709c5cff683e6732d04c78e328de0532284f upstream. CCW_CMD_READ_STATUS was introduced with revision 2 of virtio-ccw, and drivers should only rely on it being implemented when they negotiated at least that revision with the device. However, virtio_ccw_get_status() issued READ_STATUS for any device operating at least at revision 1. If the device accepts READ_STATUS regardless of the negotiated revision (which some implementations like QEMU do, even though the spec currently does not allow it), everything works as intended. While a device rejecting the command should also be handled gracefully, we will not be able to see any changes the device makes to the status, such as setting NEEDS_RESET or setting the status to zero after a completed reset. We negotiated the revision to at most 1, as we never bumped the maximum revision; let's do that now and properly send READ_STATUS only if we are operating at least at revision 2. Cc: stable@vger.kernel.org Fixes: 7d3ce5ab9430 ("virtio/s390: support READ_STATUS command for virtio-ccw") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20210216110645.1087321-1-cohuck@redhat.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03s390/vfio-ap: No need to disable IRQ after queue resetTony Krowiak
commit 6c12a6384e0c0b96debd88b24028e58f2ebd417b upstream. The queues assigned to a matrix mediated device are currently reset when: * The VFIO_DEVICE_RESET ioctl is invoked * The mdev fd is closed by userspace (QEMU) * The mdev is removed from sysfs. Immediately after the reset of a queue, a call is made to disable interrupts for the queue. This is entirely unnecessary because the reset of a queue disables interrupts, so this will be removed. Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which can result in a specification exception (when the corresponding facility is not available), so this is actually a bugfix. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> [pasic@linux.ibm.com: minor rework before merging] Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel") Cc: <stable@vger.kernel.org> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17s390/qeth: fix L2 header access in qeth_l3_osa_features_check()Julian Wiedmann
[ Upstream commit f9c4845385c8f6631ebd5dddfb019ea7a285fba4 ] ip_finish_output_gso() may call .ndo_features_check() even before the skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt to inspect the L2 header via vlan_eth_hdr(). Switch to vlan_get_protocol(), as already used further down in the common qeth_features_check() path. Fixes: f13ade199391 ("s390/qeth: run non-offload L3 traffic over common xmit path") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: fix list corruption of lcu listStefan Haberland
commit 53a7f655834c7c335bf683f248208d4fbe4b47bc upstream. In dasd_alias_disconnect_device_from_lcu the device is removed from any list on the LCU. Afterwards the LCU is removed from the lcu list if it does not contain devices any longer. The lcu->lock protects the lcu from parallel updates. But to cancel all workers and wait for completion the lcu->lock has to be unlocked. If two devices are removed in parallel and both are removed from the LCU the first device that takes the lcu->lock again will delete the LCU because it is already empty but the second device also tries to free the LCU which leads to a list corruption of the lcu list. Fix by removing the device right before the lcu is checked without unlocking the lcu->lock in between. Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: fix list corruption of pavgroup group listStefan Haberland
commit 0ede91f83aa335da1c3ec68eb0f9e228f269f6d8 upstream. dasd_alias_add_device() moves devices to the active_devices list in case of a scheduled LCU update regardless if they have previously been in a pavgroup or not. Example: device A and B are in the same pavgroup. Device A has already been in a pavgroup and the private->pavgroup pointer is set and points to a valid pavgroup. While going through dasd_add_device it is moved from the pavgroup to the active_devices list. In parallel device B might be removed from the same pavgroup in remove_device_from_lcu() which in turn checks if the group is empty and deletes it accordingly because device A has already been removed from there. When now device A enters remove_device_from_lcu() it is tried to remove it from the pavgroup again because the pavgroup pointer is still set and again the empty group will be cleaned up which leads to a list corruption. Fix by setting private->pavgroup to NULL in dasd_add_device. If the device has been the last device on the pavgroup an empty pavgroup remains but this will be cleaned up by the scheduled lcu_update which iterates over all existing pavgroups. Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: prevent inconsistent LCU device dataStefan Haberland
commit a29ea01653493b94ea12bb2b89d1564a265081b6 upstream. Prevent _lcu_update from adding a device to a pavgroup if the LCU still requires an update. The data is not reliable any longer and in parallel devices might have been moved on the lists already. This might lead to list corruptions or invalid PAV grouping. Only add devices to a pavgroup if the LCU is up to date. Additional steps are taken by the scheduled lcu update. Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1") Cc: stable@vger.kernel.org Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/dasd: fix hanging device offline processingStefan Haberland
commit 658a337a606f48b7ebe451591f7681d383fa115e upstream. For an LCU update a read unit address configuration IO is required. This is started using sleep_on(), which has early exit paths in case the device is not usable for IO. For example when it is in offline processing. In those cases the LCU update should fail and not be retried. Therefore lcu_update_work checks if EOPNOTSUPP is returned or not. Commit 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration") accidentally removed the EOPNOTSUPP return code from read_unit_address_configuration(), which in turn might lead to an endless loop of the LCU update in offline processing. Fix by returning EOPNOTSUPP again if the device is not able to perform the request. Fixes: 41995342b40c ("s390/dasd: fix endless loop after read unit address configuration") Cc: stable@vger.kernel.org #5.3 Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30s390/cio: fix use-after-free in ccw_device_destroy_consoleQinglang Miao
[ Upstream commit 14d4c4fa46eeaa3922e8e1c4aa727eb0a1412804 ] Use of sch->dev reference after the put_device() call could trigger the use-after-free bugs. Fix this by simply adjusting the position of put_device. Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support") Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> [vneethv@linux.ibm.com: Slight modification in the commit-message] Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-02s390/qeth: fix tear down of async TX buffersJulian Wiedmann
[ Upstream commit 7ed10e16e50daf74460f54bc922e27c6863c8d61 ] When qeth_iqd_tx_complete() detects that a TX buffer requires additional async completion via QAOB, it might fail to replace the queue entry's metadata (and ends up triggering recovery). Assume now that the device gets torn down, overruling the recovery. If the QAOB notification then arrives before the tear down has sufficiently progressed, the buffer state is changed to QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob(). The tear down code calls qeth_drain_output_queue(), where qeth_cleanup_handled_pending() will then attempt to replace such a buffer _again_. If it succeeds this time, the buffer ends up dangling in its replacement's ->next_pending list ... where it will never be freed, since there's no further call to qeth_cleanup_handled_pending(). But the second attempt isn't actually needed, we can simply leave the buffer on the queue and re-use it after a potential recovery has completed. The qeth_clear_output_buffer() in qeth_drain_output_queue() will ensure that it's in a clean state again. Fixes: 72861ae792c2 ("qeth: recovery through asynchronous delivery") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-02s390/qeth: fix af_iucv notification raceJulian Wiedmann
[ Upstream commit 8908f36d20d8ba610d3a7d110b3049b5853b9bb1 ] The two expected notification sequences are 1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when our TX completion code first observed the pending TX and the QAOB then completes at a later time; or 2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB completion before our TX completion code even noticed that the TX was pending. But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run concurrently, we may end up with a race that results in a sequence of TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse the af_iucv code in its tracking of pending transmits. Rework the notification code, so that qeth_qdio_handle_aob() defers its notification if the TX completion code is still active. Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-02s390/qeth: make af_iucv TX notification call more robustJulian Wiedmann
[ Upstream commit 34c7f50f7d0d36fa663c74aee39e25e912505320 ] Calling into socket code is ugly already, at least check whether we are dealing with the expected sk_family. Only looking at skb->protocol is bound to cause troubles (consider eg. af_packet). Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24s390/dasd: fix null pointer dereference for ERP requestsStefan Haberland
commit 6f117cb854a44a79898d844e6ae3fd23bd94e786 upstream. When requeueing all requests on the device request queue to the blocklayer we might get to an ERP (error recovery) request that is a copy of an original CQR. Those requests do not have blocklayer request information or a pointer to the dasd_queue set. When trying to access those data it will lead to a null pointer dereference in dasd_requeue_all_requests(). Fix by checking if the request is an ERP request that can simply be ignored. The blocklayer request will be requeued by the original CQR that is on the device queue right behind the ERP request. Fixes: 9487cfd3430d ("s390/dasd: fix handling of internal requests") Cc: <stable@vger.kernel.org> #4.16 Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-10s390/pkey: fix paes selftest failure with paes and pkey static buildHarald Freudenberger
commit 5b35047eb467c8cdd38a31beb9ac109221777843 upstream. When both the paes and the pkey kernel module are statically build into the kernel, the paes cipher selftests run before the pkey kernel module is initialized. So a static variable set in the pkey init function and used in the pkey_clr2protkey function is not initialized when the paes cipher's selftests request to call pckmo for transforming a clear key value into a protected key. This patch moves the initial setup of the static variable into the function pck_clr2protkey. So it's possible, to use the function for transforming a clear to a protected key even before the pkey init function has been called and the paes selftests may run successful. Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com> Cc: <stable@vger.kernel.org> # 4.20 Fixes: f822ad2c2c03 ("s390/pkey: move pckmo subfunction available checks away from module init") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29s390/qeth: don't let HW override the configured port roleJulian Wiedmann
[ Upstream commit a04f0ecacdb0639d416614619225a39de3927e22 ] The only time that our Bridgeport role should change is when we change the configuration ourselves. In which case we also adjust our internal state tracking, no need to do it again when we receive the corresponding event. Removing the locked section helps a subsequent patch that needs to flush the workqueue while under sbp_lock. It would be nice to raise a warning here in case HW does weird things after all, but this could end up generating false-positives when we change the configuration ourselves. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctlChristian Borntraeger
commit f7e80983f0cf470bb82036e73bff4d5a7daf8fc2 upstream. reqcnt is an u32 pointer but we do copy sizeof(reqcnt) which is the size of the pointer. This means we only copy 8 byte. Let us copy the full monty. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Fixes: af4a72276d49 ("s390/zcrypt: Support up to 256 crypto adapters.") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01s390/dasd: Fix zero write for FBA devicesJan Höppner
commit 709192d531e5b0a91f20aa14abfe2fc27ddd47af upstream. A discard request that writes zeros using the global kernel internal ZERO_PAGE will fail for machines with more than 2GB of memory due to the location of the ZERO_PAGE. Fix this by using a driver owned global zero page allocated with GFP_DMA flag set. Fixes: 28b841b3a7cb ("s390/dasd: Add discard support for FBA devices") Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01s390/irq: replace setup_irq() by request_irq()afzal mohammed
[ Upstream commit 8719b6d29d2851fa84c4074bb2e5adc022911ab8 ] request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. Per tglx[1], setup_irq() existed in olden days when allocators were not ready by the time early interrupts were initialized. Hence replace setup_irq() by request_irq(). [1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Message-Id: <20200304005049.5291-1-afzal.mohd.ma@gmail.com> [heiko.carstens@de.ibm.com: replace pr_err with panic] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-23s390/zcrypt: fix kmalloc 256k failureHarald Freudenberger
commit b6186d7fb53349efd274263a45f0b08749ccaa2d upstream. Tests showed that under stress conditions the kernel may temporary fail to allocate 256k with kmalloc. However, this fix reworks the related code in the cca_findcard2() function to use kvmalloc instead. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Cc: Stable <stable@vger.kernel.org> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03s390/cio: add cond_resched() in the slow_eval_known_fn() loopVineeth Vijayan
[ Upstream commit 0b8eb2ee9da1e8c9b8082f404f3948aa82a057b2 ] The scanning through subchannels during the time of an event could take significant amount of time in case of platforms with lots of known subchannels. This might result in higher scheduling latencies for other tasks especially on systems with a single CPU. Add cond_resched() call, as the loop in slow_eval_known_fn() can be executed for a longer duration. Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26scsi: zfcp: Fix use-after-free in request timeout handlersSteffen Maier
commit 2d9a2c5f581be3991ba67fa9e7497c711220ea8e upstream. Before v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use timer_setup()"), we intentionally only passed zfcp_adapter as context argument to zfcp_fsf_request_timeout_handler(). Since we only trigger adapter recovery, it was unnecessary to sync against races between timeout and (late) completion. Likewise, we only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action, it was unnecessary to sync against races between timeout and (late) completion. Meanwhile the timeout handlers get timer_list as context argument and do a timer-specific container-of to zfcp_fsf_req which can have been freed. Fix it by making sure that any request timeout handlers, that might just have started before del_timer(), are completed by using del_timer_sync() instead. This ensures the request free happens afterwards. Space time diagram of potential use-after-free: Basic idea is to have 2 or more pending requests whose timeouts run out at almost the same time. req 1 timeout ERP thread req 2 timeout ---------------- ---------------- --------------------------------------- zfcp_fsf_request_timeout_handler fsf_req = from_timer(fsf_req, t, timer) adapter = fsf_req->adapter zfcp_qdio_siosl(adapter) zfcp_erp_adapter_reopen(adapter,...) zfcp_erp_strategy ... zfcp_fsf_req_dismiss_all list_for_each_entry_safe zfcp_fsf_req_complete 1 del_timer 1 zfcp_fsf_req_free 1 zfcp_fsf_req_complete 2 zfcp_fsf_request_timeout_handler del_timer 2 fsf_req = from_timer(fsf_req, t, timer) zfcp_fsf_req_free 2 adapter = fsf_req->adapter ^^^^^^^ already freed Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()") Cc: <stable@vger.kernel.org> #4.15+ Suggested-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19s390/dasd: fix inability to use DASD with DIAG driverStefan Haberland
commit 9f4aa52387c68049403b59939df5c0dd8e3872cc upstream. During initialization of the DASD DIAG driver a request is issued that has a bio structure that resides on the stack. With virtually mapped kernel stacks this bio address might be in virtual storage which is unsuitable for usage with the diag250 call. In this case the device can not be set online using the DIAG discipline and fails with -EOPNOTSUP. In the system journal the following error message is presented: dasd: X.X.XXXX Setting the DASD online with discipline DIAG failed with rc=-95 Fix by allocating the bio structure instead of having it on the stack. Fixes: ce3dc447493f ("s390: add support for virtually mapped kernel stacks") Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: stable@vger.kernel.org #4.20 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19s390/qeth: don't process empty bridge port eventsJulian Wiedmann
[ Upstream commit 02472e28b9a45471c6d8729ff2c7422baa9be46a ] Discard events that don't contain any entries. This shouldn't happen, but subsequent code relies on being able to use entry 0. So better be safe than accessing garbage. Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30s390/qeth: fix error handling for isolation mode cmdsJulian Wiedmann
[ Upstream commit e2dfcfba00ba4a414617ef4c5a8501fe21567eb3 ] Current(?) OSA devices also store their cmd-specific return codes for SET_ACCESS_CONTROL cmds into the top-level cmd->hdr.return_code. So once we added stricter checking for the top-level field a while ago, none of the error logic that rolls back the user's configuration to its old state is applied any longer. For this specific cmd, go back to the old model where we peek into the cmd structure even though the top-level field indicated an error. Fixes: 686c97ee29c8 ("s390/qeth: fix error handling in adapter command callbacks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP actionSteffen Maier
commit 936e6b85da0476dd2edac7c51c68072da9fb4ba2 upstream. Suppose that, for unrelated reasons, FSF requests on behalf of recovery are very slow and can run into the ERP timeout. In the case at hand, we did adapter recovery to a large degree. However due to the slowness a LUN open is pending so the corresponding fc_rport remains blocked. After fast_io_fail_tmo we trigger close physical port recovery for the port under which the LUN should have been opened. The new higher order port recovery dismisses the pending LUN open ERP action and dismisses the pending LUN open FSF request. Such dismissal decouples the ERP action from the pending corresponding FSF request by setting zfcp_fsf_req->erp_action to NULL (among other things) [zfcp_erp_strategy_check_fsfreq()]. If now the ERP timeout for the pending open LUN request runs out, we must not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use timer_setup()"). Before that we intentionally only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Note: The lifetime of the corresponding zfcp_fsf_req object continues until a (late) response or an (unrelated) adapter recovery. Just like the regular response path ignores dismissed requests [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the ERP timeout handler now needs to ignore dismissed requests. So simply return early in the ERP timeout handler if the FSF request is marked as dismissed in its status flags. To protect against the race where zfcp_erp_strategy_check_fsfreq() dismisses and sets zfcp_fsf_req->erp_action to NULL after our previous status flag check, return early if zfcp_fsf_req->erp_action is NULL. After all, the former ERP action does not need to be woken up as that was already done as part of the dismissal above [zfcp_erp_action_dismiss()]. This fixes the following panic due to kernel page fault in IRQ context: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d Oops: 0004 ilc:2 [#1] SMP Modules linked in: ... CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G E X ... Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000 000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02 0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000 0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8 Krnl Code: 001fffff80549bd4: a7180000 lhi %r1,0 001fffff80549bd8: 4120a0f0 la %r2,240(%r10) #001fffff80549bdc: a53e0003 llilh %r3,3 >001fffff80549be0: ba132000 cs %r1,%r3,0(%r2) 001fffff80549be4: a7740037 brc 7,1fffff80549c52 001fffff80549be8: e320b0180004 lg %r2,24(%r11) 001fffff80549bee: e31020e00004 lg %r1,224(%r2) 001fffff80549bf4: 412020e0 la %r2,224(%r2) Call Trace: [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp] [<00000985915e26f0>] call_timer_fn+0x38/0x190 [<00000985915e2944>] expire_timers+0xfc/0x190 [<00000985915e2ac4>] run_timer_softirq+0xec/0x218 [<0000098591ca7c4c>] __do_softirq+0x144/0x398 [<00000985915110aa>] do_softirq_own_stack+0x72/0x88 [<0000098591551b58>] irq_exit+0xb0/0xb8 [<0000098591510c6a>] do_IRQ+0x82/0xb0 [<0000098591ca7140>] ext_int_handler+0x128/0x12c [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60 ([<000009859172ae4c>] clear_huge_page+0xec/0x250) [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768 [<000009859172a712>] __handle_mm_fault+0x88a/0x900 [<000009859172a860>] handle_mm_fault+0xd8/0x1b0 [<0000098591529ef6>] do_dat_exception+0x136/0x3e8 [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220 Last Breaking-Event-Address: [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()") Cc: <stable@vger.kernel.org> #4.15+ Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24s390/qdio: put thinint indicator after early errorJulian Wiedmann
[ Upstream commit 75e82bec6b2622c6f455b7a543fb5476a5d0eed7 ] qdio_establish() calls qdio_setup_thinint() via qdio_setup_irq(). If the subsequent qdio_establish_thinint() fails, we miss to put the DSCI again. Thus the DSCI isn't available for re-use. Given enough of such errors, we could end up with having only the shared DSCI available. Merge qdio_setup_thinint() into qdio_establish_thinint(), and deal with such an error internally. Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20s390/ism: fix error return code in ism_probe()Wei Yongjun
[ Upstream commit 29b74cb75e3572d83708745e81e24d37837415f9 ] Fix to return negative error code -ENOMEM from the smcd_alloc_dev() error handling case instead of 0, as done elsewhere in this function. Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29s390/cio: avoid duplicated 'ADD' ueventsCornelia Huck
[ Upstream commit 05ce3e53f375295c2940390b2b429e506e07655c ] The common I/O layer delays the ADD uevent for subchannels and delegates generating this uevent to the individual subchannel drivers. The io_subchannel driver will do so when the associated ccw_device has been registered -- but unconditionally, so more ADD uevents will be generated if a subchannel has been unbound from the io_subchannel driver and later rebound. To fix this, only generate the ADD event if uevents were still suppressed for the device. Fixes: fa1a8c23eb7d ("s390: cio: Delay uevents for subchannels") Message-Id: <20200327124503.9794-2-cohuck@redhat.com> Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Boris Fiuczynski <fiuczy@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29s390/cio: generate delayed uevent for vfio-ccw subchannelsCornelia Huck
[ Upstream commit 2bc55eaeb88d30accfc1b6ac2708d4e4b81ca260 ] The common I/O layer delays the ADD uevent for subchannels and delegates generating this uevent to the individual subchannel drivers. The vfio-ccw I/O subchannel driver, however, did not do that, and will not generate an ADD uevent for subchannels that had not been bound to a different driver (or none at all, which also triggers the uevent). Generate the ADD uevent at the end of the probe function if uevents were still suppressed for the device. Message-Id: <20200327124503.9794-3-cohuck@redhat.com> Fixes: 63f1934d562d ("vfio: ccw: basic implementation for vfio_ccw driver") Reviewed-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-pointSteffen Maier
commit 819732be9fea728623e1ed84eba28def7384ad1f upstream. v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports") introduced zfcp automatic port scan. Before that, the user had to use the sysfs attribute "port_add" of an FCP device (adapter) to add and open remote (target) ports, even for the remote peer port in point-to-point topology. That code path did a proper port open recovery trigger taking the erp_lock. Since above commit, a new helper function zfcp_erp_open_ptp_port() performed an UNlocked port open recovery trigger. This can race with other parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt e.g. adapter->erp_total_count or adapter->erp_ready_head. As already found for fabric topology in v4.17 commit fa89adba1941 ("scsi: zfcp: fix infinite iteration on ERP ready list"), there was an endless loop during tracing of rport (un)block. A subsequent v4.18 commit 9e156c54ace3 ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger") introduced a lockdep assertion for that case. As a side effect, that lockdep assertion now uncovered the unlocked code path for PtP. It is from within an adapter ERP action: zfcp_erp_strategy[1479] intentionally DROPs erp lock around zfcp_erp_strategy_do_action() zfcp_erp_strategy_do_action[1441] NO erp lock zfcp_erp_adapter_strategy[876] NO erp lock zfcp_erp_adapter_strategy_open[855] NO erp lock zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around zfcp_erp_action_to_running(), BUT *_not_* around zfcp_erp_enqueue_ptp_port() zfcp_erp_enqueue_ptp_port[728] BUG: *_not_* taking erp lock _zfcp_erp_port_reopen[432] assumes to be called with erp lock zfcp_erp_action_enqueue[314] assumes to be called with erp lock zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock: lockdep_assert_held(&adapter->erp_lock); It causes the following lockdep warning: WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x16a/0x188 no locks held by zfcperp0.0.17c0/775. Fix this by using the proper locked recovery trigger helper function. Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com Fixes: cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports") Cc: <stable@vger.kernel.org> #v2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01s390/qeth: handle error when backing RX bufferJulian Wiedmann
[ Upstream commit 17413852804d7e86e6f0576cca32c1541817800e ] qeth_init_qdio_queues() fills the RX ring with an initial set of RX buffers. If qeth_init_input_buffer() fails to back one of the RX buffers with memory, we need to bail out and report the error. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-01s390/qeth: don't reset default_out_queueJulian Wiedmann
[ Upstream commit 240c1948491b81cfe40f84ea040a8f2a4966f101 ] When an OSA device in prio-queue setup is reduced to 1 TX queue due to HW restrictions, we reset its the default_out_queue to 0. In the old code this was needed so that qeth_get_priority_queue() gets the queue selection right. But with proper multiqueue support we already reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic on txq 0 without even calling .ndo_select_queue. Thus we can preserve the user's configuration, and apply it if the OSA device later re-gains support for multiple TX queues. Fixes: 73dc2daf110f ("s390/qeth: add TX multiqueue support for OSA devices") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-18s390/dasd: fix data corruption for thin provisioned devicesStefan Haberland
commit 5e6bdd37c5526ef01326df5dabb93011ee89237e upstream. Devices are formatted in multiple of tracks. For an Extent Space Efficient (ESE) volume we get errors when accessing unformatted tracks. In this case the driver either formats the track on the flight for write requests or returns zero data for read requests. In case a request spans multiple tracks, the indication of an unformatted track presented for the first track is incorrectly applied to all tracks covered by the request. As a result, tracks containing data will be handled as empty, resulting in zero data being returned on read, or overwriting existing data with zero on write. Fix by determining the track that gets the NRF error. For write requests only format the track that is surely not formatted. For Read requests all tracks before have returned valid data and should not be touched. All tracks after the unformatted track might be formatted or not. Those are returned to the blocklayer to build a new request. When using alias devices there is a chance that multiple write requests trigger a format of the same track which might lead to data loss. Ensure that a track is formatted only once by maintaining a list of currently processed tracks. Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12s390/qdio: fill SL with absolute addressesJulian Wiedmann
[ Upstream commit e9091ffd6a0aaced111b5d6ead5eaab5cd7101bc ] As the comment says, sl->sbal holds an absolute address. qeth currently solves this through wild casting, while zfcp doesn't care. Handle this properly in the code that actually builds the SL. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> [for qdio] Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-12s390/cio: cio_ignore_proc_seq_next should increase position indexVasily Averin
[ Upstream commit 8b101a5e14f2161869636ff9cb4907b7749dc0c2 ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283 Link: https://lore.kernel.org/r/d44c53a7-9bc1-15c7-6d4a-0c10cb9dffce@virtuozzo.com Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05s390/qeth: vnicc Fix EOPNOTSUPP precedenceAlexandra Winter
commit 6f3846f0955308b6d1b219419da42b8de2c08845 upstream. When getting or setting VNICC parameters, the error code EOPNOTSUPP should have precedence over EBUSY. EBUSY is used because vnicc feature and bridgeport feature are mutually exclusive, which is a temporary condition. Whereas EOPNOTSUPP indicates that the HW does not support all or parts of the vnicc feature. This issue causes the vnicc sysfs params to show 'blocked by bridgeport' for HW that does not support VNICC at all. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>