aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/pci
AgeCommit message (Collapse)Author
2021-07-20PCI: iproc: Support multi-MSI only on uniprocessor kernelSandor Bodo-Merle
[ Upstream commit 2dc0a201d0f59e6818ef443609f0850a32910844 ] The interrupt affinity scheme used by this driver is incompatible with multi-MSI as it implies moving the doorbell address to that of another MSI group. This isn't possible for multi-MSI, as all the MSIs must have the same doorbell address. As such it is restricted to systems with a single CPU. Link: https://lore.kernel.org/r/20210622152630.40842-2-sbodomerle@gmail.com Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sandor Bodo-Merle <sbodomerle@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Pali Rohár <pali@kernel.org> Acked-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20PCI: iproc: Fix multi-MSI base vector number allocationSandor Bodo-Merle
[ Upstream commit e673d697b9a234fc3544ac240e173cef8c82b349 ] Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") introduced multi-MSI support with a broken allocation mechanism (it failed to reserve the proper number of bits from the inner domain). Natural alignment of the base vector number was also not guaranteed. Link: https://lore.kernel.org/r/20210622152630.40842-1-sbodomerle@gmail.com Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") Reported-by: Pali Rohár <pali@kernel.org> Signed-off-by: Sandor Bodo-Merle <sbodomerle@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Pali Rohár <pali@kernel.org> Acked-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrunKrzysztof Wilczyński
[ Upstream commit bdcdaa13ad96f1a530711c29e6d4b8311eff767c ] "utf16s_to_utf8s(..., buf, PAGE_SIZE)" puts up to PAGE_SIZE bytes into "buf" and returns the number of bytes it actually put there. If it wrote PAGE_SIZE bytes, the newline added by dsm_label_utf16s_to_utf8s() would overrun "buf". Reduce the size available for utf16s_to_utf8s() to use so there is always space for the newline. [bhelgaas: reorder patch in series, commit log] Fixes: 6058989bad05 ("PCI: Export ACPI _DSM provided firmware instance number and string name to sysfs") Link: https://lore.kernel.org/r/20210603000112.703037-7-kw@linux.com Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20PCI: tegra: Add missing MODULE_DEVICE_TABLEZou Wei
[ Upstream commit 7bf475a4614a9722b9b989e53184a02596cf16d1 ] Add missing MODULE_DEVICE_TABLE definition so we generate correct modalias for automatic loading of this driver when it is built as a module. Link: https://lore.kernel.org/r/1620792422-16535-1-git-send-email-zou_wei@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20PCI: aardvark: Fix kernel panic during PIO transferPali Rohár
commit f18139966d072dab8e4398c95ce955a9742e04f7 upstream. Trying to start a new PIO transfer by writing value 0 in PIO_START register when previous transfer has not yet completed (which is indicated by value 1 in PIO_START) causes an External Abort on CPU, which results in kernel panic: SError Interrupt on CPU0, code 0xbf000002 -- SError Kernel panic - not syncing: Asynchronous SError Interrupt To prevent kernel panic, it is required to reject a new PIO transfer when previous one has not finished yet. If previous PIO transfer is not finished yet, the kernel may issue a new PIO request only if the previous PIO transfer timed out. In the past the root cause of this issue was incorrectly identified (as it often happens during link retraining or after link down event) and special hack was implemented in Trusted Firmware to catch all SError events in EL3, to ignore errors with code 0xbf000002 and not forwarding any other errors to kernel and instead throw panic from EL3 Trusted Firmware handler. Links to discussion and patches about this issue: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50 https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/ https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/ https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541 But the real cause was the fact that during link retraining or after link down event the PIO transfer may take longer time, up to the 1.44s until it times out. This increased probability that a new PIO transfer would be issued by kernel while previous one has not finished yet. After applying this change into the kernel, it is possible to revert the mentioned TF-A hack and SError events do not have to be caught in TF-A EL3. Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marek Behún <kabel@kernel.org> Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20PCI: aardvark: Don't rely on jiffies while holding spinlockRemi Pommarel
commit 7fbcb5da811be7d47468417c7795405058abb3da upstream. advk_pcie_wait_pio() can be called while holding a spinlock (from pci_bus_read_config_dword()), then depends on jiffies in order to timeout while polling on PIO state registers. In the case the PIO transaction failed, the timeout will never happen and will also cause the cpu to stall. This decrements a variable and wait instead of using jiffies. Signed-off-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20PCI: aardvark: Fix checking for PIO Non-posted RequestPali Rohár
commit 8ceeac307a79f68c0d0c72d6e48b82fa424204ec upstream. PIO_NON_POSTED_REQ for PIO_STAT register is incorrectly defined. Bit 10 in register PIO_STAT indicates the response is to a non-posted request. Link: https://lore.kernel.org/r/20210624213345.3617-2-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Marek Behún <kabel@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20PCI: Leave Apple Thunderbolt controllers on for s2idle or standbyKonstantin Kharlamov
commit 4694ae373dc2114f9a82f6ae15737e65af0c6dea upstream. On Macbook 2013, resuming from suspend-to-idle or standby resulted in the external monitor no longer being detected, a stacktrace, and errors like this in dmesg: pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible) The reason is that we know how to turn power to the Thunderbolt controller *off* via the SXIO/SXFP/SXLF methods, but we don't know how to turn power back on. We have to rely on firmware to turn the power back on. When going to the "suspend-to-idle" or "standby" system sleep states, firmware is not involved either on the suspend side or the resume side, so we can't use SXIO/SXFP/SXLF to turn the power off. Skip SXIO/SXFP/SXLF when firmware isn't involved in suspend, e.g., when we're going to the "suspend-to-idle" or "standby" system sleep states. Fixes: 1df5172c5c25 ("PCI: Suspend/resume quirks for Apple thunderbolt") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767 Link: https://lore.kernel.org/r/20210520235501.917397-1-Hi-Angel@yandex.ru Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30Revert "PCI: PM: Do not read power state in pci_enable_device_flags()"Rafael J. Wysocki
[ Upstream commit 4d6035f9bf4ea12776322746a216e856dfe46698 ] Revert commit 4514d991d992 ("PCI: PM: Do not read power state in pci_enable_device_flags()") that is reported to cause PCI device initialization issues on some systems. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=213481 Link: https://lore.kernel.org/linux-acpi/YNDoGICcg0V8HhpQ@eldamar.lan Reported-by: Michael <phyre@rogers.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Fixes: 4514d991d992 ("PCI: PM: Do not read power state in pci_enable_device_flags()") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30PCI: Work around Huawei Intelligent NIC VF FLR erratumChiqijun
commit ce00322c2365e1f7b0312f2f493539c833465d97 upstream. pcie_flr() starts a Function Level Reset (FLR), waits 100ms (the maximum time allowed for FLR completion by PCIe r5.0, sec 6.6.2), and waits for the FLR to complete. It assumes the FLR is complete when a config read returns valid data. When we do an FLR on several Huawei Intelligent NIC VFs at the same time, firmware on the NIC processes them serially. The VF may respond to config reads before the firmware has completed its reset processing. If we bind a driver to the VF (e.g., by assigning the VF to a virtual machine) in the interval between the successful config read and completion of the firmware reset processing, the NIC VF driver may fail to load. Prevent this driver failure by waiting for the NIC firmware to complete its reset processing. Not all NIC firmware supports this feature. [bhelgaas: commit log] Link: https://support.huawei.com/enterprise/en/doc/EDOC1100063073/87950645/vm-oss-occasionally-fail-to-load-the-in200-driver-when-the-vf-performs-flr Link: https://lore.kernel.org/r/20210414132301.1793-1-chiqijun@huawei.com Signed-off-by: Chiqijun <chiqijun@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30PCI: Add ACS quirk for Broadcom BCM57414 NICSriharsha Basavapatna
commit db2f77e2bd99dbd2fb23ddde58f0fae392fe3338 upstream. The Broadcom BCM57414 NIC may be a multi-function device. While it does not advertise an ACS capability, peer-to-peer transactions are not possible between the individual functions, so it is safe to treat them as fully isolated. Add an ACS quirk for this device so the functions can be in independent IOMMU groups and attached individually to userspace applications using VFIO. [bhelgaas: commit log] Link: https://lore.kernel.org/r/1621645997-16251-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30PCI: Mark some NVIDIA GPUs to avoid bus resetShanker Donthineni
commit 4c207e7121fa92b66bf1896bf8ccb9edfb0f9731 upstream. Some NVIDIA GPU devices do not work with SBR. Triggering SBR leaves the device inoperable for the current system boot. It requires a system hard-reboot to get the GPU device back to normal operating condition post-SBR. For the affected devices, enable NO_BUS_RESET quirk to avoid the issue. This issue will be fixed in the next generation of hardware. Link: https://lore.kernel.org/r/20210608054857.18963-8-ameynarkhede03@gmail.com Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30PCI: Mark TI C667X to avoid bus resetAntti Järvinen
commit b5cf198e74a91073d12839a3e2db99994a39995d upstream. Some TI KeyStone C667X devices do not support bus/hot reset. The PCIESS automatically disables LTSSM when Secondary Bus Reset is received and device stops working. Prevent bus reset for these devices. With this change, the device can be assigned to VMs with VFIO, but it will leak state between VMs. Reference: https://e2e.ti.com/support/processors/f/791/t/954382 Link: https://lore.kernel.org/r/20210315102606.17153-1-antti.jarvinen@gmail.com Signed-off-by: Antti Järvinen <antti.jarvinen@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-22ACPI / hotplug / PCI: Fix reference count leak in enable_slot()Feilong Lin
[ Upstream commit 3bbfd319034ddce59e023837a4aa11439460509b ] In enable_slot(), if pci_get_slot() returns NULL, we clear the SLOT_ENABLED flag. When pci_get_slot() finds a device, it increments the device's reference count. In this case, we did not call pci_dev_put() to decrement the reference count, so the memory of the device (struct pci_dev type) will eventually leak. Call pci_dev_put() to decrement its reference count when pci_get_slot() returns a PCI device. Link: https://lore.kernel.org/r/b411af88-5049-a1c6-83ac-d104a1f429be@huawei.com Signed-off-by: Feilong Lin <linfeilong@huawei.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22PCI: thunder: Fix compile testingArnd Bergmann
[ Upstream commit 16f7ae5906dfbeff54f74ec75d0563bb3a87ab0b ] Compile-testing these drivers is currently broken. Enabling it causes a couple of build failures though: drivers/pci/controller/pci-thunder-ecam.c:119:30: error: shift count >= width of type [-Werror,-Wshift-count-overflow] drivers/pci/controller/pci-thunder-pem.c:54:2: error: implicit declaration of function 'writeq' [-Werror,-Wimplicit-function-declaration] drivers/pci/controller/pci-thunder-pem.c:392:8: error: implicit declaration of function 'acpi_get_rc_resources' [-Werror,-Wimplicit-function-declaration] Fix them with the obvious one-line changes. Link: https://lore.kernel.org/r/20210308152501.2135937-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Robert Richter <rric@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22PCI: endpoint: Fix missing destroy_workqueue()Yang Yingliang
[ Upstream commit acaef7981a218813e3617edb9c01837808de063c ] Add the missing destroy_workqueue() before return from pci_epf_test_init() in the error handling case and add destroy_workqueue() in pci_epf_test_exit(). Link: https://lore.kernel.org/r/20210331084012.2091010-1-yangyingliang@huawei.com Fixes: 349e7a85b25fa ("PCI: endpoint: functions: Add an EP function to test PCI") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22PCI: Release OF node in pci_scan_device()'s error pathDmitry Baryshkov
[ Upstream commit c99e755a4a4c165cad6effb39faffd0f3377c02d ] In pci_scan_device(), if pci_setup_device() fails for any reason, the code will not release device's of_node by calling pci_release_of_node(). Fix that by calling the release function. Fixes: 98d9f30c820d ("pci/of: Match PCI devices to OF nodes dynamically") Link: https://lore.kernel.org/r/20210124232826.1879-1-dmitry.baryshkov@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()Pali Rohár
[ Upstream commit 1e83130f01b04c16579ed5a5e03d729bcffc4c5d ] IRQ domain alloc function should return zero on success. Non-zero value indicates failure. Link: https://lore.kernel.org/r/20210303142202.25780-1-pali@kernel.org Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Krzysztof Wilczyński <kw@linux.com> Acked-by: Ray Jui <ray.jui@broadcom.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22PCI: PM: Do not read power state in pci_enable_device_flags()Rafael J. Wysocki
[ Upstream commit 4514d991d99211f225d83b7e640285f29f0755d0 ] It should not be necessary to update the current_state field of struct pci_dev in pci_enable_device_flags() before calling do_pci_enable_device() for the device, because none of the code between that point and the pci_set_power_state() call in do_pci_enable_device() invoked later depends on it. Moreover, doing that is actively harmful in some cases. For example, if the given PCI device depends on an ACPI power resource whose _STA method initially returns 0 ("off"), but the config space of the PCI device is accessible and the power state retrieved from the PCI_PM_CTRL register is D0, the current_state field in the struct pci_dev representing that device will get out of sync with the power.state of its ACPI companion object and that will lead to power management issues going forward. To avoid such issues it is better to leave the current_state value as is until it is changed to PCI_D0 by do_pci_enable_device() as appropriate. However, the power state of the device is not changed to PCI_D0 if it is already enabled when pci_enable_device_flags() gets called for it, so update its current_state in that case, but use pci_update_current_state() covering platform PM too for that. Link: https://lore.kernel.org/lkml/20210314000439.3138941-1-luzmaximilian@gmail.com/ Reported-by: Maximilian Luz <luzmaximilian@gmail.com> Tested-by: Maximilian Luz <luzmaximilian@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24PCI: rpadlpar: Fix potential drc_name corruption in store functionsTyrel Datwyler
commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: stable@vger.kernel.org Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210315214821.452959-1-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17PCI: Fix pci_register_io_range() memory leakGeert Uytterhoeven
[ Upstream commit f6bda644fa3a7070621c3bf12cd657f69a42f170 ] Kmemleak reports: unreferenced object 0xc328de40 (size 64): comm "kworker/1:1", pid 21, jiffies 4294938212 (age 1484.670s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 e0 d8 fc eb 00 00 00 00 ................ 00 00 10 fe 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ad758d10>] pci_register_io_range+0x3c/0x80 [<2c7f139e>] of_pci_range_to_resource+0x48/0xc0 [<f079ecc8>] devm_of_pci_get_host_bridge_resources.constprop.0+0x2ac/0x3ac [<e999753b>] devm_of_pci_bridge_init+0x60/0x1b8 [<a895b229>] devm_pci_alloc_host_bridge+0x54/0x64 [<e451ddb0>] rcar_pcie_probe+0x2c/0x644 In case a PCI host driver's probe is deferred, the same I/O range may be allocated again, and be ignored, causing a memory leak. Fix this by (a) letting logic_pio_register_range() return -EEXIST if the passed range already exists, so pci_register_io_range() will free it, and by (b) making pci_register_io_range() not consider -EEXIST an error condition. Link: https://lore.kernel.org/r/20210202100332.829047-1-geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17PCI: mediatek: Add missing of_node_put() to fix reference leakKrzysztof Wilczyński
[ Upstream commit 42814c438aac79746d310f413a27d5b0b959c5de ] The for_each_available_child_of_node helper internally makes use of the of_get_next_available_child() which performs an of_node_get() on each iteration when searching for next available child node. Should an available child node be found, then it would return a device node pointer with reference count incremented, thus early return from the middle of the loop requires an explicit of_node_put() to prevent reference count leak. To stop the reference leak, explicitly call of_node_put() before returning after an error occurred. Link: https://lore.kernel.org/r/20210120184810.3068794-1-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17PCI: xgene-msi: Fix race in installing chained irq handlerMartin Kaiser
[ Upstream commit a93c00e5f975f23592895b7e83f35de2d36b7633 ] Fix a race where a pending interrupt could be received and the handler called before the handler's data has been setup, by converting to irq_set_chained_handler_and_data(). See also 2cf5a03cb29d ("PCI/keystone: Fix race in installing chained IRQ handler"). Based on the mail discussion, it seems ok to drop the error handling. Link: https://lore.kernel.org/r/20210115212435.19940-3-martin@kaiser.cx Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-11PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controllerBjorn Helgaas
[ Upstream commit 059983790a4c963d92943e55a61fca55be427d55 ] Add function 1 DMA alias quirk for Marvell 88SS9215 PCIe SSD Controller. Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c135 Link: https://lore.kernel.org/r/20201110220516.697934-1-helgaas@kernel.org Reported-by: John Smith <LK7S2ED64JHGLKj75shg9klejHWG49h5hk@protonmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-07PCI: Add a REBAR size quirk for Sapphire RX 5600 XT PulseNirmoy Das
[ Upstream commit 907830b0fc9e374d00f3c83de5e426157b482c01 ] RX 5600 XT Pulse advertises support for BAR 0 being 256MB, 512MB, or 1GB, but it also supports 2GB, 4GB, and 8GB. Add a rebar size quirk so that the BAR 0 is big enough to cover complete VARM. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20210107175017.15893-5-nirmoy.das@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04PCI: Align checking of syscall user config accessorsHeiner Kallweit
[ Upstream commit ef9e4005cbaf022c6251263aa27836acccaef65d ] After 34e3207205ef ("PCI: handle positive error codes"), pci_user_read_config_*() and pci_user_write_config_*() return 0 or negative errno values, not PCIBIOS_* values like PCIBIOS_SUCCESSFUL or PCIBIOS_BAD_REGISTER_NUMBER. Remove comparisons with PCIBIOS_SUCCESSFUL and check only for non-zero. It happens that PCIBIOS_SUCCESSFUL is zero, so this is not a functional change, but it aligns this code with the user accessors. [bhelgaas: commit log] Fixes: 34e3207205ef ("PCI: handle positive error codes") Link: https://lore.kernel.org/r/f1220314-e518-1e18-bf94-8e6f8c703758@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064Ansuel Smith
commit 2cfef1971aea6119ee27429181d6cb3383031ac2 upstream. The use of PHY_REFCLK_USE_PAD introduced a regression for apq8064 devices. It was tested that while apq doesn't require the padding, ipq SoC must use it or the kernel hangs on boot. Link: https://lore.kernel.org/r/20201019165555.8269-1-ansuelsmth@gmail.com Fixes: de3c4bf64897 ("PCI: qcom: Add support for tx term offset for rev 2.1.0") Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30PCI: Fix pci_slot_release() NULL pointer dereferenceJubin Zhong
commit 4684709bf81a2d98152ed6b610e3d5c403f9bced upstream. If kobject_init_and_add() fails, pci_slot_release() is called to delete slot->list from parent->slots. But slot->list hasn't been initialized yet, so we dereference a NULL pointer: Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.4.240 #197 task: ffffeb398a45ef10 task.stack: ffffeb398a470000 PC is at __list_del_entry_valid+0x5c/0xb0 LR is at pci_slot_release+0x84/0xe4 ... __list_del_entry_valid+0x5c/0xb0 pci_slot_release+0x84/0xe4 kobject_put+0x184/0x1c4 pci_create_slot+0x17c/0x1b4 __pci_hp_initialize+0x68/0xa4 pciehp_probe+0x1a4/0x2fc pcie_port_probe_service+0x58/0x84 driver_probe_device+0x320/0x470 Initialize slot->list before calling kobject_init_and_add() to avoid this. Fixes: 8a94644b440e ("PCI: Fix pci_create_slot() reference count leak") Link: https://lore.kernel.org/r/1606876422-117457-1-git-send-email-zhongjubin@huawei.com Signed-off-by: Jubin Zhong <zhongjubin@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()Rafael J. Wysocki
commit 7482c5cb90e5a7f9e9e12dd154d405e0219656e3 upstream. The idea behind acpi_pm_set_bridge_wakeup() was to allow bridges to be reference counted for wakeup enabling, because they may be enabled to signal wakeup on behalf of their subordinate devices and that may happen for multiple times in a row, whereas for the other devices it only makes sense to enable wakeup signaling once. However, this becomes problematic if the bridge itself is suspended, because it is treated as a "regular" device in that case and the reference counting doesn't work. For instance, suppose that there are two devices below a bridge and they both can signal wakeup. Every time one of them is suspended, wakeup signaling is enabled for the bridge, so when they both have been suspended, the bridge's wakeup reference counter value is 2. Say that the bridge is suspended subsequently and acpi_pci_wakeup() is called for it. Because the bridge can signal wakeup, that function will invoke acpi_pm_set_device_wakeup() to configure it and __acpi_pm_set_device_wakeup() will be called with the last argument equal to 1. This causes __acpi_device_wakeup_enable() invoked by it to omit the reference counting, because the reference counter of the target device (the bridge) is 2 at that time. Now say that the bridge resumes and one of the device below it resumes too, so the bridge's reference counter becomes 0 and wakeup signaling is disabled for it, but there is still the other suspended device which may need the bridge to signal wakeup on its behalf and that is not going to work. To address this scenario, use wakeup enable reference counting for all devices, not just for bridges, so drop the last argument from __acpi_device_wakeup_enable() and __acpi_pm_set_device_wakeup(), which causes acpi_pm_set_device_wakeup() and acpi_pm_set_bridge_wakeup() to become identical, so drop the latter and use the former instead of it everywhere. Fixes: 1ba51a7c1496 ("ACPI / PCI / PM: Rework acpi_pci_propagate_wakeup()") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30PCI: iproc: Fix out-of-bound array accessesBharat Gooty
[ Upstream commit a3ff529f5d368a17ff35ada8009e101162ebeaf9 ] Declare the full size array for all revisions of PAX register sets to avoid potentially out of bound access of the register array when they are being initialized in iproc_pcie_rev_init(). Link: https://lore.kernel.org/r/20201001060054.6616-2-srinath.mannam@broadcom.com Fixes: 06324ede76cdf ("PCI: iproc: Improve core register population") Signed-off-by: Bharat Gooty <bharat.gooty@broadcom.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30PCI: Fix overflow in command-line resource alignment requestsColin Ian King
[ Upstream commit cc73eb321d246776e5a9f7723d15708809aa3699 ] The shift of 1 by align_order is evaluated using 32 bit arithmetic and the result is assigned to a resource_size_t type variable that is a 64 bit unsigned integer on 64 bit platforms. Fix an overflow before widening issue by making the 1 a ULL. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 32a9a682bef2 ("PCI: allow assignment of memory resources with a specified alignment") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30PCI: Bounds-check command-line resource alignment requestsBjorn Helgaas
[ Upstream commit 6534aac198b58309ff2337981d3f893e0be1d19d ] 32-bit BARs are limited to 2GB size (2^31). By extension, I assume 64-bit BARs are limited to 2^63 bytes. Limit the alignment requested by the "pci=resource_alignment=" command-line parameter to 2^63. Link: https://lore.kernel.org/r/20201007123045.GS4282@kadam Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30PCI: qcom: Add missing reset for ipq806xAnsuel Smith
commit ee367e2cdd2202b5714982739e684543cd2cee0e upstream Add missing ext reset used by ipq8064 SoC in PCIe qcom driver. Link: https://lore.kernel.org/r/20200615210608.21469-5-ansuelsmth@gmail.com Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Sham Muthayyan <smuthayy@codeaurora.org> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # v4.5+ [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-30PCI: iproc: Set affinity mask on MSI interruptsMark Tomlinson
[ Upstream commit eb7eacaa5b9e4f665bd08d416c8f88e63d2f123c ] The core interrupt code expects the irq_set_affinity call to update the effective affinity for the interrupt. This was not being done, so update iproc_msi_irq_set_affinity() to do so. Link: https://lore.kernel.org/r/20200803035241.7737-1-mark.tomlinson@alliedtelesis.co.nz Fixes: 3bc2b2348835 ("PCI: iproc: Add iProc PCIe MSI support") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01PCI: tegra: Fix runtime PM imbalance on errorDinghao Liu
[ Upstream commit fcee90cdf6f3a3a371add04d41528d5ba9c3b411 ] pm_runtime_get_sync() increments the runtime PM usage counter even when it returns an error code. Thus a pairing decrement is needed on the error handling path to keep the counter balanced. Also, call pm_runtime_disable() when pm_runtime_get_sync() returns an error code. Link: https://lore.kernel.org/r/20200521024709.2368-1-dinghao.liu@zju.edu.cn Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01PCI: pciehp: Fix MSI interrupt raceStuart Hayes
[ Upstream commit 8edf5332c39340b9583cf9cba659eb7ec71f75b5 ] Without this commit, a PCIe hotplug port can stop generating interrupts on hotplug events, so device adds and removals will not be seen: The pciehp interrupt handler pciehp_isr() reads the Slot Status register and then writes back to it to clear the bits that caused the interrupt. If a different interrupt event bit gets set between the read and the write, pciehp_isr() returns without having cleared all of the interrupt event bits. If this happens when the MSI isn't masked (which by default it isn't in handle_edge_irq(), and which it will never be when MSI per-vector masking is not supported), we won't get any more hotplug interrupts from that device. That is expected behavior, according to the PCIe Base Spec r5.0, section 6.7.3.4, "Software Notification of Hot-Plug Events". Because the Presence Detect Changed and Data Link Layer State Changed event bits can both get set at nearly the same time when a device is added or removed, this is more likely to happen than it might seem. The issue was found (and can be reproduced rather easily) by connecting and disconnecting an NVMe storage device on at least one system model where the NVMe devices were being connected to an AMD PCIe port (PCI device 0x1022/0x1483). Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot Status register immediately after writing to it, until it sees that all of the event status bits have been cleared. [lukas: drop loop count limitation, write "events" instead of "status", don't loop back in INTx and poll modes, tweak code comment & commit msg] Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01PCI: Use ioremap(), not phys_to_virt() for platform ROMMikel Rychliski
[ Upstream commit 72e0ef0e5f067fd991f702f0b2635d911d0cf208 ] On some EFI systems, the video BIOS is provided by the EFI firmware. The boot stub code stores the physical address of the ROM image in pdev->rom. Currently we attempt to access this pointer using phys_to_virt(), which doesn't work with CONFIG_HIGHMEM. On these systems, attempting to load the radeon module on a x86_32 kernel can result in the following: BUG: unable to handle page fault for address: 3e8ed03c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 317 Comm: systemd-udevd Not tainted 5.6.0-rc3-next-20200228 #2 Hardware name: Apple Computer, Inc. MacPro1,1/Mac-F4208DC8, BIOS MP11.88Z.005C.B08.0707021221 07/02/07 EIP: radeon_get_bios+0x5ed/0xe50 [radeon] Code: 00 00 84 c0 0f 85 12 fd ff ff c7 87 64 01 00 00 00 00 00 00 8b 47 08 8b 55 b0 e8 1e 83 e1 d6 85 c0 74 1a 8b 55 c0 85 d2 74 13 <80> 38 55 75 0e 80 78 01 aa 0f 84 a4 03 00 00 8d 74 26 00 68 dc 06 EAX: 3e8ed03c EBX: 00000000 ECX: 3e8ed03c EDX: 00010000 ESI: 00040000 EDI: eec04000 EBP: eef3fc60 ESP: eef3fbe0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010206 CR0: 80050033 CR2: 3e8ed03c CR3: 2ec77000 CR4: 000006d0 Call Trace: r520_init+0x26/0x240 [radeon] radeon_device_init+0x533/0xa50 [radeon] radeon_driver_load_kms+0x80/0x220 [radeon] drm_dev_register+0xa7/0x180 [drm] radeon_pci_probe+0x10f/0x1a0 [radeon] pci_device_probe+0xd4/0x140 Fix the issue by updating all drivers which can access a platform provided ROM. Instead of calling the helper function pci_platform_rom() which uses phys_to_virt(), call ioremap() directly on the pdev->rom. radeon_read_platform_bios() previously directly accessed an __iomem pointer. Avoid this by calling memcpy_fromio() instead of kmemdup(). pci_platform_rom() now has no remaining callers, so remove it. Link: https://lore.kernel.org/r/20200319021623.5426-1-mikel@mikelr.com Signed-off-by: Mikel Rychliski <mikel@mikelr.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03PCI: Fix pci_create_slot() reference count leakQiushi Wu
[ Upstream commit 8a94644b440eef5a7b9c104ac8aa7a7f413e35e5 ] kobject_init_and_add() takes a reference even when it fails. If it returns an error, kobject_put() must be called to clean up the memory associated with the object. When kobject_init_and_add() fails, call kobject_put() instead of kfree(). b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject") fixed a similar problem. Link: https://lore.kernel.org/r/20200528021322.1984-1-wu000273@umn.edu Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21PCI: Probe bridge window attributes once at enumeration-timeBjorn Helgaas
commit 51c48b310183ab6ba5419edfc6a8de889cc04521 upstream. pci_bridge_check_ranges() determines whether a bridge supports the optional I/O and prefetchable memory windows and sets the flag bits in the bridge resources. This *could* be done once during enumeration except that the resource allocation code completely clears the flag bits, e.g., in the pci_assign_unassigned_bridge_resources() path. The problem with pci_bridge_check_ranges() in the resource allocation path is that we may allocate resources after devices have been claimed by drivers, and pci_bridge_check_ranges() *changes* the window registers to determine whether they're writable. This may break concurrent accesses to devices behind the bridge. Add a new pci_read_bridge_windows() to determine whether a bridge supports the optional windows, call it once during enumeration, remember the results, and change pci_bridge_check_ranges() so it doesn't touch the bridge windows but sets the flag bits based on those remembered results. Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzhou1@hisilicon.com Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html Reported-by: Yandong Xu <xuyandong2@huawei.com> Tested-by: Yandong Xu <xuyandong2@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Ofer Hayut <ofer@lightbitslabs.com> Cc: Roy Shterman <roys@lightbitslabs.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Zhou Wang <wangzhou1@hisilicon.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208371 Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21PCI: qcom: Add support for tx term offset for rev 2.1.0Ansuel Smith
commit de3c4bf648975ea0b1d344d811e9b0748907b47c upstream. Add tx term offset support to pcie qcom driver need in some revision of the ipq806x SoC. Ipq8064 needs tx term offset set to 7. Link: https://lore.kernel.org/r/20200615210608.21469-9-ansuelsmth@gmail.com Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Sham Muthayyan <smuthayy@codeaurora.org> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21PCI: qcom: Define some PARF params needed for ipq8064 SoCAnsuel Smith
commit 5149901e9e6deca487c01cc434a3ac4125c7b00b upstream. Set some specific value for Tx De-Emphasis, Tx Swing and Rx equalization needed on some ipq8064 based device (Netgear R7800 for example). Without this the system locks on kernel load. Link: https://lore.kernel.org/r/20200615210608.21469-8-ansuelsmth@gmail.com Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21PCI: Add device even if driver attach failedRajat Jain
commit 2194bc7c39610be7cabe7456c5f63a570604f015 upstream. device_attach() returning failure indicates a driver error while trying to probe the device. In such a scenario, the PCI device should still be added in the system and be visible to the user. When device_attach() fails, merely warn about it and keep the PCI device in the system. This partially reverts ab1a187bba5c ("PCI: Check device_attach() return value always"). Link: https://lore.kernel.org/r/20200706233240.3245512-1-rajatja@google.com Signed-off-by: Rajat Jain <rajatja@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21PCI: Mark AMD Navi10 GPU rev 0x00 ATS as brokenKai-Heng Feng
commit 45beb31d3afb651bb5c41897e46bd4fa9980c51c upstream. We are seeing AMD Radeon Pro W5700 doesn't work when IOMMU is enabled: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01a0] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01c0] The error also makes graphics driver fail to probe the device. It appears to be the same issue as commit 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") addresses, and indeed the same ATS quirk can workaround the issue. See-also: 5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken") See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken") See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208725 Link: https://lore.kernel.org/r/20200728104554.28927-1-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()Rafael J. Wysocki
commit dae68d7fd4930315389117e9da35b763f12238f9 upstream. If context is not NULL in acpiphp_grab_context(), but the is_going_away flag is set for the device's parent, the reference counter of the context needs to be decremented before returning NULL or the context will never be freed, so make that happen. Fixes: edf5bf34d408 ("ACPI / dock: Use callback pointers from devices' ACPI hotplug contexts") Reported-by: Vasily Averin <vvs@virtuozzo.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19irqdomain/treewide: Free firmware node after domain removalJon Derrick
commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream. Commit 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") unintentionally caused a dangling pointer page fault issue on firmware nodes that were freed after IRQ domain allocation. Commit e3beca48a45b fixed that dangling pointer issue by only freeing the firmware node after an IRQ domain allocation failure. That fix no longer frees the firmware node immediately, but leaves the firmware node allocated after the domain is removed. The firmware node must be kept around through irq_domain_remove, but should be freed it afterwards. Add the missing free operations after domain removal where where appropriate. Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19PCI: Release IVRS table in AMD ACS quirkHanjun Guo
[ Upstream commit 090688fa4e448284aaa16136372397d7d10814db ] The acpi_get_table() should be coupled with acpi_put_table() if the mapped table is not used at runtime to release the table mapping. In pci_quirk_amd_sb_acs(), IVRS table is just used for checking AMD IOMMU is supported, not used at runtime, so put the table after using it. Fixes: 15b100dfd1c9 ("PCI: Claim ACS support for AMD southbridge devices") Link: https://lore.kernel.org/r/1595411068-15440-1-git-send-email-guohanjun@huawei.com Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID registerKishon Vijay Abraham I
[ Upstream commit e3bca37d15dca118f2ef1f0a068bb6e07846ea20 ] Commit 1b79c5284439 ("PCI: cadence: Add host driver for Cadence PCIe controller") in order to update Vendor ID, directly wrote to PCI_VENDOR_ID register. However PCI_VENDOR_ID in root port configuration space is read-only register and writing to it will have no effect. Use local management register to configure Vendor ID and Subsystem Vendor ID. Link: https://lore.kernel.org/r/20200722110317.4744-10-kishon@ti.com Fixes: 1b79c5284439 ("PCI: cadence: Add host driver for Cadence PCIe controller") Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19PCI/ASPM: Add missing newline in sysfs 'policy'Xiongfeng Wang
[ Upstream commit 3167e3d340c092fd47924bc4d23117a3074ef9a9 ] When I cat ASPM parameter 'policy' by sysfs, it displays as follows. Add a newline for easy reading. Other sysfs attributes already include a newline. [root@localhost ~]# cat /sys/module/pcie_aspm/parameters/policy [default] performance powersave powersupersave [root@localhost ~]# Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") Link: https://lore.kernel.org/r/1594972765-10404-1-git-send-email-wangxiongfeng2@huawei.com Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19PCI: Fix pci_cfg_wait queue locking problemBjorn Helgaas
[ Upstream commit 2a7e32d0547f41c5ce244f84cf5d6ca7fccee7eb ] The pci_cfg_wait queue is used to prevent user-space config accesses to devices while they are recovering from reset. Previously we used these operations on pci_cfg_wait: __add_wait_queue(&pci_cfg_wait, ...) __remove_wait_queue(&pci_cfg_wait, ...) wake_up_all(&pci_cfg_wait) The wake_up acquires the wait queue lock, but the add and remove do not. Originally these were all protected by the pci_lock, but cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and p->pi_lock"), moved wake_up_all() outside pci_lock, so it could race with add/remove operations, which caused occasional kernel panics, e.g., during vfio-pci hotplug/unplug testing: Unable to handle kernel read from unreadable memory at virtual address ffff802dac469000 Resolve this by using wait_event() instead of __add_wait_queue() and __remove_wait_queue(). The wait queue lock is held by both wait_event() and wake_up_all(), so it provides mutual exclusion. Fixes: cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and p->pi_lock") Link: https://lore.kernel.org/linux-pci/79827f2f-9b43-4411-1376-b9063b67aee3@huawei.com/T/#u Based-on: https://lore.kernel.org/linux-pci/20191210031527.40136-1-zhengxiang9@huawei.com/ Based-on-patch-by: Xiang Zheng <zhengxiang9@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiang Zheng <zhengxiang9@huawei.com> Cc: Heyi Guo <guoheyi@huawei.com> Cc: Biaoxiang Ye <yebiaoxiang@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-05PCI/ASPM: Disable ASPM on ASMedia ASM1083/1085 PCIe-to-PCI bridgeRobert Hancock
commit b361663c5a40c8bc758b7f7f2239f7a192180e7c upstream. Recently ASPM handling was changed to allow ASPM on PCIe-to-PCI/PCI-X bridges. Unfortunately the ASMedia ASM1083/1085 PCIe to PCI bridge device doesn't seem to function properly with ASPM enabled. On an Asus PRIME H270-PRO motherboard, it causes errors like these: pcieport 0000:00:1c.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) pcieport 0000:00:1c.0: AER: device [8086:a292] error status/mask=00003000/00002000 pcieport 0000:00:1c.0: AER: [12] Timeout pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0 pcieport 0000:00:1c.0: AER: can't find device of ID00e0 In addition to flooding the kernel log, this also causes the machine to wake up immediately after suspend is initiated. The device advertises ASPM L0s and L1 support in the Link Capabilities register, but the ASMedia web page for ASM1083 [1] claims "No PCIe ASPM support". Windows 10 (build 2004) enables L0s, but it also logs correctable PCIe errors. Add a quirk to disable ASPM for this device. [1] https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114 [bhelgaas: commit log] Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208667 Link: https://lore.kernel.org/r/20200722021803.17958-1-hancockrwd@gmail.com Signed-off-by: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>