aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2020-10-14iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()Lu Baolu
[ Upstream commit 1a3f2fd7fc4e8f24510830e265de2ffb8e3300d2 ] Lock(&iommu->lock) without disabling irq causes lockdep warnings. [ 12.703950] ======================================================== [ 12.703962] WARNING: possible irq lock inversion dependency detected [ 12.703975] 5.9.0-rc6+ #659 Not tainted [ 12.703983] -------------------------------------------------------- [ 12.703995] systemd-udevd/284 just changed the state of lock: [ 12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at: iommu_flush_dev_iotlb.part.57+0x2e/0x90 [ 12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 12.704043] (&iommu->lock){+.+.}-{2:2} [ 12.704045] and interrupts could create inverse lock ordering between them. [ 12.704073] other info that might help us debug this: [ 12.704085] Possible interrupt unsafe locking scenario: [ 12.704097] CPU0 CPU1 [ 12.704106] ---- ---- [ 12.704115] lock(&iommu->lock); [ 12.704123] local_irq_disable(); [ 12.704134] lock(device_domain_lock); [ 12.704146] lock(&iommu->lock); [ 12.704158] <Interrupt> [ 12.704164] lock(device_domain_lock); [ 12.704174] *** DEADLOCK *** Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-07iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()Yu Kuai
[ Upstream commit 1a26044954a6d1f4d375d5e62392446af663be7a ] if of_find_device_by_node() succeed, exynos_iommu_of_xlate() doesn't have a corresponding put_device(). Thus add put_device() to fix the exception handling for this function implementation. Fixes: aa759fd376fb ("iommu/exynos: Add callback for initializing devices from device tree") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20200918011335.909141-1-yukuai3@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-26iommu/amd: Use cmpxchg_double() when updating 128-bit IRTESuravee Suthikulpanit
commit e52d58d54a321d4fe9d0ecdabe4f8774449f0d6e upstream. When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode), current driver disables interrupt remapping when it updates the IRTE so that the upper and lower 64-bit values can be updated safely. However, this creates a small window, where the interrupt could arrive and result in IO_PAGE_FAULT (for interrupt) as shown below. IOMMU Driver Device IRQ ============ =========== irte.RemapEn=0 ... change IRTE IRQ from device ==> IO_PAGE_FAULT !! ... irte.RemapEn=1 This scenario has been observed when changing irq affinity on a system running I/O-intensive workload, in which the destination APIC ID in the IRTE is updated. Instead, use cmpxchg_double() to update the 128-bit IRTE at once without disabling the interrupt remapping. However, this means several features, which require GA (128-bit IRTE) support will also be affected if cmpxchg16b is not supported (which is unprecedented for AMD processors w/ IOMMU). Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure") Reported-by: Sean Osborne <sean.m.osborne@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Erik Rockstrom <erik.rockstrom@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20200903093822.52012-3-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-23iommu/amd: Fix potential @entry null derefJoao Martins
[ Upstream commit 14c4acc5ed22c21f9821103be7c48efdf9763584 ] After commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE"), smatch warns: drivers/iommu/amd/iommu.c:3870 amd_iommu_deactivate_guest_mode() warn: variable dereferenced before check 'entry' (see line 3867) Fix this by moving the @valid assignment to after @entry has been checked for NULL. Fixes: 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20200910171621.12879-1-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-17iommu/amd: Do not use IOMMUv2 functionality when SME is activeJoerg Roedel
[ Upstream commit 2822e582501b65707089b097e773e6fd70774841 ] When memory encryption is active the device is likely not in a direct mapped domain. Forbid using IOMMUv2 functionality for now until finer grained checks for this have been implemented. Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200824105415.21000-3-joro@8bytes.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09iommu/vt-d: Handle 36bit addressing for x86-32Chris Wilson
commit 29aaebbca4abc4cceb38738483051abefafb6950 upstream. Beware that the address size for x86-32 may exceed unsigned long. [ 0.368971] UBSAN: shift-out-of-bounds in drivers/iommu/intel/iommu.c:128:14 [ 0.369055] shift exponent 36 is too large for 32-bit type 'long unsigned int' If we don't handle the wide addresses, the pages are mismapped and the device read/writes go astray, detected as DMAR faults and leading to device failure. The behaviour changed (from working to broken) in commit fa954e683178 ("iommu/vt-d: Delegate the dma domain to upper layer"), but the error looks older. Fixes: fa954e683178 ("iommu/vt-d: Delegate the dma domain to upper layer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: James Sewart <jamessewart@arista.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: <stable@vger.kernel.org> # v5.3+ Link: https://lore.kernel.org/r/20200822160209.28512-1-chris@chris-wilson.co.uk Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09iommu/amd: Restore IRTE.RemapEn bit after programming IRTESuravee Suthikulpanit
[ Upstream commit 26e495f341075c09023ba16dee9a7f37a021e745 ] Currently, the RemapEn (valid) bit is accidentally cleared when programming IRTE w/ guestMode=0. It should be restored to the prior state. Fixes: b9fc6b56f478 ("iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20200903093822.52012-2-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09iommu/vt-d: Serialize IOMMU GCMD register modificationsLu Baolu
[ Upstream commit 6e4e9ec65078093165463c13d4eb92b3e8d7b2e8 ] The VT-d spec requires (10.4.4 Global Command Register, GCMD_REG General Description) that: If multiple control fields in this register need to be modified, software must serialize the modifications through multiple writes to this register. However, in irq_remapping.c, modifications of IRE and CFI are done in one write. We need to do two separate writes with STS checking after each. It also checks the status register before writing command register to avoid unnecessary register write. Fixes: af8d102f999a4 ("x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20200828000615.8281-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03iommu/iova: Don't BUG on invalid PFNsRobin Murphy
[ Upstream commit d3e3d2be688b4b5864538de61e750721a311e4fc ] Unlike the other instances which represent a complete loss of consistency within the rcache mechanism itself, or a fundamental and obvious misconfiguration by an IOMMU driver, the BUG_ON() in iova_magazine_free_pfns() can be provoked at more or less any time in a "spooky action-at-a-distance" manner by any old device driver passing nonsense to dma_unmap_*() which then propagates through to queue_iova(). Not only is this well outside the IOVA layer's control, it's also nowhere near fatal enough to justify panicking anyway - all that really achieves is to make debugging the offending driver more difficult. Let's simply WARN and otherwise ignore bogus PFNs. Reported-by: Prakash Gupta <guptap@codeaurora.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Prakash Gupta <guptap@codeaurora.org> Link: https://lore.kernel.org/r/acbd2d092b42738a03a21b417ce64e27f8c91c86.1591103298.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21iommu/omap: Check for failure of a call to omap_iommu_dump_ctxColin Ian King
[ Upstream commit dee9d154f40c58d02f69acdaa5cfd1eae6ebc28b ] It is possible for the call to omap_iommu_dump_ctx to return a negative error number, so check for the failure and return the error number rather than pass the negative value to simple_read_from_buffer. Fixes: 14e0e6796a0d ("OMAP: iommu: add initial debugfs support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200714192211.744776-1-colin.king@canonical.com Addresses-Coverity: ("Improper use of negative value") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19irqdomain/treewide: Free firmware node after domain removalJon Derrick
commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream. Commit 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") unintentionally caused a dangling pointer page fault issue on firmware nodes that were freed after IRQ domain allocation. Commit e3beca48a45b fixed that dangling pointer issue by only freeing the firmware node after an IRQ domain allocation failure. That fix no longer frees the firmware node immediately, but leaves the firmware node allocated after the domain is removed. The firmware node must be kept around through irq_domain_remove, but should be freed it afterwards. Add the missing free operations after domain removal where where appropriate. Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29irqdomain/treewide: Keep firmware node unconditionally allocatedThomas Gleixner
[ Upstream commit e3beca48a45b5e0e6e6a4e0124276b8248dcc9bb ] Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after creating the irqdomain. The only purpose of these FW nodes is to convey name information. When this was introduced the core code did not store the pointer to the node in the irqdomain. A recent change stored the firmware node pointer in irqdomain for other reasons and missed to notice that the usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence are broken by this. Storing a dangling pointer is dangerous itself, but in case that the domain is destroyed later on this leads to a double free. Remove the freeing of the firmware node after creating the irqdomain from all affected call sites to cure this. Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22iommu/vt-d: Make Intel SVM code 64-bit onlyLu Baolu
commit 9486727f5981a5ec5c0b699fb1777451bd6786e4 upstream. Current Intel SVM is designed by setting the pgd_t of the processor page table to FLPTR field of the PASID entry. The first level translation only supports 4 and 5 level paging structures, hence it's infeasible for the IOMMU to share a processor's page table when it's running in 32-bit mode. Let's disable 32bit support for now and claim support only when all the missing pieces are ready in the future. Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode") Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16iommu/vt-d: Don't apply gfx quirks to untrusted devicesRajat Jain
[ Upstream commit 67e8a5b18d41af9298db5c17193f671f235cce01 ] Currently, an external malicious PCI device can masquerade the VID:PID of faulty gfx devices, and thus apply iommu quirks to effectively disable the IOMMU restrictions for itself. Thus we need to ensure that the device we are applying quirks to, is indeed an internal trusted device. Signed-off-by: Rajat Jain <rajatja@google.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30iommu/vt-d: Update scalable mode paging structure coherencyLu Baolu
[ Upstream commit 04c00956ee3cd138fd38560a91452a804a8c5550 ] The Scalable-mode Page-walk Coherency (SMPWC) field in the VT-d extended capability register indicates the hardware coherency behavior on paging structures accessed through the pasid table entry. This is ignored in current code and using ECAP.C instead which is only valid in legacy mode. Fix this so that paging structure updates could be manually flushed from the cache line if hardware page walking is not snooped. Fixes: 765b6a98c1de3 ("iommu/vt-d: Enumerate the scalable mode capability") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30iommu/vt-d: Enable PCI ACS for platform opt in hintLu Baolu
[ Upstream commit 50310600ebda74b9988467e2e6128711c7ba56fc ] PCI ACS is disabled if Intel IOMMU is off by default or intel_iommu=off is used in command line. Unfortunately, Intel IOMMU will be forced on if there're devices sitting on an external facing PCI port that is marked as untrusted (for example, thunderbolt peripherals). That means, PCI ACS is disabled while Intel IOMMU is forced on to isolate those devices. As the result, the devices of an MFD will be grouped by a single group even the ACS is supported on device. [ 0.691263] pci 0000:00:07.1: Adding to iommu group 3 [ 0.691277] pci 0000:00:07.2: Adding to iommu group 3 [ 0.691292] pci 0000:00:07.3: Adding to iommu group 3 Fix it by requesting PCI ACS when Intel IOMMU is detected with platform opt in hint. Fixes: 89a6079df791a ("iommu/vt-d: Force IOMMU on for platform opt in hint") Co-developed-by: Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com> Signed-off-by: Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03iommu: Fix reference count leak in iommu_group_alloc.Qiushi Wu
[ Upstream commit 7cc31613734c4870ae32f5265d576ef296621343 ] kobject_init_and_add() takes reference even when it fails. Thus, when kobject_init_and_add() returns an error, kobject_put() must be called to properly clean up the kobject. Fixes: d72e31c93746 ("iommu: IOMMU Groups") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Link: https://lore.kernel.org/r/20200527210020.6522-1-wu000273@umn.edu Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27iommu/amd: Call domain_flush_complete() in update_domain()Joerg Roedel
[ Upstream commit f44a4d7e4f1cdef73c90b1dc749c4d8a7372a8eb ] The update_domain() function is expected to also inform the hardware about domain changes. This needs a COMPLETION_WAIT command to be sent to all IOMMUs which use the domain. Signed-off-by: Joerg Roedel <jroedel@suse.de> Tested-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/20200504125413.16798-4-joro@8bytes.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27iommu/amd: Fix over-read of ACPI UID from IVRS tableAlexander Monakov
[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Cc: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amonakov@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14iommu/virtio: Reverse arguments to list_addJulia Lawall
commit fb3637a113349f53830f7d6ca45891b7192cd28f upstream. Elsewhere in the file, there is a list_for_each_entry with &vdev->resv_regions as the second argument, suggesting that &vdev->resv_regions is the list head. So exchange the arguments on the list_add call to put the list head in the second argument. Fixes: 2a5a31487445 ("iommu/virtio: Add probe request") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/1588704467-13431-1-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-06iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled systemSuravee Suthikulpanit
commit b74aa02d7a30ee5e262072a7d6e8deff10b37924 upstream. Currently, system fails to boot because the legacy interrupt remapping mode does not enable 128-bit IRTE (GA), which is required for x2APIC support. Fix by using AMD_IOMMU_GUEST_IR_LEGACY_GA mode when booting with kernel option amd_iommu_intr=legacy instead. The initialization logic will check GASup and automatically fallback to using AMD_IOMMU_GUEST_IR_LEGACY if GA mode is not supported. Fixes: 3928aa3f5775 ("iommu/amd: Detect and enable guest vAPIC support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/1587562202-14183-1-git-send-email-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-06iommu/qcom: Fix local_base status checkTang Bin
commit b52649aee6243ea661905bdc5fbe28cc5f6dec76 upstream. The function qcom_iommu_device_probe() does not perform sufficient error checking after executing devm_ioremap_resource(), which can result in crashes if a critical error path is encountered. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20200418134703.1760-1-tangbin@cmss.chinamobile.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23iommu/amd: Fix the configuration of GCR3 table root pointerAdrian Huang
[ Upstream commit c20f36534666e37858a14e591114d93cc1be0d34 ] The SPA of the GCR3 table root pointer[51:31] masks 20 bits. However, this requires 21 bits (Please see the AMD IOMMU specification). This leads to the potential failure when the bit 51 of SPA of the GCR3 table root pointer is 1'. Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Fixes: 52815b75682e2 ("iommu/amd: Add support for IOMMUv2 domain mode") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23iommu/vt-d: Fix page request descriptor sizeJacob Pan
[ Upstream commit 52355fb1919ef7ed9a38e0f3de6e928de1f57217 ] Intel VT-d might support PRS (Page Reqest Support) when it's running in the scalable mode. Each page request descriptor occupies 32 bytes and is 32-bytes aligned. The page request descriptor offset mask should be 32-bytes aligned. Fixes: 5b438f4ba315d ("iommu/vt-d: Support page request in scalable mode") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23iommu/vt-d: Silence RCU-list debugging warning in dmar_find_atsr()Qian Cai
[ Upstream commit c6f4ebdeba4cff590594df931ff1ee610c426431 ] dmar_find_atsr() calls list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence this false positive. drivers/iommu/intel-iommu.c:4504 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff9755bee8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1a6/0xe19 Call Trace: dump_stack+0xa4/0xfe lockdep_rcu_suspicious+0xeb/0xf5 dmar_find_atsr+0x1ab/0x1c0 dmar_parse_one_atsr+0x64/0x220 dmar_walk_remapping_entries+0x130/0x380 dmar_table_init+0x166/0x243 intel_iommu_init+0x1ab/0xe19 pci_iommu_init+0x1a/0x44 do_one_initcall+0xae/0x4d0 kernel_init_freeable+0x412/0x4c5 kernel_init+0x19/0x193 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23iommu/vt-d: Fix mm reference leakJacob Pan
[ Upstream commit 902baf61adf6b187f0a6b789e70d788ea71ff5bc ] Move canonical address check before mmget_not_zero() to avoid mm reference leak. Fixes: 9d8c3af31607 ("iommu/vt-d: IOMMU Page Request needs to check if address is canonical.") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23iommu/virtio: Fix freeing of incomplete domainsJean-Philippe Brucker
[ Upstream commit 7062af3ed2ba451029e3733d9f677c68f5ea9e77 ] Calling viommu_domain_free() on a domain that hasn't been finalised (not attached to any device, for example) can currently cause an Oops, because we attempt to call ida_free() on ID 0, which may either be unallocated or used by another domain. Only initialise the vdomain->viommu pointer, which denotes a finalised domain, at the end of a successful viommu_domain_finalise(). Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver") Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20200326093558.2641019-3-jean-philippe@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-13iommu/vt-d: Allow devices with RMRRs to use identity domainLu Baolu
commit 9235cb13d7d17baba0b3a9277381258361e95c16 upstream. Since commit ea2447f700cab ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain"), the Intel IOMMU driver doesn't allow any devices with RMRR locked to use the identity domain. This was added to to fix the issue where the RMRR info for devices being placed in and out of the identity domain gets lost. This identity maps all RMRRs when setting up the identity domain, so that devices with RMRRs could also use it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: John Donnelly <john.p.donnelly@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01iommu/vt-d: Populate debugfs if IOMMUs are detectedMegha Dey
[ Upstream commit 1da8347d8505c137fb07ff06bbcd3f2bf37409bc ] Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel) gets populated only when DMA remapping is enabled (dmar_disabled = 0) irrespective of whether interrupt remapping is enabled or not. Instead, populate the intel iommu debugfs directory if any IOMMUs are detected. Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ee2636b8670b1 ("iommu/vt-d: Enable base Intel IOMMU debugfs support") Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-01iommu/vt-d: Fix debugfs register readsMegha Dey
[ Upstream commit ba3b01d7a6f4ab9f8a0557044c9a7678f64ae070 ] Commit 6825d3ea6cde ("iommu/vt-d: Add debugfs support to show register contents") dumps the register contents for all IOMMU devices. Currently, a 64 bit read(dmar_readq) is done for all the IOMMU registers, even though some of the registers are 32 bits, which is incorrect. Use the correct read function variant (dmar_readl/dmar_readq) while reading the contents of 32/64 bit registers respectively. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Link: https://lore.kernel.org/r/1583784587-26126-2-git-send-email-megha.dey@linux.intel.com Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-01iommu/vt-d: Silence RCU-list debugging warningsQian Cai
[ Upstream commit f5152416528c2295f35dd9c9bd4fb27c4032413d ] Similar to the commit 02d715b4a818 ("iommu/vt-d: Fix RCU list debugging warnings"), there are several other places that call list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence those false positives as well. drivers/iommu/intel-iommu.c:4288 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1ad/0xb97 drivers/iommu/dmar.c:366 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x125/0xb97 drivers/iommu/intel-iommu.c:5057 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffffa71892c8 (dmar_global_lock){++++}, at: intel_iommu_init+0x61a/0xb13 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-18iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTESuravee Suthikulpanit
commit 730ad0ede130015a773229573559e97ba0943065 upstream. Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") accidentally left out the ir_data pointer when calling modity_irte_ga(), which causes the function amd_iommu_update_ga() to return prematurely due to struct amd_ir_data.ref is NULL and the "is_run" bit of IRTE does not get updated properly. This results in bad I/O performance since IOMMU AVIC always generate GA Log entry and notify IOMMU driver and KVM when it receives interrupt from the PCI pass-through device instead of directly inject interrupt to the vCPU. Fixes by passing ir_data when calling modify_irte_ga() as done previously. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/vt-d: Ignore devices with out-of-spec domain numberDaniel Drake
commit da72a379b2ec0bad3eb265787f7008bead0b040c upstream. VMD subdevices are created with a PCI domain ID of 0x10000 or higher. These subdevices are also handled like all other PCI devices by dmar_pci_bus_notifier(). However, when dmar_alloc_pci_notify_info() take records of such devices, it will truncate the domain ID to a u16 value (in info->seg). The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if it is 0000:00:02.0. In the unlucky event that a real device also exists at 0000:00:02.0 and also has a device-specific entry in the DMAR table, dmar_insert_dev_scope() will crash on:   BUG_ON(i >= devices_cnt); That's basically a sanity check that only one PCI device matches a single DMAR entry; in this case we seem to have two matching devices. Fix this by ignoring devices that have a domain number higher than what can be looked up in the DMAR table. This problem was carefully diagnosed by Jian-Hong Pan. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Fixes: 59ce0515cdaf3 ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/vt-d: Fix the wrong printing in RHSA parsingZhenzhong Duan
commit b0bb0c22c4db623f2e7b1a471596fbf1c22c6dc5 upstream. When base address in RHSA structure doesn't match base address in each DRHD structure, the base address in last DRHD is printed out. This doesn't make sense when there are multiple DRHD units, fix it by printing the buggy RHSA's base address. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Fixes: fd0c8894893cb ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()Qian Cai
commit 2d48ea0efb8887ebba3e3720bb5b738aced4e574 upstream. There are several places traverse RCU-list without holding any lock in intel_iommu_init(). Fix them by acquiring dmar_global_lock. WARNING: suspicious RCU usage ----------------------------- drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/1. Call Trace: dump_stack+0xa0/0xea lockdep_rcu_suspicious+0x102/0x10b intel_iommu_init+0x947/0xb13 pci_iommu_init+0x26/0x62 do_one_initcall+0xfe/0x500 kernel_init_freeable+0x45a/0x4f8 kernel_init+0x11/0x139 ret_from_fork+0x3a/0x50 DMAR: Intel(R) Virtualization Technology for Directed I/O Fixes: d8190dc63886 ("iommu/vt-d: Enable DMA remapping after rmrr mapped") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge pageYonghyun Hwang
commit 77a1bce84bba01f3f143d77127b72e872b573795 upstream. intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge page onto its corresponding physical address. This commit fixes the bug by accomodating the level of page entry for the IOVA and adds IOVA's lower address to the physical address. Cc: <stable@vger.kernel.org> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: Yonghyun Hwang <yonghyun@google.com> Fixes: 3871794642579 ("VT-d: Changes to support KVM") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taintHans de Goede
commit 59833696442c674acbbd297772ba89e7ad8c753d upstream. Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Some distros, e.g. Fedora, have tools watching for the kernel backtraces logged by the WARN macros and offer the user an option to file a bug for this when these are encountered. The WARN_TAINT in warn_invalid_dmar() + another iommu WARN_TAINT, addressed in another patch, have lead to over a 100 bugs being filed this way. This commit replaces the WARN_TAINT("...") calls, with pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) calls avoiding the backtrace and thus also avoiding bug-reports being filed about this against the kernel. Fixes: fd0c8894893c ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables") Fixes: e625b4a95d50 ("iommu/vt-d: Parse ANDD records") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200309140138.3753-2-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1564895 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/dma: Fix MSI reservation allocationMarc Zyngier
commit 65ac74f1de3334852fb7d9b1b430fa5a06524276 upstream. The way cookie_init_hw_msi_region() allocates the iommu_dma_msi_page structures doesn't match the way iommu_put_dma_cookie() frees them. The former performs a single allocation of all the required structures, while the latter tries to free them one at a time. It doesn't quite work for the main use case (the GICv3 ITS where the range is 64kB) when the base granule size is 4kB. This leads to a nice slab corruption on teardown, which is easily observable by simply creating a VF on a SRIOV-capable device, and tearing it down immediately (no need to even make use of it). Fortunately, this only affects systems where the ITS isn't translated by the SMMU, which are both rare and non-standard. Fix it by allocating iommu_dma_msi_page structures one at a time. Fixes: 7c1b058c8b5a3 ("iommu/dma: Handle IOMMU API reserved regions") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + ↵Hans de Goede
add_taint commit 81ee85d0462410de8eeeec1b9761941fd6ed8c7b upstream. Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Fixes: 556ab45f9a77 ("ioat2: catch and recover from broken vtd configurations v6") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200309182510.373875-1-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=701847 Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-12iommu/amd: Disable IOMMU on Stoney Ridge systemsKai-Heng Feng
[ Upstream commit 3dfee47b215e49788cfc80e474820ea2e948c031 ] Serious screen flickering when Stoney Ridge outputs to a 4K monitor. Use identity-mapping and PCI ATS doesn't help this issue. According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do the same here to avoid screen flickering on 4K monitor. Cc: Alex Deucher <alexander.deucher@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/issues/961 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-28iommu/qcom: Fix bogus detach logicRobin Murphy
commit faf305c51aeabd1ea2d7131e798ef5f55f4a7750 upstream. Currently, the implementation of qcom_iommu_domain_free() is guaranteed to do one of two things: WARN() and leak everything, or dereference NULL and crash. That alone is terrible, but in fact the whole idea of trying to track the liveness of a domain via the qcom_domain->iommu pointer as a sanity check is full of fundamentally flawed assumptions. Make things robust and actually functional by not trying to be quite so clever. Reported-by: Brian Masney <masneyb@onstation.org> Tested-by: Brian Masney <masneyb@onstation.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Stephan Gerhold <stephan@gerhold.net> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24iommu/vt-d: Remove unnecessary WARN_ON_ONCE()Lu Baolu
[ Upstream commit 857f081426e5aa38313426c13373730f1345fe95 ] Address field in device TLB invalidation descriptor is qualified by the S field. If S field is zero, a single page at page address specified by address [63:12] is requested to be invalidated. If S field is set, the least significant bit in the address field with value 0b (say bit N) indicates the invalidation address range. The spec doesn't require the address [N - 1, 0] to be cleared, hence remove the unnecessary WARN_ON_ONCE(). Otherwise, the caller might set "mask = MAX_AGAW_PFN_WIDTH" in order to invalidating all the cached mappings on an endpoint, and below overflow error will be triggered. [...] UBSAN: Undefined behaviour in drivers/iommu/dmar.c:1354:3 shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] Reported-and-tested-by: Frank <fgndev@posteo.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STEWill Deacon
[ Upstream commit d71e01716b3606a6648df7e5646ae12c75babde4 ] If, for some bizarre reason, the compiler decided to split up the write of STE DWORD 0, we could end up making a partial structure valid. Although this probably won't happen, follow the example of the context-descriptor code and use WRITE_ONCE() to ensure atomicity of the write. Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/vt-d: Avoid sending invalid page responseJacob Pan
[ Upstream commit 5f75585e19cc7018bf2016aa771632081ee2f313 ] Page responses should only be sent when last page in group (LPIG) or private data is present in the page request. This patch avoids sending invalid descriptors. Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/vt-d: Match CPU and IOMMU paging modeJacob Pan
[ Upstream commit 79db7e1b4cf2a006f556099c13de3b12970fc6e3 ] When setting up first level page tables for sharing with CPU, we need to ensure IOMMU can support no less than the levels supported by the CPU. It is not adequate, as in the current code, to set up 5-level paging in PASID entry First Level Paging Mode(FLPM) solely based on CPU. Currently, intel_pasid_setup_first_level() is only used by native SVM code which already checks paging mode matches. However, future use of this helper function may not be limited to native SVM. https://lkml.org/lkml/2019/11/18/1037 Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/iova: Silence warnings under memory pressureQian Cai
[ Upstream commit 944c9175397476199d4dd1028d87ddc582c35ee8 ] When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call dev_err_once() there because even the "ratelimited" is too much, and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G B Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:2285 inactive_file:1838 isolated_file:0 unevictable:0 dirty:1 writeback:5 unstable:0 slab_reclaimable:13972 slab_unreclaimable:453879 mapped:2380 shmem:154 pagetables:6948 bounce:0 free:19133 free_pcp:7363 free_cma:0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/amd: Only support x2APIC with IVHD type 11h/40hSuravee Suthikulpanit
[ Upstream commit 966b753cf3969553ca50bacd2b8c4ddade5ecc9e ] Current implementation for IOMMU x2APIC support makes use of the MMIO access to MSI capability block registers, which requires checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain the information, and not in the IVHD type 10h IOMMU feature reporting field. Since the BIOS in newer systems, which supports x2APIC, would normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/amd: Check feature support bit before accessing MSI capability registersSuravee Suthikulpanit
[ Upstream commit 813071438e83d338ba5cfe98b3b26c890dc0a6c0 ] The IOMMU MMIO access to MSI capability registers is available only if the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit is set if the EFR[XtSup] is set, which might not be the case. Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address low/high and MSI data registers via the MMIO. Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts') Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24PCI: Add nr_devfns parameter to pci_add_dma_alias()James Sewart
[ Upstream commit 09298542cd891b43778db1f65aa3613aa5a562eb ] Add a "nr_devfns" parameter to pci_add_dma_alias() so it can be used to create DMA aliases for a range of devfns. [bhelgaas: incorporate nr_devfns fix from James, update quirk_pex_vca_alias() and setup_aliases()] Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24iommu/vt-d: Fix off-by-one in PASID allocationJacob Pan
[ Upstream commit 39d630e332144028f56abba83d94291978e72df1 ] PASID allocator uses IDR which is exclusive for the end of the allocation range. There is no need to decrement pasid_max. Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA") Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>