aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2022-04-15iommu/arm-smmu-v3: fix event handling soft lockupZhou Guanghui
[ Upstream commit 30de2b541af98179780054836b48825fcfba4408 ] During event processing, events are read from the event queue one by one until the queue is empty.If the master device continuously requests address access at the same time and the SMMU generates events, the cyclic processing of the event takes a long time and softlockup warnings may be reported. arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received: arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247] Call trace: _dev_info+0x7c/0xa0 arm_smmu_evtq_thread+0x1c0/0x230 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x210 kthread+0x134/0x138 ret_from_fork+0x10/0x1c Kernel panic - not syncing: softlockup: hung tasks Fix this by calling cond_resched() after the event information is printed. Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Link: https://lore.kernel.org/r/20220119070754.26528-1-zhouguanghui1@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15iommu/ipmmu-vmsa: Check for error num after setting maskJiasheng Jiang
[ Upstream commit 1fdbbfd5099f797a4dac05e7ef0192ba4a9c39b4 ] Because of the possible failure of the dma_supported(), the dma_set_mask_and_coherent() may return error num. Therefore, it should be better to check it and return the error if fails. Fixes: 1c894225bf5b ("iommu/ipmmu-vmsa: IPMMU device is 40-bit bus master") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Link: https://lore.kernel.org/r/20220106024302.2574180-1-jiasheng@iscas.ac.cn Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15iommu/iova: Improve 32-bit free space estimateRobin Murphy
commit 5b61343b50590fb04a3f6be2cdc4868091757262 upstream. For various reasons based on the allocator behaviour and typical use-cases at the time, when the max32_alloc_size optimisation was introduced it seemed reasonable to couple the reset of the tracked size to the update of cached32_node upon freeing a relevant IOVA. However, since subsequent optimisations focused on helping genuine 32-bit devices make best use of even more limited address spaces, it is now a lot more likely for cached32_node to be anywhere in a "full" 32-bit address space, and as such more likely for space to become available from IOVAs below that node being freed. At this point, the short-cut in __cached_rbnode_delete_update() really doesn't hold up any more, and we need to fix the logic to reliably provide the expected behaviour. We still want cached32_node to only move upwards, but we should reset the allocation size if *any* 32-bit space has become available. Reported-by: Yunfei Wang <yf.wang@mediatek.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Miles Chen <miles.chen@mediatek.com> Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()Joerg Roedel
commit 9b45a7738eec52bf0f5d8d3d54e822962781c5f2 upstream. The polling loop for the register change in iommu_ga_log_enable() needs to have a udelay() in it. Otherwise the CPU might be faster than the IOMMU hardware and wrongly trigger the WARN_ON() further down the code stream. Use a 10us for udelay(), has there is some hardware where activation of the GA log can take more than a 100ms. A future optimization should move the activation check of the GA log to the point where it gets used for the first time. But that is a bigger change and not suitable for a fix. Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20220204115537.3894-1-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()Guoqing Jiang
commit 99e675d473eb8cf2deac1376a0f840222fc1adcf upstream. After commit e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated"). For tear down scenario, fn is only freed after fail to allocate ir_domain, though it also should be freed in case dmar_enable_qi returns error. Besides free fn, irq_domain and ir_msi_domain need to be removed as well if intel_setup_irq_remapping fails to enable queued invalidation. Improve the rewinding path by add out_free_ir_domain and out_free_fwnode lables per Baolu's suggestion. Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20220119063640.16864-1-guoqing.jiang@linux.dev Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220128031002.2219155-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27iommu/iova: Fix race between FQ timeout and teardownXiongfeng Wang
[ Upstream commit d7061627d701c90e1cac1e1e60c45292f64f3470 ] It turns out to be possible for hotplugging out a device to reach the stage of tearing down the device's group and default domain before the domain's flush queue has drained naturally. At this point, it is then possible for the timeout to expire just before the del_timer() call in free_iova_flush_queue(), such that we then proceed to free the FQ resources while fq_flush_timeout() is still accessing them on another CPU. Crashes due to this have been observed in the wild while removing NVMe devices. Close the race window by using del_timer_sync() to safely wait for any active timeout handler to finish before we start to free things. We already avoid any locking in free_iova_flush_queue() since the FQ is supposed to be inactive anyway, so the potential deadlock scenario does not apply. Fixes: 9a005a800ae8 ("iommu/iova: Add flush timer") Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> [ rm: rewrite commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0a365e5b07f14b7344677ad6a9a734966a8422ce.1639753638.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27iommu/io-pgtable-arm: Fix table descriptor paddr formattingHector Martin
[ Upstream commit 9abe2ac834851a7d0b0756e295cf7a292c45ca53 ] Table descriptors were being installed without properly formatting the address using paddr_to_iopte, which does not match up with the iopte_deref in __arm_lpae_map. This is incorrect for the LPAE pte format, as it does not handle the high bits properly. This was found on Apple T6000 DARTs, which require a new pte format (different shift); adding support for that to paddr_to_iopte/iopte_to_paddr caused it to break badly, as even <48-bit addresses would end up incorrect in that case. Fixes: 6c89928ff7a0 ("iommu/io-pgtable-arm: Support 52-bit physical address") Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Hector Martin <marcan@marcan.st> Link: https://lore.kernel.org/r/20211120031343.88034-1-marcan@marcan.st Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27iommu/io-pgtable-arm-v7s: Add error handle for page table allocation failureYunfei Wang
commit a556cfe4cabc6d79cbb7733f118bbb420b376fe6 upstream. In __arm_v7s_alloc_table function: iommu call kmem_cache_alloc to allocate page table, this function allocate memory may fail, when kmem_cache_alloc fails to allocate table, call virt_to_phys will be abnomal and return unexpected phys and goto out_free, then call kmem_cache_free to release table will trigger KE, __get_free_pages and free_pages have similar problem, so add error handle for page table allocation failure. Fixes: 29859aeb8a6e ("iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE") Signed-off-by: Yunfei Wang <yf.wang@mediatek.com> Cc: <stable@vger.kernel.org> # 5.10.* Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20211207113315.29109-1-yf.wang@mediatek.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-26iommu: Check if group is NULL before remove deviceFrank Wunderlich
[ Upstream commit 5aa95d8834e07907e64937d792c12ffef7fb271f ] If probe_device is failing, iommu_group is not initialized because iommu_group_add_device is not reached, so freeing it will result in NULL pointer access. iommu_bus_init ->bus_iommu_probe ->probe_iommu_group in for each:/* return -22 in fail case */ ->iommu_probe_device ->__iommu_probe_device /* return -22 here.*/ -> ops->probe_device /* return -22 here.*/ -> iommu_group_get_for_dev -> ops->device_group -> iommu_group_add_device //good case ->remove_iommu_group //in fail case, it will remove group ->iommu_release_device ->iommu_group_remove_device // here we don't have group In my case ops->probe_device (mtk_iommu_probe_device from mtk_iommu_v1.c) is due to failing fwspec->ops mismatch. Fixes: d72e31c93746 ("iommu: IOMMU Groups") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Link: https://lore.kernel.org/r/20210731074737.4573-1-linux@fw-web.de Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-18iommu/vt-d: Fix agaw for a supported 48 bit guest address widthSaeed Mirzamohammadi
[ Upstream commit 327d5b2fee91c404a3956c324193892cf2cc9528 ] The IOMMU driver calculates the guest addressability for a DMA request based on the value of the mgaw reported from the IOMMU. However, this is a fused value and as mentioned in the spec, the guest width should be calculated based on the minimum of supported adjusted guest address width (SAGAW) and MGAW. This is from specification: "Guest addressability for a given DMA request is limited to the minimum of the value reported through this field and the adjusted guest address width of the corresponding page-table structure. (Adjusted guest address widths supported by hardware are reported through the SAGAW field)." This causes domain initialization to fail and following errors appear for EHCI PCI driver: [ 2.486393] ehci-pci 0000:01:00.4: EHCI Host Controller [ 2.486624] ehci-pci 0000:01:00.4: new USB bus registered, assigned bus number 1 [ 2.489127] ehci-pci 0000:01:00.4: DMAR: Allocating domain failed [ 2.489350] ehci-pci 0000:01:00.4: DMAR: 32bit DMA uses non-identity mapping [ 2.489359] ehci-pci 0000:01:00.4: can't setup: -12 [ 2.489531] ehci-pci 0000:01:00.4: USB bus 1 deregistered [ 2.490023] ehci-pci 0000:01:00.4: init 0000:01:00.4 fail, -12 [ 2.490358] ehci-pci: probe of 0000:01:00.4 failed with error -12 This issue happens when the value of the sagaw corresponds to a 48-bit agaw. This fix updates the calculation of the agaw based on the minimum of IOMMU's sagaw value and MGAW. This issue happens on the code path of getting a private domain for a device. A private domain was needed when the domain of an iommu group couldn't meet the requirement of a device. The IOMMU core has been evolved to eliminate the need for private domain, hence this code path has alreay been removed from the upstream since commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA domain"). Instead of back porting all patches that are required for removing the private domain, this simply fixes it in the affected stable kernel between v4.16 and v5.7. [baolu: The orignal patch could be found here https://lore.kernel.org/linux-iommu/20210412202736.70765-1-saeed.mirzamohammadi@oracle.com/. I added commit message according to Greg's comments at https://lore.kernel.org/linux-iommu/YHZ%2FT9x7Xjf1r6fI@kroah.com/.] Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org #v4.16+ Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Tested-by: Camille Lu <camille.lu@hpe.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translationXiyu Yang
[ Upstream commit 7c8f176d6a3fa18aa0f8875da6f7c672ed2a8554 ] The reference counting issue happens in several exception handling paths of arm_smmu_iova_to_phys_hard(). When those error scenarios occur, the function forgets to decrease the refcount of "smmu" increased by arm_smmu_rpm_get(), causing a refcount leak. Fix this issue by jumping to "out" label when those error scenarios occur. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/1623293391-17261-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get failsXiyu Yang
[ Upstream commit 1adf30f198c26539a62d761e45af72cde570413d ] arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the refcount of the "smmu" even though the return value is less than 0. The reference counting issue happens in some error handling paths of arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get() fails, the caller functions forget to decrease the refcount of "smmu" increased by arm_smmu_rpm_get(), causing a refcount leak. Fix this issue by calling pm_runtime_resume_and_get() instead of pm_runtime_get_sync() in arm_smmu_rpm_get(), which can keep the refcount balanced in case of failure. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Link: https://lore.kernel.org/r/1623293672-17954-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14iommu/dma: Fix compile warning in 32-bit buildsJoerg Roedel
commit 7154cbd31c2069726cf730b0ed94e2e79a221602 upstream. Compiling the recent dma-iommu changes under 32-bit x86 triggers this compile warning: drivers/iommu/dma-iommu.c:249:5: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘unsigned int’} [-Wformat=] The reason is that %llx is used to print a variable of type phys_addr_t. Fix it by using the correct %pa format specifier for phys_addr_t. Cc: Srinath Mannam <srinath.mannam@broadcom.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Oza Pawandeep <poza@codeaurora.org> Fixes: 571f316074a20 ("iommu/dma: Fix IOVA reserve dma ranges") Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20210607124905.27525-1-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14iommu/dma: Fix IOVA reserve dma rangesSrinath Mannam
[ Upstream commit 571f316074a203e979ea90211d9acf423dfe5f46 ] Fix IOVA reserve failure in the case when address of first memory region listed in dma-ranges is equal to 0x0. Fixes: aadad097cd46f ("iommu/dma: Reserve IOVA for PCIe inaccessible DMA address") Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20200914072319.6091-1-srinath.mannam@broadcom.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03iommu/vt-d: Fix sysfs leak in alloc_iommu()Rolf Eike Beer
commit 0ee74d5a48635c848c20f152d0d488bf84641304 upstream. iommu_device_sysfs_add() is called before, so is has to be cleaned on subsequent errors. Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device") Cc: stable@vger.kernel.org # 4.11.x Signed-off-by: Rolf Eike Beer <eb@emlix.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/17411490.HIIP88n32C@mobilepool36.emlix.com Link: https://lore.kernel.org/r/20210525070802.361755-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19iommu/amd: Remove performance counter pre-initialization testSuravee Suthikulpanit
[ Upstream commit 994d6608efe4a4c8834bdc5014c86f4bc6aceea6 ] In early AMD desktop/mobile platforms (during 2013), when the IOMMU Performance Counter (PMC) support was first introduced in commit 30861ddc9cca ("perf/x86/amd: Add IOMMU Performance Counter resource management"), there was a HW bug where the counters could not be accessed. The result was reading of the counter always return zero. At the time, the suggested workaround was to add a test logic prior to initializing the PMC feature to check if the counters can be programmed and read back the same value. This has been working fine until the more recent desktop/mobile platforms start enabling power gating for the PMC, which prevents access to the counters. This results in the PMC support being disabled unnecesarily. Unfortunatly, there is no documentation of since which generation of hardware the original PMC HW bug was fixed. Although, it was fixed soon after the first introduction of the PMC. Base on this, we assume that the buggy platforms are less likely to be in used, and it should be relatively safe to remove this legacy logic. Link: https://lore.kernel.org/linux-iommu/alpine.LNX.3.20.13.2006030935570.3181@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) <ml.linux@elloe.vision> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Alexander Monakov <amonakov@ispras.ru> Cc: David Coe <david.coe@live.co.uk> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20210409085848.3908-3-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19Revert "iommu/amd: Fix performance counter initialization"Paul Menzel
[ Upstream commit 715601e4e36903a653cd4294dfd3ed0019101991 ] This reverts commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b. The original commit tries to address an issue, where PMC power-gating causing the IOMMU PMC pre-init test to fail on certain desktop/mobile platforms where the power-gating is normally enabled. There have been several reports that the workaround still does not guarantee to work, and can add up to 100 ms (on the worst case) to the boot process on certain platforms such as the MSI B350M MORTAR with AMD Ryzen 3 2200G. Therefore, revert this commit as a prelude to removing the pre-init test. Link: https://lore.kernel.org/linux-iommu/alpine.LNX.3.20.13.2006030935570.3181@monopod.intra.ispras.ru/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Cc: Tj (Elloe Linux) <ml.linux@elloe.vision> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Alexander Monakov <amonakov@ispras.ru> Cc: David Coe <david.coe@live.co.uk> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20210409085848.3908-2-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17iommu/amd: Fix performance counter initializationSuravee Suthikulpanit
[ Upstream commit 6778ff5b21bd8e78c8bd547fd66437cf2657fd9b ] Certain AMD platforms enable power gating feature for IOMMU PMC, which prevents the IOMMU driver from updating the counter while trying to validate the PMC functionality in the init_iommu_perf_ctr(). This results in disabling PMC support and the following error message: "AMD-Vi: Unable to read/write to IOMMU perf counter" To workaround this issue, disable power gating temporarily by programming the counter source to non-zero value while validating the counter, and restore the prior state afterward. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Tj (Elloe Linux) <ml.linux@elloe.vision> Link: https://lore.kernel.org/r/20210208122712.5048-1-suravee.suthikulpanit@amd.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753 Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-11iommu/amd: Fix sleeping in atomic in increase_address_space()Andrey Ryabinin
commit 140456f994195b568ecd7fc2287a34eadffef3ca upstream. increase_address_space() calls get_zeroed_page(gfp) under spin_lock with disabled interrupts. gfp flags passed to increase_address_space() may allow sleeping, so it comes to this: BUG: sleeping function called from invalid context at mm/page_alloc.c:4342 in_atomic(): 1, irqs_disabled(): 1, pid: 21555, name: epdcbbf1qnhbsd8 Call Trace: dump_stack+0x66/0x8b ___might_sleep+0xec/0x110 __alloc_pages_nodemask+0x104/0x300 get_zeroed_page+0x15/0x40 iommu_map_page+0xdd/0x3e0 amd_iommu_map+0x50/0x70 iommu_map+0x106/0x220 vfio_iommu_type1_ioctl+0x76e/0x950 [vfio_iommu_type1] do_vfs_ioctl+0xa3/0x6f0 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4e/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by moving get_zeroed_page() out of spin_lock/unlock section. Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Acked-by: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210217143004.19165-1-arbn@yandex-team.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10iommu/vt-d: Do not use flush-queue when caching-mode is onNadav Amit
commit 29b32839725f8c89a41cb6ee054c85f3116ea8b5 upstream. When an Intel IOMMU is virtualized, and a physical device is passed-through to the VM, changes of the virtual IOMMU need to be propagated to the physical IOMMU. The hypervisor therefore needs to monitor PTE mappings in the IOMMU page-tables. Intel specifications provide "caching-mode" capability that a virtual IOMMU uses to report that the IOMMU is virtualized and a TLB flush is needed after mapping to allow the hypervisor to propagate virtual IOMMU mappings to the physical IOMMU. To the best of my knowledge no real physical IOMMU reports "caching-mode" as turned on. Synchronizing the virtual and the physical IOMMU tables is expensive if the hypervisor is unaware which PTEs have changed, as the hypervisor is required to walk all the virtualized tables and look for changes. Consequently, domain flushes are much more expensive than page-specific flushes on virtualized IOMMUs with passthrough devices. The kernel therefore exploited the "caching-mode" indication to avoid domain flushing and use page-specific flushing in virtualized environments. See commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.") This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing"). Now, when batched TLB flushing is used (the default), full TLB domain flushes are performed frequently, requiring the hypervisor to perform expensive synchronization between the virtual TLB and the physical one. Getting batched TLB flushes to use page-specific invalidations again in such circumstances is not easy, since the TLB invalidation scheme assumes that "full" domain TLB flushes are performed for scalability. Disable batched TLB flushes when caching-mode is on, as the performance benefit from using batched TLB invalidations is likely to be much smaller than the overhead of the virtual-to-physical IOMMU page-tables synchronization. Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Nadav Amit <namit@vmware.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210127175317.1600473-1-namit@vmware.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not builtBartosz Golaszewski
commit 9def3b1a07c41e21c68a0eb353e3e569fdd1d2b1 upstream. Since commit c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") dmar.c needs struct iommu_device to be selected. We can drop this dependency by not dereferencing struct iommu_device if IOMMU_API is not selected and by reusing the information stored in iommu->drhd->ignored instead. This fixes the following build error when IOMMU_API is not selected: drivers/iommu/dmar.c: In function ‘free_iommu’: drivers/iommu/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’ 1139 | if (intel_iommu_enabled && iommu->iommu.ops) { ^ Fixes: c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20201013073055.11262-1-brgl@bgdev.pl Signed-off-by: Joerg Roedel <jroedel@suse.de> [ - context change due to moving drivers/iommu/dmar.c to drivers/iommu/intel/dmar.c - set the drhr in the iommu like in upstream commit b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") ] Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03iommu/vt-d: Gracefully handle DMAR units with no supported address widthsDavid Woodhouse
commit c40aaaac1018ff1382f2d35df5129a6bcea3df6b upstream. Instead of bailing out completely, such a unit can still be used for interrupt remapping. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/linux-iommu/549928db2de6532117f36c9c810373c14cf76f51.camel@infradead.org/ Signed-off-by: Joerg Roedel <jroedel@suse.de> [ context change due to moving drivers/iommu/dmar.c to drivers/iommu/intel/dmar.c ] Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev()Lu Baolu
commit 2d6ffc63f12417b979955a5b22ad9a76d2af5de9 upstream. The VT-d hardware will ignore those Addr bits which have been masked by the AM field in the PASID-based-IOTLB invalidation descriptor. As the result, if the starting address in the descriptor is not aligned with the address mask, some IOTLB caches might not invalidate. Hence people will see below errors. [ 1093.704661] dmar_fault: 29 callbacks suppressed [ 1093.704664] DMAR: DRHD: handling fault status reg 3 [ 1093.712738] DMAR: [DMA Read] Request device [7a:02.0] PASID 2 fault addr 7f81c968d000 [fault reason 113] SM: Present bit in first-level paging entry is clear Fix this by using aligned address for PASID-based-IOTLB invalidation. Fixes: 1c4f88b7f1f9 ("iommu/vt-d: Shared virtual address in scalable mode") Reported-and-tested-by: Guo Kaijie <Kaijie.Guo@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201231005323.2178523-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17iommu/intel: Fix memleak in intel_irq_remapping_allocDinghao Liu
commit ff2b46d7cff80d27d82f7f3252711f4ca1666129 upstream. When irq_domain_get_irq_data() or irqd_cfg() fails at i == 0, data allocated by kzalloc() has not been freed before returning, which leads to memleak. Fixes: b106ee63abcc ("irq_remapping/vt-d: Enhance Intel IR driver to support hierarchical irqdomains") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210105051837.32118-1-dinghao.liu@zju.edu.cn Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-11iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEsSuravee Suthikulpanit
commit 4165bf015ba9454f45beaad621d16c516d5c5afe upstream. According to the AMD IOMMU spec, the commit 73db2fc595f3 ("iommu/amd: Increase interrupt remapping table limit to 512 entries") also requires the interrupt table length (IntTabLen) to be set to 9 (power of 2) in the device table mapping entry (DTE). Fixes: 73db2fc595f3 ("iommu/amd: Increase interrupt remapping table limit to 512 entries") Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20201207091920.3052-1-suravee.suthikulpanit@amd.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24iommu/vt-d: Avoid panic if iommu init fails in tboot systemZhenzhong Duan
[ Upstream commit 4d213e76a359e540ca786ee937da7f35faa8e5f8 ] "intel_iommu=off" command line is used to disable iommu but iommu is force enabled in a tboot system for security reason. However for better performance on high speed network device, a new option "intel_iommu=tboot_noforce" is introduced to disable the force on. By default kernel should panic if iommu init fail in tboot for security reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off". Fix the code setting force_on and move intel_iommu_tboot_noforce from tboot code to intel iommu code. Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-18iommu/amd: Increase interrupt remapping table limit to 512 entriesSuravee Suthikulpanit
[ Upstream commit 73db2fc595f358460ce32bcaa3be1f0cce4a2db1 ] Certain device drivers allocate IO queues on a per-cpu basis. On AMD EPYC platform, which can support up-to 256 cpu threads, this can exceed the current MAX_IRQ_PER_TABLE limit of 256, and result in the error message: AMD-Vi: Failed to allocate IRTE This has been observed with certain NVME devices. AMD IOMMU hardware can actually support upto 512 interrupt remapping table entries. Therefore, update the driver to match the hardware limit. Please note that this also increases the size of interrupt remapping table to 8KB per device when using the 128-bit IRTE format. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20201015025002.87997-1-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-18iommu/vt-d: Fix a bug for PDP check in prq_event_threadLiu, Yi L
[ Upstream commit 71cd8e2d16703a9df5c86a9e19f4cba99316cc53 ] In prq_event_thread(), the QI_PGRP_PDP is wrongly set by 'req->pasid_present' which should be replaced to 'req->priv_data_present'. Fixes: 5b438f4ba315 ("iommu/vt-d: Support page request in scalable mode") Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1604025444-6954-3-git-send-email-yi.y.sun@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-14iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()Lu Baolu
[ Upstream commit 1a3f2fd7fc4e8f24510830e265de2ffb8e3300d2 ] Lock(&iommu->lock) without disabling irq causes lockdep warnings. [ 12.703950] ======================================================== [ 12.703962] WARNING: possible irq lock inversion dependency detected [ 12.703975] 5.9.0-rc6+ #659 Not tainted [ 12.703983] -------------------------------------------------------- [ 12.703995] systemd-udevd/284 just changed the state of lock: [ 12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at: iommu_flush_dev_iotlb.part.57+0x2e/0x90 [ 12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 12.704043] (&iommu->lock){+.+.}-{2:2} [ 12.704045] and interrupts could create inverse lock ordering between them. [ 12.704073] other info that might help us debug this: [ 12.704085] Possible interrupt unsafe locking scenario: [ 12.704097] CPU0 CPU1 [ 12.704106] ---- ---- [ 12.704115] lock(&iommu->lock); [ 12.704123] local_irq_disable(); [ 12.704134] lock(device_domain_lock); [ 12.704146] lock(&iommu->lock); [ 12.704158] <Interrupt> [ 12.704164] lock(device_domain_lock); [ 12.704174] *** DEADLOCK *** Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-07iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()Yu Kuai
[ Upstream commit 1a26044954a6d1f4d375d5e62392446af663be7a ] if of_find_device_by_node() succeed, exynos_iommu_of_xlate() doesn't have a corresponding put_device(). Thus add put_device() to fix the exception handling for this function implementation. Fixes: aa759fd376fb ("iommu/exynos: Add callback for initializing devices from device tree") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20200918011335.909141-1-yukuai3@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-26iommu/amd: Use cmpxchg_double() when updating 128-bit IRTESuravee Suthikulpanit
commit e52d58d54a321d4fe9d0ecdabe4f8774449f0d6e upstream. When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode), current driver disables interrupt remapping when it updates the IRTE so that the upper and lower 64-bit values can be updated safely. However, this creates a small window, where the interrupt could arrive and result in IO_PAGE_FAULT (for interrupt) as shown below. IOMMU Driver Device IRQ ============ =========== irte.RemapEn=0 ... change IRTE IRQ from device ==> IO_PAGE_FAULT !! ... irte.RemapEn=1 This scenario has been observed when changing irq affinity on a system running I/O-intensive workload, in which the destination APIC ID in the IRTE is updated. Instead, use cmpxchg_double() to update the 128-bit IRTE at once without disabling the interrupt remapping. However, this means several features, which require GA (128-bit IRTE) support will also be affected if cmpxchg16b is not supported (which is unprecedented for AMD processors w/ IOMMU). Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure") Reported-by: Sean Osborne <sean.m.osborne@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Erik Rockstrom <erik.rockstrom@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20200903093822.52012-3-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-23iommu/amd: Fix potential @entry null derefJoao Martins
[ Upstream commit 14c4acc5ed22c21f9821103be7c48efdf9763584 ] After commit 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE"), smatch warns: drivers/iommu/amd/iommu.c:3870 amd_iommu_deactivate_guest_mode() warn: variable dereferenced before check 'entry' (see line 3867) Fix this by moving the @valid assignment to after @entry has been checked for NULL. Fixes: 26e495f34107 ("iommu/amd: Restore IRTE.RemapEn bit after programming IRTE") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20200910171621.12879-1-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-17iommu/amd: Do not use IOMMUv2 functionality when SME is activeJoerg Roedel
[ Upstream commit 2822e582501b65707089b097e773e6fd70774841 ] When memory encryption is active the device is likely not in a direct mapped domain. Forbid using IOMMUv2 functionality for now until finer grained checks for this have been implemented. Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200824105415.21000-3-joro@8bytes.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09iommu/vt-d: Handle 36bit addressing for x86-32Chris Wilson
commit 29aaebbca4abc4cceb38738483051abefafb6950 upstream. Beware that the address size for x86-32 may exceed unsigned long. [ 0.368971] UBSAN: shift-out-of-bounds in drivers/iommu/intel/iommu.c:128:14 [ 0.369055] shift exponent 36 is too large for 32-bit type 'long unsigned int' If we don't handle the wide addresses, the pages are mismapped and the device read/writes go astray, detected as DMAR faults and leading to device failure. The behaviour changed (from working to broken) in commit fa954e683178 ("iommu/vt-d: Delegate the dma domain to upper layer"), but the error looks older. Fixes: fa954e683178 ("iommu/vt-d: Delegate the dma domain to upper layer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: James Sewart <jamessewart@arista.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: <stable@vger.kernel.org> # v5.3+ Link: https://lore.kernel.org/r/20200822160209.28512-1-chris@chris-wilson.co.uk Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09iommu/amd: Restore IRTE.RemapEn bit after programming IRTESuravee Suthikulpanit
[ Upstream commit 26e495f341075c09023ba16dee9a7f37a021e745 ] Currently, the RemapEn (valid) bit is accidentally cleared when programming IRTE w/ guestMode=0. It should be restored to the prior state. Fixes: b9fc6b56f478 ("iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20200903093822.52012-2-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09iommu/vt-d: Serialize IOMMU GCMD register modificationsLu Baolu
[ Upstream commit 6e4e9ec65078093165463c13d4eb92b3e8d7b2e8 ] The VT-d spec requires (10.4.4 Global Command Register, GCMD_REG General Description) that: If multiple control fields in this register need to be modified, software must serialize the modifications through multiple writes to this register. However, in irq_remapping.c, modifications of IRE and CFI are done in one write. We need to do two separate writes with STS checking after each. It also checks the status register before writing command register to avoid unnecessary register write. Fixes: af8d102f999a4 ("x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20200828000615.8281-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03iommu/iova: Don't BUG on invalid PFNsRobin Murphy
[ Upstream commit d3e3d2be688b4b5864538de61e750721a311e4fc ] Unlike the other instances which represent a complete loss of consistency within the rcache mechanism itself, or a fundamental and obvious misconfiguration by an IOMMU driver, the BUG_ON() in iova_magazine_free_pfns() can be provoked at more or less any time in a "spooky action-at-a-distance" manner by any old device driver passing nonsense to dma_unmap_*() which then propagates through to queue_iova(). Not only is this well outside the IOVA layer's control, it's also nowhere near fatal enough to justify panicking anyway - all that really achieves is to make debugging the offending driver more difficult. Let's simply WARN and otherwise ignore bogus PFNs. Reported-by: Prakash Gupta <guptap@codeaurora.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Prakash Gupta <guptap@codeaurora.org> Link: https://lore.kernel.org/r/acbd2d092b42738a03a21b417ce64e27f8c91c86.1591103298.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21iommu/omap: Check for failure of a call to omap_iommu_dump_ctxColin Ian King
[ Upstream commit dee9d154f40c58d02f69acdaa5cfd1eae6ebc28b ] It is possible for the call to omap_iommu_dump_ctx to return a negative error number, so check for the failure and return the error number rather than pass the negative value to simple_read_from_buffer. Fixes: 14e0e6796a0d ("OMAP: iommu: add initial debugfs support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200714192211.744776-1-colin.king@canonical.com Addresses-Coverity: ("Improper use of negative value") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19irqdomain/treewide: Free firmware node after domain removalJon Derrick
commit ec0160891e387f4771f953b888b1fe951398e5d9 upstream. Commit 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") unintentionally caused a dangling pointer page fault issue on firmware nodes that were freed after IRQ domain allocation. Commit e3beca48a45b fixed that dangling pointer issue by only freeing the firmware node after an IRQ domain allocation failure. That fix no longer frees the firmware node immediately, but leaves the firmware node allocated after the domain is removed. The firmware node must be kept around through irq_domain_remove, but should be freed it afterwards. Add the missing free operations after domain removal where where appropriate. Fixes: e3beca48a45b ("irqdomain/treewide: Keep firmware node unconditionally allocated") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29irqdomain/treewide: Keep firmware node unconditionally allocatedThomas Gleixner
[ Upstream commit e3beca48a45b5e0e6e6a4e0124276b8248dcc9bb ] Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after creating the irqdomain. The only purpose of these FW nodes is to convey name information. When this was introduced the core code did not store the pointer to the node in the irqdomain. A recent change stored the firmware node pointer in irqdomain for other reasons and missed to notice that the usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence are broken by this. Storing a dangling pointer is dangerous itself, but in case that the domain is destroyed later on this leads to a double free. Remove the freeing of the firmware node after creating the irqdomain from all affected call sites to cure this. Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22iommu/vt-d: Make Intel SVM code 64-bit onlyLu Baolu
commit 9486727f5981a5ec5c0b699fb1777451bd6786e4 upstream. Current Intel SVM is designed by setting the pgd_t of the processor page table to FLPTR field of the PASID entry. The first level translation only supports 4 and 5 level paging structures, hence it's infeasible for the IOMMU to share a processor's page table when it's running in 32-bit mode. Let's disable 32bit support for now and claim support only when all the missing pieces are ready in the future. Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode") Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16iommu/vt-d: Don't apply gfx quirks to untrusted devicesRajat Jain
[ Upstream commit 67e8a5b18d41af9298db5c17193f671f235cce01 ] Currently, an external malicious PCI device can masquerade the VID:PID of faulty gfx devices, and thus apply iommu quirks to effectively disable the IOMMU restrictions for itself. Thus we need to ensure that the device we are applying quirks to, is indeed an internal trusted device. Signed-off-by: Rajat Jain <rajatja@google.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30iommu/vt-d: Update scalable mode paging structure coherencyLu Baolu
[ Upstream commit 04c00956ee3cd138fd38560a91452a804a8c5550 ] The Scalable-mode Page-walk Coherency (SMPWC) field in the VT-d extended capability register indicates the hardware coherency behavior on paging structures accessed through the pasid table entry. This is ignored in current code and using ECAP.C instead which is only valid in legacy mode. Fix this so that paging structure updates could be manually flushed from the cache line if hardware page walking is not snooped. Fixes: 765b6a98c1de3 ("iommu/vt-d: Enumerate the scalable mode capability") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30iommu/vt-d: Enable PCI ACS for platform opt in hintLu Baolu
[ Upstream commit 50310600ebda74b9988467e2e6128711c7ba56fc ] PCI ACS is disabled if Intel IOMMU is off by default or intel_iommu=off is used in command line. Unfortunately, Intel IOMMU will be forced on if there're devices sitting on an external facing PCI port that is marked as untrusted (for example, thunderbolt peripherals). That means, PCI ACS is disabled while Intel IOMMU is forced on to isolate those devices. As the result, the devices of an MFD will be grouped by a single group even the ACS is supported on device. [ 0.691263] pci 0000:00:07.1: Adding to iommu group 3 [ 0.691277] pci 0000:00:07.2: Adding to iommu group 3 [ 0.691292] pci 0000:00:07.3: Adding to iommu group 3 Fix it by requesting PCI ACS when Intel IOMMU is detected with platform opt in hint. Fixes: 89a6079df791a ("iommu/vt-d: Force IOMMU on for platform opt in hint") Co-developed-by: Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com> Signed-off-by: Lalithambika Krishnakumar <lalithambika.krishnakumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20200622231345.29722-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03iommu: Fix reference count leak in iommu_group_alloc.Qiushi Wu
[ Upstream commit 7cc31613734c4870ae32f5265d576ef296621343 ] kobject_init_and_add() takes reference even when it fails. Thus, when kobject_init_and_add() returns an error, kobject_put() must be called to properly clean up the kobject. Fixes: d72e31c93746 ("iommu: IOMMU Groups") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Link: https://lore.kernel.org/r/20200527210020.6522-1-wu000273@umn.edu Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27iommu/amd: Call domain_flush_complete() in update_domain()Joerg Roedel
[ Upstream commit f44a4d7e4f1cdef73c90b1dc749c4d8a7372a8eb ] The update_domain() function is expected to also inform the hardware about domain changes. This needs a COMPLETION_WAIT command to be sent to all IOMMUs which use the domain. Signed-off-by: Joerg Roedel <jroedel@suse.de> Tested-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/20200504125413.16798-4-joro@8bytes.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27iommu/amd: Fix over-read of ACPI UID from IVRS tableAlexander Monakov
[ Upstream commit e461b8c991b9202b007ea2059d953e264240b0c9 ] IVRS parsing code always tries to read 255 bytes from memory when retrieving ACPI device path, and makes an assumption that firmware provides a zero-terminated string. Both of those are bugs: the entry is likely to be shorter than 255 bytes, and zero-termination is not guaranteed. With Acer SF314-42 firmware these issues manifest visibly in dmesg: AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR0\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR1\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR2\xf0\xa5, rdevid:160 AMD-Vi: ivrs, add hid:AMDI0020, uid:\_SB.FUR3>\x83e\x8d\x9a\xd1... The first three lines show how the code over-reads adjacent table entries into the UID, and in the last line it even reads garbage data beyond the end of the IVRS table itself. Since each entry has the length of the UID (uidl member of ivhd_entry struct), use that for memcpy, and manually add a zero terminator. Avoid zero-filling hid and uid arrays up front, and instead ensure the uid array is always zero-terminated. No change needed for the hid array, as it was already properly zero-terminated. Fixes: 2a0cb4e2d423c ("iommu/amd: Add new map for storing IVHD dev entry type HID") Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Cc: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Link: https://lore.kernel.org/r/20200511102352.1831-1-amonakov@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14iommu/virtio: Reverse arguments to list_addJulia Lawall
commit fb3637a113349f53830f7d6ca45891b7192cd28f upstream. Elsewhere in the file, there is a list_for_each_entry with &vdev->resv_regions as the second argument, suggesting that &vdev->resv_regions is the list head. So exchange the arguments on the list_add call to put the list head in the second argument. Fixes: 2a5a31487445 ("iommu/virtio: Add probe request") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/1588704467-13431-1-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-06iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled systemSuravee Suthikulpanit
commit b74aa02d7a30ee5e262072a7d6e8deff10b37924 upstream. Currently, system fails to boot because the legacy interrupt remapping mode does not enable 128-bit IRTE (GA), which is required for x2APIC support. Fix by using AMD_IOMMU_GUEST_IR_LEGACY_GA mode when booting with kernel option amd_iommu_intr=legacy instead. The initialization logic will check GASup and automatically fallback to using AMD_IOMMU_GUEST_IR_LEGACY if GA mode is not supported. Fixes: 3928aa3f5775 ("iommu/amd: Detect and enable guest vAPIC support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/1587562202-14183-1-git-send-email-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-06iommu/qcom: Fix local_base status checkTang Bin
commit b52649aee6243ea661905bdc5fbe28cc5f6dec76 upstream. The function qcom_iommu_device_probe() does not perform sufficient error checking after executing devm_ioremap_resource(), which can result in crashes if a critical error path is encountered. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20200418134703.1760-1-tangbin@cmss.chinamobile.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>