summaryrefslogtreecommitdiffstats
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2019-11-24iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()Robin Murphy
[ Upstream commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 ] In removing the pagetable-wide lock, we gained the possibility of the vanishingly unlikely case where we have a race between two concurrent unmappers splitting the same block entry. The logic to handle this is fairly straightforward - whoever loses the race frees their partial next-level table and instead dereferences the winner's newly-installed entry in order to fall back to a regular unmap, which intentionally echoes the pre-existing case of recursively splitting a 1GB block down to 4KB pages by installing a full table of 2MB blocks first. Unfortunately, the chump who implemented that logic failed to update the condition check for that fallback, meaning that if said race occurs at the last level (where the loser's unmap_idx is valid) then the unmap won't actually happen. Fix that to properly account for both the race and recursive cases. Fixes: 2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation") Signed-off-by: Robin Murphy <robin.murphy@arm.com> [will: re-jig control flow to avoid duplicate cmpxchg test] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41Takashi Iwai
[ Upstream commit ad3e8da2d422c63c13819a53d3c5ea9312cc0b9d ] Acer Aspire A315-41 requires the very same workaround as the existing quirk for Dell Latitude 5495. Add the new entry for that. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1137799 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systemsKai-Heng Feng
[ Upstream commit 93d051550ee02eaff9a2541d825605a7bd778027 ] Raven Ridge systems may have malfunction touchpad or hang at boot if incorrect IVRS IOAPIC is provided by BIOS. Users already found correct "ivrs_ioapic=" values, let's put them inside kernel to workaround buggy BIOS. BugLink: https://bugs.launchpad.net/bugs/1795292 BugLink: https://bugs.launchpad.net/bugs/1837688 Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05iommu/iova: Avoid false sharing on fq_timer_onEric Dumazet
[ Upstream commit 0d87308cca2c124f9bce02383f1d9632c9be89c4 ] In commit 14bd9a607f90 ("iommu/iova: Separate atomic variables to improve performance") Jinyu Qi identified that the atomic_cmpxchg() in queue_iova() was causing a performance loss and moved critical fields so that the false sharing would not impact them. However, avoiding the false sharing in the first place seems easy. We should attempt the atomic_cmpxchg() no more than 100 times per second. Adding an atomic_read() will keep the cache line mostly shared. This false sharing came with commit 9a005a800ae8 ("iommu/iova: Add flush timer"). Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 9a005a800ae8 ('iommu/iova: Add flush timer') Cc: Jinyu Qi <jinyuqi@huawei.com> Cc: Joerg Roedel <jroedel@suse.de> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05iommu/amd: Silence warnings under memory pressureQian Cai
[ Upstream commit 3d708895325b78506e8daf00ef31549476e8586a ] When running heavy memory pressure workloads, the system is throwing endless warnings, smartpqi 0000:23:00.0: AMD-Vi: IOMMU mapping error in map_sg (io-pages: 5 reason: -12) Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 swapper/10: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0,4 Call Trace: <IRQ> dump_stack+0x62/0x9a warn_alloc.cold.43+0x8a/0x148 __alloc_pages_nodemask+0x1a5c/0x1bb0 get_zeroed_page+0x16/0x20 iommu_map_page+0x477/0x540 map_sg+0x1ce/0x2f0 scsi_dma_map+0xc6/0x160 pqi_raid_submit_scsi_cmd_with_io_request+0x1c3/0x470 [smartpqi] do_IRQ+0x81/0x170 common_interrupt+0xf/0xf </IRQ> because the allocation could fail from iommu_map_page(), and the volume of this call could be huge which may generate a lot of serial console output and cosumes all CPUs. Fix it by silencing the warning in this call site, and there is still a dev_err() later to notify the failure. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21iommu/amd: Fix race in increase_address_space()Joerg Roedel
[ Upstream commit 754265bcab78a9014f0f99cd35e0d610fcd7dfa7 ] After the conversion to lock-less dma-api call the increase_address_space() function can be called without any locking. Multiple CPUs could potentially race for increasing the address space, leading to invalid domain->mode settings and invalid page-tables. This has been happening in the wild under high IO load and memory pressure. Fix the race by locking this operation. The function is called infrequently so that this does not introduce a performance regression in the dma-api path again. Reported-by: Qian Cai <cai@lca.pw> Fixes: 256e4621c21a ('iommu/amd: Make use of the generic IOVA allocator') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21iommu/amd: Flush old domains in kdump kernelStuart Hayes
[ Upstream commit 36b7200f67dfe75b416b5281ed4ace9927b513bc ] When devices are attached to the amd_iommu in a kdump kernel, the old device table entries (DTEs), which were copied from the crashed kernel, will be overwritten with a new domain number. When the new DTE is written, the IOMMU is told to flush the DTE from its internal cache--but it is not told to flush the translation cache entries for the old domain number. Without this patch, AMD systems using the tg3 network driver fail when kdump tries to save the vmcore to a network system, showing network timeouts and (sometimes) IOMMU errors in the kernel log. This patch will flush IOMMU translation cache entries for the old domain when a DTE gets overwritten with a new domain number. Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Fixes: 3ac3e5ee5ed5 ('iommu/amd: Copy old trans table from old kernel') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16iommu/iova: Remove stale cached32_nodeChris Wilson
[ Upstream commit 9eed17d37c77171cf5ffb95c4257f87df3cd4c8f ] Since the cached32_node is allowed to be advanced above dma_32bit_pfn (to provide a shortcut into the limited range), we need to be careful to remove the to be freed node if it is the cached32_node. [ 48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110 [ 48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37 [ 48.477843] [ 48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G U 5.2.0+ #735 [ 48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 48.478047] Workqueue: i915 __i915_gem_free_work [i915] [ 48.478075] Call Trace: [ 48.478111] dump_stack+0x5b/0x90 [ 48.478137] print_address_description+0x67/0x237 [ 48.478178] ? __cached_rbnode_delete_update+0x68/0x110 [ 48.478212] __kasan_report.cold.3+0x1c/0x38 [ 48.478240] ? __cached_rbnode_delete_update+0x68/0x110 [ 48.478280] ? __cached_rbnode_delete_update+0x68/0x110 [ 48.478308] __cached_rbnode_delete_update+0x68/0x110 [ 48.478344] private_free_iova+0x2b/0x60 [ 48.478378] iova_magazine_free_pfns+0x46/0xa0 [ 48.478403] free_iova_fast+0x277/0x340 [ 48.478443] fq_ring_free+0x15a/0x1a0 [ 48.478473] queue_iova+0x19c/0x1f0 [ 48.478597] cleanup_page_dma.isra.64+0x62/0xb0 [i915] [ 48.478712] __gen8_ppgtt_cleanup+0x63/0x80 [i915] [ 48.478826] __gen8_ppgtt_cleanup+0x42/0x80 [i915] [ 48.478940] __gen8_ppgtt_clear+0x433/0x4b0 [i915] [ 48.479053] __gen8_ppgtt_clear+0x462/0x4b0 [i915] [ 48.479081] ? __sg_free_table+0x9e/0xf0 [ 48.479116] ? kfree+0x7f/0x150 [ 48.479234] i915_vma_unbind+0x1e2/0x240 [i915] [ 48.479352] i915_vma_destroy+0x3a/0x280 [i915] [ 48.479465] __i915_gem_free_objects+0xf0/0x2d0 [i915] [ 48.479579] __i915_gem_free_work+0x41/0xa0 [i915] [ 48.479607] process_one_work+0x495/0x710 [ 48.479642] worker_thread+0x4c7/0x6f0 [ 48.479687] ? process_one_work+0x710/0x710 [ 48.479724] kthread+0x1b2/0x1d0 [ 48.479774] ? kthread_create_worker_on_cpu+0xa0/0xa0 [ 48.479820] ret_from_fork+0x1f/0x30 [ 48.479864] [ 48.479907] Allocated by task 631: [ 48.479944] save_stack+0x19/0x80 [ 48.479994] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 48.480038] kmem_cache_alloc+0x91/0xf0 [ 48.480082] alloc_iova+0x2b/0x1e0 [ 48.480125] alloc_iova_fast+0x58/0x376 [ 48.480166] intel_alloc_iova+0x90/0xc0 [ 48.480214] intel_map_sg+0xde/0x1f0 [ 48.480343] i915_gem_gtt_prepare_pages+0xb8/0x170 [i915] [ 48.480465] huge_get_pages+0x232/0x2b0 [i915] [ 48.480590] ____i915_gem_object_get_pages+0x40/0xb0 [i915] [ 48.480712] __i915_gem_object_get_pages+0x90/0xa0 [i915] [ 48.480834] i915_gem_object_prepare_write+0x2d6/0x330 [i915] [ 48.480955] create_test_object.isra.54+0x1a9/0x3e0 [i915] [ 48.481075] igt_shared_ctx_exec+0x365/0x3c0 [i915] [ 48.481210] __i915_subtests.cold.4+0x30/0x92 [i915] [ 48.481341] __run_selftests.cold.3+0xa9/0x119 [i915] [ 48.481466] i915_live_selftests+0x3c/0x70 [i915] [ 48.481583] i915_pci_probe+0xe7/0x220 [i915] [ 48.481620] pci_device_probe+0xe0/0x180 [ 48.481665] really_probe+0x163/0x4e0 [ 48.481710] device_driver_attach+0x85/0x90 [ 48.481750] __driver_attach+0xa5/0x180 [ 48.481796] bus_for_each_dev+0xda/0x130 [ 48.481831] bus_add_driver+0x205/0x2e0 [ 48.481882] driver_register+0xca/0x140 [ 48.481927] do_one_initcall+0x6c/0x1af [ 48.481970] do_init_module+0x106/0x350 [ 48.482010] load_module+0x3d2c/0x3ea0 [ 48.482058] __do_sys_finit_module+0x110/0x180 [ 48.482102] do_syscall_64+0x62/0x1f0 [ 48.482147] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.482190] [ 48.482224] Freed by task 37: [ 48.482273] save_stack+0x19/0x80 [ 48.482318] __kasan_slab_free+0x12e/0x180 [ 48.482363] kmem_cache_free+0x70/0x140 [ 48.482406] __free_iova+0x1d/0x30 [ 48.482445] fq_ring_free+0x15a/0x1a0 [ 48.482490] queue_iova+0x19c/0x1f0 [ 48.482624] cleanup_page_dma.isra.64+0x62/0xb0 [i915] [ 48.482749] __gen8_ppgtt_cleanup+0x63/0x80 [i915] [ 48.482873] __gen8_ppgtt_cleanup+0x42/0x80 [i915] [ 48.482999] __gen8_ppgtt_clear+0x433/0x4b0 [i915] [ 48.483123] __gen8_ppgtt_clear+0x462/0x4b0 [i915] [ 48.483250] i915_vma_unbind+0x1e2/0x240 [i915] [ 48.483378] i915_vma_destroy+0x3a/0x280 [i915] [ 48.483500] __i915_gem_free_objects+0xf0/0x2d0 [i915] [ 48.483622] __i915_gem_free_work+0x41/0xa0 [i915] [ 48.483659] process_one_work+0x495/0x710 [ 48.483704] worker_thread+0x4c7/0x6f0 [ 48.483748] kthread+0x1b2/0x1d0 [ 48.483787] ret_from_fork+0x1f/0x30 [ 48.483831] [ 48.483868] The buggy address belongs to the object at ffff88870fc19000 [ 48.483868] which belongs to the cache iommu_iova of size 40 [ 48.483920] The buggy address is located 32 bytes inside of [ 48.483920] 40-byte region [ffff88870fc19000, ffff88870fc19028) [ 48.483964] The buggy address belongs to the page: [ 48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0 [ 48.484045] flags: 0x8000000000010200(slab|head) [ 48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0 [ 48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000 [ 48.484188] page dumped because: kasan: bad access detected [ 48.484230] [ 48.484265] Memory state around the buggy address: [ 48.484314] ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 48.484361] ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc [ 48.484451] ^ [ 48.484494] ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 48.484530] ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602 Fixes: e60aa7b53845 ("iommu/iova: Extend rbtree node caching") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: <stable@vger.kernel.org> # v4.15+ Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-06iommu/dma: Handle SG length overflow betterRobin Murphy
[ Upstream commit ab2cbeb0ed301a9f0460078e91b09f39958212ef ] Since scatterlist dimensions are all unsigned ints, in the relatively rare cases where a device's max_segment_size is set to UINT_MAX, then the "cur_len + s_length <= max_len" check in __finalise_sg() will always return true. As a result, the corner case of such a device mapping an excessively large scatterlist which is mergeable to or beyond a total length of 4GB can lead to overflow and a bogus truncated dma_length in the resulting segment. As we already assume that any single segment must be no longer than max_len to begin with, this can easily be addressed by reshuffling the comparison. Fixes: 809eac54cdd6 ("iommu/dma: Implement scatterlist segment merging") Reported-by: Nicolin Chen <nicoleotsuka@gmail.com> Tested-by: Nicolin Chen <nicoleotsuka@gmail.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25iommu/amd: Move iommu_init_pci() to .init sectionJoerg Roedel
commit 24d2c521749d8547765b555b7a85cca179bb2275 upstream. The function is only called from another __init function, so it should be moved to .init too. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04iommu/vt-d: Don't queue_iova() if there is no flush queueDmitry Safonov
commit effa467870c7612012885df4e246bdb8ffd8e44c upstream. Intel VT-d driver was reworked to use common deferred flushing implementation. Previously there was one global per-cpu flush queue, afterwards - one per domain. Before deferring a flush, the queue should be allocated and initialized. Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush queue. It's probably worth to init it for static or unmanaged domains too, but it may be arguable - I'm leaving it to iommu folks. Prevent queuing an iova flush if the domain doesn't have a queue. The defensive check seems to be worth to keep even if queue would be initialized for all kinds of domains. And is easy backportable. On 4.19.43 stable kernel it has a user-visible effect: previously for devices in si domain there were crashes, on sata devices: BUG: spinlock bad magic on CPU#6, swapper/0/1 lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x45/0x115 intel_unmap+0x107/0x113 intel_unmap_sg+0x6b/0x76 __ata_qc_complete+0x7f/0x103 ata_qc_complete+0x9b/0x26a ata_qc_complete_multiple+0xd0/0xe3 ahci_handle_port_interrupt+0x3ee/0x48a ahci_handle_port_intr+0x73/0xa9 ahci_single_level_irq_intr+0x40/0x60 __handle_irq_event_percpu+0x7f/0x19a handle_irq_event_percpu+0x32/0x72 handle_irq_event+0x38/0x56 handle_edge_irq+0x102/0x121 handle_irq+0x147/0x15c do_IRQ+0x66/0xf2 common_interrupt+0xf/0xf RIP: 0010:__do_softirq+0x8c/0x2df The same for usb devices that use ehci-pci: BUG: spinlock bad magic on CPU#0, swapper/0/1 lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x77/0x145 intel_unmap+0x107/0x113 intel_unmap_page+0xe/0x10 usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d usb_hcd_unmap_urb_for_dma+0x17/0x100 unmap_urb_for_dma+0x22/0x24 __usb_hcd_giveback_urb+0x51/0xc3 usb_giveback_urb_bh+0x97/0xde tasklet_action_common.isra.4+0x5f/0xa1 tasklet_action+0x2d/0x30 __do_softirq+0x138/0x2df irq_exit+0x7d/0x8b smp_apic_timer_interrupt+0x10f/0x151 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39 Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: iommu@lists.linux-foundation.org Cc: <stable@vger.kernel.org> # 4.14+ Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> [v4.14-port notes: o minor conflict with untrusted IOMMU devices check under if-condition] Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26iommu: Fix a leak in iommu_insert_resv_regionEric Auger
[ Upstream commit ad0834dedaa15c3a176f783c0373f836e44b4700 ] In case we expand an existing region, we unlink this latter and insert the larger one. In that case we should free the original region after the insertion. Also we can immediately return. Fixes: 6c65fb318e8b ("iommu: iommu_get_group_resv_regions") Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-19iommu/arm-smmu: Avoid constant zero in TLBI writesRobin Murphy
commit 4e4abae311e4b44aaf61f18a826fd7136037f199 upstream. Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU global register space are still, in fact, using a hypervisor to mediate it by trapping and emulating register accesses. Sadly, some deployed versions of said trapping code have bugs wherein they go horribly wrong for stores using r31 (i.e. XZR/WZR) as the source register. While this can be mitigated for GCC today by tweaking the constraints for the implementation of writel_relaxed(), to avoid any potential arms race with future compilers more aggressively optimising register allocation, the simple way is to just remove all the problematic constant zeros. For the write-only TLB operations, the actual value is irrelevant anyway and any old nearby variable will provide a suitable GPR to encode. The one point at which we really do need a zero to clear a context bank happens before any of the TLB maintenance where crashes have been reported, so is apparently not a problem... :/ Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Acked-by: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-15iommu/arm-smmu-v3: Don't disable SMMU in kdump kernelWill Deacon
[ Upstream commit 3f54c447df34ff9efac7809a4a80fd3208efc619 ] Disabling the SMMU when probing from within a kdump kernel so that all incoming transactions are terminated can prevent the core of the crashed kernel from being transferred off the machine if all I/O devices are behind the SMMU. Instead, continue to probe the SMMU after it is disabled so that we can reinitialise it entirely and re-attach the DMA masters as they are reset. Since the kdump kernel may not have drivers for all of the active DMA masters, we suppress fault reporting to avoid spamming the console and swamping the IRQ threads. Reported-by: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com> Tested-by: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com> Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15iommu/vt-d: Set intel_iommu_gfx_mapped correctlyLu Baolu
[ Upstream commit cf1ec4539a50bdfe688caad4615ca47646884316 ] The intel_iommu_gfx_mapped flag is exported by the Intel IOMMU driver to indicate whether an IOMMU is used for the graphic device. In a virtualized IOMMU environment (e.g. QEMU), an include-all IOMMU is used for graphic device. This flag is found to be clear even the IOMMU is used. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Reported-by: Zhenyu Wang <zhenyuw@linux.intel.com> Fixes: c0771df8d5297 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.") Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-25iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114Dmitry Osipenko
commit 43a0541e312f7136e081e6bf58f6c8a2e9672688 upstream. Both Tegra30 and Tegra114 have 4 ASID's and the corresponding bitfield of the TLB_FLUSH register differs from later Tegra generations that have 128 ASID's. In a result the PTE's are now flushed correctly from TLB and this fixes problems with graphics (randomly failing tests) on Tegra30. Cc: stable <stable@vger.kernel.org> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-10iommu/amd: Set exclusion range correctlyJoerg Roedel
[ Upstream commit 3c677d206210f53a4be972211066c0f1cd47fe12 ] The exlcusion range limit register needs to contain the base-address of the last page that is part of the range, as bits 0-11 of this register are treated as 0xfff by the hardware for comparisons. So correctly set the exclusion range in the hardware to the last page which is _in_ the range. Fixes: b2026aa2dce44 ('x86, AMD IOMMU: add functions for programming IOMMU MMIO space') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-04iommu/amd: Reserve exclusion range in iova-domainJoerg Roedel
[ Upstream commit 8aafaaf2212192012f5bae305bb31cdf7681d777 ] If a device has an exclusion range specified in the IVRS table, this region needs to be reserved in the iova-domain of that device. This hasn't happened until now and can cause data corruption on data transfered with these devices. Treat exclusion ranges as reserved regions in the iommu-core to fix the problem. Fixes: be2a022c0dd0 ('x86, AMD IOMMU: add functions to parse IOMMU memory mapping requirements for devices') Signed-off-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
2019-04-20iommu/dmar: Fix buffer overflow during PCI bus notificationJulia Cartwright
[ Upstream commit cffaaf0c816238c45cd2d06913476c83eb50f682 ] Commit 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device path") changed the type of the path data, however, the change in path type was not reflected in size calculations. Update to use the correct type and prevent a buffer overflow. This bug manifests in systems with deep PCI hierarchies, and can lead to an overflow of the static allocated buffer (dmar_pci_notify_info_buf), or can lead to overflow of slab-allocated data. BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0 Write of size 1 at addr ffffffff90445d80 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.14.87-rt49-02406-gd0a0e96 #1 Call Trace: ? dump_stack+0x46/0x59 ? print_address_description+0x1df/0x290 ? dmar_alloc_pci_notify_info+0x1d5/0x2e0 ? kasan_report+0x256/0x340 ? dmar_alloc_pci_notify_info+0x1d5/0x2e0 ? e820__memblock_setup+0xb0/0xb0 ? dmar_dev_scope_init+0x424/0x48f ? __down_write_common+0x1ec/0x230 ? dmar_dev_scope_init+0x48f/0x48f ? dmar_free_unused_resources+0x109/0x109 ? cpumask_next+0x16/0x20 ? __kmem_cache_create+0x392/0x430 ? kmem_cache_create+0x135/0x2f0 ? e820__memblock_setup+0xb0/0xb0 ? intel_iommu_init+0x170/0x1848 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? migrate_enable+0x27a/0x5b0 ? sched_setattr+0x20/0x20 ? migrate_disable+0x1fc/0x380 ? task_rq_lock+0x170/0x170 ? try_to_run_init_process+0x40/0x40 ? locks_remove_file+0x85/0x2f0 ? dev_prepare_static_identity_mapping+0x78/0x78 ? rt_spin_unlock+0x39/0x50 ? lockref_put_or_lock+0x2a/0x40 ? dput+0x128/0x2f0 ? __rcu_read_unlock+0x66/0x80 ? __fput+0x250/0x300 ? __rcu_read_lock+0x1b/0x30 ? mntput_no_expire+0x38/0x290 ? e820__memblock_setup+0xb0/0xb0 ? pci_iommu_init+0x25/0x63 ? pci_iommu_init+0x25/0x63 ? do_one_initcall+0x7e/0x1c0 ? initcall_blacklisted+0x120/0x120 ? kernel_init_freeable+0x27b/0x307 ? rest_init+0xd0/0xd0 ? kernel_init+0xf/0x120 ? rest_init+0xd0/0xd0 ? ret_from_fork+0x1f/0x40 The buggy address belongs to the variable: dmar_pci_notify_info_buf+0x40/0x60 Fixes: 57384592c433 ("iommu/vt-d: Store bus information in RMRR PCI device path") Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20iommu/vt-d: Check capability before disabling protected memoryLu Baolu
[ Upstream commit 5bb71fc790a88d063507dc5d445ab8b14e845591 ] The spec states in 10.4.16 that the Protected Memory Enable Register should be treated as read-only for implementations not supporting protected memory regions (PLMR and PHMR fields reported as Clear in the Capability register). Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: mark gross <mgross@intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Fixes: f8bab73515ca5 ("intel-iommu: PMEN support") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tablesNicolas Boichat
[ Upstream commit 032ebd8548c9d05e8d2bdc7a7ec2fe29454b0ad0 ] L1 tables are allocated with __get_dma_pages, and therefore already ignored by kmemleak. Without this, the kernel would print this error message on boot, when the first L1 table is allocated: [ 2.810533] kmemleak: Trying to color unknown object at 0xffffffd652388000 as Black [ 2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S 4.19.16 #8 [ 2.831227] Workqueue: events deferred_probe_work_func [ 2.836353] Call trace: ... [ 2.852532] paint_ptr+0xa0/0xa8 [ 2.855750] kmemleak_ignore+0x38/0x6c [ 2.859490] __arm_v7s_alloc_table+0x168/0x1f4 [ 2.863922] arm_v7s_alloc_pgtable+0x114/0x17c [ 2.868354] alloc_io_pgtable_ops+0x3c/0x78 ... Fixes: e5fc9753b1a8314 ("iommu/io-pgtable: Add ARMv7 short descriptor support") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-03iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debuggingNicolas Boichat
commit 0a352554da69b02f75ca3389c885c741f1f63235 upstream. IOMMUs using ARMv7 short-descriptor format require page tables (level 1 and 2) to be allocated within the first 4GB of RAM, even on 64-bit systems. For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32 is defined (e.g. on arm64 platforms). For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note that we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc, as this is not strictly necessary, and would cause a warning in mm/sl*b.c, as we did not update GFP_SLAB_BUG_MASK. Also, print an error when the physical address does not fit in 32-bit, to make debugging easier in the future. Link: http://lkml.kernel.org/r/20181210011504.122604-3-drinkcat@chromium.org Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32") Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Huaisheng Ye <yehs1@lenovo.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sasha Levin <Alexander.Levin@microsoft.com> Cc: Tomasz Figa <tfiga@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Cc: Yong Wu <yong.wu@mediatek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-27iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZEStanislaw Gruszka
commit 4e50ce03976fbc8ae995a000c4b10c737467beaa upstream. Take into account that sg->offset can be bigger than PAGE_SIZE when setting segment sg->dma_address. Otherwise sg->dma_address will point at diffrent page, what makes DMA not possible with erros like this: xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020] xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020] Additinally with wrong sg->dma_address unmap_sg will free wrong pages, what what can cause crashes like this: Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon pfn:39e8b1 Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000() Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000 Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) xhci_hcd(E) Feb 28 19:27:45 kernel: scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E) Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G B W E 4.20.12-arch1-1-custom #1 Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018 Feb 28 19:27:45 kernel: Call Trace: Feb 28 19:27:45 kernel: dump_stack+0x5c/0x80 Feb 28 19:27:45 kernel: bad_page.cold.29+0x7f/0xb2 Feb 28 19:27:45 kernel: __free_pages_ok+0x2c0/0x2d0 Feb 28 19:27:45 kernel: skb_release_data+0x96/0x180 Feb 28 19:27:45 kernel: __kfree_skb+0xe/0x20 Feb 28 19:27:45 kernel: tcp_recvmsg+0x894/0xc60 Feb 28 19:27:45 kernel: ? reuse_swap_page+0x120/0x340 Feb 28 19:27:45 kernel: ? ptep_set_access_flags+0x23/0x30 Feb 28 19:27:45 kernel: inet_recvmsg+0x5b/0x100 Feb 28 19:27:45 kernel: __sys_recvfrom+0xc3/0x180 Feb 28 19:27:45 kernel: ? handle_mm_fault+0x10a/0x250 Feb 28 19:27:45 kernel: ? syscall_trace_enter+0x1d3/0x2d0 Feb 28 19:27:45 kernel: ? __audit_syscall_exit+0x22a/0x290 Feb 28 19:27:45 kernel: __x64_sys_recvfrom+0x24/0x30 Feb 28 19:27:45 kernel: do_syscall_64+0x5b/0x170 Feb 28 19:27:45 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: stable@vger.kernel.org Reported-and-tested-by: Jan Viktorin <jan.viktorin@gmail.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Fixes: 80187fd39dcb ('iommu/amd: Optimize map_sg and unmap_sg') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13iommu/amd: Fix IOMMU page flush when detach device from a domainSuravee Suthikulpanit
[ Upstream commit 9825bd94e3a2baae1f4874767ae3a7d4c049720e ] When a VM is terminated, the VFIO driver detaches all pass-through devices from VFIO domain by clearing domain id and page table root pointer from each device table entry (DTE), and then invalidates the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages. Currently, the IOMMU driver keeps track of which IOMMU and how many devices are attached to the domain. When invalidate IOMMU pages, the driver checks if the IOMMU is still attached to the domain before issuing the invalidate page command. However, since VFIO has already detached all devices from the domain, the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as there is no IOMMU attached to the domain. This results in data corruption and could cause the PCI device to end up in indeterministic state. Fix this by invalidate IOMMU pages when detach a device, and before decrementing the per-domain device reference counts. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Joerg Roedel <joro@8bytes.org> Co-developed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Fixes: 6de8ad9b9ee0 ('x86/amd-iommu: Make iommu_flush_pages aware of multiple IOMMUs') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13iommu/amd: Unmap all mapped pages in error path of map_sgJerry Snitselaar
[ Upstream commit f1724c0883bb0ce93b8dcb94b53dcca3b75ac9a7 ] In the error path of map_sg there is an incorrect if condition for breaking out of the loop that searches the scatterlist for mapped pages to unmap. Instead of breaking out of the loop once all the pages that were mapped have been unmapped, it will break out of the loop after it has unmapped 1 page. Fix the condition, so it breaks out of the loop only after all the mapped pages have been unmapped. Fixes: 80187fd39dcb ("iommu/amd: Optimize map_sg and unmap_sg") Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13iommu/amd: Call free_iova_fast with pfn in map_sgJerry Snitselaar
[ Upstream commit 51d8838d66d3249508940d8f59b07701f2129723 ] In the error path of map_sg, free_iova_fast is being called with address instead of the pfn. This results in a bad value getting into the rcache, and can result in hitting a BUG_ON when iova_magazine_free_pfns is called. Cc: Joerg Roedel <joro@8bytes.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: 80187fd39dcb ("iommu/amd: Optimize map_sg and unmap_sg") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12iommu/arm-smmu-v3: Use explicit mb() when moving cons pointerWill Deacon
[ Upstream commit a868e8530441286342f90c1fd9c5f24de3aa2880 ] After removing an entry from a queue (e.g. reading an event in arm_smmu_evtq_thread()) it is necessary to advance the MMIO consumer pointer to free the queue slot back to the SMMU. A memory barrier is required here so that all reads targetting the queue entry have completed before the consumer pointer is updated. The implementation of queue_inc_cons() relies on a writel() to complete the previous reads, but this is incorrect because writel() is only guaranteed to complete prior writes. This patch replaces the call to writel() with an mb(); writel_relaxed() sequence, which gives us the read->write ordering which we require. Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12iommu/arm-smmu: Add support for qcom,smmu-v2 variantVivek Gautam
[ Upstream commit 89cddc563743cb1e0068867ac97013b2a5bf86aa ] qcom,smmu-v2 is an arm,smmu-v2 implementation with specific clock and power requirements. On msm8996, multiple cores, viz. mdss, video, etc. use this smmu. On sdm845, this smmu is used with gpu. Add bindings for the same. Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12iommu/arm-smmu-v3: Avoid memory corruption from Hisilicon MSI payloadsZhen Lei
[ Upstream commit 84a9a75774961612d0c7dd34a1777e8f98a65abd ] The GITS_TRANSLATER MMIO doorbell register in the ITS hardware is architected to be 4 bytes in size, yet on hi1620 and earlier, Hisilicon have allocated the adjacent 4 bytes to carry some IMPDEF sideband information which results in an 8-byte MSI payload being delivered when signalling an interrupt: MSIAddr: |----4bytes----|----4bytes----| | MSIData | IMPDEF | This poses no problem for the ITS hardware because the adjacent 4 bytes are reserved in the memory map. However, when delivering MSIs to memory, as we do in the SMMUv3 driver for signalling the completion of a SYNC command, the extended payload will corrupt the 4 bytes adjacent to the "sync_count" member in struct arm_smmu_device. Fortunately, the current layout allocates these bytes to padding, but this is fragile and we should make this explicit. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> [will: Rewrote commit message and comment] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12iommu/amd: Fix amd_iommu=force_isolationYu Zhao
[ Upstream commit c12b08ebbe16f0d3a96a116d86709b04c1ee8e74 ] The parameter is still there but it's ignored. We need to check its value before deciding to go into passthrough mode for AMD IOMMU v2 capable device. We occasionally use this parameter to force v2 capable device into translation mode to debug memory corruption that we suspect is caused by DMA writes. To address the following comment from Joerg Roedel on the first version, v2 capability of device is completely ignored. > This breaks the iommu_v2 use-case, as it needs a direct mapping for the > devices that support it. And from Documentation/admin-guide/kernel-parameters.txt: This option does not override iommu=pt Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-06iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()Gerald Schaefer
commit 198bc3252ea3a45b0c5d500e6a5b91cfdd08f001 upstream. Commit 9d3a4de4cb8d ("iommu: Disambiguate MSI region types") changed the reserved region type in intel_iommu_get_resv_regions() from IOMMU_RESV_RESERVED to IOMMU_RESV_MSI, but it forgot to also change the type in intel_iommu_put_resv_regions(). This leads to a memory leak, because now the check in intel_iommu_put_resv_regions() for IOMMU_RESV_RESERVED will never be true, and no allocated regions will be freed. Fix this by changing the region type in intel_iommu_put_resv_regions() to IOMMU_RESV_MSI, matching the type of the allocated regions. Fixes: 9d3a4de4cb8d ("iommu: Disambiguate MSI region types") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13iommu/vt-d: Handle domain agaw being less than iommu agawSohil Mehta
commit 3569dd07aaad71920c5ea4da2d5cc9a167c1ffd4 upstream. The Intel IOMMU driver opportunistically skips a few top level page tables from the domain paging directory while programming the IOMMU context entry. However there is an implicit assumption in the code that domain's adjusted guest address width (agaw) would always be greater than IOMMU's agaw. The IOMMU capabilities in an upcoming platform cause the domain's agaw to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports both 4-level and 5-level paging. The domain builds a 4-level page table based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In this case the code incorrectly tries to skip page page table levels. This causes the IOMMU driver to avoid programming the context entry. The fix handles this case and programs the context entry accordingly. Fixes: de24e55395698 ("iommu/vt-d: Simplify domain_context_mapping_one") Cc: <stable@vger.kernel.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reported-by: Ramos Falcon, Ernesto R <ernesto.r.ramos.falcon@intel.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09iommu/arm-smmu-v3: Fix big-endian CMD_SYNC writesRobin Murphy
commit 3cd508a8c1379427afb5e16c2e0a7c986d907853 upstream. When we insert the sync sequence number into the CMD_SYNC.MSIData field, we do so in CPU-native byte order, before writing out the whole command as explicitly little-endian dwords. Thus on big-endian systems, the SMMU will receive and write back a byteswapped version of sync_nr, which would be perfect if it were targeting a similarly-little-endian ITS, but since it's actually writing back to memory being polled by the CPUs, they're going to end up seeing the wrong thing. Since the SMMU doesn't care what the MSIData actually contains, the minimal-overhead solution is to simply add an extra byteswap initially, such that it then writes back the big-endian format directly. Cc: <stable@vger.kernel.org> Fixes: 37de98f8f1cf ("iommu/arm-smmu-v3: Use CMD_SYNC completion MSI") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-13iommu/vt-d: Use memunmap to free memremapPan Bian
[ Upstream commit 829383e183728dec7ed9150b949cd6de64127809 ] memunmap() should be used to free the return of memremap(), not iounmap(). Fixes: dfddb969edf0 ('iommu/vt-d: Switch from ioremap_cache to memremap') Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-13amd/iommu: Fix Guest Virtual APIC Log Tail Address RegisterFilippo Sironi
[ Upstream commit ab99be4683d9db33b100497d463274ebd23bd67e ] This register should have been programmed with the physical address of the memory location containing the shadow tail pointer for the guest virtual APIC log instead of the base address. Fixes: 8bda0cfbdc1a ('iommu/amd: Detect and initialize guest vAPIC log') Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Wei Wang <wawei@amazon.de> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-13iommu/ipmmu-vmsa: Fix crash on early domain freeGeert Uytterhoeven
[ Upstream commit e5b78f2e349eef5d4fca5dc1cf5a3b4b2cc27abd ] If iommu_ops.add_device() fails, iommu_ops.domain_free() is still called, leading to a crash, as the domain was only partially initialized: ipmmu-vmsa e67b0000.mmu: Cannot accommodate DMA translation for IOMMU page tables sata_rcar ee300000.sata: Unable to initialize IPMMU context iommu: Failed to add device ee300000.sata to group 0: -22 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 ... Call trace: ipmmu_domain_free+0x1c/0xa0 iommu_group_release+0x48/0x68 kobject_put+0x74/0xe8 kobject_del.part.0+0x3c/0x50 kobject_put+0x60/0xe8 iommu_group_get_for_dev+0xa8/0x1f0 ipmmu_add_device+0x1c/0x40 of_iommu_configure+0x118/0x190 Fix this by checking if the domain's context already exists, before trying to destroy it. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Fixes: d25a2a16f0889 ('iommu: Add driver for Renesas VMSA-compatible IPMMU') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-13iommu/vt-d: Fix NULL pointer dereference in prq_event_thread()Lu Baolu
[ Upstream commit 19ed3e2dd8549c1a34914e8dad01b64e7837645a ] When handling page request without pasid event, go to "no_pasid" branch instead of "bad_req". Otherwise, a NULL pointer deference will happen there. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: a222a7f0bb6c9 'iommu/vt-d: Implement page request handling' Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-13iommu/arm-smmu: Ensure that page-table updates are visible before TLBIWill Deacon
commit 7d321bd3542500caf125249f44dc37cb4e738013 upstream. The IO-pgtable code relies on the driver TLB invalidation callbacks to ensure that all page-table updates are visible to the IOMMU page-table walker. In the case that the page-table walker is cache-coherent, we cannot rely on an implicit DSB from the DMA-mapping code, so we must ensure that we execute a DSB in our tlb_add_flush() callback prior to triggering the invalidation. Cc: <stable@vger.kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Fixes: 2df7a25ce4a7 ("iommu/arm-smmu: Clean up DMA API usage") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-05iommu/amd: Clear memory encryption mask from physical addressSingh, Brijesh
Boris Ostrovsky reported a memory leak with device passthrough when SME is active. The VFIO driver uses iommu_iova_to_phys() to get the physical address for an iova. This physical address is later passed into vfio_unmap_unpin() to unpin the memory. The vfio_unmap_unpin() uses pfn_valid() before unpinning the memory. The pfn_valid() check was failing because encryption mask was part of the physical address returned. This resulted in the memory not being unpinned and therefore leaked after the guest terminates. The memory encryption mask must be cleared from the physical address in iommu_iova_to_phys(). Fixes: 2543a786aa25 ("iommu/amd: Allow the AMD IOMMU to work with memory encryption") Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: <iommu@lists.linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-26iommu/amd: Return devid as alias for ACPI HID devicesArindam Nath
ACPI HID devices do not actually have an alias for them in the IVRS. But dev_data->alias is still used for indexing into the IOMMU device table for devices being handled by the IOMMU. So for ACPI HID devices, we simply return the corresponding devid as an alias, as parsed from IVRS table. Signed-off-by: Arindam Nath <arindam.nath@amd.com> Fixes: 2bf9a0a12749 ('iommu/amd: Add iommu support for ACPI HID devices') Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-25iommu/vt-d: Handle memory shortage on pasid table allocationLu Baolu
Pasid table memory allocation could return failure due to memory shortage. Limit the pasid table size to 1MiB because current 8MiB contiguous physical memory allocation can be hard to come by. W/o a PASID table, the device could continue to work with only shared virtual memory impacted. So, let's go ahead with context mapping even the memory allocation for pasid table failed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107783 Fixes: cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces") Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Reported-and-tested-by: Pelton Kyle D <kyle.d.pelton@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-09-25iommu/rockchip: Free irqs in shutdown handlerHeiko Stuebner
In the iommu's shutdown handler we disable runtime-pm which could result in the irq-handler running unclocked and since commit 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework") we warn about that fact. This can cause warnings on shutdown on some Rockchip machines, so free the irqs in the shutdown handler before we disable runtime-pm. Reported-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Fixes: 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-08-25Merge tag 'armsoc-late' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC late updates from Olof Johansson: "A couple of late-merged changes that would be useful to get in this merge window: - Driver support for reset of audio complex on Meson platforms. The audio driver went in this merge window, and these changes have been in -next for a while (just not in our tree). - Power management fixes for IOMMU on Rockchip platforms, getting closer to kexec working on them, including Chromebooks. - Another pass updating "arm,psci" -> "psci" for some properties that have snuck in since last time it was done" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: iommu/rockchip: Move irq request past pm_runtime_enable iommu/rockchip: Handle errors returned from PM framework arm64: rockchip: Force CONFIG_PM on Rockchip systems ARM: rockchip: Force CONFIG_PM on Rockchip systems arm64: dts: Fix various entry-method properties to reflect documentation reset: imx7: Fix always writing bits as 0 reset: meson: add meson audio arb driver reset: meson: add dt-bindings for meson-axg audio arb
2018-08-24Merge tag 'iommu-updates-v4.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - PASID table handling updates for the Intel VT-d driver. It implements a global PASID space now so that applications usings multiple devices will just have one PASID. - A new config option to make iommu passthroug mode the default. - New sysfs attribute for iommu groups to export the type of the default domain. - A debugfs interface (for debug only) usable by IOMMU drivers to export internals to user-space. - R-Car Gen3 SoCs support for the ipmmu-vmsa driver - The ARM-SMMU now aborts transactions from unknown devices and devices not attached to any domain. - Various cleanups and smaller fixes all over the place. * tag 'iommu-updates-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits) iommu/omap: Fix cache flushes on L2 table entries iommu: Remove the ->map_sg indirection iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel iommu/arm-smmu-v3: Prevent any devices access to memory without registration iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA iommu/ipmmu-vmsa: Clarify supported platforms iommu/ipmmu-vmsa: Fix allocation in atomic context iommu: Add config option to set passthrough as default iommu: Add sysfs attribyte for domain type iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register iommu/arm-smmu: Error out only if not enough context interrupts iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows the PTE iommu/io-pgtable-arm: Fix pgtable allocation in selftest iommu/vt-d: Remove the obsolete per iommu pasid tables iommu/vt-d: Apply per pci device pasid table in SVA iommu/vt-d: Allocate and free pasid table iommu/vt-d: Per PCI device pasid table interfaces iommu/vt-d: Add for_each_device_domain() helper iommu/vt-d: Move device_domain_info to header iommu/vt-d: Apply global PASID in SVA ...
2018-08-24iommu/rockchip: Move irq request past pm_runtime_enableMarc Zyngier
Enabling the interrupt early, before power has been applied to the device, can result in an interrupt being delivered too early if: - the IOMMU shares an interrupt with a VOP - the VOP has a pending interrupt (after a kexec, for example) In these conditions, we end-up taking the interrupt without the IOMMU being ready to handle the interrupt (not powered on). Moving the interrupt request past the pm_runtime_enable() call makes sure we can at least access the IOMMU registers. Note that this is only a partial fix, and that the VOP interrupt will still be screaming until the VOP driver kicks in, which advocates for a more synchronized interrupt enabling/disabling approach. Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support") Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2018-08-24iommu/rockchip: Handle errors returned from PM frameworkMarc Zyngier
pm_runtime_get_if_in_use can fail: either PM has been disabled altogether (-EINVAL), or the device hasn't been enabled yet (0). Sadly, the Rockchip IOMMU driver tends to conflate the two things by considering a non-zero return value as successful. This has the consequence of hiding other bugs, so let's handle this case throughout the driver, with a WARN_ON_ONCE so that we can try and work out what happened. Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support") Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2018-08-18Merge tag 'driver-core-4.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are all of the driver core and related patches for 4.19-rc1. Nothing huge here, just a number of small cleanups and the ability to now stop the deferred probing after init happens. All of these have been in linux-next for a while with only a merge issue reported" * tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits) base: core: Remove WARN_ON from link dependencies check drivers/base: stop new probing during shutdown drivers: core: Remove glue dirs from sysfs earlier driver core: remove unnecessary function extern declare sysfs.h: fix non-kernel-doc comment PM / Domains: Stop deferring probe at the end of initcall iommu: Remove IOMMU_OF_DECLARE iommu: Stop deferring probe at end of initcalls pinctrl: Support stopping deferred probe after initcalls dt-bindings: pinctrl: add a 'pinctrl-use-default' property driver core: allow stopping deferred probe after init driver core: add a debugfs entry to show deferred devices sysfs: Fix internal_create_group() for named group updates base: fix order of OF initialization linux/device.h: fix kernel-doc notation warning Documentation: update firmware loader fallback reference kobject: Replace strncpy with memcpy drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number kernfs: Replace strncpy with memcpy device: Add #define dev_fmt similar to #define pr_fmt ...
2018-08-17kernel/dma: remove unsupported gfp_mask parameter from ↵Marek Szyprowski
dma_alloc_from_contiguous() The CMA memory allocator doesn't support standard gfp flags for memory allocation, so there is no point having it as a parameter for dma_alloc_from_contiguous() function. Replace it by a boolean no_warn argument, which covers all the underlaying cma_alloc() function supports. This will help to avoid giving false feeling that this function supports standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer, what has already been an issue: see commit dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag"). Link: http://lkml.kernel.org/r/20180709122020eucas1p21a71b092975cb4a3b9954ffc63f699d1~-sqUFoa-h2939329393eucas1p2Y@eucas1p2.samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michał Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17mm: convert return type of handle_mm_fault() caller to vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t") In this patch all the caller of handle_mm_fault() are changed to return vm_fault_t type. Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Tony Luck <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: David S. Miller <davem@davemloft.net> Cc: Richard Weinberger <richard@nod.at> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-08Merge branches 'arm/shmobile', 'arm/renesas', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'arm/omap', 'x86/amd', 'x86/vt-d' and 'core' into next