aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vfio/vfio_iommu_type1.c
AgeCommit message (Collapse)Author
2014-10-15Merge tag 'iommu-updates-v3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "This pull-request includes: - change in the IOMMU-API to convert the former iommu_domain_capable function to just iommu_capable - various fixes in handling RMRR ranges for the VT-d driver (one fix requires a device driver core change which was acked by Greg KH) - the AMD IOMMU driver now assigns and deassigns complete alias groups to fix issues with devices using the wrong PCI request-id - MMU-401 support for the ARM SMMU driver - multi-master IOMMU group support for the ARM SMMU driver - various other small fixes all over the place" * tag 'iommu-updates-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits) iommu/vt-d: Work around broken RMRR firmware entries iommu/vt-d: Store bus information in RMRR PCI device path iommu/vt-d: Only remove domain when device is removed driver core: Add BUS_NOTIFY_REMOVED_DEVICE event iommu/amd: Fix devid mapping for ivrs_ioapic override iommu/irq_remapping: Fix the regression of hpet irq remapping iommu: Fix bus notifier breakage iommu/amd: Split init_iommu_group() from iommu_init_device() iommu: Rework iommu_group_get_for_pci_dev() iommu: Make of_device_id array const amd_iommu: do not dereference a NULL pointer address. iommu/omap: Remove omap_iommu unused owner field iommu: Remove iommu_domain_has_cap() API function IB/usnic: Convert to use new iommu_capable() API function vfio: Convert to use new iommu_capable() API function kvm: iommu: Convert to use new iommu_capable() API function iommu/tegra: Convert to iommu_capable() API function iommu/msm: Convert to iommu_capable() API function iommu/vt-d: Convert to iommu_capable() API function iommu/fsl: Convert to iommu_capable() API function ...
2014-09-29vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU typeWill Deacon
VFIO allows devices to be safely handed off to userspace by putting them behind an IOMMU configured to ensure DMA and interrupt isolation. This enables userspace KVM clients, such as kvmtool and qemu, to further map the device into a virtual machine. With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU translation services to the guest operating system, which are nested with the existing translation installed by VFIO. However, enabling this feature means that the IOMMU driver must be informed that the VFIO domain is being created for the purposes of nested translation. This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO type-1 driver. The new IOMMU type acts identically to the VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING attribute on its IOMMU domains. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-09-25vfio: Convert to use new iommu_capable() API functionJoerg Roedel
Cc: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-05-30vfio/iommu_type1: Avoid overflowAlex Williamson
Coverity reports use of a tained scalar used as a loop boundary. For the most part, any values passed from userspace for a DMA mapping size, IOVA, or virtual address are valid, with some alignment constraints. The size is ultimately bound by how many pages the user is able to lock, IOVA is tested by the IOMMU driver when doing a map, and the virtual address needs to pass get_user_pages. The only problem I can find is that we do expect the __u64 user values to fit within our variables, which might not happen on 32bit platforms. Add a test for this and return error on overflow. Also propagate use of the type-correct local variables throughout the function. The above also points to the 'end' variable, which can be zero if we're operating at the very top of the address space. We try to account for this, but our loop botches it. Rework the loop to use the remaining size as our loop condition rather than the IOVA vs end. Detected by Coverity: CID 714659 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-04-03Merge tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: "VFIO updates for v3.15 include: - Allow the vfio-type1 IOMMU to support multiple domains within a container - Plumb path to query whether all domains are cache-coherent - Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation - Always select CONFIG_ANON_INODES, vfio depends on it (Arnd) The first patch also makes the vfio-type1 IOMMU driver completely independent of the bus_type of the devices it's handling, which enables it to be used for both vfio-pci and a future vfio-platform (and hopefully combinations involving both simultaneously)" * tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio: vfio: always select ANON_INODES kvm/vfio: Support for DMA coherent IOMMUs vfio: Add external user check extension interface vfio/type1: Add extension to test DMA cache coherence of IOMMU vfio/iommu_type1: Multi-IOMMU domain support
2014-03-04mm: close PageTail raceDavid Rientjes
Commit bf6bddf1924e ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-26vfio/type1: Add extension to test DMA cache coherence of IOMMUAlex Williamson
Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to be able to test whether coherency is currently enforced. Add an extension for this. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-02-26vfio/iommu_type1: Multi-IOMMU domain supportAlex Williamson
We currently have a problem that we cannot support advanced features of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee that those features will be supported by all of the hardware units involved with the domain over its lifetime. For instance, the Intel VT-d architecture does not require that all DRHDs support snoop control. If we create a domain based on a device behind a DRHD that does support snoop control and enable SNP support via the IOMMU_CACHE mapping option, we cannot then add a device behind a DRHD which does not support snoop control or we'll get reserved bit faults from the SNP bit in the pagetables. To add to the complexity, we can't know the properties of a domain until a device is attached. We could pass this problem off to userspace and require that a separate vfio container be used, but we don't know how to handle page accounting in that case. How do we know that a page pinned in one container is the same page as a different container and avoid double billing the user for the page. The solution is therefore to support multiple IOMMU domains per container. In the majority of cases, only one domain will be required since hardware is typically consistent within a system. However, this provides us the ability to validate compatibility of domains and support mixed environments where page table flags can be different between domains. To do this, our DMA tracking needs to change. We currently try to coalesce user mappings into as few tracking entries as possible. The problem then becomes that we lose granularity of user mappings. We've never guaranteed that a user is able to unmap at a finer granularity than the original mapping, but we must honor the granularity of the original mapping. This coalescing code is therefore removed, allowing only unmaps covering complete maps. The change in accounting is fairly small here, a typical QEMU VM will start out with roughly a dozen entries, so it's arguable if this coalescing was ever needed. We also move IOMMU domain creation to the point where a group is attached to the container. An interesting side-effect of this is that we now have access to the device at the time of domain creation and can probe the devices within the group to determine the bus_type. This finally makes vfio_iommu_type1 completely device/bus agnostic. In fact, each IOMMU domain can host devices on different buses managed by different physical IOMMUs, and present a single DMA mapping interface to the user. When a new domain is created, mappings are replayed to bring the IOMMU pagetables up to the state of the current container. And of course, DMA mapping and unmapping automatically traverse all of the configured IOMMU domains. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com>
2013-10-11VFIO: vfio_iommu_type1: fix bug caused by break in nested loopAntonios Motakis
In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking that pages are not already mapped. Since the check is being done in a for loop nested within the main loop, breaking out of it does not create the intended behavior. If the underlying IOMMU driver returns a non-NULL value, this will be ignored and mapping the DMA range will be attempted anyway, leading to unpredictable behavior. This interracts badly with the ARM SMMU driver issue fixed in the patch that was submitted with the title: "[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys" Both fixes are required in order to use the vfio_iommu_type1 driver with an ARM SMMU. This patch refactors the function slightly, in order to also make this kind of bug less likely. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-07-01vfio/type1: Fix leak on error pathAlex Williamson
We also don't handle unpinning zero pages as an error on other exits so we can fix that inconsistency by rolling in the next conditional return. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-25vfio/type1: Fix missed frees and zero sized removesAlex Williamson
With hugepage support we can only properly aligned and sized ranges. We only guarantee that we can unmap the same ranges mapped and not arbitrary sub-ranges. This means we might not free anything or might free more than requested. The vfio unmap interface started storing the unmapped size to return to userspace to handle this. This patch fixes a few places where we don't properly handle those cases, moves a memory allocation to a place where failure is an option and checks our loops to make sure we don't get into an infinite loop trying to remove an overlap. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-21vfio: Provide module option to disable vfio_iommu_type1 hugepage supportAlex Williamson
Add a module option to vfio_iommu_type1 to disable IOMMU hugepage support. This causes iommu_map to only be called with single page mappings, disabling the IOMMU driver's ability to use hugepages. This option can be enabled by loading vfio_iommu_type1 with disable_hugepages=1 or dynamically through sysfs. If enabled dynamically, only new mappings are restricted. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-21vfio: hugepage support for vfio_iommu_type1Alex Williamson
We currently send all mappings to the iommu in PAGE_SIZE chunks, which prevents the iommu from enabling support for larger page sizes. We still need to pin pages, which means we step through them in PAGE_SIZE chunks, but we can batch up contiguous physical memory chunks to allow the iommu the opportunity to use larger pages. The approach here is a bit different that the one currently used for legacy KVM device assignment. Rather than looking at the vma page size and using that as the maximum size to pass to the iommu, we instead simply look at whether the next page is physically contiguous. This means we might ask the iommu to map a 4MB region, while legacy KVM might limit itself to a maximum of 2MB. Splitting our mapping path also allows us to be smarter about locked memory because we can more easily unwind if the user attempts to exceed the limit. Therefore, rather than assuming that a mapping will result in locked memory, we test each page as it is pinned to determine whether it locks RAM vs an mmap'd MMIO region. This should result in better locking granularity and less locked page fudge factors in userspace. The unmap path uses the same algorithm as legacy KVM. We don't want to track the pfn for each mapping ourselves, but we need the pfn in order to unpin pages. We therefore ask the iommu for the iova to physical address translation, ask it to unpin a page, and see how many pages were actually unpinned. iommus supporting large pages will often return something bigger than a page here, which we know will be physically contiguous and we can unpin a batch of pfns. iommus not supporting large mappings won't see an improvement in batching here as they only unmap a page at a time. With this change, we also make a clarification to the API for mapping and unmapping DMA. We can only guarantee unmaps at the same granularity as used for the original mapping. In other words, unmapping a subregion of a previous mapping is not guaranteed and may result in a larger or smaller unmapping than requested. The size field in the unmapping structure is updated to reflect this. Previously this was unmodified on mapping, always returning the the requested unmap size. This is now updated to return the actual unmap size on success, allowing userspace to appropriately track mappings. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-06-21vfio: Convert type1 iommu to use rbtreeAlex Williamson
We need to keep track of all the DMA mappings of an iommu container so that it can be automatically unmapped when the user releases the file descriptor. We currently do this using a simple list, where we merge entries with contiguous iovas and virtual addresses. Using a tree for this is a bit more efficient and allows us to use common code instead of inventing our own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-07-31vfio: Type1 IOMMU implementationAlex Williamson
This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel VT-d hardware, but is potentially usable by anything supporting similar mapping functionality. We arbitrarily call this a Type1 backend for lack of a better name. This backend has no IOVA or host memory mapping restrictions for the user and is optimized for relatively static mappings. Mapped areas are pinned into system memory. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>