summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/i915_gem_request.h
AgeCommit message (Collapse)Author
2016-12-22drm/i915: Fix oopses in the overlay code due to i915_gem_active stuffVille Syrjälä
The i915_gem_active stuff doesn't like a NULL ->retire hook, but the overlay code can set it to NULL. That obviously ends up oopsing. Fix it by introducing a new helper to assign the retirement callback that will switch out the NULL function pointer with i915_gem_retire_noop. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Fixes: 0d9bdd886f29 ("drm/i915: Convert intel_overlay to request tracking") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161207175647.10018-1-chris@chris-wilson.co.uk
2016-12-18drm/i915: Unify active context tracking between legacy/execlists/gucChris Wilson
The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
2016-11-14drm/i915/scheduler: Execute requests in order of prioritiesChris Wilson
Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
2016-11-14drm/i915/scheduler: Record all dependencies upon request constructionChris Wilson
The scheduler needs to know the dependencies of each request for the lifetime of the request, as it may choose to reschedule the requests at any time and must ensure the dependency tree is not broken. This is in additional to using the fence to only allow execution after all dependencies have been completed. One option was to extend the fence to support the bidirectional dependency tracking required by the scheduler. However the mismatch in lifetimes between the submit fence and the request essentially meant that we had to build a completely separate struct (and we could not simply reuse the existing waitqueue in the fence for one half of the dependency tracking). The extra dependency tracking simply did not mesh well with the fence, and keeping it separate both keeps the fence implementation simpler and allows us to extend the dependency tracking into a priority tree (whilst maintaining support for reordering the tree). To avoid the additional allocations and list manipulations, the use of the priotree is disabled when there are no schedulers to use it. v2: Create a dedicated slab for i915_dependency. Rename the lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
2016-11-14drm/i915: Defer transfer onto execution timeline to actual hw submissionChris Wilson
Defer the transfer from the client's timeline onto the execution timeline from the point of readiness to the point of actual submission. For example, in execlists, a request is finally submitted to hardware when the hardware is ready, and only put onto the hardware queue when the request is ready. By deferring the transfer, we ensure that the timeline is maintained in retirement order if we decide to queue the requests onto the hardware in a different order than fifo. v2: Rebased onto distinct global/user timeline lock classes. v3: Play with the position of the spin_lock(). v4: Nesting finally resolved with distinct sw_fence lock classes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk
2016-11-14drm/i915: Split request submit/execute phase into twoChris Wilson
In order to support deferred scheduling, we need to differentiate between when the request is ready to run (i.e. the submit fence is signaled) and when the request is actually run (a new execute fence). This is typically split between the request itself wanting to wait upon others (for which we use the submit fence) and the CPU wanting to wait upon the request, for which we use the execute fence to be sure the hardware is ready to signal completion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-3-chris@chris-wilson.co.uk
2016-11-11drm/i915: Split out i915_vma.cJoonas Lahtinen
As a side product, had to split two other files; - i915_gem_fence_reg.h - i915_gem_object.h (only parts that needed immediate untanglement) I tried to move code in as big chunks as possible, to make review easier. i915_vma_compare was moved to a header temporarily. v2: - Use i915_gem_fence_reg.{c,h} v3: - Rebased v4: - Fix building when DEBUG_GEM is enabled by reordering a bit. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-11-08drm/i915: Remove two sloppy inline functions from .hJoonas Lahtinen
Get rid of sloppy inline functions now that we don't have more users: i915_gem_request_get_seqno i915_gem_request_get_engine v2: - request->engine is always non-NULL (Chris) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478589108-3702-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-10-28drm/i915: Introduce a global_seqno for each requestChris Wilson
Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
2016-10-28drm/i915: Combine seqno + tracking into a global timeline structChris Wilson
Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
2016-10-28drm/i915: Move GEM activity tracking into a common struct reservation_objectChris Wilson
In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which *i915* engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked()Chris Wilson
Since we only use the more generic unlocked variant, just rename it as the normal i915_gem_active_wait(). The temporary cost is that we need to always acquire the reference in a RCU safe manner, but the benefit is that we will combine the common paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-5-chris@chris-wilson.co.uk
2016-10-28drm/i915: Rearrange i915_wait_request() accounting with callersChris Wilson
Our low-level wait routine has evolved from our generic wait interface that handled unlocked, RPS boosting, waits with time tracking. If we push our GEM fence tracking to use reservation_objects (required for handling multiple timelines), we lose the ability to pass the required information down to i915_wait_request(). However, if we push the extra functionality from i915_wait_request() to the individual callsites (i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use of those extras, we can both simplify our low level wait and prepare for extending the GEM interface for use of reservation_objects. v2: Rewrite i915_wait_request() kerneldocs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
2016-10-28drm/i915: Support asynchronous waits on struct fence from i915_gem_requestChris Wilson
We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
2016-10-25dma-buf: Rename struct fence to dma_fenceChris Wilson
I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
2016-09-09drm/i915: Nonblocking request submissionChris Wilson
Now that we have fences in place to drive request submission, we can employ those to queue requests after their dependencies as opposed to stalling in the middle of an execbuf ioctl. (However, we still choose to spin before enabling the IRQ as that is faster - though contentious.) v2: Do the fence ordering first, where we can still fail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-20-chris@chris-wilson.co.uk
2016-09-09drm/i915: Prepare object synchronisation for asynchronicityChris Wilson
We are about to specialize object synchronisation to enable nonblocking execbuf submission. First we make a copy of the current object synchronisation for execbuffer. The general i915_gem_object_sync() will be removed following the removal of CS flips in the near future. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
2016-09-09drm/i915: Drive request submission through fence callbacksChris Wilson
Drive final request submission from a callback from the fence. This way the request is queued until all dependencies are resolved, at which point it is handed to the backend for queueing to hardware. At this point, no dependencies are set on the request, so the callback is immediate. A side-effect of imposing a heavier-irqsafe spinlock for execlist submission is that we lose the softirq enabling after scheduling the execlists tasklet. To compensate, we manually kickstart the softirq by disabling and enabling the bh around the fence signaling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk
2016-09-09drm/i915: Mark up all locked waitersChris Wilson
In the next patch we want to handle reset directly by a locked waiter in order to avoid issues with returning before the reset is handled. To handle the reset, we must first know whether we hold the struct_mutex. If we do not hold the struct_mtuex we can not perform the reset, but we do not block the reset worker either (and so we can just continue to wait for request completion) - otherwise we must relinquish the mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk
2016-09-09drm/i915: Expand bool interruptible to pass flags to i915_wait_request()Chris Wilson
We need finer control over wakeup behaviour during i915_wait_request(), so expand the current bool interruptible to a bitmask. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk
2016-09-09drm/i915: Simplify ELSP queue request trackingChris Wilson
Emulate HW to track and manage ELSP queue. A set of SW ports are defined and requests are assigned to these ports before submitting them to HW. This helps in cleaning up incomplete requests during reset recovery easier especially after engine reset by decoupling elsp queue management. This will become more clear in the next patch. In the engine reset case we want to resume where we left-off after skipping the incomplete batch which requires checking the elsp queue, removing element and fixing elsp_submitted counts in some cases. Instead of directly manipulating the elsp queue from reset path we can examine these ports, fix up ringbuffer pointers using the incomplete request and restart submissions again after reset. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-3-git-send-email-arun.siluvery@linux.intel.com Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-6-chris@chris-wilson.co.uk
2016-09-09drm/i915: Record the position of the workarounds in the tail of the requestChris Wilson
Rather than blindly assuming we need to advance the tail for resubmitting the request via the ELSP, record the position. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-3-chris@chris-wilson.co.uk
2016-08-22drm/i915: Ensure consistent control flow __i915_gem_active_get_rcuDaniel Vetter
This issue here is (I think) purely theoretical, since a compiler would need to be especially foolish to recompute the value of i915_gem_request_completed right after it was already used. Hence the additional barrier() is also not really a restriction. But I believe this to be at least permissible, and since our rcu trickery is a beast it's worth to annotate all the corner cases. Chris proposed to instead just wrap a READ_ONCE around request->fence.seqno in i915_gem_request_completed. But that has a measurable impact on code size, and everywhere we hold a full reference to the underlying request it's also not needed. And personally I'd like to have just enough barriers and locking needed for correctness, but not more - it makes it much easier in the future to understand what's going on. Since the busy ioctl has now fully embraced it's races there's no point annotating it there too. We really only need it in active_get_rcu, since that function _must_ deliver a correct snapshot of the active fences (and not chase something else). v2: Polish the comment a bit more (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1471856122-466-1-git-send-email-daniel.vetter@ffwll.ch
2016-08-15drm/i915: Move debug only per-request pid tracking from request to ctxChris Wilson
Since contexts are not currently shared between userspace processes, we have an exact correspondence between context creator and guilty batch submitter. Therefore we can save some per-batch work by inspecting the context->pid upon error instead. Note that we take the context's creator's pid rather than the file's pid in order to better track fd passed over sockets. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-29-git-send-email-chris@chris-wilson.co.uk
2016-08-15drm/i915: Track pinned VMAChris Wilson
Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searching for the relevant pin later. v2: Joonas' stylistic nitpicks, a fun rebase. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
2016-08-10drm/i915: Move setting of request->batch into its single callsiteChris Wilson
request->batch_obj is only set by execbuffer for the convenience of debugging hangs. By moving that operation to the callsite, we can simplify all other callers and future patches. We also move the complications of reference handling of the request->batch_obj next to where the active tracking is set up for the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-2-git-send-email-chris@chris-wilson.co.uk
2016-08-09drm/i915: Do not overwrite the request with zero on reallocationChris Wilson
When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915: Enable lockless lookup of request tracking via RCU"), we acknowledge that we may race with another thread that could have reallocated the request. In order for the first thread not to blow up, the second thread must not clear the request completed before overwriting it. In the RCU lookup, we allow for the engine/seqno to be replaced but we do not allow for it to be zeroed. The choice we make is to either add extra checking to the RCU lookup, or embrace the inherent races (as intended). It is more complicated as we need to manually clear everything we depend upon being zero initialised, but we benefit from not emiting the memset() to clear the entire frequently allocated structure (that memset turns up in throughput profiles). And at the same time, the lookup remains flexible for future adjustments. v2: Old style LRC requires another variable to be initialize. (The danger inherent in not zeroing everything.) v3: request->batch also needs to be cleared v4: signaling.tsk is no long used unset, but pid still exists Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-2-git-send-email-chris@chris-wilson.co.uk
2016-08-09drm/i915: Add smp_rmb() to busy ioctl's RCU danceChris Wilson
In the debate as to whether the second read of active->request is ordered after the dependent reads of the first read of active->request, just give in and throw a smp_rmb() in there so that ordering of loads is assured. v2: Explain the manual smp_rmb() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-1-git-send-email-chris@chris-wilson.co.uk
2016-08-09drm/i915: Wrap the protected active RCU dereference in a helperChris Wilson
As we do the lockdep protected RCU lookup in a couple of places, refactor that code to a common helper i915_gem_active_raw(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-2-git-send-email-chris@chris-wilson.co.uk
2016-08-09drm/i915: Remove unused i915_gem_active_peek_rcu()Chris Wilson
This was originally introduced to be used by the busy-ioctl, but in the end busy ioctl performed a different dance. Since there are no users, and no likely users, remove an unwanted chunk of the API. Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-05drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutexChris Wilson
The principal motivation for this was to try and eliminate the struct_mutex from i915_gem_suspend - but we still need to hold the mutex current for the i915_gem_context_lost(). (The issue there is that there may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and struct_mutex via the stop_machine().) For the moment, enabling last request tracking for the engine, allows us to do busyness checking and waiting without requiring the struct_mutex - which is useful in its own right. As a side-effect of having a robust means for tracking engine busyness, we can replace our other busyness heuristic, that of comparing against the last submitted seqno. For paranoid reasons, we have a semi-ordered check of that seqno inside the hangchecker, which we can now improve to an ordered check of the engine's busyness (removing a locked xchg in the process). v2: Pass along "bool interruptible" as being unlocked we cannot rely on i915->mm.interruptible being stable or even under our control. v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Introduce i915_gem_active_wait_unlocked()Chris Wilson
It is useful to be able to wait on pending rendering without grabbing the struct_mutex. We can do this by using the i915_gem_active_get_rcu() primitive to acquire a reference to the pending request without requiring struct_mutex, just the RCU read lock, and then call i915_wait_request(). v2: Rebase onto new i915_gem_active_get_unlocked() semantics that take the RCU read lock on behalf of the caller. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-1-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Enable lockless lookup of request tracking via RCUChris Wilson
If we enable RCU for the requests (providing a grace period where we can inspect a "dead" request before it is freed), we can allow callers to carefully perform lockless lookup of an active request. However, by enabling deferred freeing of requests, we can potentially hog a lot of memory when dealing with tens of thousands of requests per second - with a quick insertion of a synchronize_rcu() inside our shrinker callback, that issue disappears. v2: Currently, it is our responsibility to handle reclaim i.e. to avoid hogging memory with the delayed slab frees. At the moment, we wait for a grace period in the shrinker, and block for all RCU callbacks on oom. Suggested alternatives focus on flushing our RCU callback when we have a certain number of outstanding request frees, and blocking on that flush after a second high watermark. (So rather than wait for the system to run out of memory, we stop issuing requests - both are nondeterministic.) Paul E. McKenney wrote: Another approach is synchronize_rcu() after some largish number of requests. The advantage of this approach is that it throttles the production of callbacks at the source. The corresponding disadvantage is that it slows things up. Another approach is to use call_rcu(), but if the previous call_rcu() is still in flight, block waiting for it. Yet another approach is the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The idea is to do something like this: cond_synchronize_rcu(cookie); cookie = get_state_synchronize_rcu(); You would of course do an initial get_state_synchronize_rcu() to get things going. This would not block unless there was less than one grace period's worth of time between invocations. But this assumes a busy system, where there is almost always a grace period in flight. But you can make that happen as follows: cond_synchronize_rcu(cookie); cookie = get_state_synchronize_rcu(); call_rcu(&my_rcu_head, noop_function); Note that you need additional code to make sure that the old callback has completed before doing a new one. Setting and clearing a flag with appropriate memory ordering control suffices (e.g,. smp_load_acquire() and smp_store_release()). v3: More comments on compiler and processor order of operations within the RCU lookup and discover we can use rcu_access_pointer() here instead. v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: s/__i915_wait_request/i915_wait_request/Chris Wilson
There is only one wait on request function now, so drop the "expert" indication of leading __. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Move the special case wait-request handling to its one callerChris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-19-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Track requests inside each intel_ringChris Wilson
By tracking each request occupying space inside an individual intel_ring, we can greatly simplify the logic of tracking available space and not worry about other timelines. (Each ring is an ordered timeline of committed requests.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-17-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Refactor activity tracking for requestsChris Wilson
With the introduction of requests, we amplified the number of atomic refcounted objects we use and update every execbuffer; from none to several references, and a set of references that need to be changed. We also introduced interesting side-effects in the order of retiring requests and objects. Instead of independently tracking the last request for an object, track the active objects for each request. The object will reside in the buffer list of its most recent active request and so we reduce the kref interchange to a list_move. Now retirements are entirely driven by the request, dramatically simplifying activity tracking on the object themselves, and removing the ambiguity between retiring objects and retiring requests. Furthermore with the consolidation of managing the activity tracking centrally, we can look forward to using RCU to enable lockless lookup of the current active requests for an object. In the future, we will be able to query the status or wait upon rendering to an object without even touching the struct_mutex BKL. All told, less code, simpler and faster, and more extensible. v2: Add a typedef for the function pointer for convenience later. v3: Make the noop retirement callback explicit. Allow passing NULL to the init_request_active() which is expanded to a common noop function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Rename request->list to link for consistencyChris Wilson
We use "list" to denote the list and "link" to denote an element on that list. Rename request->list to match this idiom. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Mark up i915_gem_active for locking annotationChris Wilson
The future annotations will track the locking used for access to ensure that it is always sufficient. We make the preparations now to present the API ahead and to make sure that GCC can eliminate the unused parameter. Before: 6298417 3619610 696320 10614347 a1f64b vmlinux After: 6298417 3619610 696320 10614347 a1f64b vmlinux (with i915 builtin) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Prepare i915_gem_active for annotationsChris Wilson
In the future, we will want to add annotations to the i915_gem_active struct. The API is thus expanded to hide direct access to the contents of i915_gem_active and mediated instead through a number of helpers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Introduce i915_gem_active for request trackingChris Wilson
In the next patch, request tracking is made more generic and for that we need a new expanded struct and to separate out the logic changes from the mechanical churn, we split out the structure renaming into this patch. v2: Writer's block. Add some spiel about why we track requests. v3: Now i915_gem_active. v4: Now with i915_gem_active_set() for attaching to the active request. v5: Use i915_gem_active_set() from inside the retirement handlers Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
2016-08-02drm/i915: Rename struct intel_ringbuffer to struct intel_ringChris Wilson
The state stored in this struct is not only the information about the buffer object, but the ring used to communicate with the hardware. Using buffer here is overly specific and, for me at least, conflates with the notion of buffer objects themselves. s/struct intel_ringbuffer/struct intel_ring/ s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/ s/describe_ctx_ringbuf()/describe_ctx_ring()/ s/intel_ring_get_active_head()/intel_engine_get_active_head()/ s/intel_ring_sync_index()/intel_engine_sync_index()/ s/intel_ring_init_seqno()/intel_engine_init_seqno()/ s/ring_stuck()/engine_stuck()/ s/intel_cleanup_engine()/intel_engine_cleanup()/ s/intel_stop_engine()/intel_engine_stop()/ s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/ s/intel_unpin_ringbuffer()/intel_unpin_ring()/ s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/ s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/ s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/ s/intel_ringbuffer_free()/intel_ring_free()/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
2016-08-02drm/i915: Rename request->ringbuf to request->ringChris Wilson
Now that we have disambuigated ring and engine, we can use the clearer and more consistent name for the intel_ringbuffer pointer in the request. @@ struct drm_i915_gem_request *r; @@ - r->ringbuf + r->ring Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-12-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-2-git-send-email-chris@chris-wilson.co.uk
2016-07-20drm/i915: Rename request reference/unreference to get/putChris Wilson
Now that we derive requests from struct fence, swap over to its nomenclature for references. It's shorter and more idiomatic across the kernel. s/i915_gem_request_reference/i915_gem_request_get/ s/i915_gem_request_unreference/i915_gem_request_put/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-2-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-1-git-send-email-chris@chris-wilson.co.uk
2016-07-20drm/i915: Disable waitboosting for fence_wait()Chris Wilson
We want to restrict waitboosting to known process contexts, where we can track which clients are receiving waitboosts and prevent excessive power wasting. For fence_wait() we do not have any client tracking and so that leaves it open to abuse. v2: Hide the IS_ERR_OR_NULL testing for special clients Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-5-git-send-email-chris@chris-wilson.co.uk
2016-07-20drm/i915: Derive GEM requests from dma-fenceChris Wilson
dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspace or between drivers. v2: The fence_context needs to be globally unique, not just unique to this device. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk
2016-07-20drm/i915: Move GEM request routines to i915_gem_request.cChris Wilson
Migrate the request operations out of the main body of i915_gem.c and into their own C file for easier expansion. v2: Move __i915_add_request() across as well Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-1-git-send-email-chris@chris-wilson.co.uk