summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/gt/intel_lrc.c
AgeCommit message (Collapse)AuthorFilesLines
2019-12-10drm/i915/gt: Detect if we miss WaIdleLiteRestoreChris Wilson1-25/+21
In order to avoid confusing the HW, we must never submit an empty ring during lite-restore, that is we should always advance the RING_TAIL before submitting to stay ahead of the RING_HEAD. Normally this is prevented by keeping a couple of spare NOPs in the request->wa_tail so that on resubmission we can advance the tail. This relies on the request only being resubmitted once, which is the normal condition as it is seen once for ELSP[1] and then later in ELSP[0]. On preemption, the requests are unwound and the tail reset back to the normal end point (as we know the request is incomplete and therefore its RING_HEAD is even earlier). However, if this w/a should fail we would try and resubmit the request with the RING_TAIL already set to the location of this request's wa_tail potentially causing a GPU hang. We can spot when we do try and incorrectly resubmit without advancing the RING_TAIL and spare any embarrassment by forcing the context restore. In the case of preempt-to-busy, we leave the requests running on the HW while we unwind. As the ring is still live, we cannot rewind our rq->tail without forcing a reload so leave it set to rq->wa_tail and only force a reload if we resubmit after a lite-restore. (Normally, the forced reload will be a part of the preemption event.) Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Closes: https://gitlab.freedesktop.org/drm/intel/issues/673 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: stable@kernel.vger.org Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk (cherry picked from commit 82c69bf58650e644c61aa2bf5100b63a1070fd2f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-12-09drm/i915/gt: Save irqstate around virtual_context_destroyChris Wilson1-2/+3
As virtual_context_destroy() may be called from a request signal, it may be called from inside an irq-off section, and so we need to do a full save/restore of the irq state rather than blindly re-enable irqs upon unlocking. <4> [110.024262] WARNING: inconsistent lock state <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U <4> [110.024292] -------------------------------- <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.024592] {IN-HARDIRQ-W} state was registered at: <4> [110.024612] lock_acquire+0xa7/0x1c0 <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] <4> [110.024808] irq_work_run_list+0x49/0x70 <4> [110.024824] irq_work_run+0x26/0x50 <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 <4> [110.024855] irq_work_interrupt+0xf/0x20 <4> [110.024871] __do_softirq+0xb7/0x47f <4> [110.024885] irq_exit+0xba/0xc0 <4> [110.024898] do_IRQ+0x83/0x160 <4> [110.024910] ret_from_intr+0x0/0x1d <4> [110.024922] irq event stamp: 172864 <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 <4> [110.025031] other info that might help us debug this: <4> [110.025049] Possible unsafe locking scenario: <4> [110.025065] CPU0 <4> [110.025075] ---- <4> [110.025084] lock(&(&rq->lock)->rlock); <4> [110.025099] <Interrupt> <4> [110.025109] lock(&(&rq->lock)->rlock); <4> [110.025124] *** DEADLOCK *** <4> [110.025144] 4 locks held by kworker/0:0/5: <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.025634] stack backtrace: <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [110.025856] Workqueue: events engine_retire [i915] <4> [110.025872] Call Trace: <4> [110.025891] dump_stack+0x71/0x9b <4> [110.025907] mark_lock+0x49a/0x500 <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 <4> [110.025946] mark_held_locks+0x49/0x70 <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] <4> [110.026376] __active_retire+0xb4/0x290 [i915] <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 <4> [110.026613] i915_request_retire+0x451/0x930 [i915] <4> [110.026766] retire_requests+0x4d/0x60 [i915] <4> [110.026919] engine_retire+0x63/0xe0 [i915] Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk (cherry picked from commit 6f7ac8285371fb0df58aba861eaab387f79ed04d) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-11-25drm/i915/gt: Schedule request retirement when timeline idlesChris Wilson1-0/+9
The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") is that it disables RC6 while Skylake (and friends) is active, and we do not consider the GPU idle until all outstanding requests have been retired and the engine switched over to the kernel context. If userspace is idle, this task falls onto our background idle worker, which only runs roughly once a second, meaning that userspace has to have been idle for a couple of seconds before we enable RC6 again. Naturally, this causes us to consume considerably more energy than before as powersaving is effectively disabled while a display server (here's looking at you Xorg) is running. As execlists will get a completion event as each context is completed, we can use this interrupt to queue a retire worker bound to this engine to cleanup idle timelines. We will then immediately notice the idle engine (without userspace intervention or the aid of the background retire worker) and start parking the GPU. Thus during light workloads, we will do much more work to idle the GPU faster... Hopefully with commensurate power saving! v2: Watch context completions and only look at those local to the engine when retiring to reduce the amount of excess work we perform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315 References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk (cherry picked from commit 4f88f8747fa43c97c3b3712d8d87295ea757cc51) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-11-25drm/i915/execlists: Fixup cancel_port_requests()Chris Wilson1-9/+7
I rushed a last minute correction to cancel_port_requests() to prevent the snooping of *execlists->active as the inflight array was being updated, without noticing we iterated the inflight array starting from active! Oops. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387 Fixes: 97f9af78f38d ("drm/i915/gt: Mark the execlists->active as the primary volatile access") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk (cherry picked from commit da0ef77e1e0ccff703efee82406c629d5c4f4bbb) [Joonas: Fixed Fixes: tag to match drm-intel-next-fixes] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-11-25drm/i915/gt: Mark the execlists->active as the primary volatile accessChris Wilson1-10/+17
Since we want to do a lockless read of the current active request, and that request is written to by process_csb also without serialisation, we need to instruct gcc to take care in reading the pointer itself. Otherwise, we have observed execlists_active() to report 0x40. [ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4 [ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 [ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 } [ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940 which is impossible! The answer is that as we keep the existing execlists->active pointing into the array as we copy over that array, the unserialised read may see a partial pointer value. Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk (cherry picked from commit 331bf90591573dfe6c8e892239713ef9702f1396) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-11-25drm/i915: Mark up the calling context for intel_wakeref_put()Chris Wilson1-1/+1
Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk (cherry picked from commit 07779a76ee1f93f930cf697b22be73d16e14f50c) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-11-13drm/i915/execlists: Move reset_active() from schedule-out to schedule-inChris Wilson1-56/+62
The gem_ctx_persistence/smoketest was detecting an odd coherency issue inside the LRC context image; that the address of the ring buffer did not match our associated struct intel_ring. As we set the address into the context image when we pin the ring buffer into place before the context is active, that leaves the question of where did it get overwritten. Either the HW context save occurred after our pin which would imply that our idle barriers are broken, or we overwrote the context image ourselves. It is only in reset_active() where we dabble inside the context image outside of a serialised path from schedule-out; but we could equally perform the operation inside schedule-in which is then fully serialised with the context pin -- and remains serialised by the engine pulse with kill_context(). (The only downside, aside from doing more work inside the engine->active.lock, was the plan to merge all the reset paths into doing their context scrubbing on schedule-out needs more thought.) Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out") Testcase: igt/gem_ctx_persistence/smoketest Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk (cherry picked from commit 31b61f0ef9af62b6404d8df5dcd2cf58f80c9f53) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-10-31drm/i915: drop lrc header pageDaniele Ceraolo Spurio1-19/+3
Recent GuC binaries (including all the ones we're currently using) don't require this shared area anymore, having moved the relevant entries into the stage pool instead. i915 itself doesn't write anything into it either, so we can safely drop it. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191031013040.25803-1-daniele.ceraolospurio@intel.com
2019-10-29drm/i915/gt: Make timeslice duration configurableChris Wilson1-12/+27
Execlists uses a scheduling quantum (a timeslice) to alternate execution between ready-to-run contexts of equal priority. This ensures that all users (though only if they of equal importance) have the opportunity to run and prevents livelocks where contexts may have implicit ordering due to userspace semaphores. However, not all workloads necessarily benefit from timeslicing and in the extreme some sysadmin may want to disable or reduce the timeslicing granularity. The timeslicing mechanism can be compiled out^W^W disabled (but should DCE!) with ./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191029091632.26281-1-chris@chris-wilson.co.uk
2019-10-28drm/i915/execlists: Use vfunc to check engine submission modeMichal Wajdeczko1-1/+8
While processing CSB there is no need to look at GuC submission settings, just check if engine is configured for execlists mode. While today GuC submission is disabled it's settings are still based on modparam values that might not correctly reflect actual submission status in case of any fallback. Until that is fully fixed, use alternate method to confirm that engine really runs in execlists mode by comparing set_default_submission vfunc. v2: add other immediate use of new helper Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com
2019-10-28drm/i915/execlists: Simply walk back along request timeline on resetChris Wilson1-20/+14
The request's timeline will only contain requests from this context, in order of execution. Therefore, we can simply look back along this timeline to find the currently executing request. If we do find that the current context has completed its last request, that does not imply that all requests are completed in the context, so only advance the ring->head up to the end of the known completions! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191028124125.25176-1-chris@chris-wilson.co.uk
2019-10-26drm/i915/tgl: Adjust the location of RING_MI_MODE in the context imageChris Wilson1-3/+17
The location of RING_MI_MODE (used to stop the ring across resets) moved for Tigerlake. Fixup the new location and include a selftest to verify the location in the default context image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191026082220.32632-1-chris@chris-wilson.co.uk
2019-10-25drm/i915: Move intel_engine_context_in/out into intel_lrc.cTvrtko Ursulin1-0/+55
Intel_lrc.c is the only caller and so to avoid some header file ordering issues in future patches move these two over there. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191025090952.10135-1-tvrtko.ursulin@linux.intel.com
2019-10-24drm/i915/gt: Split intel_ring_submissionChris Wilson1-0/+1
Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
2019-10-23drm/i915/execlists: Cancel banned contexts on schedule-outChris Wilson1-41/+90
On schedule-out (CS completion) of a banned context, scrub the context image so that we do not replay the active payload. The intent is that we skip banned payloads on request submission so that the timeline advancement continues on in the background. However, if we are returning to a preempted request, i915_request_skip() is ineffective and instead we need to patch up the context image so that it continues from the start of the next request. v2: Fixup cancellation so that we only scrub the payload of the active request and do not short-circuit the breadcrumbs (which might cause other contexts to execute out of order). v3: Grammar pass Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-3-chris@chris-wilson.co.uk
2019-10-23drm/i915/execlists: Force preemptionChris Wilson1-8/+85
If the preempted context takes too long to relinquish control, e.g. it is stuck inside a shader with arbitration disabled, evict that context with an engine reset. This ensures that preemptions are reasonably responsive, providing a tighter QoS for the more important context at the cost of flagging unresponsive contexts more frequently (i.e. instead of using an ~10s hangcheck, we now evict at ~100ms). The challenge of lies in picking a timeout that can be reasonably serviced by HW for typical workloads, balancing the existing clients against the needs for responsiveness. Note that coupled with timeslicing, this will lead to rapid GPU "hang" detection with multiple active contexts vying for GPU time. The forced preemption mechanism can be compiled out with ./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk
2019-10-22drm/i915: Drop assertion that ce->pin_mutex guards state updatesChris Wilson1-16/+0
The actual conditions are that we know the GPU is not accessing the context, and we hold a pin on the context image to allow CPU access. We used a fake lock on ce->pin_mutex so that we could try and use lockdep to assert that access is serialised, but the various different hardirq/softirq contexts where we need to *fake* holding the pin_mutex are causing more trouble. Still it would be nice if we did have a way to reassure ourselves that the direct update to the context image is serialised with GPU execution. In the meantime, stop lockdep complaining about false irq inversions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk
2019-10-18drm/i915/execlists: Don't merely skip submission if maybe timeslicingChris Wilson1-4/+18
Normally, we try and skip submission if ELSP[1] is filled. However, we may desire to enable timeslicing due to the queue priority, even if ELSP[1] itself does not require timeslicing. That is the queue is equal priority to ELSP[0] and higher priority then ELSP[1]. Previously, we would wait until the context switch to preempt the current ELSP[1], but with timeslicing, we want to preempt ELSP[0] and replace it with the queue. In writing the test case, it become quickly apparent that we were also suppressing the tasklet during promotion and so failing to notice when the queue started requiring timeslicing. Fixes: 2229adc81380 ("drm/i915/execlist: Trim immediate timeslice expiry") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk
2019-10-16drm/i915/execlist: Trim immediate timeslice expiryChris Wilson1-2/+4
We perform timeslicing immediately upon receipt of a request that may be put into the second ELSP slot. The idea behind this was that since we didn't install the timer if the second ELSP slot was empty, we would not have any idea of how long ELSP[0] had been running and so giving the newcomer a chance on the GPU was fair. However, this causes us extra busy work that we may be able to avoid if we wait a jiffie for the first timeslice as normal. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191016100851.4979-1-chris@chris-wilson.co.uk
2019-10-15drm/i915/tgl: Wa_1607138340Mika Kuoppala1-0/+4
Avoid possible cs hang with semaphores by disabling lite restore. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-11-mika.kuoppala@linux.intel.com
2019-10-15drm/i915/tgl: Wa_1409600907Mika Kuoppala1-0/+4
To avoid possible hang, we need to add depth stall if we flush the depth cache. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-8-mika.kuoppala@linux.intel.com
2019-10-15drm/i915/tgl: Add extra hdc flush workaroundMika Kuoppala1-0/+20
In order to ensure constant caches are invalidated properly with a0, we need extra hdc flush after invalidation. v2: use IS_TGL_REVID (Chris) References: HSDES#1604544889 Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-4-mika.kuoppala@linux.intel.com
2019-10-15drm/i915/tgl: Add HDC Pipeline FlushMika Kuoppala1-1/+3
Add hdc pipeline flush to ensure memory state is coherent in L3 when we are done. v2: Flush also in breadcrumbs (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-3-mika.kuoppala@linux.intel.com
2019-10-15drm/i915/tgl: Include ro parts of l3 to invalidateMika Kuoppala1-0/+1
Aim for completeness and invalidate also the ro parts in l3 cache. This might allow to get rid of the preparser disable/enable workaround on invalidation path. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-2-mika.kuoppala@linux.intel.com
2019-10-15drm/i915/execlists: Clear semaphore immediately upon ELSP promotionChris Wilson1-3/+3
There is no significance to our delay before clearing the semaphore the engine is waiting on, so release it as soon as we acknowledge the CS update following our preemption request. This should allow the GPU to resume work earlier, if it was stuck on the semaphore at the end of a request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191015093204.25693-1-chris@chris-wilson.co.uk
2019-10-14drm/i915/execlists: Assert tasklet is locked for process_csb()Chris Wilson1-0/+7
We rely on only the tasklet being allowed to call into process_csb(), so assert that is locked when we do. As the tasklet uses a simple bitlock, there is no strong lockdep checking so we must make do with a plain assertion that the tasklet is running and assume that we are the tasklet! v2: Fixup intel_gt_sanitize() to prepare each engine for the reset so that the locks are marked as held during the reset v3: Check for existent function pointers for very early sanitisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014121336.30137-1-chris@chris-wilson.co.uk
2019-10-14drm/i915/execlists: Tweak virtual unsubmissionChris Wilson1-3/+3
Since commit e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption") we have restricted requests to run on their chosen engine across preemption events. We can take this restriction into account to know that we will want to resubmit those requests onto the same physical engine, and so can shortcircuit the virtual engine selection process and keep the request on the same engine during unwind. References: e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ramlingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk
2019-10-12drm/i915: Mark up "sentinel" requestsChris Wilson1-1/+5
Sometimes we want to emit a terminator request, a request that flushes the pipeline and allows no request to come after it. This can be used for a "preempt-to-idle" to ensure that upon processing the context-switch to that request, all other active contexts have been flushed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk
2019-10-12drm/i915/execlists: Prevent merging requests with conflicting flagsChris Wilson1-0/+3
We set out-of-bound parameters inside the i915_requests.flags field, such as disabling preemption or marking the end-of-context. We should not coalesce consecutive requests if they have differing instructions as we only inspect the last active request in a context. Thus if we allow a later request to be merged into the same execution context, it will mask any of the earlier flags. References: 2a98f4e65bba ("drm/i915: add infrastructure to hold off preemption on a request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-9-chris@chris-wilson.co.uk
2019-10-11drm/i915/execlists: Only mark incomplete requests as -EIO on cancellingChris Wilson1-2/+6
Only the requests that have not completed do we want to change the status of to signal the -EIO when cancelling the inflight set of requests upon wedging. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011103345.26013-1-chris@chris-wilson.co.uk
2019-10-11drm/i915/execlists: Leave tell-tales as to why pending[] is badChris Wilson1-5/+25
Before we BUG out with bad pending state, leave a telltale as to which test failed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010071434.31195-2-chris@chris-wilson.co.uk
2019-10-10drm/i915/execlists: Mark up expected state during resetChris Wilson1-1/+6
Move the BUG_ON around slightly and add some explanations for each to try and capture the expected state more carefully. We want to compare the expected active state of our bookkeeping as compared to the tracked HW state. References: https://bugs.freedesktop.org/show_bug.cgi?id=111937 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010083242.1387-1-chris@chris-wilson.co.uk
2019-10-10drm/i915/tgl: simplify the lrc register list for !RCSDaniele Ceraolo Spurio1-58/+9
There are small differences between the blitter and the video engines in the xcs context image (e.g. registers 0x200 and 0x204 only exist on the blitter). Since we never explicitly set a value for those register and given that we don't need to update the offsets in the lrc image when we change engine within the class for virtual engine because the HW can handle that, instead of having a separate define for the BCS we can just restrict the programming to the part we're interested in, which is common across the engines. Bspec: 45584 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-2-daniele.ceraolospurio@intel.com
2019-10-10drm/i915/tgl: the BCS engine supports relative MMIODaniele Ceraolo Spurio1-1/+1
The specs don't mention any specific HW limitation on the blitter and manual inspection shows that the HW does set the relative MMIO bit in the LRI of the blitter context image, so we can remove our limitations. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-1-daniele.ceraolospurio@intel.com
2019-10-09drm/i915/execlists: Protect peeking at execlists->activeChris Wilson1-2/+5
Now that we dropped the engine->active.lock serialisation from around process_csb(), direct submission can run concurrently to the interrupt handler. As such execlists->active may be advanced as we dequeue, dropping the reference to the request. We need to employ our RCU request protection to ensure that the request is not freed too early. Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk
2019-10-08drm/i915/execlists: Assign virtual_engine->uncore from first siblingChris Wilson1-0/+1
Copy across the engine->uncore shortcut to the virtual_engine from its first physical engine, similar to the handling of the engine->gt backpointer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191008070342.4045-1-chris@chris-wilson.co.uk
2019-10-07drm/i915/execlists: Fix annotation for decoupling virtual requestChris Wilson1-1/+2
As we may signal a request and take the engine->active.lock within the signaler, the engine submission paths have to use a nested annotation on their requests -- but we guarantee that we can never submit on the same engine as the signaling fence. <4>[ 723.763281] WARNING: possible circular locking dependency detected <4>[ 723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U <4>[ 723.763288] ------------------------------------------------------ <4>[ 723.763291] gem_exec_await/1388 is trying to acquire lock: <4>[ 723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915] <4>[ 723.763378] but task is already holding lock: <4>[ 723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915] <4>[ 723.763420] which lock already depends on the new lock. <4>[ 723.763423] the existing dependency chain (in reverse order) is: <4>[ 723.763427] -> #2 (&i915_request_get(rq)->submit/1){-.-.}: <4>[ 723.763434] _raw_spin_lock_irqsave_nested+0x39/0x50 <4>[ 723.763478] __i915_sw_fence_complete+0x1b2/0x250 [i915] <4>[ 723.763513] intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915] <4>[ 723.763600] cs_irq_handler+0x49/0x50 [i915] <4>[ 723.763659] gen11_gt_irq_handler+0x17b/0x280 [i915] <4>[ 723.763690] gen11_irq_handler+0x54/0xf0 [i915] <4>[ 723.763695] __handle_irq_event_percpu+0x41/0x2d0 <4>[ 723.763699] handle_irq_event_percpu+0x2b/0x70 <4>[ 723.763702] handle_irq_event+0x2f/0x50 <4>[ 723.763706] handle_edge_irq+0xee/0x1a0 <4>[ 723.763709] do_IRQ+0x7e/0x160 <4>[ 723.763712] ret_from_intr+0x0/0x1d <4>[ 723.763717] __slab_alloc.isra.28.constprop.33+0x4f/0x70 <4>[ 723.763720] kmem_cache_alloc+0x28d/0x2f0 <4>[ 723.763724] vm_area_dup+0x15/0x40 <4>[ 723.763727] dup_mm+0x2dd/0x550 <4>[ 723.763730] copy_process+0xf21/0x1ef0 <4>[ 723.763734] _do_fork+0x71/0x670 <4>[ 723.763737] __se_sys_clone+0x6e/0xa0 <4>[ 723.763741] do_syscall_64+0x4f/0x210 <4>[ 723.763744] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 723.763747] -> #1 (&(&rq->lock)->rlock#2){-.-.}: <4>[ 723.763752] _raw_spin_lock+0x2a/0x40 <4>[ 723.763789] __unwind_incomplete_requests+0x3eb/0x450 [i915] <4>[ 723.763825] __execlists_submission_tasklet+0x9ec/0x1d60 [i915] <4>[ 723.763864] execlists_submission_tasklet+0x34/0x50 [i915] <4>[ 723.763874] tasklet_action_common.isra.5+0x47/0xb0 <4>[ 723.763878] __do_softirq+0xd8/0x4ae <4>[ 723.763881] irq_exit+0xa9/0xc0 <4>[ 723.763883] smp_apic_timer_interrupt+0xb7/0x280 <4>[ 723.763887] apic_timer_interrupt+0xf/0x20 <4>[ 723.763892] cpuidle_enter_state+0xae/0x450 <4>[ 723.763895] cpuidle_enter+0x24/0x40 <4>[ 723.763899] do_idle+0x1e7/0x250 <4>[ 723.763902] cpu_startup_entry+0x14/0x20 <4>[ 723.763905] start_secondary+0x15f/0x1b0 <4>[ 723.763908] secondary_startup_64+0xa4/0xb0 <4>[ 723.763911] -> #0 (&engine->active.lock){..-.}: <4>[ 723.763916] __lock_acquire+0x15d8/0x1ea0 <4>[ 723.763919] lock_acquire+0xa6/0x1c0 <4>[ 723.763922] _raw_spin_lock_irqsave+0x33/0x50 <4>[ 723.763956] execlists_submit_request+0x2b/0x1e0 [i915] <4>[ 723.764002] submit_notify+0xa8/0x13c [i915] <4>[ 723.764035] __i915_sw_fence_complete+0x81/0x250 [i915] <4>[ 723.764054] i915_sw_fence_wake+0x51/0x64 [i915] <4>[ 723.764054] __i915_sw_fence_complete+0x1ee/0x250 [i915] <4>[ 723.764054] dma_i915_sw_fence_wake_timer+0x14/0x20 [i915] <4>[ 723.764054] dma_fence_signal_locked+0x9e/0x1c0 <4>[ 723.764054] dma_fence_signal+0x1f/0x40 <4>[ 723.764054] vgem_fence_signal_ioctl+0x67/0xc0 [vgem] <4>[ 723.764054] drm_ioctl_kernel+0x83/0xf0 <4>[ 723.764054] drm_ioctl+0x2f3/0x3b0 <4>[ 723.764054] do_vfs_ioctl+0xa0/0x6f0 <4>[ 723.764054] ksys_ioctl+0x35/0x60 <4>[ 723.764054] __x64_sys_ioctl+0x11/0x20 <4>[ 723.764054] do_syscall_64+0x4f/0x210 <4>[ 723.764054] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 723.764054] other info that might help us debug this: <4>[ 723.764054] Chain exists of: &engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1 <4>[ 723.764054] Possible unsafe locking scenario: <4>[ 723.764054] CPU0 CPU1 <4>[ 723.764054] ---- ---- <4>[ 723.764054] lock(&i915_request_get(rq)->submit/1); <4>[ 723.764054] lock(&(&rq->lock)->rlock#2); <4>[ 723.764054] lock(&i915_request_get(rq)->submit/1); <4>[ 723.764054] lock(&engine->active.lock); <4>[ 723.764054] *** DEADLOCK *** Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004194758.19679-1-chris@chris-wilson.co.uk
2019-10-04drm/i915: Remove logical HW IDChris Wilson1-20/+12
With the introduction of ctx->engines[] we allow multiple logical contexts to be used on the same engine (e.g. with virtual engines). According to bspec, aach logical context requires a unique tag in order for context-switching to occur correctly between them. [Simple experiments show that it is not so easy to trick the HW into performing a lite-restore with matching logical IDs, though my memory from early Broadwell experiments do suggest that it should be generating lite-restores.] We only need to keep a unique tag for the active lifetime of the context, and for as long as we need to identify that context. The HW uses the tag to determine if it should use a lite-restore (why not the LRCA?) and passes the tag back for various status identifies. The only status we need to track is for OA, so when using perf, we assign the specific context a unique tag. v2: Calculate required number of tags to fill ELSP. Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk
2019-10-04drm/i915/execlists: Skip redundant resubmissionChris Wilson1-1/+16
If we unwind the active requests, and on resubmission discover that we intend to preempt the active contexts with themselves, simply skip the ELSP submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-1-chris@chris-wilson.co.uk
2019-10-03drm/i915/selftests: Exercise potential false lite-restoreChris Wilson1-4/+14
If execlists's lite-restore is based on the common GEM context tag rather than the per-intel_context LRCA, then a context switch between two intel_contexts on the same engine derived from the same GEM context will perform a lite-restore instead of a full context switch. We can exploit this by poisoning the ringbuffer of the first context and trying to trick a simple RING_TAIL update (i.e. lite-restore) v2: Also check what happens if preempt ce[0] with ce[1] (both instances on the same engine from the same parent context) [Tvrtko] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191002183459.26614-1-chris@chris-wilson.co.uk
2019-10-01drm/i915: Initialise breadcrumb lists on the virtual engineChris Wilson1-0/+1
With deferring the breadcrumb signalling to the virtual engine (thanks preempt-to-busy) we need to make sure the lists and irq-worker are ready to send a signal. [41958.710544] BUG: kernel NULL pointer dereference, address: 0000000000000000 [41958.710553] #PF: supervisor write access in kernel mode [41958.710556] #PF: error_code(0x0002) - not-present page [41958.710558] PGD 0 P4D 0 [41958.710562] Oops: 0002 [#1] SMP [41958.710565] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U 5.3.0+ #207 [41958.710568] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [41958.710602] RIP: 0010:i915_request_enable_breadcrumb+0xe1/0x130 [i915] [41958.710605] Code: 8b 44 24 30 48 89 41 08 48 89 08 48 8b 85 98 01 00 00 48 8d 8d 90 01 00 00 48 89 95 98 01 00 00 49 89 4c 24 28 49 89 44 24 30 <48> 89 10 f0 80 4b 30 10 c6 85 88 01 00 00 00 e9 1a ff ff ff 48 83 [41958.710609] RSP: 0018:ffffc90000003de0 EFLAGS: 00010046 [41958.710612] RAX: 0000000000000000 RBX: ffff888735424480 RCX: ffff8887cddb2190 [41958.710614] RDX: ffff8887cddb3570 RSI: ffff888850362190 RDI: ffff8887cddb2188 [41958.710617] RBP: ffff8887cddb2000 R08: ffff8888503624a8 R09: 0000000000000100 [41958.710619] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8887cddb3548 [41958.710622] R13: 0000000000000000 R14: 0000000000000046 R15: ffff888850362070 [41958.710625] FS: 0000000000000000(0000) GS:ffff88885ea00000(0000) knlGS:0000000000000000 [41958.710628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [41958.710630] CR2: 0000000000000000 CR3: 0000000002c09002 CR4: 00000000001606f0 [41958.710633] Call Trace: [41958.710636] <IRQ> [41958.710668] __i915_request_submit+0x12b/0x160 [i915] [41958.710693] virtual_submit_request+0x67/0x120 [i915] [41958.710720] __unwind_incomplete_requests+0x131/0x170 [i915] [41958.710744] execlists_dequeue+0xb40/0xe00 [i915] [41958.710771] execlists_submission_tasklet+0x10f/0x150 [i915] [41958.710776] tasklet_action_common.isra.17+0x41/0xa0 [41958.710781] __do_softirq+0xc8/0x221 [41958.710785] irq_exit+0xa6/0xb0 [41958.710788] smp_apic_timer_interrupt+0x4d/0x80 [41958.710791] apic_timer_interrupt+0xf/0x20 [41958.710794] </IRQ> Fixes: cb2377a919bb ("drm/i915: Fixup preempt-to-busy vs reset of a virtual request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191001103518.9113-1-chris@chris-wilson.co.uk
2019-09-26drm/i915/execlists: Use per-process HWSP as scratchMichał Winiarski1-32/+13
Some of our commands (MI_FLUSH_DW / PIPE_CONTROL) require a post-sync write operation to be performed. Currently we're using dedicated VMA for PIPE_CONTROL and global HWSP for MI_FLUSH_DW. On execlists platforms, each of our contexts has an area that can be used as scratch space. Let's use that instead. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-2-chris@chris-wilson.co.uk
2019-09-25drm/i915/execlists: Simplify gen12_csb_parseChris Wilson1-5/+3
Having decided that we only care about the promotion predicate, we can simplify gen12_csb_parse to simply check whether we need to jump to a new queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190925130845.17952-1-chris@chris-wilson.co.uk
2019-09-24drm/i915/selftests: Verify the LRC register layout between init and HWChris Wilson1-201/+466
Before we submit the first context to HW, we need to construct a valid image of the register state. This layout is defined by the HW and should match the layout generated by HW when it saves the context image. Asserting that this should be equivalent should help avoid any undefined behaviour and verify that we haven't missed anything important! Of course, having insisted that the initial register state within the LRC should match that returned by HW, we need to ensure that it does. v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for constructing the lrc image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk
2019-09-23drm/i915: Prevent bonded requests from overtaking each other on preemptionChris Wilson1-8/+12
Force bonded requests to run on distinct engines so that they cannot be shuffled onto the same engine where timeslicing will reverse the order. A bonded request will often wait on a semaphore signaled by its master, creating an implicit dependency -- if we ignore that implicit dependency and allow the bonded request to run on the same engine and before its master, we will cause a GPU hang. [Whether it will hang the GPU is debatable, we should keep on timeslicing and each timeslice should be "accidentally" counted as forward progress, in which case it should run but at one-half to one-third speed.] We can prevent this inversion by restricting which engines we allow ourselves to jump to upon preemption, i.e. baking in the arrangement established at first execution. (We should also consider capturing the implicit dependency using i915_sched_add_dependency(), but first we need to think about the constraints that requires on the execution/retirement ordering.) Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") Testcase: igt/gem_exec_balancer/bonded-slice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk
2019-09-23drm/i915: Fixup preempt-to-busy vs reset of a virtual requestChris Wilson1-2/+5
Due to the nature of preempt-to-busy the execlists active tracking and the schedule queue may become temporarily desync'ed (between resubmission to HW and its ack from HW). This means that we may have unwound a request and passed it back to the virtual engine, but it is still inflight on the HW and may even result in a GPU hang. If we detect that GPU hang and try to reset, the hanging request->engine will no longer match the current engine, which means that the request is not on the execlists active list and we should not try to find an older incomplete request. Given that we have deduced this must be a request on a virtual engine, it is the single active request in the context and so must be guilty (as the context is still inflight, it is prevented from being executed on another engine as we process the reset). Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk
2019-09-23drm/i915: Fixup preempt-to-busy vs resubmission of a virtual requestChris Wilson1-6/+26
As preempt-to-busy leaves the request on the HW as the resubmission is processed, that request may complete in the background and even cause a second virtual request to enter queue. This second virtual request breaks our "single request in the virtual pipeline" assumptions. Furthermore, as the virtual request may be completed and retired, we lose the reference the virtual engine assumes is held. Normally, just removing the request from the scheduler queue removes it from the engine, but the virtual engine keeps track of its singleton request via its ve->request. This pointer needs protecting with a reference. v2: Drop unnecessary motion of rq->engine = owner Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk
2019-09-23drm/i915/execlists: Refactor -EIO markup of hung requestsChris Wilson1-14/+17
Pull setting -EIO on the hung requests into its own utility function. Having allowed ourselves to short-circuit submission of completed requests, we can now do the mark_eio() prior to submission and avoid some redundant operations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk
2019-09-23drm/i915: Only enqueue already completed requestsChris Wilson1-25/+41
If we are asked to submit a completed request, just move it onto the active-list without modifying it's payload. If we try to emit the modified payload of a completed request, we risk racing with the ring->head update during retirement which may advance the head past our breadcrumb and so we generate a warning for the emission being behind the RING_HEAD. v2: Commentary for the sneaky, shared responsibility between functions. v3: Spelling mistakes and bonus assertion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
2019-09-23drm/i915/execlists: Drop redundant list_del_init(&rq->sched.link)Chris Wilson1-1/+0
Since amalgamating the queued and active lists in commit 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list"), performing a i915_request_submit() will remove the request from the execlists priority queue. References: 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk