Age | Commit message (Collapse) | Author | Files | Lines |
|
In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.
Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).
However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.
In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@kernel.vger.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk
(cherry picked from commit 82c69bf58650e644c61aa2bf5100b63a1070fd2f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
As virtual_context_destroy() may be called from a request signal, it may
be called from inside an irq-off section, and so we need to do a full
save/restore of the irq state rather than blindly re-enable irqs upon
unlocking.
<4> [110.024262] WARNING: inconsistent lock state
<4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U
<4> [110.024292] --------------------------------
<4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.024592] {IN-HARDIRQ-W} state was registered at:
<4> [110.024612] lock_acquire+0xa7/0x1c0
<4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50
<4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915]
<4> [110.024808] irq_work_run_list+0x49/0x70
<4> [110.024824] irq_work_run+0x26/0x50
<4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0
<4> [110.024855] irq_work_interrupt+0xf/0x20
<4> [110.024871] __do_softirq+0xb7/0x47f
<4> [110.024885] irq_exit+0xba/0xc0
<4> [110.024898] do_IRQ+0x83/0x160
<4> [110.024910] ret_from_intr+0x0/0x1d
<4> [110.024922] irq event stamp: 172864
<4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50
<4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40
<4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f
<4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0
<4> [110.025031]
other info that might help us debug this:
<4> [110.025049] Possible unsafe locking scenario:
<4> [110.025065] CPU0
<4> [110.025075] ----
<4> [110.025084] lock(&(&rq->lock)->rlock);
<4> [110.025099] <Interrupt>
<4> [110.025109] lock(&(&rq->lock)->rlock);
<4> [110.025124]
*** DEADLOCK ***
<4> [110.025144] 4 locks held by kworker/0:0/5:
<4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620
<4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915]
<4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915]
<4> [110.025634]
stack backtrace:
<4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1
<4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [110.025856] Workqueue: events engine_retire [i915]
<4> [110.025872] Call Trace:
<4> [110.025891] dump_stack+0x71/0x9b
<4> [110.025907] mark_lock+0x49a/0x500
<4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200
<4> [110.025946] mark_held_locks+0x49/0x70
<4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50
<4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0
<4> [110.025995] _raw_spin_unlock_irq+0x24/0x50
<4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915]
<4> [110.026376] __active_retire+0xb4/0x290 [i915]
<4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0
<4> [110.026613] i915_request_retire+0x451/0x930 [i915]
<4> [110.026766] retire_requests+0x4d/0x60 [i915]
<4> [110.026919] engine_retire+0x63/0xe0 [i915]
Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk
(cherry picked from commit 6f7ac8285371fb0df58aba861eaab387f79ed04d)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX
corruption WA") is that it disables RC6 while Skylake (and friends) is
active, and we do not consider the GPU idle until all outstanding
requests have been retired and the engine switched over to the kernel
context. If userspace is idle, this task falls onto our background idle
worker, which only runs roughly once a second, meaning that userspace has
to have been idle for a couple of seconds before we enable RC6 again.
Naturally, this causes us to consume considerably more energy than
before as powersaving is effectively disabled while a display server
(here's looking at you Xorg) is running.
As execlists will get a completion event as each context is completed,
we can use this interrupt to queue a retire worker bound to this engine
to cleanup idle timelines. We will then immediately notice the idle
engine (without userspace intervention or the aid of the background
retire worker) and start parking the GPU. Thus during light workloads,
we will do much more work to idle the GPU faster... Hopefully with
commensurate power saving!
v2: Watch context completions and only look at those local to the engine
when retiring to reduce the amount of excess work we perform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
(cherry picked from commit 4f88f8747fa43c97c3b3712d8d87295ea757cc51)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
I rushed a last minute correction to cancel_port_requests() to prevent
the snooping of *execlists->active as the inflight array was being
updated, without noticing we iterated the inflight array starting from
active! Oops.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387
Fixes: 97f9af78f38d ("drm/i915/gt: Mark the execlists->active as the primary volatile access")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk
(cherry picked from commit da0ef77e1e0ccff703efee82406c629d5c4f4bbb)
[Joonas: Fixed Fixes: tag to match drm-intel-next-fixes]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Since we want to do a lockless read of the current active request, and
that request is written to by process_csb also without serialisation, we
need to instruct gcc to take care in reading the pointer itself.
Otherwise, we have observed execlists_active() to report 0x40.
[ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4
[ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000
[ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 }
[ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940
which is impossible!
The answer is that as we keep the existing execlists->active pointing
into the array as we copy over that array, the unserialised read may see
a partial pointer value.
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk
(cherry picked from commit 331bf90591573dfe6c8e892239713ef9702f1396)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Previously, we assumed we could use mutex_trylock() within an atomic
context, falling back to a worker if contended. However, such trickery
is illegal inside interrupt context, and so we need to always use a
worker under such circumstances. As we normally are in process context,
we can typically use a plain mutex, and only defer to a work when we
know we are being called from an interrupt path.
Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
References: a0855d24fc22d ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
References: https://bugs.freedesktop.org/show_bug.cgi?id=111626
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk
(cherry picked from commit 07779a76ee1f93f930cf697b22be73d16e14f50c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The gem_ctx_persistence/smoketest was detecting an odd coherency issue
inside the LRC context image; that the address of the ring buffer did
not match our associated struct intel_ring. As we set the address into
the context image when we pin the ring buffer into place before the
context is active, that leaves the question of where did it get
overwritten. Either the HW context save occurred after our pin which
would imply that our idle barriers are broken, or we overwrote the
context image ourselves. It is only in reset_active() where we dabble
inside the context image outside of a serialised path from schedule-out;
but we could equally perform the operation inside schedule-in which is
then fully serialised with the context pin -- and remains serialised by
the engine pulse with kill_context(). (The only downside, aside from
doing more work inside the engine->active.lock, was the plan to merge
all the reset paths into doing their context scrubbing on schedule-out
needs more thought.)
Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out")
Testcase: igt/gem_ctx_persistence/smoketest
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-3-chris@chris-wilson.co.uk
(cherry picked from commit 31b61f0ef9af62b6404d8df5dcd2cf58f80c9f53)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Recent GuC binaries (including all the ones we're currently using)
don't require this shared area anymore, having moved the relevant
entries into the stage pool instead. i915 itself doesn't write
anything into it either, so we can safely drop it.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191031013040.25803-1-daniele.ceraolospurio@intel.com
|
|
Execlists uses a scheduling quantum (a timeslice) to alternate execution
between ready-to-run contexts of equal priority. This ensures that all
users (though only if they of equal importance) have the opportunity to
run and prevents livelocks where contexts may have implicit ordering due
to userspace semaphores. However, not all workloads necessarily benefit
from timeslicing and in the extreme some sysadmin may want to disable or
reduce the timeslicing granularity.
The timeslicing mechanism can be compiled out^W^W disabled (but should
DCE!) with
./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191029091632.26281-1-chris@chris-wilson.co.uk
|
|
While processing CSB there is no need to look at GuC submission
settings, just check if engine is configured for execlists mode.
While today GuC submission is disabled it's settings are still
based on modparam values that might not correctly reflect actual
submission status in case of any fallback. Until that is fully
fixed, use alternate method to confirm that engine really runs in
execlists mode by comparing set_default_submission vfunc.
v2: add other immediate use of new helper
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com
|
|
The request's timeline will only contain requests from this context, in
order of execution. Therefore, we can simply look back along this
timeline to find the currently executing request.
If we do find that the current context has completed its last request,
that does not imply that all requests are completed in the context, so
only advance the ring->head up to the end of the known completions!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191028124125.25176-1-chris@chris-wilson.co.uk
|
|
The location of RING_MI_MODE (used to stop the ring across resets) moved
for Tigerlake. Fixup the new location and include a selftest to verify
the location in the default context image.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191026082220.32632-1-chris@chris-wilson.co.uk
|
|
Intel_lrc.c is the only caller and so to avoid some header file ordering
issues in future patches move these two over there.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025090952.10135-1-tvrtko.ursulin@linux.intel.com
|
|
Split the legacy submission backend from the common CS ring buffer
handling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
|
|
On schedule-out (CS completion) of a banned context, scrub the context
image so that we do not replay the active payload. The intent is that we
skip banned payloads on request submission so that the timeline
advancement continues on in the background. However, if we are returning
to a preempted request, i915_request_skip() is ineffective and instead we
need to patch up the context image so that it continues from the start
of the next request.
v2: Fixup cancellation so that we only scrub the payload of the active
request and do not short-circuit the breadcrumbs (which might cause
other contexts to execute out of order).
v3: Grammar pass
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-3-chris@chris-wilson.co.uk
|
|
If the preempted context takes too long to relinquish control, e.g. it
is stuck inside a shader with arbitration disabled, evict that context
with an engine reset. This ensures that preemptions are reasonably
responsive, providing a tighter QoS for the more important context at
the cost of flagging unresponsive contexts more frequently (i.e. instead
of using an ~10s hangcheck, we now evict at ~100ms). The challenge of
lies in picking a timeout that can be reasonably serviced by HW for
typical workloads, balancing the existing clients against the needs for
responsiveness.
Note that coupled with timeslicing, this will lead to rapid GPU "hang"
detection with multiple active contexts vying for GPU time.
The forced preemption mechanism can be compiled out with
./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk
|
|
The actual conditions are that we know the GPU is not accessing the
context, and we hold a pin on the context image to allow CPU access. We
used a fake lock on ce->pin_mutex so that we could try and use lockdep
to assert that access is serialised, but the various different
hardirq/softirq contexts where we need to *fake* holding the pin_mutex
are causing more trouble.
Still it would be nice if we did have a way to reassure ourselves that
the direct update to the context image is serialised with GPU execution.
In the meantime, stop lockdep complaining about false irq inversions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk
|
|
Normally, we try and skip submission if ELSP[1] is filled. However, we
may desire to enable timeslicing due to the queue priority, even if
ELSP[1] itself does not require timeslicing. That is the queue is equal
priority to ELSP[0] and higher priority then ELSP[1]. Previously, we
would wait until the context switch to preempt the current ELSP[1], but
with timeslicing, we want to preempt ELSP[0] and replace it with the
queue.
In writing the test case, it become quickly apparent that we were also
suppressing the tasklet during promotion and so failing to notice when
the queue started requiring timeslicing.
Fixes: 2229adc81380 ("drm/i915/execlist: Trim immediate timeslice expiry")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk
|
|
We perform timeslicing immediately upon receipt of a request that may be
put into the second ELSP slot. The idea behind this was that since we
didn't install the timer if the second ELSP slot was empty, we would not
have any idea of how long ELSP[0] had been running and so giving the
newcomer a chance on the GPU was fair. However, this causes us extra
busy work that we may be able to avoid if we wait a jiffie for the first
timeslice as normal.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191016100851.4979-1-chris@chris-wilson.co.uk
|
|
Avoid possible cs hang with semaphores by disabling
lite restore.
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-11-mika.kuoppala@linux.intel.com
|
|
To avoid possible hang, we need to add depth stall if we flush the
depth cache.
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-8-mika.kuoppala@linux.intel.com
|
|
In order to ensure constant caches are invalidated
properly with a0, we need extra hdc flush after invalidation.
v2: use IS_TGL_REVID (Chris)
References: HSDES#1604544889
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-4-mika.kuoppala@linux.intel.com
|
|
Add hdc pipeline flush to ensure memory state is coherent
in L3 when we are done.
v2: Flush also in breadcrumbs (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-3-mika.kuoppala@linux.intel.com
|
|
Aim for completeness and invalidate also the ro parts
in l3 cache. This might allow to get rid of the preparser
disable/enable workaround on invalidation path.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-2-mika.kuoppala@linux.intel.com
|
|
There is no significance to our delay before clearing the semaphore the
engine is waiting on, so release it as soon as we acknowledge the CS
update following our preemption request. This should allow the GPU to
resume work earlier, if it was stuck on the semaphore at the end of a
request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015093204.25693-1-chris@chris-wilson.co.uk
|
|
We rely on only the tasklet being allowed to call into process_csb(), so
assert that is locked when we do. As the tasklet uses a simple bitlock,
there is no strong lockdep checking so we must make do with a plain
assertion that the tasklet is running and assume that we are the
tasklet!
v2: Fixup intel_gt_sanitize() to prepare each engine for the reset so
that the locks are marked as held during the reset
v3: Check for existent function pointers for very early sanitisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014121336.30137-1-chris@chris-wilson.co.uk
|
|
Since commit e2144503bf3b ("drm/i915: Prevent bonded requests from
overtaking each other on preemption") we have restricted requests to run
on their chosen engine across preemption events. We can take this
restriction into account to know that we will want to resubmit those
requests onto the same physical engine, and so can shortcircuit the
virtual engine selection process and keep the request on the same
engine during unwind.
References: e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ramlingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk
|
|
Sometimes we want to emit a terminator request, a request that flushes
the pipeline and allows no request to come after it. This can be used
for a "preempt-to-idle" to ensure that upon processing the
context-switch to that request, all other active contexts have been
flushed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk
|
|
We set out-of-bound parameters inside the i915_requests.flags field,
such as disabling preemption or marking the end-of-context. We should
not coalesce consecutive requests if they have differing instructions
as we only inspect the last active request in a context. Thus if we
allow a later request to be merged into the same execution context, it
will mask any of the earlier flags.
References: 2a98f4e65bba ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-9-chris@chris-wilson.co.uk
|
|
Only the requests that have not completed do we want to change the
status of to signal the -EIO when cancelling the inflight set of requests
upon wedging.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011103345.26013-1-chris@chris-wilson.co.uk
|
|
Before we BUG out with bad pending state, leave a telltale as to which
test failed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010071434.31195-2-chris@chris-wilson.co.uk
|
|
Move the BUG_ON around slightly and add some explanations for each to
try and capture the expected state more carefully. We want to compare
the expected active state of our bookkeeping as compared to the tracked
HW state.
References: https://bugs.freedesktop.org/show_bug.cgi?id=111937
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010083242.1387-1-chris@chris-wilson.co.uk
|
|
There are small differences between the blitter and the video engines in
the xcs context image (e.g. registers 0x200 and 0x204 only exist on the
blitter). Since we never explicitly set a value for those register and
given that we don't need to update the offsets in the lrc image when we
change engine within the class for virtual engine because the HW can
handle that, instead of having a separate define for the BCS we can
just restrict the programming to the part we're interested in, which is
common across the engines.
Bspec: 45584
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-2-daniele.ceraolospurio@intel.com
|
|
The specs don't mention any specific HW limitation on the blitter and
manual inspection shows that the HW does set the relative MMIO bit in
the LRI of the blitter context image, so we can remove our limitations.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-1-daniele.ceraolospurio@intel.com
|
|
Now that we dropped the engine->active.lock serialisation from around
process_csb(), direct submission can run concurrently to the interrupt
handler. As such execlists->active may be advanced as we dequeue,
dropping the reference to the request. We need to employ our RCU request
protection to ensure that the request is not freed too early.
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk
|
|
Copy across the engine->uncore shortcut to the virtual_engine from its
first physical engine, similar to the handling of the engine->gt
backpointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191008070342.4045-1-chris@chris-wilson.co.uk
|
|
As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.
<4>[ 723.763281] WARNING: possible circular locking dependency detected
<4>[ 723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G U
<4>[ 723.763288] ------------------------------------------------------
<4>[ 723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[ 723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 723.763378]
but task is already holding lock:
<4>[ 723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 723.763420]
which lock already depends on the new lock.
<4>[ 723.763423]
the existing dependency chain (in reverse order) is:
<4>[ 723.763427]
-> #2 (&i915_request_get(rq)->submit/1){-.-.}:
<4>[ 723.763434] _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[ 723.763478] __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[ 723.763513] intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[ 723.763600] cs_irq_handler+0x49/0x50 [i915]
<4>[ 723.763659] gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[ 723.763690] gen11_irq_handler+0x54/0xf0 [i915]
<4>[ 723.763695] __handle_irq_event_percpu+0x41/0x2d0
<4>[ 723.763699] handle_irq_event_percpu+0x2b/0x70
<4>[ 723.763702] handle_irq_event+0x2f/0x50
<4>[ 723.763706] handle_edge_irq+0xee/0x1a0
<4>[ 723.763709] do_IRQ+0x7e/0x160
<4>[ 723.763712] ret_from_intr+0x0/0x1d
<4>[ 723.763717] __slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[ 723.763720] kmem_cache_alloc+0x28d/0x2f0
<4>[ 723.763724] vm_area_dup+0x15/0x40
<4>[ 723.763727] dup_mm+0x2dd/0x550
<4>[ 723.763730] copy_process+0xf21/0x1ef0
<4>[ 723.763734] _do_fork+0x71/0x670
<4>[ 723.763737] __se_sys_clone+0x6e/0xa0
<4>[ 723.763741] do_syscall_64+0x4f/0x210
<4>[ 723.763744] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 723.763747]
-> #1 (&(&rq->lock)->rlock#2){-.-.}:
<4>[ 723.763752] _raw_spin_lock+0x2a/0x40
<4>[ 723.763789] __unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[ 723.763825] __execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[ 723.763864] execlists_submission_tasklet+0x34/0x50 [i915]
<4>[ 723.763874] tasklet_action_common.isra.5+0x47/0xb0
<4>[ 723.763878] __do_softirq+0xd8/0x4ae
<4>[ 723.763881] irq_exit+0xa9/0xc0
<4>[ 723.763883] smp_apic_timer_interrupt+0xb7/0x280
<4>[ 723.763887] apic_timer_interrupt+0xf/0x20
<4>[ 723.763892] cpuidle_enter_state+0xae/0x450
<4>[ 723.763895] cpuidle_enter+0x24/0x40
<4>[ 723.763899] do_idle+0x1e7/0x250
<4>[ 723.763902] cpu_startup_entry+0x14/0x20
<4>[ 723.763905] start_secondary+0x15f/0x1b0
<4>[ 723.763908] secondary_startup_64+0xa4/0xb0
<4>[ 723.763911]
-> #0 (&engine->active.lock){..-.}:
<4>[ 723.763916] __lock_acquire+0x15d8/0x1ea0
<4>[ 723.763919] lock_acquire+0xa6/0x1c0
<4>[ 723.763922] _raw_spin_lock_irqsave+0x33/0x50
<4>[ 723.763956] execlists_submit_request+0x2b/0x1e0 [i915]
<4>[ 723.764002] submit_notify+0xa8/0x13c [i915]
<4>[ 723.764035] __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[ 723.764054] i915_sw_fence_wake+0x51/0x64 [i915]
<4>[ 723.764054] __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[ 723.764054] dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[ 723.764054] dma_fence_signal_locked+0x9e/0x1c0
<4>[ 723.764054] dma_fence_signal+0x1f/0x40
<4>[ 723.764054] vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[ 723.764054] drm_ioctl_kernel+0x83/0xf0
<4>[ 723.764054] drm_ioctl+0x2f3/0x3b0
<4>[ 723.764054] do_vfs_ioctl+0xa0/0x6f0
<4>[ 723.764054] ksys_ioctl+0x35/0x60
<4>[ 723.764054] __x64_sys_ioctl+0x11/0x20
<4>[ 723.764054] do_syscall_64+0x4f/0x210
<4>[ 723.764054] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 723.764054]
other info that might help us debug this:
<4>[ 723.764054] Chain exists of:
&engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1
<4>[ 723.764054] Possible unsafe locking scenario:
<4>[ 723.764054] CPU0 CPU1
<4>[ 723.764054] ---- ----
<4>[ 723.764054] lock(&i915_request_get(rq)->submit/1);
<4>[ 723.764054] lock(&(&rq->lock)->rlock#2);
<4>[ 723.764054] lock(&i915_request_get(rq)->submit/1);
<4>[ 723.764054] lock(&engine->active.lock);
<4>[ 723.764054]
*** DEADLOCK ***
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004194758.19679-1-chris@chris-wilson.co.uk
|
|
With the introduction of ctx->engines[] we allow multiple logical
contexts to be used on the same engine (e.g. with virtual engines).
According to bspec, aach logical context requires a unique tag in order
for context-switching to occur correctly between them. [Simple
experiments show that it is not so easy to trick the HW into performing
a lite-restore with matching logical IDs, though my memory from early
Broadwell experiments do suggest that it should be generating
lite-restores.]
We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.
v2: Calculate required number of tags to fill ELSP.
Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk
|
|
If we unwind the active requests, and on resubmission discover that we
intend to preempt the active contexts with themselves, simply skip the
ELSP submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-1-chris@chris-wilson.co.uk
|
|
If execlists's lite-restore is based on the common GEM context tag
rather than the per-intel_context LRCA, then a context switch between
two intel_contexts on the same engine derived from the same GEM context
will perform a lite-restore instead of a full context switch. We can
exploit this by poisoning the ringbuffer of the first context and trying
to trick a simple RING_TAIL update (i.e. lite-restore)
v2: Also check what happens if preempt ce[0] with ce[1] (both instances
on the same engine from the same parent context) [Tvrtko]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191002183459.26614-1-chris@chris-wilson.co.uk
|
|
With deferring the breadcrumb signalling to the virtual engine (thanks
preempt-to-busy) we need to make sure the lists and irq-worker are ready
to send a signal.
[41958.710544] BUG: kernel NULL pointer dereference, address: 0000000000000000
[41958.710553] #PF: supervisor write access in kernel mode
[41958.710556] #PF: error_code(0x0002) - not-present page
[41958.710558] PGD 0 P4D 0
[41958.710562] Oops: 0002 [#1] SMP
[41958.710565] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U 5.3.0+ #207
[41958.710568] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[41958.710602] RIP: 0010:i915_request_enable_breadcrumb+0xe1/0x130 [i915]
[41958.710605] Code: 8b 44 24 30 48 89 41 08 48 89 08 48 8b 85 98 01 00 00 48 8d 8d 90 01 00 00 48 89 95 98 01 00 00 49 89 4c 24 28 49 89 44 24 30 <48> 89 10 f0 80 4b 30 10 c6 85 88 01 00 00 00 e9 1a ff ff ff 48 83
[41958.710609] RSP: 0018:ffffc90000003de0 EFLAGS: 00010046
[41958.710612] RAX: 0000000000000000 RBX: ffff888735424480 RCX: ffff8887cddb2190
[41958.710614] RDX: ffff8887cddb3570 RSI: ffff888850362190 RDI: ffff8887cddb2188
[41958.710617] RBP: ffff8887cddb2000 R08: ffff8888503624a8 R09: 0000000000000100
[41958.710619] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8887cddb3548
[41958.710622] R13: 0000000000000000 R14: 0000000000000046 R15: ffff888850362070
[41958.710625] FS: 0000000000000000(0000) GS:ffff88885ea00000(0000) knlGS:0000000000000000
[41958.710628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41958.710630] CR2: 0000000000000000 CR3: 0000000002c09002 CR4: 00000000001606f0
[41958.710633] Call Trace:
[41958.710636] <IRQ>
[41958.710668] __i915_request_submit+0x12b/0x160 [i915]
[41958.710693] virtual_submit_request+0x67/0x120 [i915]
[41958.710720] __unwind_incomplete_requests+0x131/0x170 [i915]
[41958.710744] execlists_dequeue+0xb40/0xe00 [i915]
[41958.710771] execlists_submission_tasklet+0x10f/0x150 [i915]
[41958.710776] tasklet_action_common.isra.17+0x41/0xa0
[41958.710781] __do_softirq+0xc8/0x221
[41958.710785] irq_exit+0xa6/0xb0
[41958.710788] smp_apic_timer_interrupt+0x4d/0x80
[41958.710791] apic_timer_interrupt+0xf/0x20
[41958.710794] </IRQ>
Fixes: cb2377a919bb ("drm/i915: Fixup preempt-to-busy vs reset of a virtual request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191001103518.9113-1-chris@chris-wilson.co.uk
|
|
Some of our commands (MI_FLUSH_DW / PIPE_CONTROL) require a post-sync write
operation to be performed. Currently we're using dedicated VMA for
PIPE_CONTROL and global HWSP for MI_FLUSH_DW.
On execlists platforms, each of our contexts has an area that can be
used as scratch space. Let's use that instead.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-2-chris@chris-wilson.co.uk
|
|
Having decided that we only care about the promotion predicate, we can
simplify gen12_csb_parse to simply check whether we need to jump to a
new queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190925130845.17952-1-chris@chris-wilson.co.uk
|
|
Before we submit the first context to HW, we need to construct a valid
image of the register state. This layout is defined by the HW and should
match the layout generated by HW when it saves the context image.
Asserting that this should be equivalent should help avoid any undefined
behaviour and verify that we haven't missed anything important!
Of course, having insisted that the initial register state within the
LRC should match that returned by HW, we need to ensure that it does.
v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for
constructing the lrc image.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk
|
|
Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang. [Whether it will hang the GPU is
debatable, we should keep on timeslicing and each timeslice should be
"accidentally" counted as forward progress, in which case it should run
but at one-half to one-third speed.]
We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)
Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Testcase: igt/gem_exec_balancer/bonded-slice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk
|
|
Due to the nature of preempt-to-busy the execlists active tracking and
the schedule queue may become temporarily desync'ed (between resubmission
to HW and its ack from HW). This means that we may have unwound a
request and passed it back to the virtual engine, but it is still
inflight on the HW and may even result in a GPU hang. If we detect that
GPU hang and try to reset, the hanging request->engine will no longer
match the current engine, which means that the request is not on the
execlists active list and we should not try to find an older incomplete
request. Given that we have deduced this must be a request on a virtual
engine, it is the single active request in the context and so must be
guilty (as the context is still inflight, it is prevented from being
executed on another engine as we process the reset).
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk
|
|
As preempt-to-busy leaves the request on the HW as the resubmission is
processed, that request may complete in the background and even cause a
second virtual request to enter queue. This second virtual request
breaks our "single request in the virtual pipeline" assumptions.
Furthermore, as the virtual request may be completed and retired, we
lose the reference the virtual engine assumes is held. Normally, just
removing the request from the scheduler queue removes it from the
engine, but the virtual engine keeps track of its singleton request via
its ve->request. This pointer needs protecting with a reference.
v2: Drop unnecessary motion of rq->engine = owner
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk
|
|
Pull setting -EIO on the hung requests into its own utility function.
Having allowed ourselves to short-circuit submission of completed
requests, we can now do the mark_eio() prior to submission and avoid
some redundant operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk
|
|
If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.
v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
|
|
Since amalgamating the queued and active lists in commit 422d7df4f090
("drm/i915: Replace engine->timeline with a plain list"), performing a
i915_request_submit() will remove the request from the execlists
priority queue.
References: 422d7df4f090 ("drm/i915: Replace engine->timeline with a plain list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk
|