summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/intel_engine_cs.c
AgeCommit message (Collapse)AuthorFilesLines
2018-12-07dma-buf: make fence sequence numbers 64 bit v2Christian König1-1/+1
For a lot of use cases we need 64bit sequence numbers. Currently drivers overload the dma_fence structure to store the additional bits. Stop doing that and make the sequence number in the dma_fence always 64bit. For compatibility with hardware which can do only 32bit sequences the comparisons in __dma_fence_is_later only takes the lower 32bits as significant when the upper 32bits are all zero. v2: change the logic in __dma_fence_is_later Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Link: https://patchwork.freedesktop.org/patch/266927/
2018-11-21drm/i915: Show waiter's status on engine dumpChris Wilson1-2/+4
When showing the list of waiters, include the task's status so that we can tell if they have been woken up and are waiting for the CPU, or if they are still waiting to be woken. v2: task_state_to_char() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181121151653.24595-1-chris@chris-wilson.co.uk
2018-11-16drm/i915/selftests: Workaround an issue with unused lockdep subclassChris Wilson1-1/+1
lockdep insists that if we give a lock a subclass, it must be used. Failure to do so triggers a self-consistency check when reading lockdep_stats: [ 49.902002] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != nr_unused) [ 49.902009] WARNING: CPU: 3 PID: 383 at kernel/locking/lockdep_proc.c:249 lockdep_stats_show+0x984/0xa10 [ 49.902026] Modules linked in: nls_ascii nls_cp437 vfat fat crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf intel_gtt efivars prime_numbers ahci libahci i2c_i801 video button efivarfs [last unloaded: drm_kms_helper] [ 49.902059] CPU: 3 PID: 383 Comm: cat Tainted: G U 4.20.0-rc2+ #304 [ 49.902068] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 [ 49.902079] RIP: 0010:lockdep_stats_show+0x984/0xa10 [ 49.902086] Code: 00 85 c0 0f 84 aa f8 ff ff 8b 05 77 37 e2 00 85 c0 0f 85 9c f8 ff ff 48 c7 c6 e0 57 bc 81 48 c7 c7 28 30 bb 81 e8 6b 77 fa ff <0f> 0b e9 82 f8 ff ff 48 c7 44 24 50 00 00 00 00 45 31 e4 31 db 31 [ 49.902103] RSP: 0018:ffffc90000247d58 EFLAGS: 00010292 [ 49.902110] RAX: 0000000000000044 RBX: 00000000000002f0 RCX: 0000000000000000 [ 49.902118] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffff810b3464 [ 49.902126] RBP: 0000000000000039 R08: 0000000000000002 R09: 0000000000000000 [ 49.902133] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000007ead [ 49.902141] R13: 0000000000000001 R14: ffff88884c021000 R15: 0000000000000097 [ 49.902150] FS: 00007fb347e66540(0000) GS:ffff88885e600000(0000) knlGS:0000000000000000 [ 49.902159] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.902165] CR2: 00007fb347aeb000 CR3: 00000008544bd005 CR4: 00000000001606e0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181115203851.25739-1-chris@chris-wilson.co.uk
2018-10-29drm/i915: Prefer IS_GEN<n> check with bitmask.Rodrigo Vivi1-1/+1
Whenever possible we should stick with IS_GEN<n> checks. Bitmaks has been introduced on commit ae7617f0ef18 ("drm/i915: Allow optimized platform checks") for efficiency. Let's stick with it whenever possible. This patch was generated with coccinelle: spatch -sp_file is_gen.cocci *{c,h} --in-place is_gen.cocci: @gen2@ expression e; @@ -INTEL_GEN(e) == 2 +IS_GEN2(e) @gen3@ expression e; @@ -INTEL_GEN(e) == 3 +IS_GEN3(e) @gen4@ expression e; @@ -INTEL_GEN(e) == 4 +IS_GEN4(e) @gen5@ expression e; @@ -INTEL_GEN(e) == 5 +IS_GEN5(e) @gen6@ expression e; @@ -INTEL_GEN(e) == 6 +IS_GEN6(e) @gen7@ expression e; @@ -INTEL_GEN(e) == 7 +IS_GEN7(e) @gen8@ expression e; @@ -INTEL_GEN(e) == 8 +IS_GEN8(e) @gen9@ expression e; @@ -INTEL_GEN(e) == 9 +IS_GEN9(e) @gen10@ expression e; @@ -INTEL_GEN(e) == 10 +IS_GEN10(e) @gen11@ expression e; @@ -INTEL_GEN(e) == 11 +IS_GEN11(e) Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181026195143.20353-1-rodrigo.vivi@intel.com
2018-10-18drm/i915: GEM_WARN_ON considered harmfulTvrtko Ursulin1-4/+4
GEM_WARN_ON currently has dangerous semantics where it is completely compiled out on !GEM_DEBUG builds. This can leave users who expect it to be more like a WARN_ON, just without a warning in non-debug builds, in complete ignorance. Another gotcha with it is that it cannot be used as a statement. Which is again different from a standard kernel WARN_ON. This patch fixes both problems by making it behave as one would expect. It can now be used both as an expression and as statement, and also the condition evaluates properly in all builds - code under the conditional will therefore not unexpectedly disappear. To satisfy call sites which really want the code under the conditional to completely disappear, we add GEM_DEBUG_WARN_ON and convert some of the callers to it. This one can also be used as both expression and statement. >From the above it follows GEM_DEBUG_WARN_ON should be used in situations where we are certain the condition will be hit during development, but at a place in code where error can be handled to the benefit of not crashing the machine. GEM_WARN_ON on the other hand should be used where condition may happen in production and we just want to distinguish the level of debugging output emitted between the production and debug build. v2: * Dropped BUG_ON hunk. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181012063142.16080-1-tvrtko.ursulin@linux.intel.com
2018-10-17drm/i915: Ensure intel_engine_init_execlist() builds with ClangJani Nikula1-1/+1
Clang build with UBSAN enabled leads to the following build error: drivers/gpu/drm/i915/intel_engine_cs.o: In function `intel_engine_init_execlist': drivers/gpu/drm/i915/intel_engine_cs.c:411: undefined reference to `__compiletime_assert_411' Again, for this to work the code would first need to be inlined and then constant folded, which doesn't work for Clang because semantic analysis happens before optimization/inlining. Use GEM_BUG_ON() instead of BUILD_BUG_ON(). v2: Use is_power_of_2() from log2.h (Chris) References: http://mid.mail-archive.com/20181015203410.155997-1-swboyd@chromium.org Reported-by: Stephen Boyd <swboyd@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181016122938.18757-2-jani.nikula@intel.com
2018-10-11drm/i915: Inject load failure inside intel_engines_init_mmioMichal Wajdeczko1-0/+3
We need extra load failure point to better test error path in i915_driver_init_mmio. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181011130008.24640-2-michal.wajdeczko@intel.com
2018-10-01drm/i915: Combine multiple internal plists into the same i915_priolist bucketChris Wilson1-3/+3
As we are about to allow ourselves to slightly bump the user priority into a few different sublevels, packthose internal priority lists into the same i915_priolist to keep the rbtree compact and avoid having to allocate the default user priority even after the internal bumping. The downside to having an requests[] rather than a node per active list, is that we then have to walk over the empty higher priority lists. To compensate, we track the active buckets and use a small bitmap to skip over any inactive ones. v2: Use MASK of internal levels to simplify our usage. v3: Prevent overflow when SHIFT is zero. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001123204.23982-4-chris@chris-wilson.co.uk
2018-09-26drm/i915: Convert to BITS_PER_TYPEChris Wilson1-1/+1
In commit 9144d75e22ca ("include/linux/bitops.h: introduce BITS_PER_TYPE"), we made BITS_PER_TYPE available to all and now we can use the macro to replace some open-coded computation of sizeof(T) * BITS_PER_BYTE. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180926104707.17410-1-chris@chris-wilson.co.uk
2018-09-14drm/i915: Flush the tasklet when checking for idleChris Wilson1-0/+3
In order to reduce latency when checking for idle we kick the tasklet directly. Sometimes this is not enough as it is queued on another cpu and so to improve the accuracy of this idle-check (and so to reduce latency overall by avoiding another pass, or worse declaring a timeout!) wait for the tasklet to complete. References: https://bugs.freedesktop.org/show_bug.cgi?id=107916 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-2-chris@chris-wilson.co.uk
2018-09-03drm/i915: Use a cached mapping for the physical HWSChris Wilson1-12/+13
Older gen use a physical address for the hardware status page, for which we use cache-coherent writes. As the writes are into the cpu cache, we use a normal WB mapped page to read the HWS, used for our seqno tracking. Anecdotally, I observed lost breadcrumbs writes into the HWS on i965gm, which so far have not reoccurred with this patch. How reliable that evidence is remains to be seen. v2: Explicitly pass the expected physical address to the hw v3: Also remember the wild writes we once had for HWS above 4G. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180903152304.31589-2-chris@chris-wilson.co.uk
2018-09-03drm/i915: Combine cleanup_status_page()Chris Wilson1-14/+6
Pull the physical status page cleanup into a common cleanup_status_page() for caller simplicity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903152304.31589-1-chris@chris-wilson.co.uk
2018-08-21drm/i915: Correct CSB probing for engine state dumperChris Wilson1-12/+11
Since we no longer maintain our read position in the CSB pointers register, it always returns 0 and not where we last read up to. As a result the CSB probing in the state dumper starts from 0, either missing entries or showing stale one. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180821101138.15822-1-chris@chris-wilson.co.uk
2018-08-15drm/i915: Clear stop-engine for a pardoned resetChris Wilson1-0/+10
If we pardon a per-engine reset, we may leave the STOP_RING bit asserted in RING_MI_MODE resulting in the engine hanging. Unconditionally clear it on the per-engine exit path as we know that either we skipped the reset and so need the cancellation, or the reset was successful and the cancellation is a no-op, or there was an error and we will follow up with a full-reset or wedging (both of which will stop the engines again as required). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106560 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180814171857.24673-1-chris@chris-wilson.co.uk
2018-08-07drm/i915: Pull seqno started checks togetherChris Wilson1-2/+1
We have a few instances of checking seqno-1 to see if the HW has started the request. Pull those together under a helper. v2: Pull the !seqno assertion higher, as given seqno==1 we may indeed check to see if we have started using seqno==0. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180806112605.20725-1-chris@chris-wilson.co.uk
2018-07-27drm/i915: Eliminate use of PAGE_SIZE as a virtual alignmentChris Wilson1-2/+2
Using PAGE_SIZE for virtual offset alignment is superfluous as it is equal to the minimum gtt alignment and so equivalent to 0. It is also the wrong value to use as we stopped using physical page constructs for the virtual GTT, i.e. it would be preferrable to use I915_GTT_PAGE_SIZE and in these cases merely imply I915_GTT_MIN_ALIGNMENT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180727091855.1879-1-chris@chris-wilson.co.uk
2018-07-24drm/i915: Pull unpin map into vma releaseChris Wilson1-15/+3
A reasonably common operation is to pin the map of the vma alongside the vma itself for the lifetime of the vma, and so release both pins at the same time as destroying the vma. It is common enough to pull into the release function, making that central function more attractive to a couple of other callsites. The continual ulterior motive is to sweep over errors on module load aborting... Testcase: igt/drv_module_reload/basic-reload-inject Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180721125037.20127-1-chris@chris-wilson.co.uk
2018-07-13drm/i915: Do not short-circuit tasklets during resetChris Wilson1-5/+7
Inside intel_engine_is_idle(), we flush the tasklet to ensure that is being run in a timely fashion (ksoftirqd has taught us to expect the worst). However, if we are in the middle of reset, the HW may not yet be ready to execute the submission tasklet and so we must respect the disable flag. Fixes: dd0cf235d81f ("drm/i915: Speed up idle detection by kicking the tasklets") Testcase: igt/drv_selftest/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713203529.1973-2-chris@chris-wilson.co.uk
2018-07-11drm/i915/execlists: Switch to rb_root_cachedChris Wilson1-4/+3
The kernel recently gained an augmented rbtree with the purpose of cacheing the leftmost element of the rbtree, a frequent optimisation to avoid calls to rb_first() which is also employed by the execlists->queue. Switch from our open-coded cache to the library. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180629075348.27358-9-chris@chris-wilson.co.uk
2018-07-07drm/i915: Replace nested subclassing with explicit subclassesChris Wilson1-0/+1
In the next patch, we will want a third distinct class of timeline that may overlap with the current pair of client and engine timeline classes. Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise the different timeline classes with an explicit subclass. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk
2018-07-06drm/i915: Record logical context support in driver capsChris Wilson1-0/+2
Avoid looking at the magical engines[RCS] to decide if the HW and driver supports logical contexts, and instead record that knowledge during initialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706101442.21279-1-chris@chris-wilson.co.uk
2018-07-05drm/i915: Mark expected switch fall-throughsGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 141432 Addresses-Coverity-ID: 141433 Addresses-Coverity-ID: 141434 Addresses-Coverity-ID: 141435 Addresses-Coverity-ID: 141436 Addresses-Coverity-ID: 1357360 Addresses-Coverity-ID: 1357403 Addresses-Coverity-ID: 1357433 Addresses-Coverity-ID: 1392622 Addresses-Coverity-ID: 1415273 Addresses-Coverity-ID: 1435752 Addresses-Coverity-ID: 1441500 Addresses-Coverity-ID: 1454596 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com
2018-06-28drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)Chris Wilson1-4/+4
Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half"), we came to the conclusion that running our CSB processing and ELSP submission from inside the irq handler was a bad idea. A really bad idea as we could impose nearly 1s latency on other users of the system, on average! Deferring our work to a tasklet allowed us to do the processing with irqs enabled, reducing the impact to an average of about 50us. We have since eradicated the use of forcewaked mmio from inside the CSB processing and ELSP submission, bringing the impact down to around 5us (on Kabylake); an order of magnitude better than our measurements 2 years ago on Broadwell and only about 2x worse on average than the gem_syslatency on an unladen system. In this iteration of the tasklet-vs-direct submission debate, we seek a compromise where by we submit new requests immediately to the HW but defer processing the CS interrupt onto a tasklet. We gain the advantage of low-latency and ksoftirqd avoidance when waking up the HW, while avoiding the system-wide starvation of our CS irq-storms. Comparing the impact on the maximum latency observed (that is the time stolen from an RT process) over a 120s interval, repeated several times (using gem_syslatency, similar to RT's cyclictest) while the system is fully laden with i915 nops, we see that direct submission an actually improve the worse case. Maximum latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 2) x Always using tasklets (a couple of >1000us outliers removed) + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ | + | | + | | + | | + + | | + + + | | + + + + x x x | | +++ + + + x x x x x x | | +++ + ++ + + *x x x x x x | | +++ + ++ + * *x x * x x x | | + +++ + ++ * * +*xxx * x x xx | | * +++ + ++++* *x+**xx+ * x x xxxx x | | **x++++*++**+*x*x****x+ * +x xx xxxx x x | |x* ******+***************++*+***xxxxxx* xx*x xxx + x+| | |__________MA___________| | | |______M__A________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 118 91 186 124 125.28814 16.279137 + 120 92 187 109 112.00833 13.458617 Difference at 95.0% confidence -13.2798 +/- 3.79219 -10.5994% +/- 3.02677% (Student's t, pooled s = 14.9237) However the mean latency is adversely affected: Mean latency in microseconds of a third party RT thread (gem_syslatency -t 120 -f 1) x Always using tasklets + Only using tasklets from CS irq, direct submission of requests +------------------------------------------------------------------------+ | xxxxxx + ++ | | xxxxxx + ++ | | xxxxxx + +++ ++ | | xxxxxxx +++++ ++ | | xxxxxxx +++++ ++ | | xxxxxxx +++++ +++ | | xxxxxxx + ++++++++++ | | xxxxxxxx ++ ++++++++++ | | xxxxxxxx ++ ++++++++++ | | xxxxxxxxxx +++++++++++++++ | | xxxxxxxxxxx x +++++++++++++++ | |x xxxxxxxxxxxxx x + + ++++++++++++++++++ +| | |__A__| | | |____A___| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 3.506 3.727 3.631 3.6321417 0.02773109 + 120 3.834 4.149 4.039 4.0375167 0.041221676 Difference at 95.0% confidence 0.405375 +/- 0.00888913 11.1608% +/- 0.244735% (Student's t, pooled s = 0.03513) However, since the mean latency corresponds to the amount of irqsoff processing we have to do for a CS interrupt, we only need to speed that up to benefit not just system latency but our own throughput. v2: Remember to defer submissions when under reset. v4: Only use direct submission for new requests v5: Be aware that with mixing direct tasklet evaluation and deferred tasklets, we may end up idling before running the deferred tasklet. v6: Remove the redudant likely() from tasklet_is_enabled(), restrict the annotation to reset_in_progress(). v7: Take the full timeline.lock when enabling perf_pmu stats as the tasklet is no longer a valid guard. A consequence is that the stats are now only valid for engines also using the timeline.lock to process state. Testcase: igt/gem_exec_latency/*rthog* References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-9-chris@chris-wilson.co.uk
2018-06-28drm/i915/execlists: Trust the CSBChris Wilson1-6/+2
Now that we use the CSB stored in the CPU friendly HWSP, we do not need to track interrupts for when the mmio CSB registers are valid and can just check where we read up to last from the cached HWSP. This means we can forgo the atomic bit tracking from interrupt, and in the next patch it means we can check the CSB at any time. v2: Change the splitting inside reset_prepare, we only want to lose testing the interrupt in this patch, the next patch requires the change in locking Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-8-chris@chris-wilson.co.uk
2018-06-28drm/i915/execlists: Unify CSB access pointersChris Wilson1-12/+0
Following the removal of the last workarounds, the only CSB mmio access is for the old vGPU interface. The mmio registers presented by vGPU do not require forcewake and can be treated as ordinary volatile memory, i.e. they behave just like the HWSP access just at a different location. We can reduce the CSB access to a set of read/write/buffer pointers and treat the various paths identically and not worry about forcewake. (Forcewake is nightmare for worstcase latency, and we want to process this all with irqsoff -- no latency allowed!) v2: Comments, comments, comments. Well, 2 bonus comments. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-5-chris@chris-wilson.co.uk
2018-06-20drm/i915: Disable bh around call to taskletChris Wilson1-0/+2
The guc submission backends expects to only be run from (at least) softirq context, but during our intel_engine_is_idle() check we would call into the tasklet to make sure it was flushed. As this could occur from process context, occasionally we would be caught out using a wait_for_atomic() not from an atomic context: [ 59.939091] WARN_ON_ONCE((1) && !(preempt_count() != 0)) [ 59.939142] WARNING: CPU: 1 PID: 2901 at drivers/gpu/drm/i915/intel_guc_submission.c:615 guc_submission_tasklet+0x784/0xa90 [i915] [ 59.939143] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei prime_numbers [ 59.939164] CPU: 1 PID: 2901 Comm: gem_exec_schedu Tainted: G U W 4.18.0-rc1-g93475d62c730-drmtip_67+ #1 [ 59.939165] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3610 03/29/2018 [ 59.939188] RIP: 0010:guc_submission_tasklet+0x784/0xa90 [i915] [ 59.939189] Code: fc ff ff 80 3d 2f 87 11 00 00 0f 85 80 fb ff ff 48 c7 c6 f8 49 40 c0 48 c7 c7 80 41 3e c0 c6 05 14 87 11 00 01 e8 2c ea d6 d3 <0f> 0b e9 5f fb ff ff 8b 46 38 89 cf 31 c7 83 e7 c0 75 08 39 c1 0f [ 59.939253] RSP: 0018:ffffaafe08a03c10 EFLAGS: 00010286 [ 59.939255] RAX: 0000000000000000 RBX: ffff8f9112c246f0 RCX: 0000000000000001 [ 59.939256] RDX: 0000000080000001 RSI: ffffffff95086d8e RDI: 00000000ffffffff [ 59.939257] RBP: ffff8f9112c24680 R08: 000000009517be77 R09: 0000000000000000 [ 59.939258] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f9112c24700 [ 59.939259] R13: ffff8f9112c24700 R14: 0000000000000000 R15: ffff8f9112c242a8 [ 59.939260] FS: 00007fc2cc7e5980(0000) GS:ffff8f9136c40000(0000) knlGS:0000000000000000 [ 59.939261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.939262] CR2: 00007fc2cc815040 CR3: 000000021f10e003 CR4: 00000000003606e0 [ 59.939263] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.939264] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.939265] Call Trace: [ 59.939288] ? intel_engine_is_idle+0x64/0x160 [i915] [ 59.939323] ? intel_engine_dump+0x638/0x890 [i915] [ 59.939327] ? seq_printf+0x49/0x70 [ 59.939353] ? i915_engine_info+0xc8/0x100 [i915] [ 59.939356] ? drm_get_color_range_name+0x20/0x20 [ 59.939361] ? seq_read+0xf1/0x470 [ 59.939365] ? trace_hardirqs_on_caller+0xe0/0x1b0 [ 59.939370] ? full_proxy_read+0x51/0x80 [ 59.939389] ? __vfs_read+0x31/0x170 [ 59.939395] ? do_sys_open+0x13b/0x240 [ 59.939398] ? rcu_read_lock_sched_held+0x6f/0x80 [ 59.939401] ? vfs_read+0x9e/0x140 [ 59.939404] ? ksys_read+0x50/0xc0 [ 59.939409] ? do_syscall_64+0x55/0x190 [ 59.939412] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 59.939420] irq event stamp: 552834 [ 59.939422] hardirqs last enabled at (552833): [<ffffffff940fc74c>] console_unlock+0x3fc/0x600 [ 59.939425] hardirqs last disabled at (552834): [<ffffffff94a0111c>] error_entry+0x7c/0x100 [ 59.939451] softirqs last enabled at (552614): [<ffffffffc02e0f53>] i915_request_add+0x2e3/0x7b0 [i915] [ 59.939470] softirqs last disabled at (552604): [<ffffffffc02e0ecb>] i915_request_add+0x25b/0x7b0 [i915] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106977 Fixes: dd0cf235d81f ("drm/i915: Speed up idle detection by kicking the tasklets") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180620135929.23956-1-chris@chris-wilson.co.uk
2018-06-18drm/i915: Fix fallout of fake reset along resumeChris Wilson1-0/+22
commit b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization") and commit 1288786b18f7 ("drm/i915: Move GEM sanitize from resume_early to resume") show the conflicting requirements on the code. We must reset the GPU before trashing live state on a fast resume (hibernation debug, or error paths), but we must only reset our state tracking iff the GPU is reset (or power cycled). This is tricky if we are disabling GPU reset to simulate broken hardware; we reset our state tracking but the GPU is left intact and recovers from its stale state. v2: Again without the assertion for forcewake, no longer required since commit b3ee09a4de33 ("drm/i915/ringbuffer: Fix context restore upon reset") as the contexts are reset from the CS ensuring everything is powered up. Fixes: b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization") Fixes: 1288786b18f7 ("drm/i915: Move GEM sanitize from resume_early to resume") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
2018-06-14drm/i915: Show CCID in engine dumpsChris Wilson1-0/+2
For debugging context issues, knowing what context the GPU is loading/using is helpful. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180614094103.18025-4-chris@chris-wilson.co.uk
2018-06-14drm/i915: Make the hexdump row offset visually distinctChris Wilson1-1/+1
Currently we use %08x for the row offset, and %08x for the binary contents of the buffer. This makes it very easily to confuse the two, so switch to using [%04x] for the start-of-row offset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180614094103.18025-3-chris@chris-wilson.co.uk
2018-06-14drm/i915: Dump the ringbuffer of the active request for debuggingChris Wilson1-5/+36
Sometimes we need to see what instructions we emitted for a request to try and gather a glimmer of insight into what the GPU is doing when it stops responding. v2: Move ring dumping into its own routine Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180614122150.17552-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-06-12drm/i915/ringbuffer: Serialize load of PD_DIRChris Wilson1-2/+3
After triggering the mm switch with a load of PD_DIR, which may be deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands with that load by posting a read of PD_DIR (or else those subsequent commands may access the stale page tables). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611171825.13678-2-chris@chris-wilson.co.uk
2018-06-11drm/i915/ringbuffer: Fix context restore upon resetChris Wilson1-3/+0
The discovery with trying to enable full-ppgtt was that we were completely failing to the load both the mm and context following the reset. Although we were performing mmio to set the PP_DIR (per-process GTT) and CCID (context), these were taking no effect (the assumption was that this would trigger reload of the context and restore the page tables). It was not until we performed the LRI + MI_SET_CONTEXT in a following context switch would anything occur. Since we are then required to reset the context image and PP_DIR using CS commands, we place those commands into every batch. The hardware should recognise the no-ops and eliminate the expensive context loads, but we still have to pay the cost of using cross-powerwell register writes. In practice, this has no effect on actual context switch times, and only adds a few hundred nanoseconds to no-op switches. We can improve the latter by eliminating the w/a around known no-op switches, but there is an ulterior motive to keeping them. Always emitting the context switch at the beginning of the request (and relying on HW to skip unneeded switches) does have one key advantage. Should we implement request reordering on Haswell, we will not know in advance what the previous executing context was on the GPU and so we would not be able to elide the MI_SET_CONTEXT commands ourselves and always have to emit them. Having our hand forced now actually prepares us for later. Now since that context and mm follow the request, we no longer (and not for a long time since requests took over!) require a trace point to tell when we write the switch into the ring, since it is always. (This is even more important when you remember that simply writing into the ring bears no relation to the current mm.) v2: Sandybridge has to agree to use LRI as well. Testcase: igt/drv_selftests/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
2018-06-05drm/i915/gtt: Rename i915_hw_ppgtt base memberChris Wilson1-2/+2
In the near future, I want to subclass gen6_hw_ppgtt as it contains a few specialised members and I wish to add more. To avoid the ugliness of using ppgtt->base.base, rename the i915_hw_ppgtt base member (i915_address_space) as vm, which is our common shorthand for an i915_address_space local. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk
2018-05-24drm/i915/icl: Enable WaProgramMgsrForCorrectSliceSpecificMmioReadsYunwei Zhang1-0/+3
WaProgramMgsrForCorrectSliceSpecificMmioReads applies for Icelake as well. References: HSD#1405586840, BSID#0575 v2: - GEN11 mask is different from its predecessors. (Oscar) - Better separate GEN10 and GEN11. (Oscar) Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Yunwei Zhang <yunwei.zhang@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1526683232-24753-1-git-send-email-yunwei.zhang@intel.com
2018-05-24drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReadsYunwei Zhang1-5/+25
WaProgramMgsrForCorrectSliceSpecificMmioReads dictate that before any MMIO read into Slice/Subslice specific registers, MCR packet control register(0xFDC) needs to be programmed to point to any enabled slice/subslice pair. Otherwise, incorrect value will be returned. However, that means each subsequent MMIO read will be forwarded to a specific slice/subslice combination as read is unicast. This is OK since slice/subslice specific register values are consistent in almost all cases across slice/subslice. There are rare occasions such as INSTDONE that this value will be dependent on slice/subslice combo, in such cases, we need to program 0xFDC and recover this after. This is already covered by read_subslice_reg. Also, 0xFDC will lose its information after TDR/engine reset/power state change. References: HSD#1405586840, BSID#0575 v2: - use fls() instead of find_last_bit() (Chris) - added INTEL_SSEU to extract sseu from device info. (Chris) v3: - rebase on latest tip v5: - Added references (Mika) - Change the ordered of passing arguments and etc. (Ursulin) v7: - Moved WA explanation Comments(Oscar) - Rebased. v8: - Renamed sanitize_mcr to calculate_s_ss_select. (Oscar) - calculate s/ss selector instead of whole mcr. (Oscar) v9: - Updated function name (Oscar) - Remove redundant variables (Oscar) v10: - Separate pre-GEN10 and GEN11 mask. (Oscar) Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Yunwei Zhang <yunwei.zhang@intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1526683197-24656-1-git-send-email-yunwei.zhang@intel.com
2018-05-19drm/i915/execlists: Handle copying default context state for atomic resetChris Wilson1-0/+15
We want to be able to reset the GPU from inside a timer callback (hardirq context). One step requires us to copy the default context state over to the guilty context, which means we need to plan in advance to have that object accessible from within an atomic context. The atomic context prevents us from pinning the object or from peeking into the shmemfs backing store (all may sleep), so we choose to pin the default_state into memory when the engine becomes active. This compromise allows us to swap out the default state when idle, when required. References: 5692251c254a ("drm/i915/lrc: Scrub the GPU state of the guilty hanging request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180518090212.5349-2-chris@chris-wilson.co.uk
2018-05-19drm/i915: Make intel_engine_dump irqsafeChris Wilson1-4/+7
To be useful later, enable intel_engine_dump() to be called from irq context (i.e. using saving and restoring irq start rather than assuming we enter with irqs enabled). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180518090212.5349-1-chris@chris-wilson.co.uk
2018-05-19drm/i915: Speed up idle detection by kicking the taskletsChris Wilson1-3/+12
We rely on ksoftirqd to run in a timely fashion in order to drain the execlists queue. Quite frequently, it does not. In some cases we may see latencies of over 200ms triggering our idle timeouts and forcing us to declare the driver wedged! Thus we can speed up idle detection by bypassing ksoftirqd in these cases and flush our tasklet to confirm if we are indeed still waiting for the ELSP to drain. v2: Put the execlists.first check back; it is required for handling reset! References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180506171328.30034-1-chris@chris-wilson.co.uk
2018-05-18drm/i915: Store a pointer to intel_context in i915_requestChris Wilson1-22/+32
To ease the frequent and ugly pointer dance of &request->gem_context->engine[request->engine->id] during request submission, store that pointer as request->hw_context. One major advantage that we will exploit later is that this decouples the logical context state from the engine itself. v2: Set mock_context->ops so we don't crash and burn in selftests. Cleanups from Tvrtko. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
2018-05-18drm/i915: Move fiddling with engine->last_retired_contextChris Wilson1-0/+23
Move the knowledge about resetting the current context tracking on the engine from inside i915_gem_context.c into intel_engine_cs.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-2-chris@chris-wilson.co.uk
2018-05-18drm/i915: Move request->ctx asideChris Wilson1-1/+1
In the next patch, we want to store the intel_context pointer inside i915_request, as it is frequently access via a convoluted dance when submitting the request to hw. Having two context pointers inside i915_request leads to confusion so first rename the existing i915_gem_context pointer to i915_request.gem_context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk
2018-05-17drm/i915: Nul-terminate legacy debug stringChris Wilson1-1/+1
Make sure that when we don't have any scheduler attributes for the request, the string is terminated. Fixes: 247870ac8ea7 ("drm/i915: Build request info on stack before printk") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517152824.11619-1-chris@chris-wilson.co.uk
2018-05-16drm/i915: Stop parking the signaler around resetChris Wilson1-0/+29
We cannot call kthread_park() from softirq context, so let's avoid it entirely during the reset. We wanted to suspend the signaler so that it would not mark a request as complete at the same time as we marked it as being in error. Instead of parking the signaling, stop the engine from advancing so that the GPU doesn't emit the breadcrumb for our chosen "guilty" request. v2: Refactor setting STOP_RING so that we don't have the same code thrice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Michałt Winiarski <michal.winiarski@intel.com> CC: Michel Thierry <michel.thierry@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-8-chris@chris-wilson.co.uk
2018-05-14drm/i915/execlists: Relax CSB force-mmio for VT-dChris Wilson1-8/+0
The original switch to use CSB from the HWSP was plagued by the effect of read ordering on VT-d; we would read the WRITE pointer from the HWSP before it had completed writing the CSB contents. The mystery comes down to the lack of rmb() for correct ordering with respect to the writes from HW, and with that resolved we can remove the VT-d special casing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-3-chris@chris-wilson.co.uk Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2018-05-11Revert "drm/i915/cnl: Use mmio access to context status buffer"Chris Wilson1-3/+0
In the previous patch (to include a rmb() after readig the CSB WRITE pointer from the HWSP) we believe we have fixed the underlying bug, and so can re-enable using the HWSP on Cannolake. This reverts commit 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer"). References: https://bugs.freedesktop.org/show_bug.cgi?id=105888 References: https://bugs.freedesktop.org/show_bug.cgi?id=106185 References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Timo Aaltonen <tjaalton@ubuntu.com> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-2-chris@chris-wilson.co.uk
2018-05-03drm/i915: Mark the hangcheck as idle when unparking the enginesChris Wilson1-0/+2
As we unpark the engines and are about to begin a new cycle of activity, mark the current status of the hangceck as idle so that we avoid carrying over a stale timestamp/action into the next cycle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-2-chris@chris-wilson.co.uk
2018-05-02drm/i915: Split i915_gem_timeline into individual timelinesChris Wilson1-15/+12
We need to move to a more flexible timeline that doesn't assume one fence context per engine, and so allow for a single timeline to be used across a combination of engines. This means that preallocating a fence context per engine is now a hindrance, and so we want to introduce the singular timeline. From the code perspective, this has the notable advantage of clearing up a lot of mirky semantics and some clumsy pointer chasing. By splitting the timeline up into a single entity rather than an array of per-engine timelines, we can realise the goal of the previous patch of tracking the timeline alongside the ring. v2: Tweak wait_for_idle to stop the compiling thinking that ret may be uninitialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
2018-05-02drm/i915: Move timeline from GTT to ringChris Wilson1-1/+2
In the future, we want to move a request between engines. To achieve this, we first realise that we have two timelines in effect here. The first runs through the GTT is required for ordering vma access, which is tracked currently by engine. The second is implied by sequential execution of commands inside the ringbuffer. This timeline is one that maps to userspace's expectations when submitting requests (i.e. given the same context, batch A is executed before batch B). As the rings's timelines map to userspace and the GTT timeline an implementation detail, move the timeline from the GTT into the ring itself (per-context in logical-ring-contexts/execlists, or a global per-engine timeline for the shared ringbuffers in legacy submission. The two timelines are still assumed to be equivalent at the moment (no migrating requests between engines yet) and so we can simply move from one to the other without adding extra ordering. v2: Reinforce that one isn't allowed to mix the engine execution timeline with the client timeline from userspace (on the ring). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
2018-05-02drm/i915: Show ring->start for the ELSP context/request queueChris Wilson1-2/+3
Since the advent of execlists, the HW no longer executes from a single statically assigned ring, but instead switches to a different ring for each context (logical ringbuffer contexts as it is called). So a good way to tally the executing context against what we have queued is by comparing the RING_START register against our requests. Make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502104150.29874-1-chris@chris-wilson.co.uk
2018-04-30drm/i915: Wrap engine->context_pin() and engine->context_unpin()Chris Wilson1-7/+6
Make life easier in upcoming patches by moving the context_pin and context_unpin vfuncs into inline helpers. v2: Fixup mock_engine to mark the context as pinned on use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk