summaryrefslogtreecommitdiffstats
path: root/Documentation/driver-api
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2021-04-28 10:01:40 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2021-04-28 10:01:40 -0700
commit68a32ba14177d4a21c4a9a941cf1d7aea86d436f (patch)
tree945c20860766c22b19d1806d5b5db5b37bc65b65 /Documentation/driver-api
parent3aa139aa9fdc138a84243dc49dc18d9b40e1c6e4 (diff)
parenta1a1ca70deb3ec600eeabb21de7f3f48aaae5695 (diff)
downloadlinux-68a32ba14177d4a21c4a9a941cf1d7aea86d436f.tar.bz2
Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie: "The usual lots of work all over the place. i915 has gotten some Alderlake work and prelim DG1 code, along with a major locking rework over the GEM code, and brings back the property of timing out long running jobs using a watchdog. amdgpu has some Alderbran support (new GPU), freesync HDMI support along with a lot other fixes. Outside of the drm, there is a new printf specifier added which should have all the correct acks/sobs: - printk fourcc modifier support added %p4cc Summary: core: - drm_crtc_commit_wait - atomic plane state helpers reworked for full state - dma-buf heaps API rework - edid: rework and improvements for displayid dp-mst: - better topology logging bridge: - Chipone ICN6211 - Lontium LT8912B - anx7625 regulator support panel: - fix lt9611 4k panels handling simple-kms: - add plane state helpers ttm: - debugfs support - removal of unused sysfs - ignore signaled moved fences - ioremap buffer according to mem caching i915: - Alderlake S enablement - Conversion to dma_resv_locking - Bring back watchdog timeout support - legacy ioctl cleanups - add GEM TDDO and RFC process - DG1 LMEM preparation work - intel_display.c refactoring - Gen9/TGL PCH combination support - eDP MSO Support - multiple PSR instance support - Link training debug updates - Disable PSR2 support on JSL/EHL - DDR5/LPDDR5 support for bw calcs - LSPCON limited to gen9/10 platforms - HSW/BDW async flip/VTd corruption workaround - SAGV watermark fixes - SNB hard hang on ring resume fix - Limit imported dma-buf size - move to use new tasklet API - refactor KBL/TGL/ADL-S display/gt steppings - refactoring legacy DP/HDMI, FB plane code out amdgpu: - uapi: add ioctl to query video capabilities - Iniital AMD Freesync HDMI support - Initial Adebaran support - 10bpc dithering improvements - DCN secure display support - Drop legacy IO BAR requirements - PCIE/S0ix/RAS/Prime/Reset fixes - Display ASSR support - SMU gfx busy queues for RV/PCO - Initial LTTPR display work amdkfd: - MMU notifier fixes - APU fixes radeon: - debugfs cleanps - fw error handling ifix - Flexible array cleanups msm: - big DSI phy/pll cleanup - sc7280 initial support - commong bandwidth scaling path - shrinker locking contention fixes - unpin/swap support for GEM objcets ast: - cursor plane handling reworked tegra: - don't register DP AUX channels before connectors zynqmp: - fix OOB struct padding memset gma500: - drop ttm and medfield support exynos: - request_irq cleanup function mediatek: - fine tune line time for EOTp - MT8192 dpi support - atomic crtc config updates - don't support HDMI connector creation mxsdb: - imx8mm support panfrost: - MMU IRQ handling rework qxl: - locking fixes - resource deallocation changes sun4i: - add alpha properties to UI/VI layers vc4: - RPi4 CEC support vmwgfx: - doc cleanups arc: - moved to drm/tiny" * tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm: (1390 commits) drm/ttm: Don't count pages in SG BOs against pages_limit drm/ttm: fix return value check drm/bridge: lt8912b: fix incorrect handling of of_* return values drm: bridge: fix LONTIUM use of mipi_dsi_() functions drm: bridge: fix ANX7625 use of mipi_dsi_() functions drm/amdgpu: page retire over debugfs mechanism drm/radeon: Fix a missing check bug in radeon_dp_mst_detect() drm/amd/display: Fix the Wunused-function warning drm/radeon/r600: Fix variables that are not used after assignment drm/amdgpu/smu7: fix CAC setting on TOPAZ drm/amd/display: Update DCN302 SR Exit Latency drm/amdgpu: enable ras eeprom on aldebaran drm/amdgpu: RAS harvest on driver load drm/amdgpu: add ras aldebaran ras eeprom driver drm/amd/pm: increase time out value when sending msg to SMU drm/amdgpu: add DMUB outbox event IRQ source define/complete/debug flag drm/amd/pm: add the callback to get vbios bootup values for vangogh drm/radeon: Fix size overflow drm/amdgpu: Fix size overflow drm/amdgpu: move mmhub ras_func init to ip specific file ...
Diffstat (limited to 'Documentation/driver-api')
-rw-r--r--Documentation/driver-api/dma-buf.rst76
1 files changed, 76 insertions, 0 deletions
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index a2133d69872c..7f37ec30d9fd 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -257,3 +257,79 @@ fences in the kernel. This means:
userspace is allowed to use userspace fencing or long running compute
workloads. This also means no implicit fencing for shared buffers in these
cases.
+
+Recoverable Hardware Page Faults Implications
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Modern hardware supports recoverable page faults, which has a lot of
+implications for DMA fences.
+
+First, a pending page fault obviously holds up the work that's running on the
+accelerator and a memory allocation is usually required to resolve the fault.
+But memory allocations are not allowed to gate completion of DMA fences, which
+means any workload using recoverable page faults cannot use DMA fences for
+synchronization. Synchronization fences controlled by userspace must be used
+instead.
+
+On GPUs this poses a problem, because current desktop compositor protocols on
+Linux rely on DMA fences, which means without an entirely new userspace stack
+built on top of userspace fences, they cannot benefit from recoverable page
+faults. Specifically this means implicit synchronization will not be possible.
+The exception is when page faults are only used as migration hints and never to
+on-demand fill a memory request. For now this means recoverable page
+faults on GPUs are limited to pure compute workloads.
+
+Furthermore GPUs usually have shared resources between the 3D rendering and
+compute side, like compute units or command submission engines. If both a 3D
+job with a DMA fence and a compute workload using recoverable page faults are
+pending they could deadlock:
+
+- The 3D workload might need to wait for the compute job to finish and release
+ hardware resources first.
+
+- The compute workload might be stuck in a page fault, because the memory
+ allocation is waiting for the DMA fence of the 3D workload to complete.
+
+There are a few options to prevent this problem, one of which drivers need to
+ensure:
+
+- Compute workloads can always be preempted, even when a page fault is pending
+ and not yet repaired. Not all hardware supports this.
+
+- DMA fence workloads and workloads which need page fault handling have
+ independent hardware resources to guarantee forward progress. This could be
+ achieved through e.g. through dedicated engines and minimal compute unit
+ reservations for DMA fence workloads.
+
+- The reservation approach could be further refined by only reserving the
+ hardware resources for DMA fence workloads when they are in-flight. This must
+ cover the time from when the DMA fence is visible to other threads up to
+ moment when fence is completed through dma_fence_signal().
+
+- As a last resort, if the hardware provides no useful reservation mechanics,
+ all workloads must be flushed from the GPU when switching between jobs
+ requiring DMA fences or jobs requiring page fault handling: This means all DMA
+ fences must complete before a compute job with page fault handling can be
+ inserted into the scheduler queue. And vice versa, before a DMA fence can be
+ made visible anywhere in the system, all compute workloads must be preempted
+ to guarantee all pending GPU page faults are flushed.
+
+- Only a fairly theoretical option would be to untangle these dependencies when
+ allocating memory to repair hardware page faults, either through separate
+ memory blocks or runtime tracking of the full dependency graph of all DMA
+ fences. This results very wide impact on the kernel, since resolving the page
+ on the CPU side can itself involve a page fault. It is much more feasible and
+ robust to limit the impact of handling hardware page faults to the specific
+ driver.
+
+Note that workloads that run on independent hardware like copy engines or other
+GPUs do not have any impact. This allows us to keep using DMA fences internally
+in the kernel even for resolving hardware page faults, e.g. by using copy
+engines to clear or copy memory needed to resolve the page fault.
+
+In some ways this page fault problem is a special case of the `Infinite DMA
+Fences` discussions: Infinite fences from compute workloads are allowed to
+depend on DMA fences, but not the other way around. And not even the page fault
+problem is new, because some other CPU thread in userspace might
+hit a page fault which holds up a userspace fence - supporting page faults on
+GPUs doesn't anything fundamentally new.