summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
AgeCommit message (Collapse)AuthorFilesLines
2019-01-02drm/amdgpu/sriov:Correct pfvf exchange logicEmily Deng1-4/+4
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu reset. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-By: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-02drm/amdgpu/virtual_dce: No need to pin the cursor boEmily Deng1-2/+2
For virtual display feature, no need to pin cursor bo. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-14drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hangwentalou1-4/+6
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov. It would make sriov hang during reset. Signed-off-by: Wentao Lou <Wentao.Lou@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-12drm/amdgpu: Enable GPU recovery by default for CIAndrey Grodzovsky1-0/+2
I retested Bonaire (gfx7 dGPU) and it works fine. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-05drm/amdgpu/si: fix SI after doorbell reworkAlex Deucher1-1/+2
SI does not use doorbells, move asic doorbell init later asic check. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108920 Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-03drm/amdgpu: Implement concurrent asic reset for XGMI.Andrey Grodzovsky1-5/+39
Use per hive wq to concurrently send reset commands to all nodes in the hive. v2: Switch to system_highpri_wq after dropping dedicated queue. Fix non XGMI code path KASAN error. Stop the hive reset for each node loop if there is a reset failure on any of the nodes. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-03drm/amdgpu: Handle xgmi device removal.Andrey Grodzovsky1-0/+3
XGMI hive has some resources allocted on device init which needs to be deallocated when the device is unregistered. v2: Remove creation of dedicated wq for XGMI hive reset. v3: Use the gmc.xgmi.supported flag Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-30drm/amdgpu: Fix num_doorbell calculation issueOak Zeng1-2/+5
When paging queue is enabled, it use the second page of doorbell. The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the kernel doorbells are in the first page. So with paging queue enabled, the total kernel doorbell range should be original num_doorbell plus one page (0x400 in dword), not *2. Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28drm/amdgpu: Refactor GPU reset for XGMI hive caseAndrey Grodzovsky1-117/+255
For XGMI hive case do reset in steps where each step iterates over all devs in hive. This especially important for asic reset since all PSP FW in hive must come up within a limited time (around 1 sec) to properply negotiate the link. Do this by refactoring amdgpu_device_gpu_recover and amdgpu_device_reset into pre_asic_reset, asic_reset and post_asic_reset functions where is part is exectued for all the GPUs in the hive before going to the next step. v2: Update names for amdgpu_device_lock/unlock functions. v3: Introduce per hive locking to avoid multiple resets for GPUs in same hive. v4: Remove delayed_workqueue()/ttm_bo_unlock_delayed_workqueue() - they are copy & pasted over from radeon and on amdgpu there isn't any reason for that any more. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28drm/amdgpu: Refactor amdgpu_xgmi_add_deviceAndrey Grodzovsky1-0/+2
This is prep work for updating each PSP FW in hive after GPU reset. Split into build topology SW state and update each PSP FW in the hive. Save topology and count of XGMI devices for reuse. v2: Create seperate header for XGMI. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28drm/amdgpu: Use asic specific doorbell index instead of macro definitionOak Zeng1-1/+1
ASIC specific doorbell layout is used instead of enum definition Signed-off-by: Oak Zeng <ozeng@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-28drm/amdgpu: Call doorbell index init on device initializationOak Zeng1-3/+5
Also call functioin amdgpu_device_doorbell_init after amdgpu_device_ip_early_init because the former depends on the later to set up asic-specific init_doorbell_index function Signed-off-by: Oak Zeng <ozeng@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-20drm/amdgpu: enable paging queue doorbell support v4Philip Yang1-0/+6
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue will break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, paging queues doorbell index use same index as SDMA gfx queues index but on second page. For Vega20, after we change doorbell layout to increase SDMA doorbell for 8 SDMA RLC queues later, we could use new doorbell index for paging queue. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-06drm/amdgpu/psp: initialize xgmi session (v2)Hawking Zhang1-1/+2
Setup and tear down xgmi as part of psp. v2: - make psp_xgmi_terminate static - squash in: drm/amdgpu: only issue xgmi cmd when it is enabled drm/amdgpu/psp: terminate xgmi ta in suspend and hw_fini phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05drm/amdgpu: Refine CSA related functionsRex Zhu1-2/+4
There is no functional changes, Use function arguments for SRIOV special variables which is hardcode in those functions. so we can share those functions in baremetal. Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05drm/amdgpu: Enable default GPU reset for dGPU on gfx8/9 v3Andrey Grodzovsky1-4/+26
After testing looks like these subset of ASICs has GPU reset working for the most part. Enable reset due to job timeout. v2: Switch from GFX version to ASIC type. v3: Fix identation Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-01drm/amdgpu: revert "enable gfxoff in non-sriov and stutter mode by default"Christian König1-2/+0
This is still completely breaking my Raven system. This reverts commit cdf2f910fa969adca1b0e3ad2b487821233dc038. Revert until we sort out the sbios and firmware combinations that work correctly. bug: https://bugs.freedesktop.org/show_bug.cgi?id=108606 Cc: stable@vger.kernel.org # v4.19 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-01drm/amdgpu: Fix skipping hangged job reset during gpu recover.Andrey Grodzovsky1-1/+1
Problem: During GPU recover DAL would hang in amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty Fix: Turns out there was a typo introduced by 3320b8d drm/amdgpu: remove job->ring which caused skipping amdgpu_fence_driver_force_completion and so the hangged job was never force signaled and this would cause the hang later in DAL. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-22drm/amdgpu: Fix null pointer amdgpu_device_fw_loadingEmily Deng1-1/+1
Need to check adev->powerplay.pp_funcs. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10drm/amdgpu: Load fw between hw_init/resume_phase1 and phase2Rex Zhu1-1/+60
Extract the function of fw loading out of powerplay. Do fw loading between hw_init/resuem_phase1 and phase2 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10drm/amdgpu: split ip hw_init into 2 phasesRex Zhu1-13/+53
We need to do some IPs earlier to deal with ordering issues similar to how resume is split into two phases. Will do fw loading via smu/psp between the two phases. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10drm/amdgpu: Split amdgpu_ucode_init/fini_bo into two functionsRex Zhu1-0/+4
1. one is for create/free bo when init/fini 2. one is for fill the bo before fw loading the ucode bo only need to be created when load driver and free when driver unload. when resume/reset, driver only need to re-fill the bo if the bo is allocated in vram. Suggested by Christian. v2: Return error when bo create failed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10drm/amdgpu: Check late_init status before set cg/pg stateRex Zhu1-2/+2
Fix cg/pg unexpected set in hw init failed case. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-10drm/amdgpu: Refine function amdgpu_device_ip_late_initRex Zhu1-2/+2
1. only call late_init when hw_init successful, so check status.hw instand of status.valid in late_init. 2. set status.late_initialized true if late_init was not implemented. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-09drm/amdgpu: Move gfx flag in_suspend to adevRex Zhu1-0/+3
Move in_suspend flag to adev from gfx, so can be used in other ip blocks, also keep consistent with gpu_in_reset flag. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-10-09drm/amd/powerplay: helper interfaces for MGPU fan boost featureEvan Quan1-0/+41
MGPU fan boost feature is enabled only when two or more dGPUs in the system. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-19drm/amdgpu: fix shadow BO restoringChristian König1-81/+28
Don't grab the reservation lock any more and simplify the handling quite a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-19drm/amdgpu: always recover VRAM during GPU recoveryChristian König1-5/+5
It shouldn't add much overhead and we should make sure that critical VRAM content is always restored. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14drm/amdgpu: simplify Raven, Raven2, and Picasso handlingAlex Deucher1-7/+3
Treat them all as Raven rather than adding a new picasso asic type. This simplifies a lot of code and also handles the case of rv2 chips with the 0x15d8 pci id. It also fixes dmcu fw handling for picasso. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14drm/amdgpu: add raven2 to gpu_info firmwareFeifei Xu1-1/+5
Add gpu_info firmware for raven2. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14drm/amdgpu: enable gfxoff in non-sriov and stutter mode by defaultKenneth Feng1-0/+2
enable gfxoff in non-sriov and stutter mode by default Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14drm/amd/display/dm: add picasso supportLikun Gao1-0/+1
Add support for picasso to the display manager. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14drm/amdgpu: add soc15 support for picassoLikun Gao1-1/+6
Add the IP blocks, clock and powergating flags, and common clockgating support. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-14drm/amdgpu: add picasso to asic_type enumLikun Gao1-0/+1
Add picasso to amd_asic_type enum and amdgpu_asic_name[]. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10drm/amdgpu : Generate XGMI topology info from driver levelShaoyun Liu1-0/+1
Driver will save an array of XGMI hive info, each hive will have a list of devices that have the same hive ID. Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-10drm/amdgpu: move PSP init prior to IH in gpu resetEmily Deng1-1/+1
since we use PSP to program IH regs now Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-29drm/amdgpu: move amdgpu_device_(vram|gtt)_locationChristian König1-65/+0
Move that into amdgpu_gmc.c since we are really deadling with GMC address space here. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-28drm/amdgpu: move full access into amdgpu_device_ip_suspendYintian Tao1-6/+6
It will be more safe to make full-acess include both phase1 and phase2. Then accessing special registeris wherever at phase1 or phase2 will not block any shutdown and suspend process under virtualization. Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: cleanup GPU recovery check a bit (v2)Christian König1-13/+25
Check if we should call the function instead of providing the forced flag. v2: rebase on KFD changes (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: remove fulll access for suspend phase1Yintian Tao1-6/+0
There is no need for gpu full access for suspend phase1 because under virtualization there is no hw register access for dce block. Signed-off-by: Yintian Tao <yttao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Set clock ungate state when suspend/finiRex Zhu1-51/+5
After set power ungate state, set clock ungate state before when suspend or fini. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Set power ungate state when suspend/finiRex Zhu1-6/+5
Unify to set power ungate state at the begin of suspend/fini. Remove the workaround code for gfx off feature in amdgpu_device.c. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Refine function name and function argsRex Zhu1-20/+19
There are no any logical changes here. 1. change function names: amdgpu_device_ip_late_set_pg/cg_state to amdgpu_device_set_pg/cg_state. 2. add a function argument cg/pg_state, so we can enable/disable cg/pg through those functions Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: use new scheduler load balancing for VMsChristian König1-1/+1
Instead of the fixed round robin use let the scheduler balance the load of page table updates. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Cancel the delay work when suspendRex Zhu1-0/+2
Cancel the delay work to avoid the corner case that ib test was not running when suspend Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Cancel gfx off delay work when driver fini/suspendRex Zhu1-0/+2
there may be gfx off delay work pending when suspend/driver unload, need to cancel them first. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Ctrl gfx off via amdgpu_gfx_off_ctrlRex Zhu1-4/+2
use amdgpu_gfx_off_ctrl function so driver can arbitrate whether the gfx ip can be power off or power on. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Put enable gfx off feature to a delay threadRex Zhu1-0/+15
delay to enable gfx off feature to avoid gfx on/off frequently suggested by Alex and Evan. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: Add amdgpu_gfx_off_ctrl functionRex Zhu1-0/+2
v2: 1. drop the special handling for the hw IP suggested by hawking and Christian. 2. refine the variable name suggested by Flora. This funciton as the entry of gfx off feature. we arbitrat gfx off feature enable/disable in this function. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-21Revert "drm/amdgpu/display: Replace CONFIG_DRM_AMD_DC_DCN1_0 with CONFIG_X86"Leo (Sunpeng) Li1-1/+1
This reverts commit 8624c3c4dbfe24fc6740687236a2e196f5f4bfb0. We need CONFIG_DRM_AMD_DC_DCN1_0 to guard code that is using fp math. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>