summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
AgeCommit message (Collapse)AuthorFilesLines
2022-09-29drm/amdgpu: Add amdgpu suspend-resume code path under SRIOVBokun Zhang1-1/+26
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor during the suspend time. Furthermore, we cannot request a mode 1 reset under SRIOV as VF. Therefore, we will skip it as it is called in suspend_noirq() function. - In the resume code path, we need to send REQ_GPU_INIT to the hypervisor and also resume PSP IP block under SRIOV. Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: Use simplified API for p2p dist calcLijo Lazar1-3/+3
Use the simpified API that calculates distance between two devices. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: Disable verbose for p2p dist calcLijo Lazar1-1/+1
Disable verbose while getting p2p distance. With verbose, it shows warning if ACS redirect is set between the devices. Adds noise to dmesg logs when a few GPU devices are on the same platform. Example log: amdgpu 0000:34:00.0: ACS redirect is set between the client and provider (0000:31:00.0) amdgpu 0000:34:00.0: to disable ACS redirect for this path, add the kernel parameter: pci=disable_acs_redir=0000:30:00.0;0000:2e:00.0;0000:33:00.0;0000:2e:10.0 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: add gang submit backend v2Christian König1-0/+35
Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu deviceYiPeng Chai1-1/+4
V3: Fixed psp fence and memory issues for the asic using smu v13_0_2 when removing amdgpu device. [Why]: 1. psp_suspend->psp_free_shared_bufs-> psp_ta_free_shared_buf-> amdgpu_bo_free_kernel-> ...->amdgpu_bo_release_notify-> amdgpu_fill_buffer psp will free vram memory used by psp when psp_suspend is called. But for the asic using smu v13_0_2, because psp_suspend is called before adev->shutdown is set to true when removing the first hive device, amdgpu fill_buffer will be called, which will cause fence issues when evicting all vram resources in amdgpu vram mgr_fini. 2. Since psp_hw_fini is not called after calling psp_suspend and psp_suspend only calls psp_ring_stop, the psp ring memory will not be released when amdgpu device is removed. [How]: 1. Set shutdown to true before calling amdgpu_device_gpu_recover, then amdgpu_fill_buffer will not be called when psp_suspend is called. 2. Free psp ring memory in psp_sw_fini. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Adjust removal control flow for smu v13_0_2YiPeng Chai1-0/+30
Adjust removal control flow for smu v13_0_2: During amdgpu uninstallation, when removing the first device, the kernel needs to first send a mode1reset message to all gpu devices. Otherwise, smu initialization will fail the next time amdgpu is installed. V2: 1. Update commit comments. 2. Remove the global variable amdgpu_device_remove_cnt and add a variable to the structure amdgpu_hive_info. 3. Use hive to detect the first removed device instead of a global variable. V3: 1. Update commit comments. 2. Split a patch into multiple patches. 3. The current patch does: a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into the existing gpu recover path, which make all devices in hive list only have HW reset but no resume (except the base IP). b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover in amdgpu_pci_remove when removing the first device in hive list. c. When removing the first device, the IP blocks keyword function call sequence is as follows: .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini. ^ | |-<----------<---------<----| The first three sequences are because of a call to amdgpu_device_gpu_recover. The three sequences will be executed in a loop until all devices in the hive list are iterated. The sequences starting from .hw_fini only apply to the first device. Since .suspend has been called before, except the resumed phase1 basic ip blocks, all other ip blocks .hw_fini of current device will do nothing. d. When removing other devices, the calling sequences is the same as legacy: .hw_fini -> .early_fini -> .sw_fini. Since .suspend has been called when removing the first device, except the resumed phase1 basic ip blocks, all of other ip blocks .hw_fini of current device will do nothing. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-14drm/amdgpu: make sure to init common IP before gmcAlex Deucher1-3/+11
Move common IP init before GMC init so that HDP gets remapped before GMC init which uses it. This fixes the Unsupported Request error reported through AER during driver load. The error happens as a write happens to the remap offset before real remapping is done. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: Fix hive reference count leakVignesh Chander1-2/+2
both get_xgmi_hive and put_xgmi_hive can be skipped since the reset domain is not necessary for VF Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: Use per device reset_domain for XGMI on sriov configurationshaoyunl1-9/+11
For SRIOV configuration, host driver control the reset method(either FLR or heavier chain reset). The host will notify the guest individually with FLR message if individual GPU within the hive need to be reset. So for guest side, no need to use hive->reset_domain to replace the original per device reset_domain Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-07drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is ↵YiPeng Chai1-2/+1
uninstalled V1: The psp_cmd_submit_buf function is called by psp_hw_fini to send TA unload messages to psp to terminate ras, asd and tmr. But when amdgpu is uninstalled, drm_dev_unplug is called earlier than psp_hw_fini in amdgpu_pci_remove, the calling order as follows: static void amdgpu_pci_remove(struct pci_dev *pdev) { drm_dev_unplug ...... amdgpu_driver_unload_kms->amdgpu_device_fini_hw->... ->.hw_fini->psp_hw_fini->... ->psp_ta_unload->psp_cmd_submit_buf ...... } The program will return when calling drm_dev_enter in psp_cmd_submit_buf. So the call to drm_dev_enter in psp_cmd_submit_buf should be removed, so that the TA unload messages can be sent to the psp when amdgpu is uninstalled. V2: 1. Restore psp_cmd_submit_buf to its original code. 2. Move drm_dev_unplug call after amdgpu_driver_unload_kms in amdgpu_pci_remove. 3. Since amdgpu_device_fini_hw is called by amdgpu_driver_unload_kms, remove the unplug check to release device mmio resource in amdgpu_device_fini_hw before calling drm_dev_unplug. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinksAlex Sierra1-1/+2
[Why] Devices with CPU XGMI iolink do not support PCIe peer access. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amd/amdgpu: avoid soft reset check when gpu recovery disabledChengming Gui1-5/+6
Avoid soft reset, even ip hang check (ring/ib test) when gpu recovery disabled. v2: add missing "}" Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: Remove the additional kfd pre reset call for sriovshaoyunl1-2/+0
The additional call is caused by merge conflict Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: fix hive reference leak when adding xgmi deviceYiPeng Chai1-0/+2
Only amdgpu_get_xgmi_hive but no amdgpu_put_xgmi_hive which will leak the hive reference. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amd: Add detailed GFXOFF stats to debugfsAndré Almeida1-0/+2
Add debugfs interface to log GFXOFF statistics: - Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the time of query since system power-up - Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop. Read it to get average GFXOFF residency % multiplied by 100 during the last logging interval. Both features are designed to be keep the values persistent between suspends. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: revert context to stop engine before mode2 resetVictor Zhao1-0/+1
For some hang caused by slow tests, engine cannot be stopped which may cause resume failure after reset. In this case, force halt engine by reverting context addresses Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: let mode2 reset fallback to default when failureVictor Zhao1-1/+6
- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-10drm/amdgpu: Avoid another list of reset devicesLijo Lazar1-0/+2
A list of devices to be reset is already created in amdgpu_device_gpu_recover function. Creating another list with the same nodes is incorrect and not supported in list_head. Instead, pass the device list as part of reset context. Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdgpu: move mes self test after drm sched re-startedJack Xiao1-0/+3
mes self test rely on vm mapping, move it after drm sched re-started so that vm mapping can work during gpu reset. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-and-tested-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdgpu: drop non-necessary call trace dumpEvan Quan1-1/+0
This extra call trace dump comes out in every gpu reset. And it gives people a wrong impression that something went wrong. Although actually there was not. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18drm/amdgpu: Get rid of amdgpu_job->external_hw_fenceAndrey Grodzovsky1-2/+1
This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (*sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzovsky@amd.com/ Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13drm/amdgpu: support reset flag set for gpu resetLikun Gao1-13/+7
Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29drm/amdgpu: fix documentation warningAlex Deucher1-1/+1
Fixes this issue: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:5094: warning: expecting prototype for amdgpu_device_gpu_recover_imp(). Prototype was for amdgpu_device_gpu_recover() instead Fixes: cf727044144d ("drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover") Reviewed-by: Kent Russell <kent.russell@amd.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29drm/amdgpu: Fix typos in amdgpu_stop_pending_resetsKent Russell1-2/+2
Change amdggpu to amdgpu and pedning to pending Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28drm/amdgpu: Follow up change to previous drm scheduler change.Andrey Grodzovsky1-6/+21
Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Tested-by: Yiqing Yao <yiqing.yao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28drm/amdgpu: Prevent race between late signaled fences and GPU reset.Andrey Grodzovsky1-0/+4
Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. v2: Switch from irq_get/put to full enable/disable_irq for amdgpu Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-21drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()Alex Deucher1-1/+1
Use the correct adev variable for the drm_fb_helper in amdgpu_device_gpu_recover(). Noticed by inspection. Fixes: 087451f372bf ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-14drm/amdgpu: remove redundant enable_mes and enable_mes_kiqYifan Zhang1-8/+0
enable_mes and enable_mes_kiq are set in both device init and MES IP init. Leave the ones in MES IP init, since it is a more accurate way to judge from GC IP version. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Stop any pending reset if another in progress.Andrey Grodzovsky1-0/+27
We skip rest requests if another one is already in progress. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to ↵Andrey Grodzovsky1-1/+1
amdgpu_device_gpu_recover We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Add work_struct for GPU reset from kfd.Andrey Grodzovsky1-31/+0
We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Cache result of last reset at reset domain level.Andrey Grodzovsky1-2/+4
Will be read by executors of async reset like debugfs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUsRamesh Errabolu1-0/+31
Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-06drm/amdgpu: adding device coredump supportSomalapuram Amaranath1-0/+64
Added device coredump information: - Kernel version - Module - Time - VRAM status - Guilty process name and PID - GPU register dumps v1 -> v2: Variable name change v1 -> v2: NULL check v1 -> v2: Code alignment v1 -> v2: Adding dummy amdgpu_devcoredump_free v1 -> v2: memset reset_task_info to zero v2 -> v3: add CONFIG_DEV_COREDUMP for variables v2 -> v3: remove NULL check on amdgpu_devcoredump_read Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-06drm/amdgpu: save the reset dump register value for devcoredumpSomalapuram Amaranath1-3/+3
Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: Shashank Sharma <Shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03drm/amdgpu: fix up comment in amdgpu_device_asic_has_dc_support()Alex Deucher1-1/+1
LVDS support was implemented in DC a while ago. Just DAC support is left to do. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03drm/amdgpu: simplify the logic in amdgpu_device_parse_gpu_info_fw()Alex Deucher1-29/+0
Drop all of the extra cases in the default case. Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03amdgpu: amdgpu_device.c: Removed trailing whitespaceMitchell Augustin1-1/+1
Removed trailing whitespace from end of line in amdgpu_device.c Signed-off-by: Mitchell Augustin <kernel@mitchellaugustin.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03drm/amdgpu: simplify amdgpu_device_asic_has_dc_support()Alex Deucher1-26/+0
Drop extra cases in the default case. Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26drm/amdgpu: move amdgpu_gmc_tmz_set after ip_version populatedSunil Khatri1-3/+3
To enable TMZ feature based on IP version needs adev->ip_version populated but its empty. Move amdgpu_gmc_tmz_set to a place where ip_version is populated. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26drm/amdgpu: support ras on SRIOVStanley.Yang1-0/+4
support umc/gfx/sdma ras on guest side Changed from V1: move sriov judgment in amdgpu_ras_interrupt_fatal_error_handler Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-10drm/amdgpu/psp: Add vbflash sysfs interface supportLikun Gao1-0/+10
Add sysfs interface to copy VBIOS. v2: squash in fix for proper vmalloc API (Alex) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-06drm/amdgpu: flush delete wq after wait fenceYiqing Yao1-4/+5
[why] lru_list not empty warning in sw fini during repeated device bind unbind. There should be a amdgpu_fence_wait_empty() before the flush_delayed_work() call as Christian suggested. [how] Move to do flush_delayed_work for ttm bo delayed delete wq after fence_driver_hw_fini. Tested by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yiqing Yao <yiqing.yao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: Enable KFD with MES enabledMukul Joshi1-15/+9
Enable KFD initialization with MES enabled. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: skip kfd routines when mes enabledJack Xiao1-9/+16
For kfd hasn't supported mes, skip kfd routines. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: add mes_kiq module parameter v2Jack Xiao1-2/+7
mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: use the whole doorbell space for mesJack Xiao1-13/+19
Use the whole doorbell space for mes. Each queue in one process occupies one doorbell slot to ring the queue submitting. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-28drm/amdgpu: switch to atomfirmware_asic_initHawking Zhang1-1/+4
Some initial settings now are not available from the atom data table. The assumption that !ps[0] || !ps[1] in amdgpu_atom_asic_init is not valid. In addition, driver needs to strictly follow atomfirmware structure (asic_init_parameters) to initialize parameters used to execute asic_init function, otherwise, the execution of asic_init would fail. This shall be applicable to all soc15 adapters,but let make the transition on soc21 first. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-28drm/amdgpu/discovery: move all table parsing into amdgpu_discovery.cAlex Deucher1-3/+1
This data has no dependencies, so encapsulate it all within amdgpu_discovery.c. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amd/amdgpu: Update PF2VF headerBokun Zhang1-1/+1
- In the latest version of the header, there is a variable name change. This should not cause any backward compatibility since the variable is at the same offset in the struct. Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>