drm/amdgpu: skip reset other device in the same hive if it's SRIOV VF

On SRIOV, host driver can support FLR(function level reset) on individual VF within the hive which might bring the individual device back to normal without the necessary to execute the hive reset. If the FLR failed , host driver will trigger the hive reset, each guest VF will get reset notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
author: Zhigang Luo <zhigang.luo@amd.com> 2021-11-26 12:16:45 -0500
committer: Alex Deucher <alexander.deucher@amd.com> 2021-12-13 16:32:34 -0500
commit: 175ac6ec6bd8db6b7e08fed8fd189bd492015b28 (patch)
tree: 1e49cf8f15ea642a0d108e963e9457a4386f1680 /drivers
parent: 123202744955e62470174fc3ba666a4d98062ea6 (diff)
download: linux-175ac6ec6bd8db6b7e08fed8fd189bd492015b28.tar.bz2
1 files changed, 4 insertions, 3 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a1c14466f23d..25a9e529d62e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4747,7 +4747,7 @@ static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgp
 {
 	struct amdgpu_device *tmp_adev = NULL;
 
-	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+	if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) {
 		if (!hive) {
 			dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes");
 			return -ENODEV;
@@ -4959,7 +4959,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * We always reset all schedulers for device and all devices for XGMI
 	 * hive so that should take care of them too.
 	 */
-	hive = amdgpu_get_xgmi_hive(adev);
+	if (!amdgpu_sriov_vf(adev))
+		hive = amdgpu_get_xgmi_hive(adev);
 	if (hive) {
 		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
 			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
@@ -5000,7 +5001,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * to put adev in the 1st position.
 	 */
 	INIT_LIST_HEAD(&device_list);
-	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+	if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) {
 		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head)
 			list_add_tail(&tmp_adev->reset_list, &device_list);
 		if (!list_is_first(&adev->reset_list, &device_list))
author	Zhigang Luo <zhigang.luo@amd.com>	2021-11-26 12:16:45 -0500
committer	Alex Deucher <alexander.deucher@amd.com>	2021-12-13 16:32:34 -0500
commit	175ac6ec6bd8db6b7e08fed8fd189bd492015b28 (patch)
tree	1e49cf8f15ea642a0d108e963e9457a4386f1680 /drivers
parent	123202744955e62470174fc3ba666a4d98062ea6 (diff)
download	linux-175ac6ec6bd8db6b7e08fed8fd189bd492015b28.tar.bz2