diff options
author | Monk Liu <Monk.Liu@amd.com> | 2021-09-01 08:46:46 +0800 |
---|---|---|
committer | Andrey Grodzovsky <andrey.grodzovsky@amd.com> | 2021-09-15 10:21:30 -0400 |
commit | bcf26654a38f8e55ecac4635dac2e72c161d0063 (patch) | |
tree | 20eb90439123abf55821c744dd05e7974206a743 /drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c | |
parent | 282abb5a1f381d0ec10b20893961563be174a1c3 (diff) | |
download | linux-bcf26654a38f8e55ecac4635dac2e72c161d0063.tar.bz2 |
drm/sched: fix the bug of time out calculation(v4)
issue:
in cleanup_job the cancle_delayed_work will cancel a TO timer
even the its corresponding job is still running.
fix:
do not cancel the timer in cleanup_job, instead do the cancelling
only when the heading job is signaled, and if there is a "next" job
we start_timeout again.
v2:
further cleanup the logic, and do the TDR timer cancelling if the signaled job
is the last one in its scheduler.
v3:
change the issue description
remove the cancel_delayed_work in the begining of the cleanup_job
recover the implement of drm_sched_job_begin.
v4:
remove the kthread_should_park() checking in cleanup_job routine,
we should cleanup the signaled job asap
TODO:
1)introduce pause/resume scheduler in job_timeout to serial the handling
of scheduler and job_timeout.
2)drop the bad job's del and insert in scheduler due to above serialization
(no race issue anymore with the serialization)
Tested-by: jingwen <jingwen.chen@@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1630457207-13107-1-git-send-email-Monk.Liu@amd.com
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c')
0 files changed, 0 insertions, 0 deletions