diff options
author | Yuri Nudelman <ynudelman@habana.ai> | 2021-06-06 10:28:51 +0300 |
---|---|---|
committer | Oded Gabbay <ogabbay@kernel.org> | 2021-08-29 09:47:46 +0300 |
commit | 938b793fdede518e10c67dc1dcfba1804faf98a6 (patch) | |
tree | 640575a63c1b0c204cdbacab5f2d0bb1c4cce3d7 /Documentation/ABI | |
parent | e79e745b208b3c709cc5e689e587d04a4c0fe1bb (diff) | |
download | linux-938b793fdede518e10c67dc1dcfba1804faf98a6.tar.bz2 |
habanalabs: expose state dump
To improve the user's ability to debug the case where a workload that
is part of executing training/inference of a topology is getting stuck,
we need to add a 'core dump' each time a CS times-out. The 'core dump'
shall contain all relevant Sync Manager information and corresponding
fence values.
The most recent dumps shall be accessible via debugfs, under
'state_dump' node. Reading from the node will provide the oldest dump
available. Writing an integer value X will discard X dumps, starting
with the oldest one, i.e. subsequent read will now return newer
dumps.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Diffstat (limited to 'Documentation/ABI')
-rw-r--r-- | Documentation/ABI/testing/debugfs-driver-habanalabs | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs index a5c28f606865..e29156511388 100644 --- a/Documentation/ABI/testing/debugfs-driver-habanalabs +++ b/Documentation/ABI/testing/debugfs-driver-habanalabs @@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of "0" means device will be reset in case some CS has timed out, otherwise it will not be reset. +What: /sys/kernel/debug/habanalabs/hl<n>/state_dump +Date: Oct 2021 +KernelVersion: 5.15 +Contact: ynudelman@habana.ai +Description: Gets the state dump occurring on a CS timeout or failure. + State dump is used for debug and is created each time in case of + a problem in a CS execution, before reset. + Reading from the node returns the newest state dump available. + Writing an integer X discards X state dumps, so that the + next read would return X+1-st newest state dump. + What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err Date: Mar 2020 KernelVersion: 5.6 |