diff options
author | Ofir Bitton <obitton@habana.ai> | 2021-05-24 22:58:44 +0300 |
---|---|---|
committer | Oded Gabbay <ogabbay@kernel.org> | 2021-06-18 15:23:41 +0300 |
commit | a39725819c816c87c6b4eeca4c10197a41e2a928 (patch) | |
tree | d601015c086bd19151bfe3db92fdc48d2e9e6385 /Kbuild | |
parent | 5a967fb3a74113724cf3f5fd9021d43fe2bda32e (diff) | |
download | linux-a39725819c816c87c6b4eeca4c10197a41e2a928.tar.bz2 |
habanalabs/gaudi: don't use disabled ports in collective wait
In the collective wait, we put jobs on the QMANs of all the NICs. The
code takes into account if a port is disabled only in case of PCI card.
When this info arrives from the f/w, the code doesn't take it into
account, and it tries to schedule jobs on NICs that aren't enabled and
thats a bug.
To fix this, after the f/w sends us the list of disabled ports, we
update the state of the QMANs according to that list. In addition,
we need to update the HW_CAP bits so the collective wait operation
will not try to use those QMANs. We also need to update the collective
master monitor mask.
Moreover, we need to add a protection for such future cases and in case
the user will try to submit work to those QMANs.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Diffstat (limited to 'Kbuild')
0 files changed, 0 insertions, 0 deletions