diff options
Diffstat (limited to 'Documentation/admin-guide/ras.rst')
-rw-r--r-- | Documentation/admin-guide/ras.rst | 22 |
1 files changed, 21 insertions, 1 deletions
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst index d71340e86c27..1b90c6f00a92 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/ras.rst @@ -81,7 +81,7 @@ That defines some categories of errors: still run, eventually replacing the affected hardware by a hot spare, if available. - Also, when an error happens on an userspace process, it is also possible to + Also, when an error happens on a userspace process, it is also possible to kill such process and let userspace restart it. The mechanism for handling non-fatal errors is usually complex and may @@ -438,11 +438,13 @@ A typical EDAC system has the following structure under │ │ ├── ce_count │ │ ├── ce_noinfo_count │ │ ├── dimm0 + │ │ │ ├── dimm_ce_count │ │ │ ├── dimm_dev_type │ │ │ ├── dimm_edac_mode │ │ │ ├── dimm_label │ │ │ ├── dimm_location │ │ │ ├── dimm_mem_type + │ │ │ ├── dimm_ue_count │ │ │ ├── size │ │ │ └── uevent │ │ ├── max_location @@ -457,11 +459,13 @@ A typical EDAC system has the following structure under │ │ ├── ce_count │ │ ├── ce_noinfo_count │ │ ├── dimm0 + │ │ │ ├── dimm_ce_count │ │ │ ├── dimm_dev_type │ │ │ ├── dimm_edac_mode │ │ │ ├── dimm_label │ │ │ ├── dimm_location │ │ │ ├── dimm_mem_type + │ │ │ ├── dimm_ue_count │ │ │ ├── size │ │ │ └── uevent │ │ ├── max_location @@ -483,6 +487,22 @@ this ``X`` memory module: This attribute file displays, in count of megabytes, the memory that this csrow contains. +- ``dimm_ue_count`` - Uncorrectable Errors count attribute file + + This attribute file displays the total count of uncorrectable + errors that have occurred on this DIMM. If panic_on_ue is set + this counter will not have a chance to increment, since EDAC + will panic the system. + +- ``dimm_ce_count`` - Correctable Errors count attribute file + + This attribute file displays the total count of correctable + errors that have occurred on this DIMM. This count is very + important to examine. CEs provide early indications that a + DIMM is beginning to fail. This count field should be + monitored for non-zero values and report such information + to the system administrator. + - ``dimm_dev_type`` - Device type attribute file This attribute file will display what type of DRAM device is |