diff options
author | Borislav Petkov <bp@suse.de> | 2015-01-13 15:08:51 +0100 |
---|---|---|
committer | Borislav Petkov <bp@suse.de> | 2015-02-19 13:24:25 +0100 |
commit | 3f2f0680d1161df96a0e8fea16930f1bd487a9cf (patch) | |
tree | 29009c1b6dcc24a7dc93ba485983c6ea5f31e0f0 /arch/x86/include/asm/mce.h | |
parent | 0eac092d8307db61d320f77f9fce40e60b4ffa89 (diff) | |
download | linux-3f2f0680d1161df96a0e8fea16930f1bd487a9cf.tar.bz2 |
x86/MCE/intel: Cleanup CMCI storm logic
Initially, this started with the yet another report about a race
condition in the CMCI storm adaptive period length thing. Yes, we have
to admit, it is fragile and error prone. So let's simplify it.
The simpler logic is: now, after we enter storm mode, we go straight to
polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm
mode as long as we see errors being logged while polling.
Theoretically, if we see an uninterrupted error stream, we will remain
in storm mode indefinitely and keep polling the MSRs.
However, when the storm is actually a burst of errors, once we have
logged them all, we back out of it after ~5 mins of polling and no more
errors logged.
If we encounter an error during those 5 minutes, we reset the polling
interval to 5 mins.
Making machine_check_poll() return a bool and denoting whether it has
seen an error or not lets us simplify a bunch of code and move the storm
handling private to mce_intel.c.
Some minor cleanups while at it.
Reported-by: Calvin Owens <calvinowens@fb.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Diffstat (limited to 'arch/x86/include/asm/mce.h')
-rw-r--r-- | arch/x86/include/asm/mce.h | 8 |
1 files changed, 4 insertions, 4 deletions
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 51b26e895933..13eeea518233 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -183,11 +183,11 @@ typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS); DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); enum mcp_flags { - MCP_TIMESTAMP = (1 << 0), /* log time stamp */ - MCP_UC = (1 << 1), /* log uncorrected errors */ - MCP_DONTLOG = (1 << 2), /* only clear, don't log */ + MCP_TIMESTAMP = BIT(0), /* log time stamp */ + MCP_UC = BIT(1), /* log uncorrected errors */ + MCP_DONTLOG = BIT(2), /* only clear, don't log */ }; -void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); +bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b); int mce_notify_irq(void); void mce_notify_process(void); |