Tags: andikleen/mcelog
Tags
mcelog: Improve cache-error-trigger script Two issues: 1) The script would attempt to take all CPUs offline for an L3 cache error on a single socket system. 2) Many users don't want any CPUs taken offline because of the reduced system performance. Make the default to just log the affected CPUs. But make it simple to enable offline for users that still want that. If offline is enabled, sanity check AFFECTED_CPUS does not refer to all online CPUs. Signed-off-by: Tony Luck <tony.luck@intel.com>
mcelog: Add model-specific decoding for Diamond Rapids The model-specific decoding for Diamond Rapids differs a lot from that of earlier generations. Add the new model-specific decoding for Diamond Rapids. Details of error codes published in chapter 17 of the September 2025 edition of the Intel(R) Architecture Instruction Set Extensions Programming Reference. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
mcelog: Add a --binary option for reading records saved to pstore The Linux kernel can be configured to save fatal error records to persistent storage with the pstore file system. These are a raw copy of "struct mce". Add an option to skip the ioctl() calls that determine the record size so that mcelog will decode a binary file given as argument. Signed-off-by: Tony Luck <tony.luck@intel.com>
Add ability to retry failed page offlines with an exponential backoff A page which fails to get offlined may become offlinable in the future, depending on memory usage patterns. Under the circumstances that the page continues to experience CEs, retrying the page offlining operation would make sense. This patch adds memory-ce-offline-retry, a mcelog.conf knob to turn on or off the ability to retry offlining a page that continues to cross the CE threshold. However, each successive retry will have an exponentially higher threshold so as not to overrun the system with retries.
mcelog: Wire up model-specific decoding for Clearwater Forest The model-specific decoding for Clearwater Forest is the same as Granite Rapids'. Wire up the model-specific docoding of Granite Rapids for Clearwater Forest. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
PreviousNext