Improvements in Nehalelm Page Retire
By user9148476 on Sep 25, 2009
This change hit build 125 today. Solaris has supported memory page retire since the initial launch of Nehalem EP. Today's putback improves that support in the area of fault replay.
FMA persists page retires (and all other faults) across reboots via the on-disk fault cache. When FMD starts, the fault cache is consulted and (provided the affected resource is still in the configuration) replays the cached faults.
For page retires, the faults are associated with a physical address (PA). Between OS reboots, it's possible the memory topology can change - DIMMs can be added/removed, interleaves changed, etc. In such cases, the physical/virtual mappings change, and the PA in the on-disk fault cache could point at a healthy page. FMD would then retire a page that had experienced no errors.
This putback adds code to recalculate the PA (if necessary) after reboots to ensure the correct, faulty page is re-retired.