Improvements in Nehalelm Page Retire

6734814 Intel address translation Phase II

This change hit build 125 today. Solaris has supported memory page retire since the initial launch of Nehalem EP. Today's putback improves that support in the area of fault replay.

FMA persists page retires (and all other faults) across reboots via the on-disk fault cache. When FMD starts, the fault cache is consulted and (provided the affected resource is still in the configuration) replays the cached faults.

For page retires, the faults are associated with a physical address (PA). Between OS reboots, it's possible the memory topology can change - DIMMs can be added/removed, interleaves changed, etc. In such cases, the physical/virtual mappings change, and the PA in the on-disk fault cache could point at a healthy page. FMD would then retire a page that had experienced no errors.

This putback adds code to recalculate the PA (if necessary) after reboots to ensure the correct, faulty page is re-retired.

:wq

Comments:

Hey Scott, this is cool. Could you persistently keep track of DIMM serial number info that is provided by SMBIOS to determine whether the DIMM swap situation that you mentioned has happened? Or is that not really reliable data?

Posted by Dale Ghent on September 25, 2009 at 08:57 AM PDT #

Hi Dale....Yes. With certain config constraints (all met on Sun platforms), DIMM serial numbers (and labels) are sourced from SMBIOS and used in FMA topology (if you're terminally curious, check the intel_nhm source). So DIMM replacements are detected and FMD does not replay DIMM/page faults associated with the "old" DIMM. 'fmadm faulty' will not display "old" DIMM fault either. The faults do remain in the FMD fault cache until they expire (default=30 days). We do this in case the "old" DIMM makes a reappearance. :)

Posted by Scott Davenport on September 25, 2009 at 09:10 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

user9148476

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today