Solaris FMA for Nehalem

Today, Intel launched their Xeon Processor 5500 Series - aka Nehalem. And OpenSolaris is ready to go to take advantage of the features of this new processor, thanks in no small part to the ongoing OpenSolaris projects for Intel platforms. A good starting point to learn about all things Nehalem and OpenSolaris is Solaris On Intel overview pages. For the specifics on Solaris FMA, read on.

Several subsystems were updated to ensure that Solaris FMA continued to support expected features including CPU offlining and memory page retire. Changes include:

  • Machine Check Handler updates: Most notably, support for the newly added corrected machine check interrupt (CMCI) is added. Error throttling control on CMCIs is in place to mitigate against correctable error storms. Also, an intel plugin for refined error telemetry has been added to Solaris' MCA framework.
  • Memory Topology: For DIMM diagnosis and page retire, FMA requires a memory topology. For Nehalem, the memory topology is read directly from the memory controllers on the system via the intel_nhm driver, and post-processed by FMA's topology enumerators
  • Diagnosis Rule updates: Coverage for new Nehalem ereports, notably when the QuickPath detects errors. A particularly interesting one is notification of a memory sparing event.

Oh....and all of the above is forthcoming in the next Solaris 10 update release.

UPDATE 04/01/2009: There's a great video from Dave Stewart at Intel on Nehalem's CMCI and FMA interaction.

UPDATE 04/14/2009: Sun announced Nehalem based systems. Check out my blog describing fault management on Sun's new Nehalem systems.

:wq
Comments:

Post a Comment:
Comments are closed for this entry.
About

user9148476

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today