Why I removed Ubuntu ..Really!!

 

A couple of  months ago, I got  a new workstation.  A quad core Xeon box from Dell. It came with preinstalled Ubuntu.  I decide to play with that for while.  After a while I noticed that the system would randomly reset. Ah the darn unstable linux is what I thought!!

I reinstalled the system with OpenSolaris 2008.11 and it ran for a day or so. The system reset again.  Annoyed I started looking at /var/adm/messages and I found that there was a hardware fault detected by FMA and taken appropriate action.  Now it was nice to call Dell Support  and tell them definatively the cause, diagnosis and that I needed a new motherboard.  

The rep asked me what test was running.  I said nothing. This functionality is built into OpenSolaris...and it's free!!  One cannot get this running linux.

Here's what fmadm faulty printed

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 20 01:29:08 152a7687-c256-40dd-80b1-83c1f4ed74c7  INTEL-8001-43  Critical 

Fault class : fault.cpu.intel.nb.ie
FRU         : "MB" (hc://:product-id=Precision-WorkStation-T3400:chassis-id=65QLTH1:server-id=opensolaris/motherboard=0)
                  faulty

Description : Northbridge has detected an internal error  Refer to
              http://sun.com/msg/INTEL-8001-43 for more information.

Response    : System panic or reset by BIOS

Impact      : System may be unexpectedly reset

Action      : Replace motherboard

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 20 01:29:08 50ba84aa-3f12-c2c5-9c0b-8fdec9454104  INTEL-8000-LE  Major    

Fault class : fault.cpu.intel.l1dcache
Affects     : hc://:product-id=Precision-WorkStation-T3400:chassis-id=65QLTH1:server-id=opensolaris/motherboard=0/chip=0/core=3/strand=0
                  faulted and taken out of service
FRU         : hc://:product-id=Precision-WorkStation-T3400:chassis-id=65QLTH1:server-id=opensolaris/motherboard=0/chip=0
                  faulty

Description : A level 1 Data Cache on this cpu is faulty.  Refer to
              http://sun.com/msg/INTEL-8000-LE for more information.

Response    : The system will attempt to offline this cpu to remove it from
              service.

Impact      : Performance of this system may be affected.

Action      : Schedule a repair procedure to replace the affected CPU.  Use
              'fmadm faulty' to identify the module.


Comments:

Awesome, bookmarked on my page.

Posted by Dave on April 02, 2009 at 05:04 PM MDT #

I wondered if someone had a practical use for FMA on a personal system, other than SMF itself. Very sweet.

Posted by Michael Ernest on April 03, 2009 at 02:03 AM MDT #

My Dell would randomly reboot, it is also a Quad Core machine, the solution however was to update the BIOS. As soon as it was on the latest verison it was stable.

FYI it is an XPS420, and the Bios came oout a few weeks after I purchased it. I wonder how many went back as faulty......

Posted by Mark hayden on April 06, 2009 at 11:48 PM MDT #

Post a Comment:
Comments are closed for this entry.
About

user12610379

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today