Tuesday May 26, 2009

CPU or GPU? one or the other or both?

The transistor counts of both the CPU and GPU are escalating almost as fast as toxic assets from the sub prime mortgage meltdown.  As in every good debate there are usually 2 opposed sides to a given topic.  Political parties such as Democrats and Republicans thrive on the point versus counterpoint arguments.  This analogy certainly is applicable to the technology of semiconductors.  Gone are the days of the CPU as the center of the computer.  With the advancement of visual applications in both the commercial and entertainment sectors, graphic processing has made a claim as the center of the computer.  Today 50 years after the first silicon transistor, semiconductor advancements have exceeded industry predictions 25 years ago.  It is truly amazing that computer and graphic processor transistor counts have gone from 100s of millions and exceeded the billion of transistor ceiling!  That is one large mass of circuits that have to be designed, verified, placed, routed and timed for chip signoff.

As the industry has pretty much hit the celing on clock speed, multiple instances of cores have appeared.  However having a quad-core CPU does not mean that your office productivity suite will run faster on your desktop as this application is single threaded.  Applications that are muti threaded will be able to take advantage of mutiple cores.  A good example is visualization hypervisor software that will run on multiple bare metal cores.  When you are managing multiple virtual machine instances many cpu threads come in handy.

It is obvious that word processing applications do not need extreme graphics processing either.  Then what does require high end graphics?  The graphics capability of the microprocessor is pretty impressive these days.  I can think of two areas: high end video games and visualization software for high end computer modeling and manipulation.  Both of these areas have a viable market as evidenced by the sales of popular gaming consoles out there such as PlayStation3 and the new consoles under development.  In the commercial sector 3D crash simulations are very cost effective for automobile manufactures when designing a safer automobile.

Ferraris and Fiat Cinquecentos both can go 50 mph (80 kph).  However not everyone has the need or monetary opportunity to purchase a Ferrari.  The same applies for CPU and/or GPUs depending on what you are trying to do.

Blog is available also at: http://bobporras.wordpress.com/

Wednesday Mar 18, 2009

Xbox 360 "RROD" it happended...

It took approximately 15 months for the event to occur, but as it has happened to so many others the RROD was not unexpected.  My son creates lots of computations on his 1 teraflop Xbox 360. Needless to say the gaming community is not very amused with Xbox 360 technical problems.  More about community in a future post.

The console in our house suffered from the lower right quadrant ring of red light, otherwise known as the "E74 System Error."  Microsoft has extended the 1-year warranty to 3 years for the RROD error, but the E74 System Error still only carries a 1 year warranty.  It is becoming obvious that the common hardware failures are interrelated (heat and cold solder joints) but go figure.

Next came the big decision.  Pack up the Xbox and ship it for a costly repair that will take 1 month or do it myself?  There is plenty of information out there in the cloud as to how one can fix the problems that statistically should occur at a much lower percentage.  My soon to be 15 year old son is contemplating a career in engineering so we said let's void the warranty that has already expired and see what happens...

To attempt a fix you need to disassemble the whole unit down to the bare motherboard.  This includes opening the clever injection molded plastic case that has no screws, the metal case that requires Torx screwdrivers, the control PCB that drives the (see picture above) on/off LED button, the CD drive, the air plenum, the heatsinks, the dual cooling fans, the drive power cable and the drive data cable. Next you need to unscrew the motherboard from the metal case in order to expose the back of the motherboard.  Here is where the infamous x-clamps reside. You need to remove the x-clamps that secure the heatsinks to the 2 custom ASICs. The 2 ASICs are pretty impressive. A custom multi threaded (2) multi-core (3) IBM PowerPC-based CPU and a custom ATI GPU. Polygon performance is 500 million triangles/sec and 48 billion shader operations/sec.  No wonder the HPC community is programming GPUs for their computational might.  512 MB of 700 MHz GDDR3 RAM feeds the GPU. Memory interface bandwidth comes in at 22.4GB/sec. Not bad at all but here lies the problem.  The Xbox is a screaming number cruncher which produces electrical resistance and as a result heat.  Thermal expansion (heat) and thermal contraction (cool) cause the motherboard to be bent by the x-clamps.  Repeated cycles of this causes cold soder joints. One bad connection on a signal and your Xbox is toast.  By the way server engineers have been dealing with this issue for years. The thermal problem can be solved, but it can and usually does as the result of adding cost to the product. In my opinion the issue for the Xbox is that it is an extremely high volume product and trading off added cost versus margin to a gaming console is a difficult balance.

The fix basically involves reflowing the solder balls in the CPU, GPU, RAM area with a heat gun. Assuming this is successful you have to put the whole game console back together.  But before you do this you replace the x-clamps with metric screws to attach the heat sinks.  You have to be very careful to clean the old heat paste completely from both ASICs before applying artic silver 5 thermal paste.  If you do not do this correctly your heatsinks will not work efficiently and your unit will overheat quickly.

We put the whole thing back together and powered it up around 10:30pm after spending about 6 hours working and 2 trips to the hardware store and RadioShack . It worked!  My son was happy as he was able to get to level 65 on Call of Duty 5.

I suggest that if you want to increase the odds of not getting RROD on your Xbox then mount your console in the tower position (standing on end) rather than flat like a laptop.  When the console is in the flat position the motherboard is on the bottom of the unit and cooling is more difficult.  When the console is mounted on it side (which is a valid position since it has skid pads on its side as well as bottom) the motherboard is cooled more efficiently. It's even better if you can can mount the Xbox on 4 small blocks so that more cool air can flow into the unit from all top, bottom and side intakes.

It was fun showing my son aspects of engineering in practice but even more enjoyable to actually have fixed the console for him.  On the downside-- given the data out there on these RROD problems, I know the unit will ultimately fail again...

Blog is available also at: http://bobporras.wordpress.com/


The blog of Bob Porras - Vice President, Data, Availability, Scalability & HPC for Sun Microsystems, Inc.


« July 2016