Thursday Apr 27, 2006

A Tale of Two Books

The Second Edition of Solaris Internals

Sit. Relax. Breath. Read a book. Watch a movie. It's done (pretty much). Resync. Have fun. Get normal.

Recently, I found myself thinking about a book I read some time ago, Into Thin Air, Jon Krakauer's riveting account of the 1996 Mount Everest expedition that ended in disaster. In particular, I was recalling Krakauer's description of how he felt when he finally summited Everest. Having reached a milestone of this magnitude, standing on the tallest spot on the planet earth, Krakauer found the moment more surreal than anything else; feelings of intense joy and satisfaction were to come later. There was no jumping for joy (literally or figuratively)...he was mostly thinking about getting off the mountain.

I had a similar experience when Richard McDougall and I, along with Brendan Gregg, wrapped up the new edition of Solaris Internals. Now, please understand, I am not equating writing a technical book to climbing the tallest mountain in the world - it's not even in the same universe effort-wise. I'm simply drawing a comparison to a similar feeling of having accomplished something that I never thought I'd complete, and the hazy feeling I had (have) now that it's all done. To me, it's a fascinating example of what complex creatures we are. To achieve something of such significance (summiting Everest, not writing a technical book), and the natural gratification latency that follows. I think it will kick-in when we're shipping, and I can actually hold, in my hands, the new books. Which brings me to....

As per a blog I did last June, the updated edition of Solaris Internals did not have a smooth and predictable take-off. Even after we put a "stake in the ground", and narrowed our focus to Solaris 10 and OpenSolaris, we still suffered from self-induced scope-creep, and generally did all the wrong things in terms of constraining a complex project. The good news is that all the wrong things produced something that we are extremely proud of.

Our original goal was to produce an update to the first edition of Solaris Internals - no new content in terms of subject matter, but update, revise and rewrite the material such that it reflects Solaris 10 and OpenSolaris. Naturally, we could not ignore the new technology we had at our disposal, like DTrace, MDB, and all the other way-cool observability tools in Solaris 10 and OpenSolaris. Also, the availability of Solaris source code allowed us to reference specific areas of the code, and include code segments in the book. Additionally, the internal support and enthusiasm for the new edition was just overwhelming. The Sun engineering community, Solaris kernel engineering and adjunct groups, offered assistance in the writing, reviewing, content advise, etc. It was extremely gratifying to have so many talented and time-constrained engineers come forward and offer their time and expertise.

Given the tools and expertise we had at our disposal, it seemed inevitable that the end result would be something significantly different than our original intention of "a simple update". It actually brought us right back to one of our key goals when we wrote the first book. That is, in addition to describing the internals of the Solaris kernel, include methods of putting the information to practical use. To that end, we made extensive use of DTrace, and MDB, etc, to illustrate the areas of the kernel discussed throughout the text. The tools examples naturally evolved into performance and behavior related text. This is a good thing, in that a great many readers will be using the text specifically to understand the performance and behavior of thier Solaris systems. The not-so-good news is, once you cross the "performance" line, scoop broadens pretty significantly, and more content gets created. A lot more content. Pages and pages. And man, if I may be so bold, it's all good.

So while I was busily trying to complete my internals chapters, Richard took off like a rocket with new material for performance and tools, and recruited Brendan Gregg (whose DTrace ToolKit is a must-have download), to add his considerable experience and expertise. Faster than you could say "DTrace rocks", we did a book build and found we had over 1400 pages of material. And we were not done. We had some calls with the publisher, and discovered that the publishing industry is not particularly fond of publishing very large books, and we had some concerns about our readers needing orthopedic surgery due to carrying Solaris Internals around. So it was decided that we split the work into two books; an internals book, and a POD book. POD is an acronym that Richard and I have been using for some time, and expands to Performance, Observability and Debugging. We love being able to encapsulate so concisely what the vast majority of Solaris users wish to do - observe and understand for the purpose of improving performance, and/or root-causing pathological behavior. The tools are there, and now there's some documentation to leverage for guidance and examples.

As you can imagine, once we started on a performance book, scope-creep took a whole new path. Ideas flowed faster than content, and given sufficient time, we could easily have created 1000 pages on POD. As it was, finishing up turned out to be something of a herculean task. We were bound and determined not to miss another deadline, and deliver the book files to the publisher. Richard, Brendan and I were communicating using AIM (Brendan is in Australia, Richard in California, and I'm located in New Jersey), and pounding away to get the material cleaned up and ready to ship. What started out as a late night turned into an all-nighter. Literally. I had to stop at 7:45AM ET to take my Son to school, and I was happy to do it (stop that is). Incredibly, Brendan and Richard seemed like they could go on for hours more (note to self, start consuming Vegemite). In the end, we met our goal, and handed the publisher two books, over 1600 pages of material, on that hazy Monday morning.

Solaris Internals: Solaris 10 and OpenSolaris kernel architecture

Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and Open Solaris

So there it is. We're currently working with the publisher on getting the cover art together, dealing with typsetting issues, etc. We're doing our best to accelerate getting the books to the printer, and getting them out the door. Hopefully, correct cover images, ISBN numbers and pricing will make their way to the retailers very soon. I want to take this blog opportunity to thank the Solaris community for the positive feedback on the first edition, and the support and interest in getting the second edition out the door. We really have a winning combination, with the best operating system on the planet (Solaris - my objective opinion), world-class observability tools, open solaris source, and documentation to pull it all together, thus maximizing the using-Solaris experience. We look forward to hearing from and working with the Solaris community, and doing our part to broaden awareness of the power of Solaris, and contribute to the knowledge base. Keep an eye on the Solaris Internals and OpenSolaris WEB sites for feedback, forums and reader contributions.

Tuesday Dec 06, 2005

Niagara IO - Architecture & Performance

Today Sun is launching a revolutionary new set of server products. The Sun FireTM CoolThreads servers, internally named Ontario and Erie, are both based on the Niagara multicore SPARC processor. The Niagara, or UltraSPARCTM T1 processor, represents a quantum leap in implementing multiple execution pipelines (cores) on a single chip, with support for multiple hardware threads per pipeline. We refer to this throughput-oriented design as Chip Multithreading (CMT) technology. The UltraSPARC T1 processor incorporate eight execution cores, with four hardware threads per core, providing the capability of what previously required 32 processors (where each processor was a traditional design with a single instruction pipeline) on a single chip. The Sun FireTM T2000 (Ontario) and the Sun FireTMT1000 (Erie) represent ground breaking technology. First and foremost, the amount of processing power (CPU, memory, I/O) available in a relatively small system. Both the Sun FireTMT2000 and T1000 are rack mount chassis systems; the T2000 is two RU (rack unit) high, and the T1000 is one rack unit in height. Within a relatively small package, we find an amazing amount of computing power - not only in terms of parallel processor-oriented tasks, but also in memory and I/O bandwidth capabilities. The icing on the cake is the low power design of the systems. The UltraSPARC T1 processor generates a remarkably low amount of heat, and the system as a whole has an amazing performance/power metric.

But my blog here today is not about the power and heat metrics of the T2000 and T1000. I'm sure that the launch blog-burst will include specific data on that particular feature. Nor will I be detailing the UltraSPARC T1 microprocessor architecture - the beauty of 8 execution cores, 4 hardware threads per core, (32 threads total), and the whiz-bang performance and throughput these systems deliver with parallel workloads. My fellow bloggers will expound on these virtues, as well as other features. This discussion is intended to provide an overview of the I/O architecture of these systems, and a small sample of some performance numbers we have measured in our benchmarking work. Not industry standard benchmark results - they can be found on the product pages.

The I/O architecture of the T2000 includes five PCI slots; three PCI-E and two PCI-X, as well as four on-board gigabit ethernet ports. PCI-X is a 64-bit wide, 133Mhz bus, capable of 1.06GB/sec bandwidth. PCI-E (PCI-Express) is a point-to-point bus that provides a non-shared link to a PCI-E device. A link can be implemented with one or more lanes to carry data, where each lane carries a full-duplex serial data bit stream and a rate of 2.5Gbits/second. PCI-E implementaions can scale-up bandwidth based on the number of lanes implemented in the link, referred to as X1, X2, X4, X8, X12, X16 and X32, where the value after the X corresponds to the number of data lanes. PCI-E on the T2000 and T1000 is X8, supporting devices with up to 8 lanes of data bandwidth capability. The transport bus between the Fire I/O bridge chip and the UltraSPARC T1 processor is the Jbus, which has a theoretical maximum bandwidth of 2.5GB/sec. Please note that this is not memory bandwidth - processor to memory data transfers take place on a different physical bus in the system (and of course through a cache memory hierarchy). The Jbus is dedicated to I/O, providing true high-end I/O bandwidth capability.

The T1000 uses the same Fire I/O bridge chip and Jbus to interface the I/O subsystem to the UltraSPARC T1 processor. The T1000, at one RU in size, has fewer I/O slots, with one PCI-E slot.

Some quick tests on an Sun Fire T2000 system with well over 200 connected disks (multiple Sun 3510 storage arrays connected via multiple PCI-X and PCI-E dual port Gbit Fiber Channel adapters) indicate these systems are extremely I/O capable. The T2000 is able to sustain 1.6GB/sec of sequential disk read bandwidth doing sequential reads from raw disk devices. Running a database transactional workload, which has a random I/O profile (and small 4k I/O's), the T2000 sustains 58000 IOPS (I/O operations per second). Using a smaller I/O size for the sequential tests (8k instead of 1MB), we can sustain 120,000 IOPS on reads (just under 1GB/sec bandwidth with 8k IOs). On a combined read/write test, 60,000 read/sec and 60,000 writes/sec are sustained.

The numbers quoted above provide a solid indication that the Sun FireTMT2000 system is not just a new system with another pretty processor (the UltraSPARC T1). These systems are designed handle workloads that generate high rates of sustained I/O, making the T2000 system suitable for a broad range of applications and workloads.

[ T: ]




« July 2016