The General-Purpose Storage Revolution

It happened so slowly, most people didn't notice until it was over.

I'm speaking, of course, of the rise of general-purpose computing during the 1990s.  It was not so long ago that you could choose from a truly bewildering variety of machines.  Symbolics, for example, made hardware specifically designed to run Lisp programs.  We debated SIMD vs. MIMD, dataflow vs. control flow, VLIW, and so on.  Meanwhile, those boring little PCs just kept getting faster.  And more capable.  And cheaper.  By the end of the decade, even the largest supercomputers were just clusters of PCs. A simple, general-purpose computing device crushed all manner of clever, sophisticated, highly specialized systems.

And the thing is, it had nothing to do with technology. It was all about volume economics.  It was inevitable.

With that in mind, I bring news that is very good for you, very good for Sun, and not so good for our competitors:  the same thing that happened to compute in the 1990s is happening to storage, right now. Now, as then, the fundamental driver is volume economics, and we see it playing out at all levels of the stack: the hardware, the operating system, and the interconnect.

First, custom RAID hardware can't keep up with general-purpose CPUs. A single Opteron core can XOR data at about 6 GB/sec.  There's just no reason to dedicate special silicon to this anymore.  It's expensive, it wastes power, and it was always a compromise: array-based RAID can't provide the same end-to-end data integrity that host-based RAID can. No matter how good the array is, a flaky cable or FC port can still flip bits in transit.  A host-based RAID solution like RAID-Z in ZFS can both detect and correct silent data corruption, no matter where it arises.

Second, custom kernels can't keep up with volume operating systems. I try to avoid naming specific competitors in this blog -- it seems tacky -- but think about what's inside your favorite storage box. Is it open source?  Does it have an open developer community? Does it scale?  Can the vendor make it scale?  Do they even get a vote?

The latter question is becoming much more important due to trends in CPU design.  The clock rate party of the 1990s, during which we went from 20MHz to 2GHz -- a factor of 100 -- is over.  Seven years into the new decade we're not even 2x faster in clock rate, and there's no sign of that changing soon.  What we are getting, however, is more transistors.  We're using them to put multiple cores on each chip and multiple threads on each core (so the chip can do something useful during load stalls) -- and this trend will only accelerate.

Which brings us back to the operating system inside your storage device. Does it have any prayer of making good use of a 16-core, 64-thread CPU?

Third, custom interconnects can't keep up with Ethernet.  In the time that Fibre Channel went from 1Gb to 4Gb -- a factor of 4 -- Ethernet went from 10Mb to 10Gb -- a factor of 1000.  That SAN is just slowing you down.

Today's world of array products running custom firmware on custom RAID controllers on a Fibre Channel SAN is in for massive disruption. It will be replaced by intelligent storage servers, built from commodity hardware, running an open operating system, speaking over the real network.

You've already seen the first instance of this: Thumper (the x4500) is a 4-CPU, 48-disk storage system with no hardware RAID controller. The storage is all managed by ZFS on Solaris, and exported directly to your real network over standard protocols like NFS and iSCSI.

And if you think Thumper was disruptive, well... stay tuned.

Comments:

Jeff: We (as in our company) are eagerly waiting for this storage revolution from Sun. I keep asking our Sun reps why I should shell out obscene amount of dollars for a Sun StorageTek 3510 FC Array or other HW RAID products when I can just use a JBOD and ZFS. -- pj

Posted by guest on April 09, 2007 at 11:26 PM PDT #

Now you just need to get 3 people to put a full and production-quality NFSv4 client implementation into Linux. It's always shocked me how little Sun does toward the NFS client end. Do you want us to use your now-barely-not-crappy network filesystem via your company's server hardware or not? AFS for 15 years now.

Posted by Jeff on April 10, 2007 at 12:43 AM PDT #

As i've worked on both the linux and opensolaris NFS client, i would disagree with your assertions about the opensolaris NFS client not up to snuff.

I would definitely agree that the linux NFS implementation has greatly improved over the past several years, but i would still take the opensolaris NFS client over linux.

That being said, any issues you have please post to nfs-discuss@opensolaris.org to get them resolved. Interoperability is NFS's best friend.

Posted by eric kustarz on April 10, 2007 at 02:07 AM PDT #

I agree that SAN is ripe for a disruption, but I don't expect Sun to be able to capitalize on new trends in storage. Sun used to make and sell excellend JBODs, the SPARCstorage array and A5000. And what has happened to that?

So perhaps 3par Data is set to capitalize on new trends, but I'm not so sure about Sun. There's a reason why the founders of 3par escaped Sun, why they were able to skim the cream of Sun's storage engineers. Sun's culture is synonymous with immense lumbering bureaucracy.

The concept of a filer endured at Sun for years, all the way through Netra NFS (it survived in Network Technologies, away from Storage and SMCC). If that is recovering, it's a good thing. But don't call it an "intelligent storage server", that term means a stock Solaris server with Encore-derived software and expensive SAN-based back-end.

I worked on an attempt to use a stock server for a storage server, but that was just silly. A general purpose server is not balanced right, it has way too much CPU and way too little I/O. So, to get the right I/O capacity you have to budget a far more powerful (and expensive) general purpose platform than if you were using a specially designed platform. It is never cost-effective. The NFS server only makes sense because NFS is CPU intensive. But for block level storage it is much better to plug storage directly into the server. And for Sun's sake, that better be a Solaris server.

BTW, the argument you're making about the "good use" of a multicore is actually an argument for a HW RAID. If the CPU power growth stumbles, suddenly the off-load becomes far more attractive. Mainframers with their pitiful CPUs learned it decades ago. Just ask Brian Wong. This actually works into hands of EMC, NetApps, Avaya etc.

Posted by Pete Zaitcev on April 10, 2007 at 07:36 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

bonwick

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today