128-bit storage: are you high?

One gentle reader offered this feedback on our recent ZFS article:

64 bits would have been plenty ... but then you can't talk out of your ass about boiling oceans then, can you?

Well, it's a fair question. Why did we make ZFS a 128-bit storage system? What on earth made us think it's necessary? And how do we know it's sufficient?

Let's start with the easy one: how do we know it's necessary?

Some customers already have datasets on the order of a petabyte, or 250 bytes. Thus the 64-bit capacity limit of 264 bytes is only 14 doublings away. Moore's Law for storage predicts that capacity will continue to double every 9-12 months, which means we'll start to hit the 64-bit limit in about a decade. Storage systems tend to live for several decades, so it would be foolish to create a new one without anticipating the needs that will surely arise within its projected lifetime.

If 64 bits isn't enough, the next logical step is 128 bits. That's enough to survive Moore's Law until I'm dead, and after that, it's not my problem. But it does raise the question: what are the theoretical limits to storage capacity?

Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2128 blocks = 2137 bytes = 2140 bits; therefore the minimum mass required to hold the bits would be (2140 bits) / (1031 bits/kg) = 136 billion kg.

That's a lot of gear.

To operate at the 1031 bits/kg limit, however, the entire mass of the computer must be in the form of pure energy. By E=mc2, the rest energy of 136 billion kg is 1.2x1028 J. The mass of the oceans is about 1.4x1021 kg. It takes about 4,000 J to raise the temperature of 1 kg of water by 1 degree Celcius, and thus about 400,000 J to heat 1 kg of water from freezing to boiling. The latent heat of vaporization adds another 2 million J/kg. Thus the energy required to boil the oceans is about 2.4x106 J/kg \* 1.4x1021 kg = 3.4x1027 J. Thus, fully populating a 128-bit storage pool would, literally, require more energy than boiling the oceans.

Comments:

Nope, we do not need 128 bit.
But we DO need ZFS or whatever you call it.
. Today, business customers care more about functionality. Low level technology such as file system is considered irrelevant.
So, I definitely don't want to change file system when I realise it does not meet my needs after 40 TB. Moving 40 TB in production, just to change file system, is a risky project. I'd even call it stupid project as I have to risk my career for delivering something that my customers have 0 clue. Can you imagine how hard it will be for CIO to explain to the board that 'why on earth we need to change file system'?
cheers! e1

Posted by iwan ang on September 25, 2004 at 02:33 AM PDT #

"You don't have to worry about the details of what's going on with your disks, your storage, or your file systems,"
"You add disks to your storage pool, file systems consume space automatically as they need it, and administrators don't have to get involved."

That sounds more like Windows than Solaris ... but I guess we'll all have to wait for the release before being able to comment.

"Dynamic striping across all devices to maximize throughput"
In general when you make a stripe it is recommended to use disks with same hardware properties; otherwise performance is degraded. ZFS stripes accross all devices and you are encouraged to add all your available devices to a big pool. Some of those devices will certainly have different properties. How can I say that I want some data striped on the speedy devices ? I still need to know where I can write at a better speed right ? What choices do you offer ?

We're all waiting for ZFS to be integrated in Solaris Express... When it is, a BigAdmin content section, similar to the other Sokaris 10 important features (Dtrace , Zones, Self Healing) would be nice.

Posted by Vlad Grama on September 25, 2004 at 07:20 AM PDT #

It seems like you distinguish between blocks and bytes only when it's convenient. AFAIK, a 64-bit filesystem like XFS uses 64-bit block numbers, which allows significantly more than 2\^64 bytes of data. But who cares; disks are cheap.

Posted by Wes Felter on September 26, 2004 at 08:07 AM PDT #

I want to meet the sales rep that sells all of that gear.

Posted by John Clingan on September 27, 2004 at 06:16 AM PDT #

Isn't that an apples-to-oranges comparison? It seems absurd to us, today, to consume the energy to boil the oceans for this purpose, but most of what we do today seemed absurd in our father's day.

Posted by roy walter on September 29, 2004 at 08:05 AM PDT #

Wes Falter is exactly right. By your very own math, we are 23 or 24 doublings (assuming 512- or 1024-byte blocks) from exhausting the 64-bit block limit.

Of course, the assumption that the "Moore's Law for storage" can survive that many more doublings is absurd. Conventional hard disk technology is already nearing the end of the line in terms of spot size reduction. After only 20 more doublings, you're talking about multiple bits per atom.

So, it's safe to say that a 64-bits is more than adequate for your postulated several-decade lifetime of storage systems (which is itself absurd for any value of "several" greater than two).

So, I have to ask myself, what exactly was Jonathan Schwartz talking about when he wrote: "Why I Love Working at Sun: Because I'm literally surrounded by people who think like this." http://blogs.sun.com/roller/page/jonathan/20040926#why_i_love_working_at

Posted by Michael Robinson on September 29, 2004 at 03:33 PM PDT #

There are two very different arguments here: the limits of physics, and the limits of current technology. The former is unavoidable absent new physics that overcomes the Planck limit. But the latter is utterly unpersuasive. I've had people tell me every year, for years, that Moore's Law was about to end. I've said the opposite, and have yet to lose the bet. Limits on spot density are fundamentally arguments about 2D storage. Once we move into 3D -- and this work is already underway -- we will get many more orders of magnitude to play with.

Posted by Jeff Bonwick on September 29, 2004 at 04:15 PM PDT #

To the contrary, these are not very different arguments. Moore's Law and relatives only apply where each generation of technology represents incremental enhancements to the physics of the previous generation (shorter wavelengths, more sensitive magnetic effects, finer laser control, etc.).

Ultimately, the limits of a technology are the limits of the the physics of that technology. Once you hit the end, you have to jump to something else. Sometimes that something else is a lot better, as in the jump from copper to glass, and sometimes not, as in the jump from NiCd to Li+, but in no case is a blind assumption of a "Moore's Law" continuity warranted.

In the specific case of the transition from 2D to 3D storage, not only is this work "already underway", it has been underway for over 20 years already (e.g. US Patent 4,458,345), with no viable mass-market technology yet identified. From the standpoint of physics, 3D technology is inherently constrained by optical wavelength, which in turn is inherently incapable of achieving the atomic resolution of such promising 2D technologies as IBM's AFM-based Millipede.

The limits of physics and the limits of current technology both point to the next-gen mass storage medium being 2D. If you are designing filesystems today with the assumption that you are going to have "many more orders of magnitude to play with" at some point in the next 20 years, I again have to question Schwartz' assessment.

Posted by Michael Robinson on September 29, 2004 at 11:16 PM PDT #

I'm not sure I agree with this magnetic storage surely can go to the magnetic domain (dipole) level and the implicit assumption in all this is that the data is digitally encoded. Presumably if the magnetic crystaline structure can be controlled well enough then values could be stored as analogue value with the magnetic twist representing the magnitude of the value. This doesn't stop with magnetic recording we seem to forget that we could encode 20Khz bandwidth onto a piece of plastic and read it back more acurately that current digital systems 50 years ago with nothing more than a pin, a magnet and a piece of wire ! Who is to say storage will be magnetic, I would expect that 3D silicon topographies will appear and likely be arranged as an analog neural net, what will be the memory capacity of a 3D chip with 100,000,000 analog neurons ? The only reason we work in the digital domain today is to minimize the power budget, not because it's the best way to do it.

Posted by Robert Lunnon on September 30, 2004 at 10:08 PM PDT #

"640 KB should be enough for everybody", right?

Posted by Andreas on October 01, 2004 at 03:51 AM PDT #

I'm not sure what you guys are talkin about, but it seems you discuss better alternatives to the roller math question to block those darn spam bots (which after a refresh doubled complexity from 0+2 to 10+8 and might therefore satisfy that law you are refering to all the time - I wouldn't bet my ass on it though. What I know is that if it's continuing that direction I won't be able to comment anylonger in the near future)

Posted by fx on October 01, 2004 at 12:17 PM PDT #

I'm glad I work for the same company as you ;-)

Posted by Drew on October 04, 2004 at 10:34 PM PDT #

This entire post reeks of BS and engineers pretending they're smarter than they really are. Kudos to Michael Robinson and Wes Felter for calling Bonwick on this hogwash. You did a good job of proving your "gentle reader" right—you do talk out of your ass.

Posted by George Bezel on October 05, 2004 at 03:08 AM PDT #

Why is this even an issue? Who cares whether the file system size is expressed as a 64 or 128 bit number? It makes no difference at all, not to me anyway. So let's just accept that 128 is the new number from now on, and move on with life, knowing that this way we NEVER ever have to worry about file system sizes again. I guess my point is that the discussion is kind of pointless.

Posted by AP on October 05, 2004 at 03:56 AM PDT #

To Wes Falter and roy walter and others who believe 64-bit file system is enough for 20+ years, by the same math, theoretically, you only need boil about 475 gallon water to fullfill the 64-bit file system! A household water heater can do that! The problem is, when you are talking about 64-bit file system, the addressable space is 2\^64 Blocks (a block is usually 512 bytes) , so the information stored in that file system is 2\^75 Bits, but don't forget apply the same rule to today's fact, now we have Petebytes (2\^50 BYTES) storage available on market, that is 2\^62 bits information, given the Moore's law, whether you like or not, a 64-bit file system will become short after 13-14 years! In short, when talking about file system, 64-bit is used for addressing, which means it can hold 2\^64 (or 2\^63) Bytes infomration, while when taling about the mass to hold the information, you have to change the Byte to Bit, but don't forget change both (2\^64 Bytes file system vs. today's Pete Bytes or 2\^50 Bytes storage system). Last, the water heater thing: for a 64-bit file system, 2\^76 bits/10\^31 bits/Kg =0.00000000756 Kg= 7.56 x 10\^-9 Kg = 0.00000756 g, when pushing to the limits of physic we know today, by E=mc\^2, that is 6.7 x 10\^8 J, when heating 1Kg water from freeze to boiling, you need 40,000 J, that translate to about 1800 Kg water, or about 1.8 cubic meter water or 475 gallon water.

Posted by kevin fu on October 06, 2004 at 12:22 AM PDT #

Aren't filesystems usually block addressable not byte? Thus 2\^64 is actually another 9 doublings away? Even with doubling annually the entire earths production of disk drives seems a long ways from 2\^65 blocks. How big is a ZFS block anyways (512 bytes?)?

Posted by bill B on October 06, 2004 at 07:59 AM PDT #

[Trackback] Jeff Bonwick of Sun proves he can in fact talk out of his ass about boiling the ocean. Turns out it would take 3.4×1027 J to actually do it. I wonder how much asphalt is used when paving the cow path?...

Posted by John's Jottings on October 17, 2004 at 09:49 AM PDT #

What about the classical information system where the bit is written by adding to the electron an energy 1eV? This can be as real system as single electron transistor: has it charge or not. 1eV = 1.6E-19 Joules (J). 2\^140 bits makes then 2.2E23 J which is littlebit smaller than 1.6E28 J. If the system would consist only from electrons, then it's mass would be 1.3E12 kg. Such a memory is ofcourse not practical but shows that information can be written with less energy than using big bang memory.
It is possible ofcourse to lover the voltage or to use memory which total internal energy will not change when programming, then the writing energy will be even smaller but mass is incredibly high though.

Posted by Peeter Vois on October 17, 2004 at 10:08 PM PDT #

I guess I reveal my origin as a civil engineer, and not a quantum physicist, but it always seemed to me that bits per atom (you know, something you can imagine putting in one place rather than another) were the obvious reality check for how much data we'll ever be able to store. My own rough estimate, based on the mass and the abundance of different elements in the earth's crust, suggests that it has about 7e47 atoms. I don't believe for a minute that we'll store more than one bit per atom, or that we'll mine the entire crust of the planet to turn it into storage media. Retaining less stuff, not storing more, therefore has to become the core strategy by 2100 at the latest, based on the UC Berkeley studies that estimate data production rates currently doubling almost every year. - Peter Coffee, eWEEK

Posted by Peter Coffee, eWEEK on October 19, 2004 at 07:08 AM PDT #

Your ideas are intriguing to me and I wish to subscribe to your newsletter.

Posted by Jonathan Drain on October 24, 2004 at 03:11 AM PDT #

Hello Jeff, I can't find a mail address of you on your blog, so I ask you via the comment function in the hope you check frequently for new comments: I'm working for a german Sun partner and would like to put a german translation of your nice ZFS calculation onto my blog. Of course, I will link back to your original entry here. Are there any constraints about this from your side? You can contact me by mail. Thank you. wolfgang

Posted by Wolfgang Stief on October 31, 2004 at 05:15 PM PST #

I find this quite an interesting topic. Being 16 I do not fully follow all your jargon but I do follow the concepts. Reading alot of this blog has acually inspired me somewhat again to relight the smoldering candle of my love for science. Thank you

Posted by Bryan Paradis on November 28, 2004 at 05:14 PM PST #

The calculation is interesting... but who says that information must be bound to matter ? You can "simply" store information in photons, let them cycle in a ring and only have to find a way to "refresh" the photons in a way which doesn't alter the information... :)

Posted by Roland Mainz on November 30, 2004 at 09:32 AM PST #

If the Earth crust contains 7E47 atoms and each atom can hold one bit of information, that's enough for 2\^159 bits, or half a million fully populated ZFS instances.

However, I think it is more useful to focus on the limitations of 64-bit, as the whole point of going to 128-bit is that it is unthinkable it should ever constrain us. A file system with 2\^64 blocks gives us 2\^76 bits. If we use a carbon crystal, 2\^76 atoms would have a mass of 1.5 gram (7.5 carat). I'm willing to take bets that structures storing this amount of data will appear within 20 years.

Posted by Kjetil T. Homme on December 04, 2004 at 06:31 PM PST #

While the debate about the adequacy of 64-bit file systems is seductive, I'm actually more interested in other aspects of ZFS which so far I have not yet found documented (thanks for any pointers you might provide). For example, supplementary checksums for stored data are fairly old hat in serious storage environments, but mechanisms that can detect silent disk write failures and/or the effects of 'wild' disk writes could qualify as reasonably innovative. As others have observed, a '64-bit file system' often can store far more than 2\^64 bytes of data. In fact, even today one can argue (on the basis of seek-vs.-transfer latency characteristics, though that ignores the additional bus and memory activity involved) that disk block sizes below 64 KB are of marginal utility, which would yield a total size for a 2\^64 block-addressable file system of 2\^80 bytes (though any individual file would still be limited to 2\^64 bytes) - and by the time file systems approaching this size start to exist, it's likely that the size of the smallest data block will have increased substantially as well (pushing up the system capacity even farther). Given that gluing multiple file systems together under a common name space is also readily done, I don't find your argument for expansion to 128 bits persuasive - unless you contend that 2\^64 bytes will start to become significantly constrictive for a single file, and even then I'm a bit skeptical that the frequency of such occurrences would justify the change in architecture, given the ease with which applications have managed to deal with the limits of 32-bit file systems when they have encountered them. Another poster here observed that we're already encountering the limits of our ability to handle the amount of data we have. Unless you foresee something comparable to the data explosion which has accompanied the adoption of digital video in the near future (Star-Trek-style transporter scanning technoloogy is about the only thing that comes immediately to mind, and I'm not holding my breath waiting for it), long before anything like current use starts to gobble up anything like 2\^70+ bytes of data we'll have long since lost any ability to make sense of it. - bill

Posted by Bill Todd on December 20, 2004 at 06:47 AM PST #

Post a Comment:
Comments are closed for this entry.
About

bonwick

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today