128-bit storage: are you high?
By bonwick on Sep 24, 2004
One gentle reader offered this feedback on our recent ZFS article:
64 bits would have been plenty ... but then you can't talk out of your ass about boiling oceans then, can you?
Well, it's a fair question. Why did we make ZFS a 128-bit storage system? What on earth made us think it's necessary? And how do we know it's sufficient?
Let's start with the easy one: how do we know it's necessary?
Some customers already have datasets on the order of a petabyte, or 250 bytes. Thus the 64-bit capacity limit of 264 bytes is only 14 doublings away. Moore's Law for storage predicts that capacity will continue to double every 9-12 months, which means we'll start to hit the 64-bit limit in about a decade. Storage systems tend to live for several decades, so it would be foolish to create a new one without anticipating the needs that will surely arise within its projected lifetime.
If 64 bits isn't enough, the next logical step is 128 bits. That's enough to survive Moore's Law until I'm dead, and after that, it's not my problem. But it does raise the question: what are the theoretical limits to storage capacity?
Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2128 blocks = 2137 bytes = 2140 bits; therefore the minimum mass required to hold the bits would be (2140 bits) / (1031 bits/kg) = 136 billion kg.
That's a lot of gear.
To operate at the 1031 bits/kg limit, however, the entire mass of the computer must be in the form of pure energy. By E=mc2, the rest energy of 136 billion kg is 1.2x1028 J. The mass of the oceans is about 1.4x1021 kg. It takes about 4,000 J to raise the temperature of 1 kg of water by 1 degree Celcius, and thus about 400,000 J to heat 1 kg of water from freezing to boiling. The latent heat of vaporization adds another 2 million J/kg. Thus the energy required to boil the oceans is about 2.4x106 J/kg \* 1.4x1021 kg = 3.4x1027 J. Thus, fully populating a 128-bit storage pool would, literally, require more energy than boiling the oceans.
But we DO need ZFS or whatever you call it.
. Today, business customers care more about functionality. Low level technology such as file system is considered irrelevant.
So, I definitely don't want to change file system when I realise it does not meet my needs after 40 TB. Moving 40 TB in production, just to change file system, is a risky project. I'd even call it stupid project as I have to risk my career for delivering something that my customers have 0 clue. Can you imagine how hard it will be for CIO to explain to the board that 'why on earth we need to change file system'?
cheers! e1
Posted by iwan ang on September 25, 2004 at 02:33 AM PDT #
"You don't have to worry about the details of what's going on with your disks, your storage, or your file systems,"
"You add disks to your storage pool, file systems consume space automatically as they need it, and administrators don't have to get involved."
That sounds more like Windows than Solaris ... but I guess we'll all have to wait for the release before being able to comment.
"Dynamic striping across all devices to maximize throughput"
In general when you make a stripe it is recommended to use disks with same hardware properties; otherwise performance is degraded. ZFS stripes accross all devices and you are encouraged to add all your available devices to a big pool. Some of those devices will certainly have different properties. How can I say that I want some data striped on the speedy devices ? I still need to know where I can write at a better speed right ? What choices do you offer ?
We're all waiting for ZFS to be integrated in Solaris Express... When it is, a BigAdmin content section, similar to the other Sokaris 10 important features (Dtrace , Zones, Self Healing) would be nice.
Posted by Vlad Grama on September 25, 2004 at 07:20 AM PDT #
Posted by Wes Felter on September 26, 2004 at 08:07 AM PDT #
I want to meet the sales rep that sells all of that gear.
Posted by John Clingan on September 27, 2004 at 06:16 AM PDT #
Posted by roy walter on September 29, 2004 at 08:05 AM PDT #
Of course, the assumption that the "Moore's Law for storage" can survive that many more doublings is absurd. Conventional hard disk technology is already nearing the end of the line in terms of spot size reduction. After only 20 more doublings, you're talking about multiple bits per atom.
So, it's safe to say that a 64-bits is more than adequate for your postulated several-decade lifetime of storage systems (which is itself absurd for any value of "several" greater than two).
So, I have to ask myself, what exactly was Jonathan Schwartz talking about when he wrote: "Why I Love Working at Sun: Because I'm literally surrounded by people who think like this." http://blogs.sun.com/roller/page/jonathan/20040926#why_i_love_working_at
Posted by Michael Robinson on September 29, 2004 at 03:33 PM PDT #
Posted by Jeff Bonwick on September 29, 2004 at 04:15 PM PDT #
Ultimately, the limits of a technology are the limits of the the physics of that technology. Once you hit the end, you have to jump to something else. Sometimes that something else is a lot better, as in the jump from copper to glass, and sometimes not, as in the jump from NiCd to Li+, but in no case is a blind assumption of a "Moore's Law" continuity warranted.
In the specific case of the transition from 2D to 3D storage, not only is this work "already underway", it has been underway for over 20 years already (e.g. US Patent 4,458,345), with no viable mass-market technology yet identified. From the standpoint of physics, 3D technology is inherently constrained by optical wavelength, which in turn is inherently incapable of achieving the atomic resolution of such promising 2D technologies as IBM's AFM-based Millipede.
The limits of physics and the limits of current technology both point to the next-gen mass storage medium being 2D. If you are designing filesystems today with the assumption that you are going to have "many more orders of magnitude to play with" at some point in the next 20 years, I again have to question Schwartz' assessment.
Posted by Michael Robinson on September 29, 2004 at 11:16 PM PDT #
Posted by Robert Lunnon on September 30, 2004 at 10:08 PM PDT #
Posted by Andreas on October 01, 2004 at 03:51 AM PDT #
Posted by fx on October 01, 2004 at 12:17 PM PDT #
Posted by Drew on October 04, 2004 at 10:34 PM PDT #
Posted by George Bezel on October 05, 2004 at 03:08 AM PDT #
Posted by AP on October 05, 2004 at 03:56 AM PDT #
Posted by kevin fu on October 06, 2004 at 12:22 AM PDT #
Posted by bill B on October 06, 2004 at 07:59 AM PDT #
Posted by John's Jottings on October 17, 2004 at 09:49 AM PDT #
It is possible ofcourse to lover the voltage or to use memory which total internal energy will not change when programming, then the writing energy will be even smaller but mass is incredibly high though.
Posted by Peeter Vois on October 17, 2004 at 10:08 PM PDT #
Posted by Peter Coffee, eWEEK on October 19, 2004 at 07:08 AM PDT #
Posted by Jonathan Drain on October 24, 2004 at 03:11 AM PDT #
Posted by Wolfgang Stief on October 31, 2004 at 05:15 PM PST #
Posted by Bryan Paradis on November 28, 2004 at 05:14 PM PST #
Posted by Roland Mainz on November 30, 2004 at 09:32 AM PST #
If the Earth crust contains 7E47 atoms and each atom can hold one bit of information, that's enough for 2\^159 bits, or half a million fully populated ZFS instances.
However, I think it is more useful to focus on the limitations of 64-bit, as the whole point of going to 128-bit is that it is unthinkable it should ever constrain us. A file system with 2\^64 blocks gives us 2\^76 bits. If we use a carbon crystal, 2\^76 atoms would have a mass of 1.5 gram (7.5 carat). I'm willing to take bets that structures storing this amount of data will appear within 20 years.
Posted by Kjetil T. Homme on December 04, 2004 at 06:31 PM PST #
Posted by Bill Todd on December 20, 2004 at 06:47 AM PST #