What is a Terabyte?

While this may be surprising to some people, one discussion I seem to have more frequently than any other is one surrounding usable capacity. Not surprisingly, capacity is a key metric by which customers buy storage, but what may be surprising is how poorly it is understood. I'll attempt to demystify things a little.

Before we get too far I think it's worth explaining what a Terabyte is. I often get the question "Isn't 1TB just one trillion bytes?" That is a perfectly reasonable conclusion to draw - tera is the standard prefix for trillion in base 10. The thing we need to remember that computers don't do decimal (base 10) math like we do, they do binary (base 2) math. As a computer understands it, 1TB is 240, or 1,099,511,627,776 bytes. Similarly, 1GB is actually 230, or 1,073,741,824 bytes. 

Surely then when we buy a 1TB drive in our storage array, or from the local electronics store, it must contain 1,099,511,627,776 bytes right? Wrong. If you refer to the specifications of nearly any disk on the market today you'll see the capacity footnoted with text something like this (found on a Seagate specification sheet) "1 One gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes when referring to drive capacity". Technically this is correct, since the prefixes 'giga' and 'tera' do describe billions and trillions, but it leaves a little to be desired when we're talking in terms the computer understands. Update: It was pointed out to me that I should highlight the fact that RAM always comes in capacities based on a power of 2 as a result of the way it is addressed, meaning that this capacity difference will only ever apply to disks. 

I can't honestly pinpoint when this footnoting of capacity started happening without going over a lot of old spec sheets, but I would guess it would date back to the time when drives in the gigabyte range started to become available. Previous to that I can recall looking at the spec sheets for drives in which the number of megabytes of capacity was well documented. In fact, the geometry of the disk was described in painstaking detail including the number of spare sectors per cylinder, and the number of spare cylinders. But then, those were the days when you needed that information to use the drive.  

In any case,  it seems as though at some point marketing decided that bigger, rounder units were convenient and sexy, and as consumers we accepted this as 'close enough'. It may be due to the fact that in those days the difference was smaller (1 billion bytes vs 1GB would work out to a 70.33MB difference).

So now let's see how the gap magnifies as the scale gets bigger. The table below shows how the ratio between SI units (standard mega, giga prefixes in base 10) compares with binary units; it is borrowed from a wikipedia article here. What we can see from the table is that the ratio between SI units (standard base 10 mega, giga, tera, peta) and binary units (base 2) grows as the units grow bigger. A 1TB disk that you buy today actually contains a little over 931GB of space. If we could manufacture a 1PB disk, it would contain something like 909TB of space. 

Multiples of bytes
SI decimal prefixes IEC binary prefixes
kilobyte (kB) 103 210 0.9766 kibibyte (KiB) 210
megabyte (MB) 106 220 0.9537 mebibyte (MiB) 220
gigabyte (GB) 109 230 0.9313 gibibyte (GiB) 230
terabyte (TB) 1012 240 0.9095 tebibyte (TiB) 240
petabyte (PB) 1015 250 0.8882 pebibyte (PiB) 250
exabyte (EB) 1018 260 0.8674 exbibyte (EiB) 260
zettabyte (ZB) 1021 270 0.8470 zebibyte (ZiB) 270
yottabyte (YB) 1024 280 0.8272 yobibyte (YiB) 280

As you can see above, a new naming convention has been created to describe storage capacity in binary terms. So a terabyte is actually not a terabyte, but rather a tebibyte or TiB. The problem is that I feel like the only guy using the term; that may be why I feel like I'm having this conversation all the time.

To make matters worse, in enterprise storage systems, there is additional overhead consumed by things like storage system meta data. In some cases this additional overhead can consume more than 10% of the purchased capacity in TiB. I doubt there's any changing the disk industry at this point, but you can challenge your enterprise storage system suppliers to tell you about the usable capacity of their system in tebibytes - and when they ask what that is (because they likely will), send them here. I've helped my customers decode the real capacity of my competitors systems while those competitors struggled to accurately describe a terabyte.

Update: One of my coworkers added this colourful (yes, I'm Canadian and need to spell colour with a 'u') anecdote which illustrates a number of the things (overzealous marketing, base 10 vs base 2, and system metadata overhead) that I describe, and shows that like always - what's old will be new again.

"The practice goes back to at least the 1980s, when marketing folks tried as hard as they could to make the disks sound as big as possible. In addition to the 1000 vs 1024 malarky, we used to have "unformatted capacity" about half of the time. I remember losing bids to the competition in the 80s because we (Sun) listed our drive as 669MB while they listed what turned out to be the identical drive as 760MB. If you actually formatted the drive for use, you'd see 669 million bytes, or 638 megabytes. And then when you did newfs on top of it to put a UFS file system on it, you would run out of space at around 555 MB, because statically allocated inodes consumed ~25 MB and then we also walled off a 10% reserve.

People were really annoyed that the 760MB disk actually allowed them to store 555MB"

If you'd prefer to buy storage from a company that's transparent, look no further than Oracle. The size calculator that I maintain (the latest release as of this writing is here) for our 7000 series tells you \*exactly\* how much capacity you will be able to use when you power up the system including all overhead, and allows you to see how much capacity you will have later when you expand your system.



Simple way to show the numbers. Back to binary numbers...

Posted by Kasteler on August 27, 2010 at 05:54 AM PDT #

Similar post at storagebuddhist.wordpress.com/2010/05/23/how-many-fingers-am-i-holding-up/

Posted by Scott on August 27, 2010 at 09:43 AM PDT #

Post a Comment:
Comments are closed for this entry.

This is the weblog for Ryan Matthews, a sales consultant at Oracle specializing in the ZFS Storage Appliance. It is the home to information on sizing and much more.


« December 2016