The Year of Open Storage

If you click on the "Resources" tab on the National Science Foundation's TeraGrid User Portal you will quickly notice that TACC's Ranger supercomputer has 2x the CPU power, 2x the memory, and 2x the storage of the previous 17 NSF TeraGrid funded systems combined. This rapid advance in supercomputer power has been enabled, in part, by open computing. The Ranger system uses the same AMD Opteron quad core CPUs that you will very shortly be able to buy from every major vendor, the same type of memory that goes into over 7 million x64 servers a year, and disk drives that are sold by the millions, all used in conjunction with open source software like Sun Grid Engine which allows a single compute job to be used by 16 or 16,000 or more of Ranger's approximately 60,000 processor cores. While the more powerful blade servers in Ranger's Sun Blade 6048 chassis cost slightly more, you can buy similar technology from Sun starting from around $1000 such as our Sun Fire x2100. Even computer companies much larger than Sun that make supercomputers based on non industry standard processors and closed-source software can't possibly compete on price because they will never have the volumes that AMD or Intel have with their x64 processors. That is why 406 of the top 500 supercomputers in the world are based on the same cluster architecture as Ranger and a similar number of the top 500 are based on Intel or AMD x64 CPU architectures.

So what does that have to do with Open Storage? Storage, including many of Sun's own Storagetek products as well as storage products from virtually all major storage vendors have been based on commodity disk drives, like the ones used in Ranger, but built into proprietary storage arrays, that ran non open source software, and used other proprietary components. Sun's Thumper x4500 storage server (the ones that delivery to Ranger over 35 GB/second using the Sun Lustre file system and 72 low cost arrays with 1.7 PB of capacity) was the first in a new breed of open storage devices, combining the same commodity CPU, memory, and disk drives used in clusters and in fact your everyday server like the x2100. Just how low cost is the x4500? Well, if you are an eligible academic institution, you can buy a 48 TB x4500 for under $25,000 in the US via the Sun Education Essentials Matching Grant Program. Sorry, commercial customers pay a bit more, but we all remember what it was like to be a starving student. So what's next with Open Storage.

The next evolution in Open Storage will be driven by software. Rather than running limited functionality proprietary software, the x4500 run's the open source Solaris operating system. We've created an entire Open Storage Community around Open Solaris, so anyone can write new storage features and quickly and easily deliver them to a wide user base. Or vica versa, you could develop a new storage device and use our Open Storage software to drive it, without writing a single line of your own code. This week, we made another major contribution to the Open Storage community by delivering on our pledge to open source Sun's SamFS and QFS software. This is the software that lets TACC backup Ranger's 1.7 PB of storage to several multi-PB Sun StorageTek SL8500 modular tape libary systems. If you are not familiar with SamFS or QFS, read Margaret's Blog to understand the importance of this latest open source project.

So with an ever growing collection of Open Storage Software available to Sun StorageTek engineers to work with (and since it is open source, engineers from other storage companies too) I predict we will see a dramatic change in range of storage products entering the market this year. It took nearly a decade from the time the first open source cluster system made its way onto the Top500 list to reach the 80%+ share that clusters have today. How long will it take for 80% of the world's storage to reside on new classes of open storage devices? My guess, much less than 10 years. That is because storage capacity has been growing even faster than compute capacity. And just like there is no way that the NSF could have afforded to pay 2x or even 1x what it paid for the last 17 TeraGrid systems combined for Ranger, there is no way that TACC or Facebook or Bank of America can afford to keep all the data they will generate in the coming years on traditional, proprietary, high cost storage devices.

Congrats to the SamFS and QFS team on their Open Source release, and welcome to the new world of Open Storage.


Post a Comment:
Comments are closed for this entry.



« July 2016