Tuesday Mar 17, 2009

Open Flash

Open_Flash_Module.pngOur new Open Flash Module is the world's first enterprise-quality, open-standard Flash design. Built to an industry-standard JEDEC form factor, the module is being made available to developers and the OpenSolaris Storage community to foster Flash innovation. The Open Flash Module delivers unprecedented I/O performance, saves on power, space, and cooling, and will enable new levels of server optimization and datacenter efficiencies.

Imagine what you can do with such a Flash Module!
  • Instead of having Servers that only use RAM to increase their performance, you can now build a Server that combines RAM and SSD in one. The big advantage is certainly the Solid State of the Flash Module. With this advantage you can use the Flash Module for caching write transactions or anything else that needs Solid Storage.
  • Or, use the module in traditional FC Array Controllers as a backup of the traditional RAM Cache. If the controller looses power, the RAM based Cache can be written to Flash in seconds, and therefore have a kind of Hybernate in a Array ;-)
  • You can also use the module in traditional Array Controllers as an extension of the RAM Cache. Very similar to the ZFS Hybrid Storage Pool Model, Raid Controllers could implement a tiered cache model. In this case any application or file system (that is not yet as innovative as ZFS) could profit from the combination of RAM and Flash!
  • Additionally to RAM, Notebooks could have a Flash DIMM Slot. On this Flash DIMM the complete operating system and also the active data you are working with is stored in a tiered storage approach. That allows the system to spin down your high capacity harddrive and save a lot of power!
  • Compute nodes in clusters could dramatically expand their work storage with Flash DIMMs. Imagine 8x Flash DIMMS in the initial specification would provide 192GB of quite fast compute memory.
In many situations the performance of a Flash DIMM is sufficient. Considering the higher density (24GB DIMM) and the lower power consumption than a RAM Module, this product is really powerful and interesting!

Initial Specifications

  • 24 GB initial capacity
  • 64 MB DRAM buffer cache

Form factor
  • JEDEC MO-258A
  • 3 Gb SATA-II/SAS-I

  • 7 x 24 x 3 years (100% write duty cycle)
  • Designed for enterprise-class applications

Want to know more?

See what Chief Architect Andy Bechtolsheim says about SSDs and Open Storage.

Product Information

To get more information about Flash Storage and the Open Flash Module, use the following links:

Tuesday Nov 11, 2008

ZFS and the Hybrid Storage Concept

I have spoken initially in an older blog entry "Open Storage - The (R)Evolution" about ZFS and Hybrid Storage Pools. Now I would like to dive a bit deeper into this great feature.

Hybrid Storage

ZFS is not just a filesystem. It is actually a hybrid filesystem and volume manager. These two functions are the main source of the flexibility of ZFS. Being hybrid means that ZFS manages storage differently than traditional solutions. Traditionally, you have a 1:1 mapping of filesystems to disk partitions, or alternately you have a 1:1 mapping of filesystems to logical volumes, each of which is made out of one ore more disks. In ZFS, all disks participate in one storage pool. Each filesystem can use all disk drives in a pool, and since the filesystem is not mapped to a volume, all space is shared! Space can be reserved, so that a single filesystem cannot fill up the whole pool and space reservations can be changed at will. Growing or shrinking of a filesystem isn't just painless, it is irrelevant!

zfs_hybrid_storage_model.pngThe definition of hybrid storage within ZFS goes even further! A storage pool can have more than just logical volumes or partitions. You can split the pool into three different areas:
  1. ZIL - ZFS Intend Log
  2. Read / Write Cache Pool
  3. High Capacity Pool
By using different devices for each position above, you can tremendously increase the performance of your filesystem.

ZFS Intend Log (ZIL)

All file system related system calls are logged as transaction records by the ZIL. The transaction records contain sufficient information to replay them back in the event of a system crash.

The ZIL performance is critical for performance of synchronous writes. A common application that issues synchronous writes is a database. This means that all of these writes run at the speed of the ZIL.

Synchronous writes can be quickly written and acknowledged by the "slog" in ZFS jargon to the client before the data is written to the storage pool. The slog is used only for small transactions while large transactions use the main storage pool – it's tough to beat the raw throughput of large numbers of disks. A flash-based log device would be ideally suited for a ZFS slog. Using such a device with ZFS can reduce, latencies of small transactions to a range of 100-200µs.

Read Cache Pool

How many data on your traditional storage systems are active data? 5%? 10%? Wouldn't it be nice to have a low latency solid storage that delivers you the information in time and without additional IO on your traditional storage (disks)? Is your RAM not sufficient to store all hot read data or is it too expensive to have 256GB RAM?

That is exactly where the read cache pool has it's role.

ZFS and most other filesystems use a L1ARC (Adaptive Replacement Cache) that resides in your RAM memory. The drawback of this is that it is not solid and very expensive. Not solid means after each reboot you rely for a certain time on your traditional storage until the cache has been rebuilt for optimal performance.

The people from the ZFS team have now also implemented a L2ARC that can use whatever device to improve your read performance!

The level 2 ARC (L2ARC) is a cache layer in-between main memory and the disk. It uses dedicated storage devices to hold cached data, which are populated using large infrequent writes. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include short-stroked disks, solid state disks, and other media with substantially faster read latency than disk.

Imagine a 10TB file system with a 1TB SSD L2ARC! Screaming fast!

High Capacitiy Pool

The high capacity pool now just takes care for the mass storage. You can basically go with low performing high capacity disks as most of your IO/s are being handled in the L2ARC and ZIL.

Old fashioned Storage vs the new Fashion

The following pictures illustrate the historic view of filesystems and storage versus the actual view and implementation:
Old Model
Old Model

New Model
ZFS Model
ZFS Model


By combining the use of flash as an intent-log to reduce write latency with flash as a cache to reduce read latency, we create a system that performs far better and consumes less power than a traditional system at similar cost. It's now possible to construct systems with a precise mix of write-optimized flash, flash for caching, DRAM, and cheap disks designed specifically to achieve the right balance of cost and performance for any given workload with data automatically handled by the appropriate level of the hierarchy. Most generally, this new flash tier can be thought of as a radical form of hierarchical storage management (HSM) without the need for explicit management.

ZFS allows Flash to join DRAM and commodity disks to form a hybrid pool automatically used by ZFS to achieve the best price, performance and energy efficiency conceivable. Adding Flash will be like adding DRAM - once it's in, there's no new administration, just new capability.

And do you know what? All of those features are part of our recently announced Sun Storage 7000 Unified Storage Systems!

Monday Aug 25, 2008

Why you should avoid placing SSDs in traditional Arrays!

Performance_Car.pngSome vendors are announcing SSDs for their traditional arrays in the midrange and high end sector.

This is quite surprising to me, as it is comparable to place a 8-cylinder bi-turbo engine with 450HP into an entry level car (try to avoid to use any brands ;-)).

You might ask for an explanation? Here it is:

Traditional midrange arrays are developed to handle hundreds of traditional (15k RPM) harddisk drives. A traditional harddisk is capable of running about 250 IO/s. Now if we compare this with the actual enterprise class Solid State Disks available on the market, a single solid state disk can do about 50k IO/s read or 12k IO/s write. So in fact it is about 100x faster than a 15k RPM harddisk.

The controller of a midrange array system can probably do about 500k IO/s against it's internal cache. So in fact if we place about 10x Solid State Disks into such a storage system, it would simple consume the complete power of the controller. I didn't even start talking about RAID functionality!!!!

There is another major reason that makes such solutions ridiculous! It is the lantency you are adding by using FC networks. While a traditional harddisk works with a latency of about 3,125us (3.1ms), an enterprise class Solid State Disk works with a latency of less than 100us (0.1ms). By using FC, you might loose 1 IO/s with a traditional disk drive by adding the overhead of switches, cable length and array controllers! With a SSD and a latency of less than 100us, the overhead can end up at loosing 10'000 IO/s in the read performance.

So, where do I place the SSD technology?

The answer is simple!


The best protocol and technology today is SAS (Serial Attached SCSI). The only limitation of SAS is the cable length as it is limited to about 8m, but there is no additional protocol overhead as on FC!

There are two ways to implement SAS attached SSDs.
  1. Directly in a Server, as most of the servers anyway uses SAS attached internal harddisks.
  2. Attached via SAS JBOD (Just a Bunch Of Disks) if you need more disks than a server could cover.
You might also ask how to implement the SSD technology in the most cost effective way?

Thats where most vendors have to stop as they have no solution or good answer.

Sun's ZFS is exactely the product that is capable of using all the benefits of SSDs in combination with the benefits of traditional storage (DENSITY). Combining the two technologies within one file system provides performance AND density under one umbrella. The magic word is Hybrid Storage Pool.


While the slow part of ZFS (density) remains on traditional fibre channel storage arrays, the important parts (performance) like ZIL (ZFS Intend Log) and L2ARC (Level 2 Adaptive Replacement Cache) remains on SSD technology.

Monday Jul 28, 2008

NAND Flash based SSDs


What is Flash?

Flash memory is non-volatile computer memory that can be electrically erased and reprogrammed. The Flash technology is primarly used in memory cards and USB flash drives for general storage and transfer of data between coputers and other digital products. It is a specific type of EEPROM (Electrically Erasable Programmable Read-Only Memory) that is erased and (re)programmed in large blocks. In the early flash products, the entire chip had to be erased at once. Flash memory costs far less than byte-programmable EEPROM and therefore has become the dominant technology wherever a significant amount of non-volatile storage is needed.

Flash memory needs no power to maintain the information that is stored on the chip. In addition, flash memory offers fast read access times and better shock resistance then hard disks. Flash is able to withstand intense pressure, extremes of temparature and even immersion in water.

What is NAND?

The NAND flash architecture was introduced in 1989. These memories are accessed much like block devices such as hard disks or memory cards. Each block consists of a number of pages. The pages are typically 512, 2024 or 4096 bytes in size. Associated with each page are a few bytes (typically 12-16 bytes) that should be used for storage of an error detection and correction checksum.

While programming is performed on a page basis, erasure can only be performed on a block basis. Another limitation of NAND flash is data in a block can only be written sequentially. Number of operations (NOPs) is the number of times the sectors can be programmed. So far this number for MLC (Multy Level Cell) flash is always one whereas for SLC (Single Level Cell) flash it is 4.

NAND devices also require bad block management by a separate controller chip. SD cards, for example, include controller circuitry to perform bad block management and wear leveling.


MLC (Multi Level Cell) NAND flash allows each memory cell to store 2 bits of information, compared to 1 bit-per-cell for SLC NAND flash, resulting in a larger capacity and lower bit cost. As a rule of thumb, MLC devices are available at twice the density of SLC devices of the same flash technology. Mature and proven, MLC technology is generally used in cost-sensitive consumer products such as cell phones and memory cards.

A significant portion of the NAND flash-based memory cards on the market today are made from MLC NAND, and the continuing rapid growth of this market can be considered an indication that the performance is meeting consumers' needs. Although the use of MLC technology offers the highest density (and the lowest cost), the tradeoff compared to single-bit-per cell is lower performance in the form of slower write (and potentially erase) speeds, as well as reduced write/erase cycling endurance.

Also, because of the storage of 2 bits per cell, the probability of bit error is higher than for SLC technology. However, this is partially compensated for by using error detection and correction codes (EDC). System designers have long been aware of the benefits of using EDC to detect and correct errors in systems using Hamming codes (common in memory subsystems) and Reed Solomon codes (common in hard drives and CD-ROMs).

SLC NAND is generally specified at 100,000 write/erase cycles per block with 1-bit ECC. MLC is generally specified at 10,000 cycles with ECC. While the datasheet for the MLC device does not specify the level of ECC required, the MLC manufacturers recommend 4-bit ECC when using this technology. Therefore, when using the same controller, a storage device using SLC will have an endurance value roughly 10 times that of a similar MLC-based product.

The following table shows the advantages and disadvantages of SLC Flash and MLC Flash:
When we talk about Enterprise Class Flash Storage, we cleary talk about SLC based NAND Solid State Disks!

Why Solid Stated Disks?

One of the biggest advantages of Flash Based SSDs is their latency. The performance of Flash is a bit unusual as it is highly asymmetric. A block of flash must be erased before it can be written, which takes on the order of 1-2ms for a block. Writing to an erased flash requires around 200-300us. Most of the flash based disks try to maintain a pool of prefiously erased blocks, so that the latency of a write is just that of teh program operation. Read operations are much faster, 25-30us for 4k. Flash based SSDs also use internal DRAM Memory to assure a good write performance. The RAM is protected with a capacitor to avoid data losses on power outages. A capacitor doesn't require any maintenance.

Conventional storage solutions mix dynamic memory (DRAM) and hard drives; flash is interesting because it falls in a sweet spot between those two components for both cost and performance in that flash is significantly cheaper and denser than DRAM and also significantly faster than disk. Flash accordingly can augment the system to form a new tier in the storage hierarchy – perhaps the most significant new tier since the introduction of the disk drive with RAMAC in 1956.

ZFS by Sun Microsystems has been optimized to manage Flash SSD systems, both as cache as well as main storage facilities, available for OpenSolaris, Mac OS X, and the Linux operating system.

Business Cases for Flash Based SSDs

Bellow you can find a few business cases, where flash clearly is usefull and probably the technology of choice in the future:
  • As a LOG device for Databases, like Oracle Redo-Logs
  • As an extended, huge and solid cache for ZFS
  • As a ZIL (ZFS Intend Log) device of ZFS, which is similar to Redo-Logs in Oracle
  • As a MetaData device for SAM-QFS. QFS can separate the MetaData from real data. This improves the common file system operations like ls, find, by factors. It also improves the "Directory Name Lookup Cache" by factors on huge file systems (above 10Mio Files) as the traditional RAM wont be sufficient.
  • As a storage device for immense data transactions or databases with hughe amount of transactions per second; mostly WRITE as for read you can also use the RAM of the server

In this blog you can find interesting content about solutions and technologies that SUN is developing.

The blog is customer oriented and provides information for architects, chief technologists as well as engineers.


« July 2016