Thursday Nov 13, 2008

New Class Of Storage Systems - Sun Storage 7000 Unified Storage Systems

STK7410_Rack.pngI have been blogging for a while about Open Storage, ZFS Hybrid Storage Pools and Solid State Disks. Now the Products that combine all of those technologies are available and will disrupt the complete storage market!

You may ask why? Here are a few reasons:
  1. There is simple no price competitive system on the high end market
  2. Our system has no license fees - All inclusive incl. future features
  3. There is no other system that has in depth built-in analytics
  4. There is no other system that combines new technology SSD and traditional storage
  5. No other system has a rock solid OS like Solaris with all its features like DTRACE, Fault Management Architecture (FMA)
I could mention dozens of more reasons, but that should be already enough to seriously consider those systems in your business!

So, said enough marketing stuff, I would like to go a bit deeper and give you a short introduction of the products and its extraordinary features!

Sun Storage 7000 Unified Storage Systems

We have announced three different Unified Storage Systems for the beginning. The two smaller version are single node systems, while the 7410 can be used in a active/active cluster (2 Nodes). All systems are fully licensed and run the same OS. There are no features restrictions an the smaller systems, except the one given by the hardware configuration.

Sun Storage 7110 Unified Storage System

STK7110.pngThe 7110 is the entry level Unified Storage System. It has following hardware Specifications:
  • 14x Usable Disk Drives
  • Quad-Core Opteron
  • 8GB RAM
  • 4x 1 Gigabit Ethernet Ports
  • 6x PCI-E Slots per Node
  • 1Gb-E and 10 Gb-E Network Interface Cards
  • FC/SCSI HBA Options for Backup/Restore
Today the system is equipped with 14x 146GB 10k RPM Disks. In the future you will be able to have it also equipped with 14x 500GB SATA Disks.

The 7110 is the only system that cannot be equipped with SSDs for now. The system is perfectly suited as a workgroup storage and just uses 2u Rack Space.

Sun Storage 7210 Unified Storage System

STK7210.pngThe 7210 is the ultimate dense Unified Storage System. It doesn't only have a lot of disks but also quite some Caching and CPU power. Here are some hardware specifications:
  • 44-46x Usable Disk Drives
  • 0-2x LogZilla 18GB SSDs
  • Dual Quad-Core Opteron
  • 32GB/64GB RAM
  • 4x 1 Gigabit Ethernet Ports
  • 3x PCI-E Slots per Node
  • 1Gb-E and 10 Gb-E Network Interface Cards
  • FC/SCSI HBA Options for Backup/Restore
The 7210 is the ultimate dense storage pod! In combination with the 44TB Storage and the LogZilla write acceleration the system can provide up to 780MB/sec throughput! All of this in only 4u Rack Space!

Sun Storage 7410 Unified Storage System

T7410_Single_Node.pngThe 7410 is our highly available and performant Unfied Storage System. The 7410 System supports two configurations, a single node and a 2-node cluster for high availability. Each configuration has three levels, an Entry, Mid and High level, where the main differences are in computer power. Here is an overview of the hardware specifications:
  • Up to 576x Usable Disk Drives
  • Up to four Quad-Core Opteron per Node
  • Up to 4x LogZilla 18GB SSDs per Node
  • Up to 6x ReadZilla 100GB SSDs per Node
  • up to 128GB RAM per Node
  • 4x 1 Gigabit Ethernet Ports per Node
  • 6x PCI-E Slots per Node
  • 1Gb-E and 10 Gb-E Network Interface Cards
  • FC/SCSI HBA Options for Backup/Restore
T7410_Dual_Node.pngHave you ever seen a storage system that had 128GB Cache per Controller? We go even further by adding 600GB L2ARC Cache! So in fact if you go for the big cluster, you will have 256GB L1ARC Cache and 600GB L2ARC Cache. Again this is where we start today, imagine how much cache we will have in the future.

The 7410 is based on compute nodes (Heads) and storage nodes (JBODs). In regards to compute power you have 16 cores per head to do all storage and file system related work. In a clustered configuration you will have 32 cores that can perform in parallel (active/active)! The heads together have therefore a theoretical IO capability of more than 1.6 Mio IO/s per second!

A storage node is a 4u rack mountable chassis that can hold up to 24 disks. You can attach up to 24x storage nodes to this system which will give you a total of 576 disk drives.

The Sun Storage 7410 implements a true ZFS Hybrid Storage Pool with support for Flash-memory devices for acceleration of Reads (100GB Read Flash Accelerator, aka Readzilla) and Writes (18GB Write Flash Accelerator, Logzilla). Multiple configurations are provided on both the node configuration and the expansion array to accommodate the most demanding customer application performance requirements. You can find more details about the SSD Integration bellow in the feature section.

Extraordinary Features

Now as we have seen what hardware features the three Unified Storage Systems have, I wold like to go a bit deeper into the software features. These are in fact the features that make this products so unique and interesting!

SSD Integration / Hybrid Storage Pools

The Sun Storage 7000 system uses a Flash Hybrid Storage Pool design, which is composed of optional Flash-memory devices for acceleration of reads and writes, low-power and high-capacity enterpriseclass SATA disks, and DRAM memory. All these components are managed transparently as a single data hierarchy, with automated data placement by the file system. In the Storage 7410 model, both Write Flash Accelerator (write-optmized SSD, aka LogZilla) and Read Flash Accelerator (read-optimized SSD, aka ReadZille) are used to deliver superior performance and capacity at lower cost and energy consumption than competitive solutions. The Storage 7210 currently implements only write-optimized SSD, and the Storage 7110 does not currently implement this design.

ZFS provides two dimensions for adding flash memory to the file system stack, and improve overall system performance: the L2ARC (Level 2 ARC) for random reads, and the ZIL (ZFS Intent Log) for writes. The L2ARC (ARC is the ZFS main memory cache in DRAM) sits in between memory cache and disk drives and extends the main memory cache to improve read performance. The ZFS Intent Log uses Write-Flash SSD disks as log devices to improve write performance.

The main reason why we have chosen different SSDs (ReadZilla, WriteZilla) lays on the fact that flash based SSDs are still quite expensive and have some limitations in how they write data. The WriteZilla SSDs have a more complex controller chip that can handle thousands of write IO/s, a bigger DRAM cache and a capacitor that assures that in case of a power outage no IO gets lost between DRAM and the flash chips. WriteZilla SSDs are therefore optimized on writes while the ReadZilla SSDs are optimized on read operations.

Realtime Analytics

g20_abr_feature1_zoom.pngRealtime Analytics is one of the coolest features in this product and was only possible because Solaris has DTrace builtin. The Sun Storage 7000 Systems are equipped with Dtrace Analytics, an advanced DTrace-based facility for server analytics. DTrace Analytics provides real-time analysis of the Storage 7000 System and of the enterprise network, from the storage system to the clients accessing the data. It is an advanced facility to graph a variety of statistics in real-time and record this data for later viewing. It has been designed for both long term monitoring and short term analysis. When needed, it makes use of DTrace to dynamically create custom statistics, which allows different layers of the operating system stack to be analyzed in detail.

g20_abr_feature2_zoom.pngAnalytics has been designed around an effective performance analysis technique called drill-down analysis. This involves checking high level statistics first, and to focus on finer details based on findings so far. This quickly narrows the focus to the most likely areas.

So how does this work?

You may discover a throughput problem on your network. By selecting the interface that causes you some headache, you can drill down by protocol and even deeper onto the NFS client that causes the high load. Well we don't stop here and can drill down further to figure out what kind of files the nfs client is accessing at what latency, etc. DTrace Analytics creates datasets as you are drilling down. These datasets can be stored and reused at a later time. The analytic data is not discarded - if an appliance has been running for two years, you can zoom down to by-second views for any time in the previous two years for your archived datasets. The data is stored on a compressed file system and can be easily monitored. You can destroy datasets on demand or export them as CSV.

Other Features

The Unified Storage Systems have a lot of other features which I will cover in short.

Data Compression

Data compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. The Sun Storage 7000 System software supports 4 levels of data compression, LZJB and 3 levels of GZIP. Shares can optionally compress data before writing to the storage pool. This allows for much greater storage utilization at the expense of increased CPU utilization. In the Sun Storage 7000 family, by default, no compression is done. If the compression does not yield a minimum space savings, it is not committed to disk to avoid unnecessary decompression when reading back the data.


A snapshot is a read-only copy of a file system or volume. Snapshots can be created almost instantly, and initially consume no additional disk space within the pool. When data within the active dataset change, the snapshot consumes disk space by continuing to reference the old data and so prevents the space from being freed. Snapshots are the base for replication and just-in-time backup.

Remote Replication

The Sun Storage 7000 Remote Replication can be used to create a copy of a filesystem, group of filesystems or LUNs from any Storage 7000 System to another 7000 system at a remote location through an interconnecting TCP/IP network that is responsible for propagating the data between them. Replication transfers the data and metadata in a project and its component shares either at discrete, point in time snapshots or continuously. Discrete replication can be initiated manually or occur on a schedule of your own creation. With continuous replication, data is streamed asynchronously to the remote appliance as it's modified locally at the granularity of storage transactions to ensure data consistency. In both cases, data transmitted between appliances is encrypted using SSL.

iSCSI Block Level Access

The Sun Storage 7000 family of products act as a iSCSI target for several iSCSI hardware and software initiators. When you configure a LUN on the appliance you can specify that it is an Internet Small Computer System Interface (iSCSI) target. The service supports discovery, management, and configuration using the iSNS protocol. The iSCSI service supports both unidirectional (target authenticates initiator) and bidirectional (target and initiator authenticate each other) authentication using CHAP. Additionally, the service supports CHAP authentication data management in a RADIUS database. You can even do thin provisioning with iSCSI Luns. Means they grow on demand.

Virus Scan

This feature allows the Storage 7000 family to be configured as a client of an antivirus scan engine. The Virus Scan service will scan for viruses at the filesystem level. When a file is accessed from any protocol, the Virus Scan service will first scan the file, and both deny access and quarantine the file if a virus is found. Once a file has been scanned with the latest virus definitions, it is not rescanned until it is next modified.

NDMP Backup and Restore

Backup and restore is one of the primary goals of enterprise storage management. Backup and restores should be in a timely, secure, and cost effective manner over enterprise wide operating systems. Companies need high performance backup and the ability to back up data to local media devices. While the data itself may be distributed throughout the enterprise, its cataloging and control must be centralized. The emergence of network-attached storage and dedicated file servers makes storage management more challenging. Network Data Management Protocol (NDMP) recognizes that these issues must be addressed. NDMP is an opportunity to provide truly enterprise-wide heterogeneous storage management solutions - permitting platforms to be driven at a departmental level and backup at the enterprise level.

The Sun Storage 7000 Systems support NDMP v3 and v4

Phone-Home of Telemetry for all Software and Hardware Issues

Phone-home provides automated case opening when failures are detected in the system. This assures faster time to resolutions and reduces the time to figure out what the problem might be.

End-to-End Data Integrity and self-healing mechanisms

The Sun Storage 7000 systems include FMA (Failure Management Architecture) which provides the capability to detect and take faulty hardware components offline in order to prevent system disruption. In addition, to avoid accidental data corruption, the ZFS file system provides memory-based end-to-end data and metadata checksumming with self-healing capabilities to fix potential issues. FMA combined with ZFS data integrity facilities, make the sun Storage 7000 the most comprehensive self-healing unified storage system.


What makes this system so screaming cool? It is simply the combination off all features, starting at the hardware with the SAS protocol, the incredibly high amount of caches, the integration of SSD technology going to the soft features like real time analysis, end-to-end data integrity, FMA (Fault Management Architecture), and finally its foundation on open source technology (OpenSolaris, ZFS, and many other open source projects) that assures future innovation. Features like, encryption, de-duplication and FC-target mode are on its way. And you know what, you will get them all at no additional license cost! That is what I call investment protection.

If you don't consider these Unified Storage Systems at your next IT investment, you are simply ignoring facts and may spent far too much money for a limited featured product.

Tuesday Nov 11, 2008

ZFS and the Hybrid Storage Concept

I have spoken initially in an older blog entry "Open Storage - The (R)Evolution" about ZFS and Hybrid Storage Pools. Now I would like to dive a bit deeper into this great feature.

Hybrid Storage

ZFS is not just a filesystem. It is actually a hybrid filesystem and volume manager. These two functions are the main source of the flexibility of ZFS. Being hybrid means that ZFS manages storage differently than traditional solutions. Traditionally, you have a 1:1 mapping of filesystems to disk partitions, or alternately you have a 1:1 mapping of filesystems to logical volumes, each of which is made out of one ore more disks. In ZFS, all disks participate in one storage pool. Each filesystem can use all disk drives in a pool, and since the filesystem is not mapped to a volume, all space is shared! Space can be reserved, so that a single filesystem cannot fill up the whole pool and space reservations can be changed at will. Growing or shrinking of a filesystem isn't just painless, it is irrelevant!

zfs_hybrid_storage_model.pngThe definition of hybrid storage within ZFS goes even further! A storage pool can have more than just logical volumes or partitions. You can split the pool into three different areas:
  1. ZIL - ZFS Intend Log
  2. Read / Write Cache Pool
  3. High Capacity Pool
By using different devices for each position above, you can tremendously increase the performance of your filesystem.

ZFS Intend Log (ZIL)

All file system related system calls are logged as transaction records by the ZIL. The transaction records contain sufficient information to replay them back in the event of a system crash.

The ZIL performance is critical for performance of synchronous writes. A common application that issues synchronous writes is a database. This means that all of these writes run at the speed of the ZIL.

Synchronous writes can be quickly written and acknowledged by the "slog" in ZFS jargon to the client before the data is written to the storage pool. The slog is used only for small transactions while large transactions use the main storage pool – it's tough to beat the raw throughput of large numbers of disks. A flash-based log device would be ideally suited for a ZFS slog. Using such a device with ZFS can reduce, latencies of small transactions to a range of 100-200µs.

Read Cache Pool

How many data on your traditional storage systems are active data? 5%? 10%? Wouldn't it be nice to have a low latency solid storage that delivers you the information in time and without additional IO on your traditional storage (disks)? Is your RAM not sufficient to store all hot read data or is it too expensive to have 256GB RAM?

That is exactly where the read cache pool has it's role.

ZFS and most other filesystems use a L1ARC (Adaptive Replacement Cache) that resides in your RAM memory. The drawback of this is that it is not solid and very expensive. Not solid means after each reboot you rely for a certain time on your traditional storage until the cache has been rebuilt for optimal performance.

The people from the ZFS team have now also implemented a L2ARC that can use whatever device to improve your read performance!

The level 2 ARC (L2ARC) is a cache layer in-between main memory and the disk. It uses dedicated storage devices to hold cached data, which are populated using large infrequent writes. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include short-stroked disks, solid state disks, and other media with substantially faster read latency than disk.

Imagine a 10TB file system with a 1TB SSD L2ARC! Screaming fast!

High Capacitiy Pool

The high capacity pool now just takes care for the mass storage. You can basically go with low performing high capacity disks as most of your IO/s are being handled in the L2ARC and ZIL.

Old fashioned Storage vs the new Fashion

The following pictures illustrate the historic view of filesystems and storage versus the actual view and implementation:
Old Model
Old Model

New Model
ZFS Model
ZFS Model


By combining the use of flash as an intent-log to reduce write latency with flash as a cache to reduce read latency, we create a system that performs far better and consumes less power than a traditional system at similar cost. It's now possible to construct systems with a precise mix of write-optimized flash, flash for caching, DRAM, and cheap disks designed specifically to achieve the right balance of cost and performance for any given workload with data automatically handled by the appropriate level of the hierarchy. Most generally, this new flash tier can be thought of as a radical form of hierarchical storage management (HSM) without the need for explicit management.

ZFS allows Flash to join DRAM and commodity disks to form a hybrid pool automatically used by ZFS to achieve the best price, performance and energy efficiency conceivable. Adding Flash will be like adding DRAM - once it's in, there's no new administration, just new capability.

And do you know what? All of those features are part of our recently announced Sun Storage 7000 Unified Storage Systems!

Sunday Jul 20, 2008

Open Storage - The (R)Evolution

Why Pay more for Less?

Do you pay incredibly high license and maintenance fees for your Network Attached Storage? Are you locked into a vendor with proprietary Operating Systems and Protocols? Do you question yourself why you should pay just for using NFS, CIFS or NDMP which are standards since years?

You might answer all of the above mentioned questions with a big and bold YES. If this is the case, then keep reading this blog entry and you will see that there is an other WAY or PERSPECTIVE to go into the next decade of Open, Reliable and Fairly Priced Storage Solutions!

You will recognize that there is only one Vendor that fullfills the following topics:
  • Open Source Software and Operating System Stack
  • No proprietary hardware and drivers
  • 128Bit Transaction Oriented File System
  • Usage of fair priced SAS (Serial Attached SCSI) Connectivity
  • Hybrid Storage Concept
  • Usage of Solid State Technology to increase performance
And this vendor is SUN Microsystems!

I am now finished with the marketing part. Let's see how Sun Microsystems can help you optimize your Storage and Data Services!

The Open Storage Concept

As a general term, open storage refers to storage systems built with an open architecture, customers can select the best hardware and software components to meet their requirements. For example, a customer who needs network file services can usa an open storage filer built from a standard x86 server, disk drives, and OpenSolaris technology at fraction of the cost of a proprietary NAS appliance.

Almost all modern disk arrays and NAS are closed systems. All the components of a closed system must come from that specific vendor. Therefore you are locked into buying drives, controllers and proprietary software features at premium prices and typically you cannot add your own drivers or software to improve the functionality of this product.

The Open Storage Software

OpenSolaris is the cornerstone of Sun Open Storage offerings and provides a solid foundation as an open storage platform. The origin of OpenSolaris technology, the Solaris OS, has been in continous production since September 1991. OpenSolaris offers the most complete open source storage software stack in the industry. Below is a list of current and planned offerings:

At the storage protocol layer, OpenSolaris technology provides:
  • SCSI
  • iSCSI
  • iSNS
  • FC
  • FCoE
  • InfiniBand software
  • RDMA
  • OSD
  • SES
  • SAS
At the storage presentation layer, OpenSolaris technology offers:
  • Solaris ZFS
  • UFS
  • SVM
  • NFS
  • Parallel NFS
  • CIFS
  • MPxIO
  • Shared QFS
  • FUSE
At the storage application layer, OpenSolaris technology offers:
  • MySQL
  • Postgres
  • BerkeleyDB
  • AVS
  • SAM-FS
  • Amanda
  • Filebench

Solaris ZFS

One of the key cornerstones of Sun's open storage platform is the Solaris ZFS file system. Solaris ZFS can address 256 quadrillion zettabytes of storage and handle a maximum file size of 16 exabytes. Several storage services are included in ZFS:
  • Snapshots
  • Point-in-time copy
  • Volume management (no need for additional volume managers!)
  • Command line and GUI oriented file system management
  • Data integrity features based on copy-on-write and RAID
  • Hybrid Storage Model
Vendors of closed storage appliances typically charge customers extra software licensing fees for data management services such as administration, replication, and volume management. The Solaris OS with Solaris ZFS moves this functionality to the operating system, simplifying storage management and eliminating layers in the storage stack. In doing this, Solaris ZFS changes the economics of storage. A closed and expensive storage system can now be replaced by a storage server running Solaris ZFS, or a server running Solaris ZFS attached to JBOD.

Solaris ZFS recently won InfoWorld’s 2008 Technology of the Year award for best file system. In the InfoWorld evaluation, the reviewer stated, “Soon after I started working with ZFS (Zettabyte File System), one thing became clear: The file system of the next 10 years will either be ZFS or something extremely similar.”

ZFS Hybrid Storage Model

zfs_hybrid_storage_model.png The ZFS Storage Pools have an extreme flexibility in terms of placing data on the optimal storage devices. You can basically split a storage pool in to three different sections:
  1. The High performance Read & Write Cache Pool
  2. The high performance read & write cache pool combines the systems main memory and SSDs for read caching. As you imagine, we are using SSDs (Solid State Disks) which have a big advantage compared to RAM and traditional disks. SSDs are NOT volatile as RAM, and they are much faster than traditional disks. Therefore you don't need to first load the data into the memory to become fast! Traditionally less than 10-20% of a file system are realy used often or need high performance. Imagine that exactely this part is stored on the SSD technology. The result is a grazy fast file system ;-) You can read more about how ZFS technically does this in an other blog entry soon.

  3. ZFS Intent Log pool
  4. All file system related system calls are logged as transaction records by the ZIL. The transaction records contain sufficient information to replay them back in the event of a system crash.

    ZFS operations are always a part of a DMU (Data Management Unit) transaction. When a DMU transaction is opened, there is also a ZIL transaction that is opened. This ZIL transaction is associated with the DMU transaction, and in most cases discarded when the DMU transaction commits. These transactions accumulate in memory until an fsync or O_DSYNC write happens in which case they are committed to stable storage. For committed DMU transactions, the ZIL transactions are discarded (from memory or stable storage).

    As you must have figured out by now, ZIL performance is critical for performance of synchronous writes. A common application that issues synchronous writes is a database. This means that all of these writes run at the speed of the ZIL. The ZIL is already quite optimized, and efforts will optimize this code path even further. Using solid state disks for the log make this screaming fast!

  5. High Capacity Pool
  6. The biggest advantage of traditional HDDs is the price per capacity and density value, which is until today unbeaten for online storage. While combining different technologies within a file system, you can now choose SATA technology for the high capacity pool while not loosing performance in the overall prespective. The ZFS pool manager automatically stripes across any number of high capacity HDDs. The ZFS IO-scheduler bundles disk IO to optimize arm movement and sector allocation.
Again, I will post more details about ZFS and the Hybrid Storage Concept soon in an other blog entry.

Solaris DTrace

Solaris DTrace provides an advanced tracing framework and language that enables users to ask arbitrary diagnostic questions of the storage subsystem, such as “Which user is generating which I/O load?” and “Is the storage subsystem data block size optimized for the application that is using it?” These queries place minimal load on the system and can be used to resolve support issues and increase system efficiency with very little analytical effort.

Solaris FMA - Fault Management Architecture

Solaris Fault Management Architecture provides automatic monitoring and diagnosis of I/O subsystems and hardware faults and facilitates a simpler and more effective end-to-end experience for system administrators, reducing cost of ownership. This is achieved by isolating and disabling faulty components and then continuing the provision of service through reconfiguration of redundant paths to data, even before an administrator knows there is a problem. The Solaris OS’ reconfiguration agents are integrated with other Solaris OS features such as Solaris Zones and Solaris Resource Manager, which provide a consistent administrative experience and are transparent to applications.

Sun StorageTek Availability Suite

Sun StorageTek Availability Suite software delivers open-source remote-mirror-copy and point-in-time-copy applications as well as a collection of supporting software and utilities. The remote-mirror-copy and point-in-time-copy software enable volumes and/ or their snapshots to be replicated between physically separated servers. Replicated volumes can be used for tape and disk backup, off-host data processing, disaster recovery solutions, content distribution, and other volume-based processing tasks.

Lustre File System

Lustre is Sun’s open-source shared disk file system that is generally used for largescale cluster computing. The Lustre file system is currently used in 15 percent of the top 500 supercomputers in the world, and six of the top 10 supercomputers. Lustre currently supports tens of thousands of nodes, petabytes of data, and billions of files. Development is underway to support one million nodes and trillions of files.


Today’s digital data, Internet applications, and emerging IT markets require new storage architectures that are more open and flexible, and that offer better IT economics. Open storage leverages industry-standard components and open software to build highly scalable, reliable, and affordable enterprise storage systems.

Open storage architectures are already competing with traditional storage architectures in the IT market, especially in Web 2.0 deployments and increasingly in other, more traditional storage markets. Open storage architectures won’t completely replace closed architectures in the near term, but the storage architecture mix in IT datacenters will definitely change over time.

We estimate that open storage architectures will make up just under 12 percent of the market by 2011, fueled by the industry’s need for more scalable and economic storage.

In this blog you can find interesting content about solutions and technologies that SUN is developing.

The blog is customer oriented and provides information for architects, chief technologists as well as engineers.


« April 2014