Wednesday Aug 31, 2005

More on Blocks

A few weeks ago I was blogging about how block protocols like SCSI were designed around the on-disk sector format and limited intelligence of 1980's disk drives. Clearly, if we were starting from a clean sheet of paper today to design a storage interconnect for modern intelligent storage devices, this is NOT the protocol we would create.

The real problem though isn't just that the physical sector size doesn't apply to today's disk arrays. The problem today has more to do with the separation of the storage and the application/compute server. Storage in today's data-centers sits in storage servers, typically in the form of disk arrays or tape libraries which are available as services on a network, the SAN, and used by some number of clients - the application/compute servers. As with any server, you would like some guarantee of the level of service it provides. This includes things like availability of the data, response time, security, failure and disaster tolerance, and a variety of other service levels needed to insure compliance with laws for data retention and to avoid over-provisioning.

The block protocol was not designed with the notion of service levels. When a data client writes a collection of data, there is no way to specify to the storage server what storage service level is required for that particular data. Furthermore, all data gets broken into 512-byte blocks so there isn't even a way to identify how to group blocks that require a common service level. The workaround today is to use a management interface to apply service levels at the LUN level which is at too high a level and leads to over-provisioning. This gets really complicated when you factor in Information Lifecycle Management (ILM) where data migrates and gets replicated to different classes of storage. This leads to highly complex management software and administrative processes that must tie together management APIs from a variety of storage servers, operating systems, and database and backup applications.

If we were starting from a clean sheet of paper today to design a storage interconnect we would do a couple of things. One, we would use the concept of a variable sized data Object that allows the data client to group related data at a much finer granularity then the LUN. This could be an individual file, or a database record, or any unit of data that requires a consistent storage service level and that migrates through the information lifecycle as a unit. Second, each data object would include metadata - the information about the object that identifies what service levels, access rights, etc are required for this piece of data. This metadata stays with the data object as it migrates through its lifecycle and gets accessed by multiple data clients.

Of course there are some things about today's block protocols we would retain such as the separation of command and data. This allows block storage devices and HBAs to quickly communicate the necessary command information to set up DMA engines and memory buffers to subsequently move data very efficiently.

Key players in the storage industry have created just such a protocol in the ANSI standards group that governs the SCSI protocol. The new protocol is called Object SCSI Disk (OSD). OSD is based on variable-sized data object which include metadata and can run on all the same physical interconnects as SCSI including parallel SCSI, Fibre Channel, and ethernet. With the OSD protocol, we now have huge potential to enable data clients to specify service levels in the metadata of each data object and to design storage servers to support those service level agreements.

I could go on for many pages about potential service levels that can be specified for data objects. They cover performance, insuring the right availability, security, including access rights and access logs, compliance with data retention laws, and any storage SLAs a storage administrator may have. I'll talk more about these in future blogs.

Wednesday Aug 10, 2005

Sun Makes Strong Showing at iSCSI Plugfest

SAN JOSE - Sun made a strong showing at last week's iSCSI Plugfest in San Jose. Sun brought along Sparc and x64 servers running advance copies of Solaris 10 Update 1 which includes their new iSCSI initiator driver. Although update 1 is not released yet, it's clear that the iSCSI stack in update 1 is a mature driver ready for the most demanding workloads. Sun also brought its automated Java Interoperable Storage Test Suite (JIST). JIST runs an extensive suite of hundreds of protocol compliance tests that fully exercise Fibre Channel and iSCSI storage devices.

Sun successfully ran its iSCSI driver and test suite against arrays from various vendors. Solaris ran well and the test suite found a variety of incompatibilities in some of the arrays during boundary and error-case testing. There were several requests for copies of JIST that array engineers could use to continue verifying iSCSI protocol compliance back in their own labs.

Although Sun has been criticized for being late to the iSCSI market, it's clear they have been working hard and when Solaris 10 update 1 releases, they will have an iSCSI stack with all the availability, reliability, and open-standards compliance you would expect from a Solaris server. Now it looks like it's the array vendors that need to catch up.

Monday Aug 01, 2005

Why blocks?

We've been doing a lot of thinking lately about the blocks in block storage. At some level blocks make sense. It makes sense to break the disk media into fixed-size sectors. Disks have done this for years and up until the early 1990s, disk drives had very little intelligence and could only store and retrieve data that was pre-formatted into their native sector size. The industry standardized on 512-byte sectors and file systems and I/O stacks were all designed to operate on these fixed blocks.

Now fast-forward to today. Disk drives have powerful embedded processors in integrated circuits with wasted real-estate where more could be added. Servers use RAID arrays with very powerful embedded computers that internally operate on RAID volumes with data partitioning much larger than 512 byte blocks. These arrays use their embedded processors to emulate the 512-byte block interface of a late 1980s disk drive. Then, over on the server side, we still have file systems mapping files down to these small blocks as if IT were talking to an old drive.

This is what I'm wondering about. Is it time to stop designing storage subsystems that pretend to look like an antique disk drive and is it time to stop writing file systems and I/O stacks designed to spoon-feed data to these outdated disks.




« August 2005