Wednesday Apr 25, 2007

SAM/QFS Directory Performance

With the release of the 4.6 version of SAM and QFS on April 3rd, I thought it would be appropriate to start discussing some of the improvements included in this update. Specifically, SAM/QFS added improvements to directory lookup performance. This is part of our effort to continue scaling SAM/QFS. I thought I'd address the directory lookup improvements in this article

This phase of the directory lookup performance improvements is primarily targeted at lookups for names that do not exist. The system calls that receive the greatest benefit are create(), link(), and rename(). System calls other than these three will not benefit greatly from the 4.6 changes. We would like to further expand the system calls that benefit in the future.

The SAM/QFS performance group ran a comparison test for these system calls on small (2 CPUs), medium (4 CPUs), and large (20 CPUs) configurations comparing 4.5 to 4.6. The tests were run using a C program on a QFS 'ma' file system with 'mm' and 'mr' devices. They reported a 66 times improvement for the case with 128K zero length files in a directory on the large configuration. On the small configuration, a 26 times improvement was realized for creation of 100K files in a single directory. Additionally, on the medium configuration, rename, symlink, and mkdir performance comparisons showed 970, 169, and 4 fold increases respectively. In general, the larger the directory, the greater the gain expected with 4.6 over 4.5.

It is important to realize that the performance of the application in using these system calls must be sufficient to benefit. For example, when the create logic was moved from a C program to a shell script on the small configuration, the performance improvement was only 2.4 times compared to the 26 times with the C program. Clearly, the shell script could not create files quickly enough to see the same benefit as the C program. This needs to be kept in mind when developing performance tests or understanding performance results.

We also took a look at some of the SAM commands that are run on QFS. The command that received the greatest performance gain in 4.6 was the samfsrestore command because it primarily does lookups of directory entries that do not exist. We observed a 2 times performance increase for a samfsdump file containing a few million files. You can't expect hundreds of times of improvement at this level because the directory lookups are only a subset of the operations performed by samfsrestore.

The really nice point for existing SAM/QFS users is that, other than a 4.6 upgrade, you do not need to do anything to obtain the improvements. The directory lookup performance improvements are compatible with all 4.x QFS file systems. All directories greater than 4K in size will automatically use the new directory caching lookup scheme.

All in all, I think this a big benefit to SAM/QFS users and shows the commitment to continue scaling SAM/QFS.

'til next time Ted

Wednesday Feb 21, 2007

Welcome to SAM-QFS Weblog

Welcome to the Sun SAM and QFS Blog! SAM is storage archive manager which manages data (both on disk and tape) with amazing ease and transparency to the user. QFS is a shared file system which scales from 1 to 128 nodes and virtually has no limit to the amount of information managed. I know there is a community of people out there that are already familiar with this software, but I'd like a brief introduction of SAM and QFS for those that are not.

So, what is SAM and QFS? In a nutshell, it provides a complete data management system but consists of two integrated software products SAM (Storage Archive Manager) and QFS (High Performance Shared SAN file system).

The Shared QFS file system is a high-performance, 64-bit Solaris SAN based file system. This file system ensures that data is available at device-rated speeds when requested by one or more users. The Shared QFS file system's inherent scalability allows an organization's storage requirements to grow over time from Terrabytes to Petabytes with virtually no limit to the amount of information that can be managed. In addition, the Shared QFS file system allows you to scale from 1 to 128 compute nodes to let you scale your data sharing with your computational needs. QFS also let's you tune your I/O access to your specific data profile to maximize your read and write performance to disk. Shared QFS is also part of Sun's heterogeneous SAN story and supports linux client nodes in addition to the native Solaris clients. We also support IBM Sanergy clients in a QFS shared configuration.

As I said, SAM is tightly integrated with QFS. The SAM adds the features of a storage archive manager to QFS. A SAM-QFS file system configuration allows data to be archived to and retrieved from local or remote automated tape libraries or disk at device-rated speeds. SAM manages QFS data online, nearline, and offline automatically and in a manner that is transparent to the user or application. Users read and write files to a SAM-QFS file system as though all files were on primary storage. In addition, SAM protects QFS file system data continually, automatically, and unobtrusively. Multiple file copies can be made to many media types and storage tiers in a standard format. This minimizes the requirement for traditional back-up only and provides fast disaster recovery in an effective long-term data storage solution. A SAM-QFS file system configuration is especially suited to data-intensive applications that require a scalable and flexible storage solution, superior data protection, and fast disaster recovery. This solution also includes an integrated volume manager, automated and flexible policy management, GUI-based management tools.

On the whole SAM-QFS solution solves data management needs with ease and reliability and we're just getting better and bigger. Look here for technical discussion around SAM-QFS and it's future development.



Ted Pogue


« April 2014