Quality by Design
By hhnguyen on Jun 15, 2007
A product is only as good as its test suites.
This is a favorite mantra of one of the inventors of ZFS, Bill Moore. This is especially true for Sun Cluster Product and I couldn't agree with him more. Sun Cluster team assumes that anything not tested doesn't work and we apply a lot of rigor and discipline with heavy emphasis on quality throughout the software development life cycle.
What is Testing ?
There are several definitions for Testing. The one that I like the best is from Systematic Software Testing by Rick D. Craig and Stefan P. Jaskiel.
"Testing is a concurrent lifecycle process of engineering, using and maintaining testware in order to measure and improve the quality of software being tested."
This definition captures all the essential elements of what we do in the Sun Cluster group.
Concurrent lifecyle process of engineering
Quality team is involved right from the beginning championing product testability and making sure that testability features are designed right upfront. Getting an early start is vital for building successful test automation. During feature development, Quality engineers work in parallel on designing and implementing test suites both from the perspective of implementation and the documented behavior. We have a very strong requirement that every new feature in the release should be accompanied by corresponding test suite(s) to verify that the feature is working correctly. Without a test suite, the feature cannot be added to the product. Also, the tests have to be automated and they are run nightly and on every product build to ensure that errors don't creep in afterwards. So the mantra in the Sun cluster organization is "A feature is only as good as its test suite".
Test Automation has been one of our key strategic investments. We have highly engineered test automation software with sophisticated functional, fault injection, load and performance test suites. We call this software SCATE - Sun Cluster Automated Test Environment. It is a collection of test suites, tools and also includes a distributed test development and test execution framework that is used by several groups in the company. SCATE is state-of-the-art technology covered by several patents and has won the Chairman's Award for Innovation which is a prestigious award in the company.
For the high availability features, SCATE software is as sophisticated as the system under test and required a lot of thought and effort to design and develop. A key technology in SCATE is fault injection. This provides total programmed control of granularity with ability to inject faults deterministically at precise locations in the product source code. This includes both kernel level Sun Cluster code and also user space code. This has helped us simulate a wide variety of fault conditions that are hard to reproduce otherwise. The precision and the repeatability of fault injection has vastly improved the Reliability and Availability of Sun Cluster Product.
SCATE has 1.5 million lines of test code with over 50 Automated test suites. We execute over 350,000 automated tests per release across a complex test matrix and inject over 50,000 faults. Automated tests are run 24x7x365 from Dublin to Bangalore and points in between. SCATE tests are executed by several groups in Sun for qualification of new Servers, Storage, Solaris, Filesystems and Volume manager versions with Sun Cluster.
SCATE is also used in the Open Storage Program. We partnered with some of the best storage vendors in the industry to offer enhanced choice to our customers. The storage that is certified through the Sun Cluster OSP has been rigorously tested by SCATE software and the test results have been jointly reviewed and approved. The joint interoperability matrix is instrumental in architecting a high-availability solution with confidence. SCATE also includes generic functional and fault injection tests for validation of failover and scalable agents developed at Sun and by third parties. This speeds up the qualification effort with extensive coverage at a low cost.
Measure and Improve
We constantly measure the quality of our software, processes and make steady improvements. One of the areas that we closely monitor is Test Escapes. These are customer encountered defects which were not found by our tests. We take all the Test Escapes very seriously and add new test cases to plug the holes. Each bug fix is accompanied by a test case. This closes the loop between Development, Quality, Sustaining and protects against any future regressions.
Speaking of improvements, Testing on the Toilet looks like an interesting idea to teach engineers about testing during their "downtime"
- Sekhar and the SCATE Team