ZFS: The Last Word in Filesystems
By bonwick on Oct 31, 2005
Halloween has been a special day for ZFS since its inception.
On 10/31/2001, we got the user-level prototype working.
On 10/31/2002, we got the first in-kernel mount.
And today, 10/31/2005, we integrated into Solaris. ZFS will hit the street in a couple of weeks via Solaris Express.
We (the ZFS team) will have much more to say about ZFS in the coming weeks. But tonight, I just want to tell you what it was like to drive this thing home.
The ZFS team is distributed: we have people working in Menlo Park, California; Broomfield, Colorado; and several other places. This week, we flew everyone in to Menlo Park and took over a giant conference room.
The first thing you notice is the heat. These rooms are made for 100 watt humans, not multi-gigahertz CPUs and associated paraphernalia. And, like any big company, Sun is all about saving money in the dumbest ways -- like turning off the A/C at night, and outsourcing the people who could possibly turn it back on.
At first, things went pretty well. We comandeered the Nob Hill conference room, which has a long table and lots of power and network taps. We brought in a bunch of machines and created 'Camp ZFS'. Each new box amped up the heat level, to the point that it became difficult to think. So we grabbed a 36-inch fan from one of the labs to get the air flowing. That was a huge help, although it sounded like you were wearing a pair of lawn mowers as headphones.
On Sunday, we plugged in one more laptop. That was it -- we blew the circuit breaker! So here we are, less than 24 hours from our scheduled integration date, and all of our computers are without power -- and the room containing the circuit breakers was locked. (Thanks, OSHA!)
So we took a three-pronged approach: (1) went through the Approved Process to get power restored (ETA: April); (2) hunted down someone from campus security to get us key access to the electrical room (ETA: Sunday night); and (3) sent our manager to Home Depot to buy a bunch of 100-foot extension cords so we could, if necessary, run Nob Hill off of the adjacent lab's power grid (ETA: 30 minutes).
All three came through. We ran half of the load over extension cords to the lab, the other half on the Nob Hill circuit. It took a bit of experimentation to find a load balance that would stay up without tripping the breaker again. Apparently, we had angered it. (Even now, I'm typing this blog entry courtesy of a shiny new yellow extension cord.)
Meanwhile, the clock was ticking.
At the end of a large project like this, it's never the technology that kills you -- it's the process, the cleanup of home-grown customizations, the packaging, the late-breaking code review comments, the collapsing of SCCS deltas, stuff like that. With power back up, we slogged on until about 4AM. Everything looked good, so we went home to sleep. Actually, some people just crashed on the couches in my office, and Bill's office next door.
By 10AM Monday we were back, making sure that all the final tests had run successfully, and working through the last bits of paperwork with the Solaris release team. After five years of effort, it was time to type 'putback'.
Denied! The permissions on a directory were wrong. Fix them up, try again.
Denied! One more TeamWare directory with wrong permissions. Fine, fix that too.
Last try... and there it goes! 584 files, 92,000 lines of change, 56 patents, 5 years... and there it is. Just like that.
Fortunately we were prepared for either success or failure. We had brought in a massive array of vodka, tequila, wine... you name it, we had it. And not in small quantities.
As I said at the beginning, we'll have lots more to say in the coming days. But right now, it's time to sleep!