More thoughts on ZFS compression and crash dumps
By Chris W Beal-Oracle on Dec 22, 2011
Thanks to Darren Moffat for poking holes in my previous post, or more explicitly pointing out that I could add more useful and interesting data. Darren commented that it was a shame I hadn't included the time to take a crash dump along side the size, and space usage. The reason for this is that one reason for using vmdump format compression from savecore is to minimize the time required to get the crash dump off the dump device and on to the file system.
The motivation for this reaches back many years, back to when the default for Solaris was to use swap as the dump device. So when you brought the system back up, you wanted to wait till savecore completed before letting the system complete coming up to multiuser (you can tell how old this is by the fact we're not talking about SMF services)
So with Oracle Solaris 11 the root file system is ZFS, and the default configuration is to dump to a dump ZVOL. And as it's not used by anything else, the savecore can and does run in the background. So it isn't quite as important to make it as fast as possible. It's still interesting though, as with everything in life, it's a compromise.
One problem with the tests I wrote about yesterday is the size of the dumps is too small to make measurement of time easy (size is one thing, but we have fast disks now, so getting 8GB off a zvol on to a file system takes very little time)
So this is not a completely scientific test, but an illustration which helps me understand what the best solution for me is. My colleague Clive King wrote a driver to leak memory to create larger kernel memory segments, which artificially increases the amount of data a crash dump contains. I told this to leak 126GB of kernel memory, set the savecore target directory to be one of "uncompressed" "gzip9 compressed" or "LZJB Compressed", and in the first case set it to use vmdump format compressed dumps, oj and then I took a crash dump, repeating over the 3 configurations. The idea being to time the difference in getting the dump on to the file system.
This is a table of what I found
| Size Leaked (GB)
|| Size of Crash Dump (GB)
|| ZFS pool space used (GB)
||Compression|| Time to take dump (mm:ss)
|| Time from panic to crash dump available (mm:ss)
|126||140||2.4|| GZIP level 9
Notice one thing, the compression ratio for gzip 9 is massive - 70x, so this is probably a side effect of the fact it's not real data, but probably contains some easily compressible data. The next step should be to populate the leaked memory with random data.
So what does this tel us - assuming the lack of random content isn't an issue, that for a modest hit in time take to get the dump from the dump device (7:05 vs 6:15) we get an uncompressed dump on an LZJB compressed ZFS file system while using a comparable amount of physical storage. This allows me to directly analyse the dump as soon as it's available. Great for development purposes. Is it of benefit to our customers? That's something I'd like feedback on. Please leave a comment if you see value in this being the default.