For some time now we've realised that system crash dumps in Solaris are getting larger, and taking longer to preserve. A number of us have been working on projects to help speed these up, and reduce the size.
The first of these was "Crash Dump Restructuring". Implemented by Sriman and Vlad This went in to Solaris 11.2, and allowed the administrator to remove certain portions from the crash dump via the dumpadm(1m) command. Or even if you do need all of the data, the less useful data (like zfs metadata) is stored in a separate file, and you can analyse the main core file without the suplementary ones. This should allow you to send only the required data back to Oracle for analysis. For more information have a look at the documentation here and here
Another popular one is the integration of the Autonomous Crashdump Analysis Tool (ACT) in to Solaris. This again was added in 11.2 by my colleague Chris. This allows you to run an mdb dcmd on a crash dump and get a summary of the commonly needed data in a text form. Again you could send this small(er) text file to Oracle for analysis before sending the entire crash dump.
Though act is part of the mdb package, you need to load the module to make it work
</var/crash/1> # mdb -k 1
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix zvpsm
scsi_vhci iommu zfs rpcmod sata sd ip hook neti arp usba i915 stmf stmf_sbd
sockfs md random idm cpc crypto ipc fcip fctl fcp zvmm lofs ufs logindmux nsmb
ptm sppp nfs ]
> ::load act
[=== ACT Version: 8.17 <<< SUMMARY >>> ===]
[=== ACT report for core dump ===]
release: SunOS 5.11.2
isa_list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86
system booted at:0x543e6f2e ##:2014 Oct 15 12:57:18 GMT
system crashed at:0x543e6ff3 ##:2014 Oct 15 13:00:35 GMT
dump started at:0x543e6ff8 ##:2014 Oct 15 13:00:40 GMT
panic: forced crash dump initiated at user request
<and copious stuff deleted>
And finally the big project that has really eaten all my and my colleagues time for a while now, and the motivation for this blog post, Deferred Dump. Working with Brian, Nick, and again Sriman and Vlad, we've worked to preserve the crash dump information in memory across a reboot. Thus eliminating the painfully slow process of writing to a dump device and extracting it again. Vlad has written an excellent blog about how this was possible, and why it needed doing. Please do go and take a look here
Deferred dump is now available in Solaris 11.2.8 (SRU 8 of Solaris 11 update 2), so we felt it was a good time to start talking about it. Vlad has highlighted some of the requirements as well.