By Chris W Beal-Oracle on Apr 24, 2015
Better ways to handle crash dumps in Solaris
For some time now we've realised that system crash dumps in Solaris are getting larger, and taking longer to preserve. A number of us have been working on projects to help speed these up, and reduce the size.
The first of these was "Crash Dump Restructuring". Implemented by Sriman and Vlad This went in to Solaris 11.2, and allowed the administrator to remove certain portions from the crash dump via the dumpadm(1m) command. Or even if you do need all of the data, the less useful data (like zfs metadata) is stored in a separate file, and you can analyse the main core file without the suplementary ones. This should allow you to send only the required data back to Oracle for analysis. For more information have a look at the documentation here and here
Another popular one is the integration of the Autonomous Crashdump Analysis Tool (ACT) in to Solaris. This again was added in 11.2 by my colleague Chris. This allows you to run an mdb dcmd on a crash dump and get a summary of the commonly needed data in a text form. Again you could send this small(er) text file to Oracle for analysis before sending the entire crash dump.
Though act is part of the mdb package, you need to load the module to make it work
</var/crash/1> # mdb -k 1 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix zvpsm scsi_vhci iommu zfs rpcmod sata sd ip hook neti arp usba i915 stmf stmf_sbd sockfs md random idm cpc crypto ipc fcip fctl fcp zvmm lofs ufs logindmux nsmb ptm sppp nfs ] > ::load act > ::act collector: 8.17 osrelease: 5.11.2 osversion: s11.2 arch: i86pc target: core [=== ACT Version: 8.17 <<< SUMMARY >>> ===] [=== ACT report for core dump ===] hostname: myhost domainname: mydomain.oracle.com release: SunOS 5.11.2 architecture: i86pc isa_list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86 pagesize: 0t4096 hostid: 0x296e9 system booted at:0x543e6f2e ##:2014 Oct 15 12:57:18 GMT system crashed at:0x543e6ff3 ##:2014 Oct 15 13:00:35 GMT dump started at:0x543e6ff8 ##:2014 Oct 15 13:00:40 GMT panic: forced crash dump initiated at user request <and copious stuff deleted>
And finally the big project that has really eaten all my and my colleagues time for a while now, and the motivation for this blog post, Deferred Dump. Working with Brian, Nick, and again Sriman and Vlad, we've worked to preserve the crash dump information in memory across a reboot. Thus eliminating the painfully slow process of writing to a dump device and extracting it again. Vlad has written an excellent blog about how this was possible, and why it needed doing. Please do go and take a look here
Deferred dump is now available in Solaris 11.2.8 (SRU 8 of Solaris 11 update 2), so we felt it was a good time to start talking about it. Vlad has highlighted some of the requirements as well.