Solaris Xen update
By levon on Jul 18, 2007
As you might expect, there's been a massive amount of change since the last OpenSolaris release. This time round, we are based on Xen 3.0.4 and build 66 of Nevada. As always, we'd love to hear about your experiences if you try it out, either on the mailing list or the IRC channel.
In many ways, the most significant change is the huge effort we've put in to stabilize our codebase; a significant number of potential hangs, crashes, and core dumps have been resolved, and we hope we're converging on a good-quality release. We've started looking seriously at performance issues, and filling in the implementation gaps. Since the last drop, notable improvements include:
- PAE support
- By default, we now use PAE mode on 32-bit, aiding compatibility with other domain 0 implementations; we also can boot under either PAE or non-PAE, if the Xen version has 'bi-modal' support. This has probably been the most-requested change missing from our last release.
- HVM support
- If you have the right CPU, you can now run fully-virtualized domains such as Windows using a Solaris dom0! Whilst more work is needed here, this does seem to work pretty well already. Mark Johnson has some useful tips on using HVM domains.
- New management tools
- We have integrated the virt- suite of management tools. virt-manager provides a simple GUI for controlling guest domains on a single host. virt-install and virsh are simple CLIs for installing and managing guest domains respectively. Note that parts of these tools are pre-alpha, and we still have a significant amount of work to do on them. Nonetheless, we appreciate any comments...
- PV framebuffer
- Solaris dom0 now supports the SDL-based paravirt framebuffer backend, which can be used with domUs that have PV framebuffer support.
- Virtual NIC support
- The Ethernet bridge used in the previous release has been replaced with virtual NICs from the Crossbow project. This enables future work around smart NICs, resource controls, and more.
- Simplified Solaris guest domain install
- It's now easy to install a new Solaris guest domain using the DVD ISO. The temporary tool in the last release, vbdcfg, has disappeared now as a result. William Kucharski has a walk-through.
- Better SMF usage
- Several of the xend configuration properties are now controlled using the SMF framework.
- Managed domain support
- We now support xend-managed domain configurations instead of using .py configuration files. Certain parts of this don't work too well yet (unfortunately all versions of Xen have similar problems), but we are plugging in the gaps here one by one.
- Memory ballooning support
- Otherwise known as support for dynamic xm mem-set, this allows much greater flexibility in partitioning the physical memory on a host amongst the guest domains. Ryan Scott has more details.
- Vastly improved debugging support
- Crash dump analysis and debugging tools have always been a critical feature for Solaris developers. With this release, we can use Solaris tools to debug both hypervisor crashes and problems with guest domains. I talk a little bit about the latter feature below.
- xvbdb has been renamed
- To simply be xdb. This was a very exciting change for certain members of our team.
We're still working hard on finishing things up for our phase 2 putback into Nevada (where "phase 1" was the separate dboot putback). As well as finishing this work, we're starting to look at further enhancements, in particular some features that are available in other vendors' implementations, such as a hypervisor-copy based networking device, blktap support, para-virtualized drivers for HVM domains (a huge performance fix), and more.
Debugging guest domains
Here I'll talk a little about one of the more minor new features that has nonetheless proven very useful. The xm dump-core command generates an image file of a running domain. This file is a dump of all memory owned by the running domain, so it's somewhat similar to the standard Solaris crash dump files. However, dump-core does not require any interaction with the domain itself, so we can grab such dumps even if the domain is unable to create a crash dump via the normal method (typically, it hangs and can't be interacted with), or something else prevents use of the standard Solaris kernel debugging facilities such as kmdb (an in-kernel debugger isn't very useful if the console is broken).
However, this also means that we have no control over the format used by the image file. With Xen 3.0.4, it's rather basic and difficult to work with. This is much improved in Xen 3.1, but I haven't yet written the support for the new format.
To add support for debugging such image files of a Solaris domain, I modified mdb(1) to understand the format of the image file (the alternative, providing a conversion step, seemed unneccessarily awkward, and would have had to throw away information!). As you can see if you look around usr/src/cmd/mdb in the source drop, mdb(1) loads a module called mdb_kb when debugging such image files. This provides simple methods for reading data from the image file. For example, to read a particular virtual address, we need to use the contents of the domain's page tables in the image file to resolve it to a physical page, then look up the location of that page in the file. This differs considerably from how libkvm works with Solaris crash dumps: there, we have a big array of address translations, which is used directly, instead of the page table contents.
In most other respects, debugging a kernel domain image is much the same as a crash dump:
# xm dump-core solaris-domu core.domu # mdb core.domu mdb: warning: dump is from SunOS 5.11 onnv-johnlev; dcmds and macros may not match kernel implementation Loading modules: [ unix genunix specfs dtrace xpv_psm scsi_vhci ufs ... sppp ptm crypto md fcip logindmux nfs ] > ::status debugging domain crash dump core.domu (64-bit) from sxc16 operating system: 5.11 onnv-johnlev (i86pc) > ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fffffffffbc4b7f0 1b 40 9 169 yes yes t-1408926 ffffff00010bfc80 sched > ::evtchns Type Evtchn IRQ IPL CPU ISR(s) evtchn 1 257 1 0 xenbus_intr evtchn 2 260 9 0 xenconsintr virq:debug 3 256 15 0 xen_debug_handler virq:timer 4 258 14 0 cbe_fire evtchn 5 259 5 0 xdf_intr evtchn 6 261 6 0 xnf_intr evtchn 7 262 6 0 xnf_intr > ::cpustack -c 0 cbe_fire+0x5c() av_dispatch_autovect+0x8c(102) dispatch_hilevel+0x1f(102, 0) switch_sp_and_call+0x13() do_interrupt+0x11d(ffffff00010bfaf0, fffffffffbc86f98) xen_callback_handler+0x42b(ffffff00010bfaf0, fffffffffbc86f98) xen_callback+0x194() av_dispatch_softvect+0x79(a) dispatch_softint+0x38(9, 0) switch_sp_and_call+0x13() dosoftint+0x59(ffffff0001593520) do_interrupt+0x140(ffffff0001593520, fffffffffbc86048) xen_callback_handler+0x42b(ffffff0001593520, fffffffffbc86048) xen_callback+0x194() sti+0x86() _sys_rtt_ints_disabled+8() intr_restore+0xf1() disp_lock_exit+0x78(fffffffffbd1b358) turnstile_wakeup+0x16e(fffffffec33a64d8, 0, 1, 0) mutex_vector_exit+0x6a(fffffffec13b7ad0) xenconswput+0x64(fffffffec42cb658, fffffffecd6935a0) putnext+0x2f1(fffffffec42cb3b0, fffffffecd6935a0) ldtermrmsg+0x235(fffffffec42cb2b8, fffffffec3480300) ldtermrput+0x43c(fffffffec42cb2b8, fffffffec3480300) putnext+0x2f1(fffffffec42cb560, fffffffec3480300) xenconsrsrv+0x32(fffffffec42cb560) runservice+0x59(fffffffec42cb560) queue_service+0x57(fffffffec42cb560) stream_service+0xdc(fffffffec42d87b0) taskq_d_thread+0xc6(fffffffec46ac8d0) thread_start+8()
Note that both ::cpustack and ::cpuregs are capable of using the actual register set at the time of the dump (since the hypervisor needs to store this for scheduling purposes). You can also see the ::evtchns dcmd in action here; this is invaluable for debugging interrupt problems (and we've fixed a lot of those over the past year or so!).
Currently, mdb_kb only has support for image files of para-virtualized Solaris domains. However, that's not the only interesting target: in particular, we could support mdb in live crash dump mode against a running Solaris domain, which opens up all sorts of interesting debugging possibilities. With a small tweak to Solaris, we can support debugging of fully-virtualized Solaris instances. It's not even impossible to imagine adding Linux kernel support to mdb(1), though it's hard to imagine there would be a large audience for such a feature...