What's up with LDoms: Part 1 - Introduction & Basic Concepts
By Stefan Hinker-Oracle on Jun 29, 2012
LDoms - the correct name is Oracle VM Server for SPARC - have been around for quite a while now. But to my surprise, I get more and more requests to explain how they work or to give advise on how to make good use of them. This made me think that writing up a few articles discussing the different features would be a good idea. Now - I don't intend to rewrite the LDoms Admin Guide or to copy and reformat the (hopefully) well known "Beginners Guide to LDoms" by Tony Shoumack from 2007. Those documents are very recommendable - especially the Beginners Guide, although based on LDoms 1.0, is still a good place to begin with. However, LDoms have come a long way since then, and I hope to contribute to their adoption by discussing how they work and what features there are today.
In this and the following posts, I will use the term "LDoms" as a common abbreviation for Oracle VM Server for SPARC, just because it's a lot shorter and easier to type (and presumably, read).
So, just to get everyone on the same baseline, lets briefly discuss the basic concepts of virtualization with LDoms. LDoms make use of a hypervisor as a layer of abstraction between real, physical hardware and virtual hardware. This virtual hardware is then used to create a number of guest systems which each behave very similar to a system running on bare metal: Each has its own OBP, each will install its own copy of the Solaris OS and each will see a certain amount of CPU, memory, disk and network resources available to it. Unlike some other type 1 hypervisors running on x86 hardware, the SPARC hypervisor is embedded in the system firmware and makes use both of supporting functions in the sun4v SPARC instruction set as well as the overall CPU architecture to fulfill its function.
The CMT architecture of the supporting CPUs (T1 through T4) provide a large number of cores and threads to the OS. For example, the current T4 CPU has eight cores, each running 8 threads, for a total of 64 threads per socket. To the OS, this looks like 64 CPUs.
The SPARC hypervisor, when creating guest systems, simply assigns a certain number of these threads exclusively to one guest, thus avoiding the overhead of having to schedule OS threads to CPUs, as do typical x86 hypervisors. The hypervisor only assigns CPUs and then steps aside. It is not involved in the actual work being dispatched from the OS to the CPU, all it does is maintain isolation between different guests.
Likewise, memory is assigned exclusively to individual guests. Here, the hypervisor provides generic mappings between the physical hardware addresses and the guest's views on memory. Again, the hypervisor is not involved in the actual memory access, it only maintains isolation between guests.
During the inital setup of a system with LDoms, you start with one special domain, called the Control Domain. Initially, this domain owns all the hardware available in the system, including all CPUs, all RAM and all IO resources. If you'd be running the system un-virtualized, this would be what you'd be working with. To allow for guests, you first resize this initial domain (also called a primary domain in LDoms speak), assigning it a small amount of CPU and memory. This frees up most of the available CPU and memory resources for guest domains.
IO is a little more complex, but very straightforward. When LDoms 1.0 first came out, the only way to provide IO to guest systems was to create virtual disk and network services and attach guests to these services. In the meantime, several different ways to connect guest domains to IO have been developed, the most recent one being SR-IOV support for network devices released in version 2.2 of Oracle VM Server for SPARC. I will cover these more advanced features in detail later. For now, lets have a short look at the initial way IO was virtualized in LDoms:
For virtualized IO, you create two services, one "Virtual Disk Service" or vds, and one "Virtual Switch" or vswitch. You can, of course, also create more of these, but that's more advanced than I want to cover in this introduction. These IO services now connect real, physical IO resources like a disk LUN or a networt port to the virtual devices that are assigned to guest domains. For disk IO, the normal case would be to connect a physical LUN (or some other storage option that I'll discuss later) to one specific guest. That guest would be assigned a virtual disk, which would appear to be just like a real LUN to the guest, while the IO is actually routed through the virtual disk service down to the physical device. For network, the vswitch acts very much like a real, physical ethernet switch - you connect one physical port to it for outside connectivity and define one or more connections per guest, just like you would plug cables between a real switch and a real system. For completeness, there is another service that provides console access to guest domains which mimics the behavior of serial terminal servers.
The connections between the virtual devices on the guest's side and the virtual IO services in the primary domain are created by the hypervisor. It uses so called "Logical Domain Channels" or LDCs to create point-to-point connections between all of these devices and services. These LDCs work very similar to high speed serial connections and are configured automatically whenever the Control Domain adds or removes virtual IO.
To see all this in action, now lets look at a first example. I will start with a newly installed machine and configure the control domain so that it's ready to create guest systems.
In a first step, after we've installed the software, let's start the virtual console service and downsize the primary domain.
root@sun # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-c-- UART 512 261632M 0.3% 2d 13h 58m root@sun # ldm add-vconscon port-range=5000-5100 \ primary-console primary root@sun # svcadm enable vntsd root@sun # svcs vntsd STATE STIME FMRI online 9:53:21 svc:/ldoms/vntsd:default root@sun # ldm set-vcpu 16 primary root@sun # ldm set-mau 1 primary root@sun # ldm start-reconf primary root@sun # ldm set-memory 7680m primary root@sun # ldm add-config initial root@sun # shutdown -y -g0 -i6
So what have I done:
- I've defined a range of ports (5000-5100) for the virtual network terminal service and then started that service. The vnts will later provide console connections to guest systems, very much like serial NTS's do in the physical world.
- Next, I assigned 16 vCPUs (on this platform, a T3-4, that's two cores) to the primary domain, freeing the rest up for future guest systems. I also assigned one MAU to this domain. A MAU is a crypto unit in the T3 CPU. These need to be explicitly assigned to domains, just like CPU or memory. (This is no longer the case with T4 systems, where crypto is always available everywhere.)
- Before I reassigned the memory, I started what's called a "delayed reconfiguration" session. That avoids actually doing the change right away, which would take a considerable amount of time in this case. Instead, I'll need to reboot once I'm all done. I've assigned 7680MB of RAM to the primary. That's 8GB less the 512MB which the hypervisor uses for it's own private purposes. You can, depending on your needs, work with less. I'll spend a dedicated article on sizing, discussing the pros and cons in detail.
- Finally, just before the reboot, I saved my work on the ILOM, to make this configuration available after a powercycle of the box. (It'll always be available after a simple reboot, but the ILOM needs to know the configuration of the hypervisor after a power-cycle, before the primary domain is booted.)
Now, lets create a first disk service and a first virtual switch which is connected to the physical network device igb2. We will later use these to connect virtual disks and virtual network ports of our guest systems to real world storage and network.
root@sun # ldm add-vds primary-vds primary root@sun # ldm add-vswitch net-dev=igb2 switch-primary primary
You are free to choose whatever names you like for the virtual disk service and the virtual switch. I strongly recommend that you choose names that make sense to you and describe the function of each service in the context of your implementation. For the vswitch, for example, you could choose names like "admin-vswitch" or "production-network" etc.
This already concludes the configuration of the control domain. We've freed up considerable amounts of CPU and RAM for guest systems and created the necessary infrastructure - console, vts and vswitch - so that guests systems can actually interact with the outside world. The system is now ready to create guests, which I'll describe in the next section.