Monday Oct 13, 2008

Server Virtualization - Creating IO Domains on T5440

If we refer to the System Topology diagram in my previous blog, we find that the internal disks of T5440 are connected to PCIe-0. Hence it is not possible to remove the PCIe-0 from the Primary (or Control) Domain. However it is possible to remove PCIe-1, PCIe-2 and PCIe-3 from the Primary Domain and allocate them to IO Domains.

In order to create a IO-domain using PCIe-1, it has to be removed from Primary Domain. This would cause the Primary Domain to lose its primary network interface if it has been using the On-board NICs. However if there was a network card available on PCIe-0, then the primary network for Primary Domain can be switched to the ports on the network card before removing PCIe-1 from Primary Domain. If an additional network card is not available, it should still be possible to remove PCIe-1 from Primary Domain and create a IO domain (let us call it Secondary Domain) managing devices off PCIe-1. In such a case, the Primary Domain would provide the boot-disk service to the Secondary Domain and the Secondary Domain would provide the primary network service for the Primary Domain. The Pseudo-steps below outlines how this can be done.

  • In the Primary Domain
    • set the number of VCPUs to 8 (this is just an example number of VCPUs)
    • set the memory to 8GB (just an example size of memory)
    • create a vdisk-server
    • remove PCIe-1 from its control
      • This would cause the Primary Domain to lose its network after reboot
    • Reboot the Primary Domain and log back into the Primary Domain from Console
    • To cause VCPUs for Secondary Domain to be allocated from T1 (refer to the Topology Diagaram), create a dummy domain with the rest of 56 VCPUs from T0. Bind the dummy domain.
    • Associate a vdiskserverdevice as the boot-device for Secondary Domain                
    • Create the Secondary Domain
      • set the number of VCPUs to 8
      • set the memory to 8GB
      • add PCIe-1 to it
      • add the vdiskserverdevice as the vdisk for this domain
      • Bind, install-OS and boot the domain
    • Create a vswitch-device on the Secondary Domain
    • Reboot the Secondary Domain
    • Create a vnet-device for the Primary Domain associated with the above vswitch-device
    • Plumb and configure the vnet device on the Primary Domain (assumingthe On-Board network ports are connected to the primary network of the Data Center) Now the Primary Domain should have the primary network available.
    • Remove the dummy domain and proceed with creating other domains.

With the above technique, when the Primary Domain is rebooted, the Secondary Domain may seem to pause until the Primary Domain boots back. Similarly when the Secondary Domain is rebooted, the Primary Domain's primary network may appear to freeze until the Secondary Domain comes back online. But that is far better than losing all the domains and the applications running in those domains.

Server Virtualization - Using LDOMs on T5440

The Sun Fire T5440 can have at most 4 UltraSPARC T2 processor.  Each UltraSPARC-T2 Procesor is directly connected to ┬╝th of the entire system memory with 1Gigabyte memory interleaving and owns a PCIe Root-Complex. When fully populated with Processor and memory, Solaris can see 256 CPUs and 512GB of memory. That is a lot for many applications except for some large databases. With  this class of system, it is not usually possible to consume the entire system with a singe instance of most applications. But that is in fact a very good opportunity to consolidate a bunch of such applications in this system using LDOMs, there-by reducing Power consumption and rack space. An example is the SugarCRM application. It is a web based application written using PHP and has a MySQL database backend. Yun Chew has written a nice blog demonstrating how to consolidate SugarCRM application on this system using LDOMs. I can think of many such applications that can be consolidated on this and T5140 and T5240 based systems.

The work done by Yun referred to above, there was no need to create any IO domains, but because T5440 has 4 PCIe Root-Complex, it is possible to create up to 4 IO domains for applications sensitive to IO performance. Such applications, like database can be run in the IO domain  so that the application can have direct access to the physical disks. The other domains - like application server domains can access the database over virtual NIC. Each of the application server domains can have another virtual NIC to communicate with the external world.

The good thing about LDOMs based virtualization is that, even if the Primary Domain goes down, other domains continue to be functional. Many other virtualization technology does not have this advantage, which is why Live Migration is very critical for such virtualization technology.

To get the best performance out of a LDOMs based application deployment, it is important to understand the system topology a bit so that it becomes easier to determine what to place where. I have tried to create a sketch of the system topology below for reference.


When creating domains, IO and CPU requirement for the applications that would run in the virtualized environment should be estimated. The IO-performance of virtualized 1Gig network and virtualized disk is same as native. But compared to native-IO, virtualized-IO consumes more CPU cycles, often in the range of 5%-25%, depending on the size and frequency of the IO. Hence, when doing resource planning for LDOMs environment, couple of points should be considered to get the best performance from the T5440 LDOMs environment.

  • Is the application CPU intensive?
    • Does it scale up with additional CPUs?
  • Is the application Disk or Network IO intensive?
    • Moderately IO intensive applications would consume less than 50% of maximum IO capacity of the device
  • Is the application both CPU and IO intensive?
  • How many interrupt sources the domain would need to manage?
    • PCIe based Fiber Channel HBAs normally have 2 interrupt source.
    • PCIe based 1G network devices have either 1 or 2 interrupt sources, while 10G network devices have 8 interrupt sources
    • Each virtualized IO device created out of vsw or vds have 1 interrupt sources

The number of VCPUs that need to be allocated to a Domain depends largely on the ability of the application to make good use of the VCPUs.  In addition to the VCPUs needed by the application, extra VCPUs should be  allocated to handle interrupts.  For optimal performance, when VCPUs are allocated to a domain, then they should be allocated in multiples of 4 at least, preferably in multiples of 8 where possible.

In the next section I will describe how to create IO domain with Inter-IO Domain Dependency

Server Virtualization - LDOMs

With the introduction of Chip Multi-Threading (CMT) in the SPARC Processor Family, a new sun4v based architecture was also introduced. 


This sun4v interface allows the Operating System to communicate with the hardware via a layer called the Hypervisor. The Hypervisor provides a  Hardware Abstraction to the Operating System. The Hypervisor itself is not an Operating System and is delivered with the platform bundled with the Firmware. Now it should be possible to carve out different groups  of actual Hardware components and present it to the Operating System.


This combination of the Hypervisor and sun4v based Operating System are the key enablers for LDOMs. LDOMs is supported on all UltraSPARC T1 and UltraSPARC T2 based system. There are some nice documents
about LDOMs  including discussion forums that you can join or post your questions.

LDOMs Concept

A UltraSPARC T1 processor is equipped with up to 8 cores, with 4 Hardware Threads (Strands) per core. Each Hardware Thread is seen as a CPU by the Operating System. A UltraSPARC T2 Processor is also equipped with up to 8 cores  per chip with 8 Hardware Threads per core.

When creating domains, CPUs are allocated to a domain. A CPU allocated to one domain cannot be shared with another domain. Similarly when memory is allocated to a domain, the same memory cannot be allocated to another domain. Hence CPU and memory are partitioned  across domains.  However, the IO  devices  like network cards  or disks can be shared. When sharing disks, a single slice of a disk cannot be shared with multiple domains, however different slices of a disk can be allocated to different domains. It is also possible to create large files on a mounted filesystem and make a file available to a domain as disk.

UltraSPARC T1 based T2000 have 2 PCI-e Root-Complex,  UltraSPARC T2 based T5120 and T5220 have 1 PCI-e Root-Complex along with 2xOn-Chip 10Gigabit Ethernet,  UltraSPARC T2 Plus based T5140 and T5240 also have 2 PCI-e Root-Complex, and T5440 has 4 PCI-e Root Complex. It is possible to allocate a Root-Complex to a Guest Domain so that the Guest has direct access to the devices connecting to the Root-Complex.

LDOMs Components

  • Primary Domain - This is default or the first domain that is available with a new system. Initially all system resources remain allocated to this domain. This is the only domain that can be used to configure other domains. This Domain is sometimes referred as Control Domain.
  • Service Domain - A domain that provides disk and network services to other domains. For example, if a Guest Domain makes a Disk Image stored in its filesystem available for booting another domain, then it can be called a Service Domain
  • IO Domain - A domain that owns  physical  IO devices. When such domain shares its devices  with another domain , it can also be terms as Service Domain
  • Guest Domain - A domain that depends on any of the above three domains for its IO services.
  • Virtual Disk Client (vdc) - A device driver component active in Guest Domain to provide disk view to the domain
  • Virtual Disk Server (vds) - A device driver component active in Service Domain, that is responsible for the physical IO after receiving requests from the vdc.
  • Virtual Network Client (vnet) - Similar to vdc above, but provide Virtual NIC service to the Guest
  • Virtual Network Switch (vsw) - A switch implementation that communicates with vnet on one side and and with the NIC device-driver  on the other side.
  • Virtual Console Concentrator  (vcc) -  Provide Console access to a Guest Domain
  • MAU - These are the On-Chip Cryptographic Co-Processors. There is 1 MAU per core.

Steps for Creating a Domain

  1. Some CPU and Memory resources from the Primary Domain must be removed so that it can be allocated to other domains
  2. A vcc instance need to be created in the Primary Domain
  3. A vsw and vds (Virtual Disk Server Device) instance need to be created
  4. At this time a Guest Domain can be created
    1. It should be assigned a Console Port (vcc)
    2. Its vdc should be associated with a Virtual Disk Service
    3. Its vnet should be associated with a vsw

Tony Shoumack wrote a nice blueprint to provide detailed help with domain creation using LDOMs.

The per core FPU of UltraSPARC T2 and UltraSPARC T2 Plus are just functional units of the core. When a Domain need to execute Floating Point instruction, the core associated with the Domain takes care of it.
If the Domain need to accelerate Cryptographic Operations by offloading it to the On-Chip Cryptographic Co-Processor, then, MAUs need to be assigned to the domain.

In the next section, I will cover how to allocate devices and CPU to get the best performance.




« April 2014