And now for something virtually different...

Okay, so the title is a homage to Monty Python... But, what if we had a completely different approach to providing virtual environments on a computer system. The previous entries on this blog describe the tricky bits - complexity and overhead - involved in creating virtual machines. These problems have been addressed with pretty darned good success through heroic programming, but what if we could avoid some of the issues entirely with a new approach.

Traditional virtual machines timeslice physical CPUs among multiple virtual machines, intercepting instructions that change system state or do I/O, and emulating them as needed. This is based on the historical design of computer systems where physical CPUs are relatively rare and expensive (hence must be time-multiplexed), and that state-changing events for one virtual machine must not affect others (hence must run without full machine access privileges, and require trapping and emulation of such functions). As I've been outlining, this is complicated and expensive. Even simple timeslicing between virtual machines can cost hundreds of clock cycles, because cache and TLB contents have to be discarded.

The T1 chip in Sun's "Niagara"-based systems (T1000, T2000, and others to come) turns the assumption of expensive/rare CPUs upside down. This processor's Chip Multi Threading (CMT) design provides up to 32 logical CPUs ("strands") in a 1 or 2 rack unit, low-cost server. Now, CPU strands are plentiful and cheap. Instead of timeslicing a few CPUs between VMs, just give each virtual machines one or more dedicated logical CPUs for its own use. That is the basis of logical domains (LDoms): every domain has its own assigned CPUs (roughly 3% granularity of the entire box CPU count) which can be dynamically added or removed to a Solaris instance. Each domain also has its complement of disk, network, and cryptographic assets. Everything is assigned by a control domain, and virtual network and disk I/O is provided by bridged access service domains.

This gives us several important benefits right away: since each domain has its own logical CPUs, it can change its state (such as enable or disable interrupts) without having to cause a trap and emulation. After all, it owns the CPU and its interrupt mask all by itself. That can save thousands of context switches per second. Second, since each CPU strand has its own private context in hardware, the T1000/T2000 can switch between domains in a single clock cycle, not the several hundred needed for most virtual machines.

Typically that happens when a domain references memory that is not currently in cache. Fetching contents from RAM to the processor (all vendor's processors, not just this one!) can take many clock cycles during which a logical CPU stalls execution of the single instruction causing the cache miss. By switching to another CPU strand on the same physical CPU core, the T1000/T2000 lets another logical CPU continue instruction processing, during time that is "dead time" on most processors. On most existing CPUs, cache misses result in dead time - but on the Sun T1 chip, that time can be used to continue processing other work. This is the essence of CMT's "Throughput Computing" that makes the T1 chip so poweful.

Next time, some more information on how LDoms works and is used.


[Trackback] Interesting article about the basics of the logical domains on T1000/T2000:This gives us several important benefits right away: since each domain has its own logical CPUs, it can change its state (such as enable or disable interrupts) without having to...

Posted by on February 01, 2007 at 08:08 AM MST #

Interesting blog entry. Do you have any information when the LDOM firmware for the T2000 will be made available?

Posted by Derek Morr on February 01, 2007 at 08:08 AM MST #

The Solaris 10 11/06 What's New documentation says:
Up to 32 logical domains per system, managed by a CLI, the Logical Domains (LDoms) Manager 1.0 software, which is a separate download

When will this be available outside of Sun? Is it proper to read it as "a separate free download"?

Posted by Mike Gerdts on February 01, 2007 at 08:08 AM MST #

Post a Comment:
Comments are closed for this entry.



« November 2015