Sunday May 03, 2009

Improving integration of LDoms and FMA

Scott Davenport recently posted a blog entry announcing the availability of Solaris 10 Update 7 (aka Solaris 10 5/09 or just s10u7), and touted some of the FMA improvements in the release. Of particular interest for LDoms and its users are improvements in diagnosis of IO faults when one or more IO root complexes are allocated to domains other than the control domain (i.e. so-called root domains). These improvements were designed & developed through a collaborative effort between the LDoms & FMA teams. The collaboration between these teams is nothing new; even before the initial release of Logical Domains technology in April 2007, as well as ever since, there has been tight coordination between LDoms and FMA, both in terms of the technology, and the teams.

The changes needed to resolve the IO diagnosis problems required new interfaces between the FMA and LDoms code, as well as new software on both sides of the interface. As Scott mentioned, the FMA software is now available in the Solaris 10 5/09 release. However, the necessary changes on the LDoms side (in the LDoms Manager) will be available in the upcoming 1.2 release of Logical Domains, currently scheduled for this summer. That's no reason not to install s10u7 in a Logical Domains environment now; all currently supported versions of LDoms will function correctly with s10u7 installed in a control domain, guest domains, or both. And once LDoms 1.2 is released, simply installing the new firmware & LDoms Manager that make up the 1.2 update on a system running s10u7 will automatically enable the improved IO diagnosis features (along with several other new LDoms features, but that's the subject of another post).

Monday Feb 23, 2009

Oracle 10g certification on Logical Domains

Excellent News! Oracle 10g is now certified & supported on LDoms for both single instances and RAC implementations. The details are available here.

Wednesday Dec 24, 2008

LDoms 1.1

Just in time for the holidays, LDoms version 1.1 is now available! This is a major new release of Logical Domains technology, with an extensive list of new features & bugfixes. Here are the highlights:

Major Features Introduced in LDoms version 1.1:

  • Warm and Cold Migration
  • Network NIU Hybrid IO
  • VLAN Support for Virtual Network Interface and Virtual Switch
  • Public XML Interface and XMPP Connection with the Domain Manager
  • Virtual IO DR
  • Virtual Disk Multipathing and Failover
  • Virtual Switch Support for Link Aggregated Interface
  • iostat(1M) Support in Guest Domains

Alex has more details on these features here.

Other Improvements Include:

  • Improved Interrupt Distribution (CR 6671853)
  • Performance improvements for virtual IO (CR 6689871, 6640564)
  • Solaris Installation to Single Slice Disk (CR 6685162)
  • Numerous Improvements and Extension to our Domain Services Infrastructure (CR 6560890)
  • Improved Console Behavior when not using Virtual Console (CR 6581309)
  • VDisk EFI Label Support for Disk Image (CR 6558966)
  • VDisk Support for Disk Managed by Multipathing Software (Veritas DMP, EMC Powerpath) (CR 6694540, 6637560)
  • LDoms Manager Improvements to IO FMA (CR 6463270)
  • ldm list -l now displays MAC assigned to guest (CR 6586046)
  • Improved Error Messages (CR 6741733, 6590124, 6715063)
  • ldm list -o provides fine-grained control of configuration display options (CR 6562222)
  • More accurate utilization percentage reporting (CR 6637955, 6709020)
  • Can now explicitly set a domain's hostid (CR 6670605)
  • Improved persistence of VIO service and VDS volume names (CR 6729544, 6771264)
  • More predictable behavior when deciding which cpus to DR out of a domain (CR 6567372)
  • More accurate annotations in ldm ls-spconfig output (CR 6744046)
  • Better support for large, fragmented memory configs in a domain (CR 6749507)
  • Supports setting persistent WANboot keys from OBP (CR 6510365)
  • Lots of Bug Fixes (over 100)

Get it here!

Monday Oct 13, 2008

New Sun SPARC Enterprise T5440 Server runs LDoms 1.0.3

Today Sun is announcing the latest in our line of sun4v SPARC CMT systems: the Sun SPARC Enterprise T5440 Server. This is a four socket server based on our UltraSPARC T2 Plus processor.

With up to 256 available threads, this is the best platform yet for running our Logical Domains (LDoms) virtualization technology. LDoms 1.0.3, released last May, fully supports all shipping configurations of the T5440. All the necessary firmware & software comes pre-installed. If you need to download any of the LDoms 1.0.3 software, just go here.

In addition, there's a new resource available for helping administrators get the most out of their LDoms installation: the LDoms Community Cookbook. It just went live today; in fact, at the time of this writing, not all sections are live yet. Please check back often, and remember, this is a Wiki & a community resource, so feel free to add content, make corrections, etc.

To read what other Sun engineers have to say about the T5440, see Allan Packer's blog for an updated list of T5400-related blog entries.

Note: the Sun SPARC Enterprise T5440 Server is not to be confused with our recently anounced Sun Netra T5440 Server, which is a two socket carrier-grade server.

Monday May 19, 2008

LDoms 1.0.3

Logical Domains (LDoms) 1.0.3 is now available. This release is mainly intended to enable many new features included in Solaris 10 5/08 (aka update 5). Jason did an excellent job with details & logistics in this blog entry, so there's no need in my repeating that info here.

One thing I do want to mention in terms of LDom Manager functionality is that with this release, the XML format produced by the

ldm ls-constraints -x
command has changed. While the LDom Manager will continue to accept our previous, so-called v2 XML format, as of 1.0.3, it also accepts & produces the new v3 format. This format is designed to closely align with the schema defined as part of the draft Open Virtual Machine Format (OVF) specification.

This is just the tip of the iceberg. Coming in LDoms 1.1 (currently targeted for release in Q4CY08) will be a complete XML based control interface for monitoring & managing Logical Domains, based on this same v3 schema. In addition, it will utilize the XMPP transport, providing secure, standards-based XML messaging between client application & the LDom Manager.

This combination of a standards-based schema over a standard XML transport provides a rich control interface for creating management applications. More details about this new management interface, including detailed specifications, will be forthcoming.

Wednesday Apr 09, 2008

UltraSPARC T2 Plus & LDoms 1.0.2

Today we announced our T5140/T5240 platforms, based around the UltraSPARC T2 Plus processor. This is the first CMT platform which supports multiple processor chips (2), providing up to 128 hardware threads of CMT goodness! And to go along with these platforms, we're releasing Logical Domains version 1.0.2, which fully supports the new T5140/T5240 systems, including the ability to create up to 128 logical domains to match the number of available hardware threads.

Of course, most users probably won't want to run 128 single threaded logical domains, but the fact that you can is a testament to the scalability of the architecture.

Other features of LDoms 1.0.2 include:

  • Support for up to 64 domains on UltraSPARC T2 based systems
  • Libvirt management interface software included
  • A slew of bug fixes

Of course, LDoms 1.0.2 supports all the platforms supported by LDoms 1.0.1 & LDoms 1.0, and like its predecessors, is available at no extra cost. In fact, LDoms 1.0.2 comes factory installed on all T5140/T5240 systems. The associated firmware & LDoms Manager packages are also available from the Sun Download Center.

Sunday Dec 02, 2007

LDoms versions & compatibility

With the release of LDoms 1.0.1 last October, along with the release of the first members of our UltraSPARC T2 (US T2) based platform family, issues of version compatibility naturally arise. When I first posted about the availability of LDoms 1.0.1, I included the following caveats:

WARNING: There are two important caveats when upgrading from LDoms 1.0 to LDoms 1.0.1:

  • Configurations saved to the service processor under 1.0 are not usable under 1.0.1. The LDoms 1.0.1 Administration Guide describes the upgrade procedure that needs to be applied to work around this. Part of this procedure needs to be carried out BEFORE performing the actual upgrade!
  • You must upgrade both the firmware and LDom Manager components at the same time.

One of the pillars of Sun's products is our ability to maintain compatibility; we have explicit practices & processes in place to guarantee compatibility between product releases. Unfortunately, we did not apply those to the 1.0.1 LDoms upgrade. This was an omission we do not intend to repeat. So, when upgrading from LDoms 1.0 to any subsequent version, the caveats mentioned above do apply (the current LDoms release is still 1.0.1, but we are planning for two follow-on releases in CY 2008).

As we wrap up the development phase of our upcoming 1.0.2 release, we're re-verifying that existing configurations created under 1.0.1 do continue to work under 1.0.2, and that firmware & LDom Manager upgrades do not need to be synchronized. Of course, certain 1.0.2 capabilities might not be fully enabled until both the firmware & the LDom Manager are upgraded, but at least the system will continue to function with only one or the other upgraded.

UPDATE 4/29/2008: With the release of LDoms 1.0.2, I'm happy to report that neither of the caveats described in this entry apply when upgrading from LDoms 1.0.1 to LDoms 1.0.2. However, they do still apply if you're upgrading from LDoms 1.0 to LDoms 1.0.2 (or any subsequent release). In other words, these compatibility issues only come into play when upgrading from LDoms 1.0 to a subsequent release.

Thursday Nov 01, 2007

Control domain reconfiguration in LDoms 1.0.1

This note explains how control domain reconfiguration works in LDoms 1.0.1, in contrast to how it functioned in LDoms 1.0. There were severe limitations placed on control domain reconfiguration in LDoms 1.0, which have been addressed in the 1.0.1 release:
  • The control domain could only be reconfigured when running in the "factory-default" configuration. Once reconfigured, if subsequent changes were desired, one had to revert back to factory-default and re-apply the initial changes as well as any new ones.
  • The only way to instantiate a newly reconfigured control domain was by downloading the new configuration to the SP, and then power-cycling the box.

When reconfiguring the control domain under LDoms 1.0.1, the LDom Manager enters "delayed reconfiguration" mode the first time it's asked to do something that can't be immediately instantiated (i.e. just about anything other than cpu DR and adding or removing disk volumes). Once in this mode, all subsequent operations are pended in the hypervisor until the control domain reboots. This delayed reconfiguration capability actually existed in LDoms 1.0, but could not be applied to the control domain because rebooting the control domain required a full powercycle of the system to make sure the I/O subsystem was properly reset, causing the loss of any pending operations queued up in the hypervisor.

LDoms 1.0.1 introduces the ability to soft reset the I/O subsystem, allowing the control domain (or any I/O domain) to reboot while the rest of the system stays up. This in turn allows delayed reconfiguration to work on the control domain.

Utilizing delayed reconfiguration mode for control domain reconfiguration also means the reconfiguration can take place at any time, not just when running in the factory-default configuration. This allows the control domain to be reconfigured as many times as needed without having to revert to the factory-default configuration and start over each time.

To facilitate control domain reconfiguration under LDoms 1.0, the LDom Manager ran in a special "config mode" when in the factory-default configuration. In this mode, all reconfiguration requests were simply queued up within the LDom Manager, so that the new config could be downloaded to the SP when ready, and instantiated by power-cycling the box. This mode is still utilized in Ldoms 1.0.1 on UltraSPARC T1 based platforms (when booted into its factory-default configuration). This is to support non-LDoms legacy compatibility mode, since these platforms initially shipped before the advent of LDoms technology. On these systems, the first control domain reconfiguration has to be done the same way it was for 1.0.

All subsequent control domain reconfigurations (and all control domain reconfigurations on T2 based and all future LDoms-supported platforms, as they do not utilize config mode) can be accomplished via delayed reconfig operations followed by a simple reboot of the control domain.

In summary, under LDoms 1.0.1, you can now reconfigure your control domain whenever you want, as many times as you need, without having to revert to factory-default and re-apply all your previous changes, without having to save the configuration to the SP, and without having to power-cycle the box (except for the first-ever reconfiguration on a T1 based platform).

The ability to reboot the control domain without the box power-cycling (aided with some magic I'll leave to Narayan to describe) deserves a little elaboration: it means you can truly reconfigure your control domain even with active guest domains running! The guest domains stay up during the ensuing control domain reboot; VIO services are pended & automatically re-established as the control domain comes back up.

One very important note about saving your domain configuration to the SP: just because you no longer _need_ to save the new configuration to the SP before rebooting the control domain to affect a reconfiguration, doesn't mean you _shouldn't_ save it; you absolutely should! It's _strongly_ recommend to always save any new configuration you create to the SP, otherwise if the system were to lose power, it would revert to a previously saved state (or to factory-default), which is almost certainly _not_ what you want. BTW, you can safely save your configuration even if there are delayed reconfig operations pending; in this case, the configuration that gets saved is the pending one.

Wednesday Oct 24, 2007

Project Virginia? xVM? Whither LDoms?

There's been some understandable confusion regarding the re-branding of Sun's various virtualization technologies that's currently underway, specifically as it relates to our two Hypervisor based technologies; the one on our x64 systems (originally based on Xen); the other on our SPARC CMT systems (i.e. LDoms).

Part of the confusion stems from the fact that we recently started using the "xVM" moniker for our x86 based hypervisor technology, since we did not meet the requirements set out by Citrix to continue using the Xen brand. But then, as part of an effort to re-cast all our virtualization technologies under one systems management umbrella (I believe this is what Project Virginia is about), it was decided to expand the scope of technologies included under the xVM brand. The Sun xVM Product Family now encompasses both LDoms & our x86 based hypervisor. Read this blog entry from Marc Hamilton for more details.

Unfortunately, there is still plenty of collateral out there that either implies or explicitly correlates xVM solely with our x86 hypervisor technology. This requires our readers to carefully analyze any information describing xVM, and properly determine from the context whether it's a reference to the overall Sun xVM infrastructure product family, or to one of its members, be it one of the underlying virtualization technologies, or the systems management piece layered on top.

Speaking of systems management, one of the key pieces of the Sun xVM infrastructure is Sun xVM Ops Center, which is under active development by our SysNet organization. Its aim is to provide a systems management framework that truly encompasses all our virtualization technologies. Read more about it in Marc's blog.

So what does this mean for LDoms? Other than the planned support for it in Sun xVM Ops Center, absolutely nothing. Unchanged are the LDoms technology, product, roadmap or name. This includes our plans to support 3rd party management tools via SNMP, our planned extensions to libvirt, or through our own XML-based control interface currently under development. That's a topic for another posting...

Thursday Oct 11, 2007

UltraSPARC T2 & LDoms 1.0.1

Sun is officially announcing the first products in its UltraSPARC T2 (US T2) based platform lineup today, the T5x20 series. You can read all about the details here and here. There are two big stories here related to our Logical Domains technology. For those of you who are new to my blog, Logical Domains (LDoms for short) is the name of Sun's virtualization technology for our SPARC CMT platforms that allows multiple operating systems to run concurrently on a single system.

These products represent the first of our CMT based platforms that are shipping with LDoms technology pre-installed from the factory. All future CMT servers from Sun will ship with the ability to run Logical Domains out of the box. This includes the LDoms-enabled hypervisor (our LDoms hypervisor runs on bare metal, and is embedded in the firmware of the platform), all the necessary Solaris components, and the LDoms Manager package (which is what my team works on).

This further represents the introduction of version 1.0.1 of LDoms technology. Besides support for the new US T2 based platforms (and a slew of bug fixes), this release supports the ability to reset any domain, even one which owns physical I/O devices, while all other domains continue to run. Even the control domain, i.e. the one on which the LDoms Manager runs, can reboot while all other domains stay up. This represents a major step forward in terms of RAS capability for LDoms.

As of today, LDoms version 1.0.1 is only available pre-installed on our newly announced US T2 based servers. Stay tuned here for information on the impending availability of this upgrade on our existing US T1 based platforms!

UPDATE: LDoms version 1.0.1 is now available for download here. This includes the firmware updates for US T1 platforms.

WARNING: There are two important caveats when upgrading from LDoms 1.0 to LDoms 1.0.1:

  • Configurations saved to the service processor under 1.0 are not usable under 1.0.1. The LDoms 1.0.1 Administration Guide describes the upgrade procedure that needs to be applied to work around this. Part of this procedure needs to be carried out BEFORE performing the actual upgrade!
  • You must upgrade both the firmware and LDom Manager components at the same time.

Tuesday Aug 21, 2007

VIO device renaming by LDom Manager

[This is the first in a series of entries I'll write about tips, tricks & other issues with the LDom Manager.]

The LDom Manager allows you to specify a name for each VIO client & server instance you configure. Currently (i.e. in LDoms 1.0 and the upcoming 1.0.1 releases), this information is not stored as part of the machine description (MD) for the associated guest domain. Instead, the device name to instance mapping is stored in the LDom Manager's private constraints database, which is itself persisted as a simple XML file in the control domain's filesystem.

There are cases where the information in the constraints database doesn't match that of the running system, and in those cases, the LDom Manager, on startup, will apply a canonical name to any VIO device(s) for which no name mapping is available. The two main reasons this can happen are:

  • Loss of the constraints database file (as a result of an OS upgrade, for example)
  • Reverting the system to a configuration stored on the SP containing a different set of VIO devices than the currently running Config.

When the LDom Manager first starts, if it can't find a mapping for a given VIO device in its constraints database, it applies a canonical name using the following heuristics:

For VIO clients: <type><instance #>, where <type> is either "vnet" or "vdisk", and the instance # is incremented for each additional device of type <type> encountered

For VIO servers: <domain-name>-<type><instance #>, where type is "vds", "vsw", or "vcc"

The Ldom Manager's renaming of VIO devices never affects the actual binding of VIO devices to instances in the OS, nor the binding of VIO clients to servers; everything continues to operate normally. The impact is in how the LDom Manager references VIO devices for display and reconfiguration by the user.

There is, however, one more serious problem to note: if a VIO device is configured using a name that matches a potential canonical name, and the LDom Manager subsequently attempts to use that same canonical name on another VIO device, it'll cause the LDom Manager to abort on startup, and eventually enter maintenance mode. This failure can be identified by this message appearing in the LDom Manager's log file:

Assertion failed: 0L != clientp->published_name, file vio_classes.c, line 2471

To work around this, so the LDom Manager can start, its constraints database (stored in /var/opt/SUNWldm/ldom-db.xml) must be hand-edited to rename the offending VIO device to one that doesn't collide with the canonical name namespace. There is a bug open in our bug tracking system for this problem; it's CR #6571091.

We plan to address these issues in an upcoming release (after the 1.0.1 release), by eliminating the need for the LDom Manager to rename VIO devices altogether.

Tuesday Jul 24, 2007

Updated Beginners Guide to LDoms

Tony Shoumack has just posted an updated version of the excellent Sun Blueprints™ document Beginners Guide to LDoms. This is the perfect resource and reference document for those folks new to Logical Domains. It provides both a conceptual background of the technology, and specific guidance on configuring LDoms. This represents a significant update to the previous version, incorporating feeedback from LDoms experts within Sun as well as our customers. It is also now current with the 1.0 release. If you had downloaded the previous version (which was targeted at our old, pre 1.0 release candidate builds of LDoms), please update to this latest version.

I work on the Oracle VM Server for SPARC (nee LDoms) team.

View Eric Sharakan's profile on LinkedIn


« July 2016