Monday Jun 25, 2012

Oracle VM server for SPARC 2.2 on S11

Oracle VM Server for SPARC 2.2 has been released for a little while now. The https://blogs.oracle.com/virtualization blog has an overview of all the 2.2 features. Initially, what was released was the SVR4 package for Solaris 10 (which is unbundled and wasn't constrained by any external schedule). On Solaris 11, the 'ldomsmanager' package is built into Solaris (and therefore doesn't need to be downloaded separately) so it is delivered as part of an S11 Support Repository Update (SRU). Some of the features in 2.2 are specific to S11 (SR-IOV and the ability to live migrate between machines with different CPU types) and so there have been many requests to know when are the S11 bits coming.

Solaris 11 SRU8.5
was released on Friday and this includes Oracle VM server for SPARC 2.2 so if you're already running an S11 SRU all you need do is a 'pkg update' to get the 2.2 bits.

If you're still running the original S11 and your 'pkg publisher' output shows the /release repository then you'll need to sign up for the /support repo by getting the appropriate keys and certificates to access the repository (requires a support contract). The 2.2 Admin Guide documents how to do this upgrade on S11

Two S11 articles which have some useful details on upgrading (not just 'ldomsmanager') via the support repositories are:
How to Update Oracle Solaris 11 Systems From Oracle Support Repositories by Glynn Foster

Tips for Updating Your Oracle Solaris 11 System from the Oracle Support Repository by Peter Dennis

In particular, if you'd like to stick with the v2.1 release when upgrading to SRU8.5 or greater, see the 'pkg freeze' section of Peter's article.

Wednesday Jun 08, 2011

Live Migration in Oracle VM Server for SPARC 2.1

You may have seen the press release that Oracle VM Server for SPARC 2.1 (a.k.a. LDoms) has just been released (you can find the download links here). There is a considerable list of new features (including Dynamic Resource Management, Virtual Device Service Validation and many more) but the key feature for me is Live Migration which allows for the migration of an active domain without any impact on applications and users shouldn't even notice that the guest domain is running on a new machine (OK so I would say that since I'm one of the migration developers...)


It has been possible to migrate an active domain since LDoms 1.1 (released in 2008) however up until now, the domain was suspended while the runtime state was copied from the source machine to the target which could result in an outage in the order of minutes if the domain had a large amount of memory (the suspend time was pretty much linearly proportional to the guest domain memory size). With Live Migration we transfer the memory contents as the domain keeps running while at the same time keeping track of the memory that is being modified. We iterate through the memory, transferring modified pages to the target system, until the amount of memory being modified is minimal. Then at the end we momentarily suspend the domain and copy the remaining memory and resume the domain on the target. This suspension can take less than a second but depending on the workload can take longer than this if the domain is rapidly modifying a lot of the memory pages.


One other migration performance enhancement in this release is that multiple network connections between the source and target machines are utilised (based on the number of virtual cpus in the control domains) which improves the throughput of the memory transfer and reduces the overall migration time.


I've found from running experiments that having 16 vcpus in the control domains makes a significant improvement over 8 vcpus (and up to 32 vpus will help more, beyond which there's no noticeable difference). The other best practice is to ensure that cryptographic units (a.k.a MAUs) are assigned to the control domains also as the memory contents are protected by SSL when being transferred over the network and offloading these operations to the T-series hardware makes a big difference.


The migration chapter in the Admin Guide has been revamped to discuss the enhancements.


 [Update: Jeff Savit wrote a great post about his experiences using Live Migration]

Wednesday Jan 20, 2010

LDoms 1.3 has been released

It's now possible to download the LDoms 1.3 software. The LDoms 1.3 Release Notes list all the new features but some of the ones I particularly like include:


  • Greatly improved domain migration speeds
    • Following on from the speed-up in 1.2 due to using the hardware crypto devices, the migration code in 1.3 has been enhanced to compress the memory and use multiple threads to push the data over the network. The speed-up depends on the memory usage of the domain being migrated but Haik has mentioned seeing improvements of over 80%...

  • Support for link-based IPMP
    • The virtual network and virtual switch devices now support link status updates to the network stack. The LDoms 1.3 Admin Guide includes an example of how you'd configure it.

  • Crypto Dynamic Reconfiguration (DR)
    • You can now add and remove the hardware crypto units from domains without rebooting (just like CPUs and VIO devices). Which is great since ssh/scp can now use the crypto units as well as domain migration. In addition, because of this, you can now migrate domains with crypto units.

  • Support for non-interactive migrations
    • Adding a '-p {password file}' option to 'ldm migrate' removes the need to type in a password when doing migrations.

[ Update: Some other posts by Alex, Duncan, Eric and Jeff ]

Wednesday Jan 09, 2008

LDoms Solaris patches available since Solaris 10 8/07 (S10U4) was released

There have been a number of LDoms fixes backported to S10 since Solaris 10 8/07 (S10U4) was released[\*]. They are available in the S10 Sustaining KU, 127111 and can be applied to Solaris 10 8/07 (S10U4) or Solaris 10 11/06 (S10U3). In fact, instead of applying 124921-02 which was released at the same time as LDoms 1.0, it is recommended to use the latest revision of the 127111 patch with S10U3 as many more bug fixes are available via 127111 and 120011-14 (the S10U4 KU patch that 127111 depends on).

 

[ Updated 2008-02-07: Added  127111-08 ]

The fixes include improvements to vDisk support for rebooting control/service domains, guest networking performance enhancements as well as fixing the way LDoms/OBP variables are stored while rebooting a domain. In addition, a fix for the vsw issue that Narayan describes is included.


127111-08

6578761 System hangs in ds_cap_fini() and ds_cap_init()
6593231 Domain Services logging facility must manage memory better
6616313 cnex incorrectly generates interrupt cookies
6630945 vntsd runs out of file descriptor with very large domain counts


127111-05

6501039 rebooting multiple guests continuously causes a reboot thread to hang
6527622 Attempt to store boot command variable during a reboot can time out
6589682 IO-DOMAIN-RESET (Ontario-AA): kern_postprom panic on tavor-pcix configuration (reboot)
6605716 halting the system should not override auto-boot? on the next poweron

127111-04

6519849 vnet hot lock in vnet_m_tx affecting performance.
6530331 vsw when plumbed and in prog mode should write its mac address into HW
6531557 format(1m) does not work with virtual disks
6536262 vds occasionally sends out-of-order responses
6544946 Adding non existant disk device to single cpu domain causes hang
6566086 vdc needs an I/O timeout
6573657 vds type-conversion bug prevents raw disk accesses from working
6575216 Guests may lose access to disk services (VDS) if IO domain is rebooted
6578918 disk image should have a device id

[\*} LDoms improvements in Solaris 10 8/07 (S10U4) lists the LDoms features/fixes in S10U4 - those fixes can also be applied to S10U3 by applying the 120011-14 KU patch which obsoletes 124921-02 et. al.

 

 

Friday Sep 14, 2007

Presentation on LDoms at the Irish OpenSolaris User Group

 

I'm giving a presentation at the next meeting of the Irish OpenSolaris User Group on Sept 25th - some more logistical details on the meeting time/location/etc are available here. I plan to give an introduction to LDoms plus a quick overview of some of the upcoming features. 

Wednesday Sep 12, 2007

LDoms improvements in Solaris 10 8/07 (S10U4)

Now that Solaris 10 8/07 (known to most of us as S10U4) has been released, it's worth doing a recap of what LDoms features and bug fixes have been integrated into this release. It is also possible to patch Solaris 10 11/06 by applying the SPARC KU patch, 120011-14 and get the new LDoms functionality for UltraSPARC-T1 based machines.

The features integrated mainly involve adding support in the LDoms networking drivers for the Clearview project. This allows the vsw and vnet drivers to use the multiple unicast address support in the network adapters instead of putting the adapter into promiscuous mode (6447559 is the main bugid covering this). The bug fixes focus on improving stability in the control/service and guest domains as well as some usability fixes.

[Update: if you do plan to plumb the vsw in S10U4, see Narayan's post - guest networking could be broken if the vsw is not configured correctly]

The fixes listed below are in addition to the 30+ fixes available in 124921-02[1] which were all integrated into S10U4 when that patch was created back in March.
 

  • Networking
    • 6405380 LDoms vSwitch needs to be modified to support network interfaces
    • 6418780 vswitch needs to be able to process updates to its MD node
    • 6447559 vswitch should take advantage of multiple unicast address support
    • 6474949 vSwitch panics if mac_open of the underlying network device fails
    • 6492423 vSwitch multi-ring code hangs when queue thread not started
    • 6492705 vsw warning messages should identify device instance number
    • 6512604 handshake untimeout() race condition in vnet
    • 6517019 vgen_multicst does not handle kmem_zalloc failure
    • 6496374 vsw: "turnstile_block: unowned mutex" panic on a diskless-clients test bed
    • 6514591 vsw: fix for 6496374 causes softhang
    • 6523926 handshake restart can fail following reboot under certain conditions
    • 6523891 vsw needs to update lane state correctly for RDX pkts
    • 6556036 vswitch panics when trying to boot over vnet interface
  • Disk
    • 6520626 Assertion panic in vdc following primary domain reboot
    • 6527265 Hard hang in guest ldom on issuing the format command
    • 6534269 vdc incorrectly allocs mem handle for synchronous DKIOCFLUSHWRITECACHE calls
    • 6547651 fix for 6524333 badly impact performance when writing to a vdisk
    • 6524333 Service domain panics if it fails to map pages for a disk on file
    • 6530040 vds does not close underlying physical device or file properly
  • General
    • 6488115 reboot from guest via break hangs
    • 6495154 mdeg should not print a warning when the MD generation number does not change
    • 6520018 vntsd gets confused and immediately closes newly established console connections
    • 6505472 RC1 build: guest ldg(s) softhang during repeat boot
    • 6521890 recursive mutex_enter in ldc_set_cb_mode
    • 6528180 link state change is not handled under certain conditions in ldc
    • 6526280 Guest with 64 vdisk devices hangs during boot
    • 6528758 'ds_cap_send: invalid handle' message during LDom boot

LDoms bugs are not yet visible on the OpenSolaris bug query interface, http://bugs.opensolaris.org/ (but this is being worked on). [Update: LDoms bugs are now visible via bugs.opensolaris.org]

[1] 124921-02 was an LDoms patch made available for Solaris 10 11/06 [U3] prior to the release of S10U4

Friday Aug 24, 2007

Hello


I've been considering it for long enough so I think it's time to creep onto a quiet corner of the internet and start blogging. I've had the bones of this draft written for months but I might as well hit Publish and get on with it...

I'm a kernel engineer in the SPARC Platform Software group and am based in Ireland. I have been working in Sun since 1998 and I've spent much of that time working on adding Solaris support for various SPARC processors and servers. For the last couple of years I've been working on Logical Domains (LDoms) – virtualisation support for servers based on UltraSPARC CMT processors (LDoms basically allows you run multiple virtual machines on a single SPARC sun4v machine).

I've a few posts I like to share on LDoms so maybe, just maybe, I might get around to finding the time to write them up. I enjoy reading technical blogs so hopefully I can give something back on a topic I know something about.

About

I have been working in Sun/Oracle since 1998 and I've spent much of that time working on adding Solaris support for various SPARC processors and servers. For the last 6+ years I've been working on what is now known as Oracle VM Server for SPARC (previously called LDoms) – virtualisation support for servers based on UltraSPARC CMT processors.

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today