Monday Jun 25, 2012

Oracle VM server for SPARC 2.2 on S11

Oracle VM Server for SPARC 2.2 has been released for a little while now. The https://blogs.oracle.com/virtualization blog has an overview of all the 2.2 features. Initially, what was released was the SVR4 package for Solaris 10 (which is unbundled and wasn't constrained by any external schedule). On Solaris 11, the 'ldomsmanager' package is built into Solaris (and therefore doesn't need to be downloaded separately) so it is delivered as part of an S11 Support Repository Update (SRU). Some of the features in 2.2 are specific to S11 (SR-IOV and the ability to live migrate between machines with different CPU types) and so there have been many requests to know when are the S11 bits coming.

Solaris 11 SRU8.5
was released on Friday and this includes Oracle VM server for SPARC 2.2 so if you're already running an S11 SRU all you need do is a 'pkg update' to get the 2.2 bits.

If you're still running the original S11 and your 'pkg publisher' output shows the /release repository then you'll need to sign up for the /support repo by getting the appropriate keys and certificates to access the repository (requires a support contract). The 2.2 Admin Guide documents how to do this upgrade on S11

Two S11 articles which have some useful details on upgrading (not just 'ldomsmanager') via the support repositories are:
How to Update Oracle Solaris 11 Systems From Oracle Support Repositories by Glynn Foster

Tips for Updating Your Oracle Solaris 11 System from the Oracle Support Repository by Peter Dennis

In particular, if you'd like to stick with the v2.1 release when upgrading to SRU8.5 or greater, see the 'pkg freeze' section of Peter's article.

Wednesday Jun 08, 2011

Live Migration in Oracle VM Server for SPARC 2.1

You may have seen the press release that Oracle VM Server for SPARC 2.1 (a.k.a. LDoms) has just been released (you can find the download links here). There is a considerable list of new features (including Dynamic Resource Management, Virtual Device Service Validation and many more) but the key feature for me is Live Migration which allows for the migration of an active domain without any impact on applications and users shouldn't even notice that the guest domain is running on a new machine (OK so I would say that since I'm one of the migration developers...)


It has been possible to migrate an active domain since LDoms 1.1 (released in 2008) however up until now, the domain was suspended while the runtime state was copied from the source machine to the target which could result in an outage in the order of minutes if the domain had a large amount of memory (the suspend time was pretty much linearly proportional to the guest domain memory size). With Live Migration we transfer the memory contents as the domain keeps running while at the same time keeping track of the memory that is being modified. We iterate through the memory, transferring modified pages to the target system, until the amount of memory being modified is minimal. Then at the end we momentarily suspend the domain and copy the remaining memory and resume the domain on the target. This suspension can take less than a second but depending on the workload can take longer than this if the domain is rapidly modifying a lot of the memory pages.


One other migration performance enhancement in this release is that multiple network connections between the source and target machines are utilised (based on the number of virtual cpus in the control domains) which improves the throughput of the memory transfer and reduces the overall migration time.


I've found from running experiments that having 16 vcpus in the control domains makes a significant improvement over 8 vcpus (and up to 32 vpus will help more, beyond which there's no noticeable difference). The other best practice is to ensure that cryptographic units (a.k.a MAUs) are assigned to the control domains also as the memory contents are protected by SSL when being transferred over the network and offloading these operations to the T-series hardware makes a big difference.


The migration chapter in the Admin Guide has been revamped to discuss the enhancements.


 [Update: Jeff Savit wrote a great post about his experiences using Live Migration]

Wednesday Feb 03, 2010

LDoms 1.3 bits now on pkg.opensolaris.org

The LDoms 1.3 packages are now available in the OpenSolaris repositories.


To install or upgrade just type

pfexec pkg install ldomsmanager

If you previously had LDoms 1.2 installed from an IPS repository, you will then need to restart the LDoms Manager (any existing configs will work just fine).

svcadm restart ldmd

If you were previously running the SVR4 version of the SUNWldm package, you will need to uninstall this before running pkg install

Wednesday Jan 20, 2010

LDoms 1.3 has been released

It's now possible to download the LDoms 1.3 software. The LDoms 1.3 Release Notes list all the new features but some of the ones I particularly like include:


  • Greatly improved domain migration speeds
    • Following on from the speed-up in 1.2 due to using the hardware crypto devices, the migration code in 1.3 has been enhanced to compress the memory and use multiple threads to push the data over the network. The speed-up depends on the memory usage of the domain being migrated but Haik has mentioned seeing improvements of over 80%...

  • Support for link-based IPMP
    • The virtual network and virtual switch devices now support link status updates to the network stack. The LDoms 1.3 Admin Guide includes an example of how you'd configure it.

  • Crypto Dynamic Reconfiguration (DR)
    • You can now add and remove the hardware crypto units from domains without rebooting (just like CPUs and VIO devices). Which is great since ssh/scp can now use the crypto units as well as domain migration. In addition, because of this, you can now migrate domains with crypto units.

  • Support for non-interactive migrations
    • Adding a '-p {password file}' option to 'ldm migrate' removes the need to type in a password when doing migrations.

[ Update: Some other posts by Alex, Duncan, Eric and Jeff ]

Friday Oct 02, 2009

Switching between ALOM and ILOM shells

Here are the commands needed to switch between the ALOM and ILOM shells on the T5xx0 SPARC CMT servers. I'm saving this here because I can never easily find it when I go searching Google or docs.sun.com for it.

ALOM -> ILOM
sc> userclimode admin default
sc> logout

ILOM -> ALOM
-> set /SP/users/admin cli_mode=alom
-> exit

And usually when I want to do this, it's because I'm switching on Power Management

To set the policy property to elastic mode (enable PM):

-> set /SP/powermgmt policy=elastic

To set the policy property to performance mode (disable PM):
-> set /SP/powermgmt policy=performance

Wednesday Sep 09, 2009

LDoms 1.2 in OpenSolaris IPS repos

For anyone running OpenSolaris (either 2009.06 or the latest 2010.02 development builds), you'll be glad to know that I've pushed the LDoms 1.2 software to the /release and /dev repositories. You can get it by running:

pfexec pkg install ldomsmanager

If you were previously running LDoms 1.1 from the IPS repo, the new version will start running once you do a 'svcadm restart ldmd'.

If you were previously running LDoms 1.2 or earlier which was installed via pkgadd, then do a 'pkgrm SUNWldm' before the pkg install (your existing LDoms configs will be kept).

Friday Mar 27, 2009

LDoms Migration demo

I notice that Markus Weber has posted a  slick demo he produced which shows a warm migration of an LDom from one system to another while Solaris keeps on running in the domain. He gives a good overview of how domain migration works on LDoms as well as showing the command-line instructions.

Wednesday Jan 09, 2008

LDoms Solaris patches available since Solaris 10 8/07 (S10U4) was released

There have been a number of LDoms fixes backported to S10 since Solaris 10 8/07 (S10U4) was released[\*]. They are available in the S10 Sustaining KU, 127111 and can be applied to Solaris 10 8/07 (S10U4) or Solaris 10 11/06 (S10U3). In fact, instead of applying 124921-02 which was released at the same time as LDoms 1.0, it is recommended to use the latest revision of the 127111 patch with S10U3 as many more bug fixes are available via 127111 and 120011-14 (the S10U4 KU patch that 127111 depends on).

 

[ Updated 2008-02-07: Added  127111-08 ]

The fixes include improvements to vDisk support for rebooting control/service domains, guest networking performance enhancements as well as fixing the way LDoms/OBP variables are stored while rebooting a domain. In addition, a fix for the vsw issue that Narayan describes is included.


127111-08

6578761 System hangs in ds_cap_fini() and ds_cap_init()
6593231 Domain Services logging facility must manage memory better
6616313 cnex incorrectly generates interrupt cookies
6630945 vntsd runs out of file descriptor with very large domain counts


127111-05

6501039 rebooting multiple guests continuously causes a reboot thread to hang
6527622 Attempt to store boot command variable during a reboot can time out
6589682 IO-DOMAIN-RESET (Ontario-AA): kern_postprom panic on tavor-pcix configuration (reboot)
6605716 halting the system should not override auto-boot? on the next poweron

127111-04

6519849 vnet hot lock in vnet_m_tx affecting performance.
6530331 vsw when plumbed and in prog mode should write its mac address into HW
6531557 format(1m) does not work with virtual disks
6536262 vds occasionally sends out-of-order responses
6544946 Adding non existant disk device to single cpu domain causes hang
6566086 vdc needs an I/O timeout
6573657 vds type-conversion bug prevents raw disk accesses from working
6575216 Guests may lose access to disk services (VDS) if IO domain is rebooted
6578918 disk image should have a device id

[\*} LDoms improvements in Solaris 10 8/07 (S10U4) lists the LDoms features/fixes in S10U4 - those fixes can also be applied to S10U3 by applying the 120011-14 KU patch which obsoletes 124921-02 et. al.

 

 

Monday Oct 15, 2007

Presentation on LDoms at the London OpenSolaris User Group

I'm giving a presentation at the next meeting of the London OpenSolaris User Group on Wed Oct 17th at Sun's Customer Briefing Centre, 45 King William Street (map here) - refreshments are from 18:00 with the presentation starting at 18:45. I plan to give an introduction to LDoms, an recap of the just released LDoms 1.0.1 plus a quick overview of some of the upcoming features. 

Friday Sep 14, 2007

Presentation on LDoms at the Irish OpenSolaris User Group

 

I'm giving a presentation at the next meeting of the Irish OpenSolaris User Group on Sept 25th - some more logistical details on the meeting time/location/etc are available here. I plan to give an introduction to LDoms plus a quick overview of some of the upcoming features. 

Wednesday Sep 12, 2007

LDoms improvements in Solaris 10 8/07 (S10U4)

Now that Solaris 10 8/07 (known to most of us as S10U4) has been released, it's worth doing a recap of what LDoms features and bug fixes have been integrated into this release. It is also possible to patch Solaris 10 11/06 by applying the SPARC KU patch, 120011-14 and get the new LDoms functionality for UltraSPARC-T1 based machines.

The features integrated mainly involve adding support in the LDoms networking drivers for the Clearview project. This allows the vsw and vnet drivers to use the multiple unicast address support in the network adapters instead of putting the adapter into promiscuous mode (6447559 is the main bugid covering this). The bug fixes focus on improving stability in the control/service and guest domains as well as some usability fixes.

[Update: if you do plan to plumb the vsw in S10U4, see Narayan's post - guest networking could be broken if the vsw is not configured correctly]

The fixes listed below are in addition to the 30+ fixes available in 124921-02[1] which were all integrated into S10U4 when that patch was created back in March.
 

  • Networking
    • 6405380 LDoms vSwitch needs to be modified to support network interfaces
    • 6418780 vswitch needs to be able to process updates to its MD node
    • 6447559 vswitch should take advantage of multiple unicast address support
    • 6474949 vSwitch panics if mac_open of the underlying network device fails
    • 6492423 vSwitch multi-ring code hangs when queue thread not started
    • 6492705 vsw warning messages should identify device instance number
    • 6512604 handshake untimeout() race condition in vnet
    • 6517019 vgen_multicst does not handle kmem_zalloc failure
    • 6496374 vsw: "turnstile_block: unowned mutex" panic on a diskless-clients test bed
    • 6514591 vsw: fix for 6496374 causes softhang
    • 6523926 handshake restart can fail following reboot under certain conditions
    • 6523891 vsw needs to update lane state correctly for RDX pkts
    • 6556036 vswitch panics when trying to boot over vnet interface
  • Disk
    • 6520626 Assertion panic in vdc following primary domain reboot
    • 6527265 Hard hang in guest ldom on issuing the format command
    • 6534269 vdc incorrectly allocs mem handle for synchronous DKIOCFLUSHWRITECACHE calls
    • 6547651 fix for 6524333 badly impact performance when writing to a vdisk
    • 6524333 Service domain panics if it fails to map pages for a disk on file
    • 6530040 vds does not close underlying physical device or file properly
  • General
    • 6488115 reboot from guest via break hangs
    • 6495154 mdeg should not print a warning when the MD generation number does not change
    • 6520018 vntsd gets confused and immediately closes newly established console connections
    • 6505472 RC1 build: guest ldg(s) softhang during repeat boot
    • 6521890 recursive mutex_enter in ldc_set_cb_mode
    • 6528180 link state change is not handled under certain conditions in ldc
    • 6526280 Guest with 64 vdisk devices hangs during boot
    • 6528758 'ds_cap_send: invalid handle' message during LDom boot

LDoms bugs are not yet visible on the OpenSolaris bug query interface, http://bugs.opensolaris.org/ (but this is being worked on). [Update: LDoms bugs are now visible via bugs.opensolaris.org]

[1] 124921-02 was an LDoms patch made available for Solaris 10 11/06 [U3] prior to the release of S10U4

Friday Aug 24, 2007

Hello


I've been considering it for long enough so I think it's time to creep onto a quiet corner of the internet and start blogging. I've had the bones of this draft written for months but I might as well hit Publish and get on with it...

I'm a kernel engineer in the SPARC Platform Software group and am based in Ireland. I have been working in Sun since 1998 and I've spent much of that time working on adding Solaris support for various SPARC processors and servers. For the last couple of years I've been working on Logical Domains (LDoms) – virtualisation support for servers based on UltraSPARC CMT processors (LDoms basically allows you run multiple virtual machines on a single SPARC sun4v machine).

I've a few posts I like to share on LDoms so maybe, just maybe, I might get around to finding the time to write them up. I enjoy reading technical blogs so hopefully I can give something back on a topic I know something about.

About

I have been working in Sun/Oracle since 1998 and I've spent much of that time working on adding Solaris support for various SPARC processors and servers. For the last 6+ years I've been working on what is now known as Oracle VM Server for SPARC (previously called LDoms) – virtualisation support for servers based on UltraSPARC CMT processors.

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today