Jeff Victor's Blog

Recent Posts


Encryption Everywhere

IntroductionUntil recently you chose between security and performance. Now you can have both, all the time. Encryption everywhere.Generally speaking, data resides in three types of locations: long-term storage (e.g. disks), short-term memory (RAM), and in flight, on the network. Encrypting your data in all three types of locations minimizes the chances that your data can be misused even if it is stolen.Many data protection mechanisms focus on preventing thieves from getting access to your data. Preventing access is a good goal, and worthy of the effort. However, recent history has shown us that eventually, the "bad guys" will get through these defenses. One straightforward method to protect your data from misuse is encryption. Even if villains can make a copy of the encrypted data, they can't use it.A primary goal of data thieves is copying data. They can use this data directly, or they can sell it. But they can only do so if the data is readable. They cannot use encrypted data.Many people use data encryption to protect data in transit over the Web. This was the first area of data encryption, and that's not surprising. As your data travels the Internet, it passes through many pieces of network equipment. A determined attacker may be able to gain control of such equipment and copy your data as it passes by.Unencrypted data could be used for nefarious purposes. To prevent this, developers of web browsers and web server software added the ability to communicate via encrypted data, protecting your information while in the Internet. This is already used uniformly to protect financial purchases including online shopping.Of course, every computer operation has a price. With encryption, the price has been reduced performance: software algorithms that encrypt data slow down data processing. So for years, encryption was only performed when it was judged "necessary."Recent advances in Oracle SPARC processors have addressed this, reducing the performance penalty to an almost negligible amount. This opens up other possibilities: regular encryption of storage, and even memory. And so, with SPARC CPUs, it is no longer necessary to choose between data protection and industry-leading performance. You can have both with current-generation SPARC servers.Today I'll focus specifically on protection of your data from misuse after theft, through the use of encryption. This blog entry discusses the features that you can use to protect your data, using SPARC servers based on the SPARC M7 and SPARC S7 processors, Oracle Solaris, and Oracle Database software.BackgroundI'll use the phrase "encryption domain" to mean any location or path of data that has been encrypted. For example, if you encrypt a letter you have written to someone, and mail the encrypted version to someone who can decrypt it, its "encryption domain" is the piece of paper with encrypted text. During its whole voyage, its content is protected from anyone who might see it. If you copy the encrypted text without decrypting it, the copy is also in the encryption domain of the message. A decrypted copy would, of course, not be in that message's encryption domain.This is similar in some ways to airport security. After you have passed through security screening,you are in a secure area, often called "airside." This includes parts of the original airport, the airplane, runways, etc. You can extend this to include the airplane while in flight, and the secure areas of the destination airport. You are in this secure area from the moment you pass through the screening area of your originating airport until you walk past the screening area of your destination. All of the other people in these spaces have also been "secured." This is your "security domain." (See Figure 1.)Encryption EverywhereThe rest of this blog entry portrays the multiple locations where encryption is possible. To distinguish them, I use a diagram that becomes progressively more secure by using encryption in more computer components. I will also name features that you can use for this purpose, in the description that accompanies the diagrams.First let's look at the whole path of data, from a user, through the Web and associated network equipment, a web server, an application server, a database server, and finally to persistent storage. Figure 2 depicts this path as a logical diagram, which you can click to see a larger version. As the data flows from user to database, it passes through the user's PC, a network adapter (labeled "Net" in the diagram), and so on. Note the labels "Swap Disk" off to the side. They represent the potential for data to be swapped out to disk if the server has insufficient RAM.All of these shapes are highlighted in red to indicate that their data is not encrypted.Protecting Data in MotionThe first use of encryption in this diagram will be SSL: theSecure Sockets Layer.This encryption, shown in Figure 3, protects data from people who have control of Internet infrastructure equipment such as network routers. After the user enters some text such as a password, the user's PC encrypts the data before sending it over the web. This is indicated by green rectangles: a green Firewall shows that the data is encrypted while it flows through the firewall. The data is not decrypted until it arrives at the web server. This type of encryption has been common for a few years: you are using it whenever your web browser begins a URL with "https:".Most common operating systems, including Oracle Solaris, and web server software packages, such as Apache, include SSL features.Less common, but also shown in Figure 3, is the use of encryption between the web and application tiers, and between the app and database tiers. These are less common because their traffic does not typically flow over the public Internt, and for the reason mentioned early in this article: traditionally encryption has slowed performance too much. But with current generation SPARC servers, there is no longer a reason to not encrypt this traffic.Protecting Data at RestFigure 4 adds encrypted storage to protect the "data at rest." Two effective methods that protect storage are Oracle TDE(Transparent Data Encryption) andZFS. So whether you store your databases in flat files - on ZFS - or in ASM, you can protect your data from prying eyes while it sits on storage.Solaris 11 added encryption to ZFS. You can easily create an encrypted file system with the following command:# zfs create -o encryption=on tank/home/jeffvOracle SPARC processors offer excellent encryption performance.A comparison performed a year ago demonstrated negligible performance overhead for encryption on SPARC CPUs, but a significant decrease in performance on x86 CPUs.Encryption has become common for network traffic, because the value of data protection is large, and the additional processing needed to implement encryption takes much less time than the transmission and receipt of a network packet. Encryption processing has little effect on network performance.Similarly, encryption is slowly becoming common for persistent storage. The effect on overall performance can be larger than network performance, depending on the type of storage and the implementation of encryption algorithms in the CPU.Memory accesses, on the other hand, have traditionally been far faster than encryption operations, and the implementation of encryption of data while in memory has lagged behind. However, this is changing.Oracle TDE can enrypt data, either per column, or for a whole table, while it is in RAM. Encryption of a tablespace occurs at the block layer: a block is encrypted and written to disk, and the block is decrypted when it's read back in.Column encryption and decryption occurs in PGA user session memory, which means that the data is encrypted while it is in the SGA, as shown in Figure 5.Benchmark engineers have found that TDE encryption ismuch faster on current SPARC CPUs than current x86 CPUs.The performance overhead when using a SPARC computer was about 2%, while the x86 system suffered 25% overhead.Finally, Figure 6 shows an under-appreciated feature: the ability of Oracle Solaris toencrypt data before it is saved in the swap disk. Without this protection, sensitive data that has been "temporarily" paged out could be copied by a successful attacker. Further, data that has been paged out is not erased when it is no longer needed. The data may remain on deallocated disk blocks until further paging activity overwrites it.ConclusionYou can now encrypt all of your data, whether it's in motion or at rest. If you use Oracle Solaris on SPARC computers, you can do so without sacrificing performance.

Introduction Until recently you chose between security and performance. Now you can have both, all the time. Encryption everywhere. Generally speaking, data resides in three types of...


DBaaS Introduction

Oracle and CloudOracle is the leading supplier of hybrid cloud solutions, enabling you to use Oracle's public cloud, or to use our private cloud technologies in your own data center, or to use them together. The last option is possible because the same technology is used in our public and private cloud solutions.In some situations, the right hosting environment for cloud-provisioned database environments - Database as a Service (DBaaS) - is a public cloud. In other situations, it's important or more cost effective to host your own private database cloud. This blog entry is intended for those situations.Cloud CapabilitiesBeyond mere IaaS, followed by manual DB provisioning, Oracle DBaaS solutions deliver a service catalog with database-specific technologies, features and optimizations as capabilities of catalog elements. For example, in addition to simplistic compute resources, the "gold" service level might include automated auditing and compliance reporting, and data protection with DataGuard and database cloning. The "platinum" service might add RAC configuration for a zero downtime solution. Further, usage tracking and quotas can be used for any catalog element.This greatly reduces the time to deploy an environment, whether it's intended for development, test, or production. It also reduces the potential for human error, because the use of those service catalog elements has already been tested. Further, the service catalog may be tailored to meet the specific needs of your corporation or industry.Together, these capabilities deliver fast, easy database provisioning while maintaining the highest levels of security. Access to specific catalog elements may be limited to certain groups of self-service users. Beneath the user interface,Oracle Solarisand Database deliver a vast array of security features, from end-to-end, zero-overhead encryption to role-based access control.Integrated ComponentsOracle offers the ability to implement DBaaS using Solaris andSPARC technologies. Both use Oracle Enterprise ManagerCloud Controland Oracle Database 11g and 12c. When superior uptime is needed,Oracle RACcan be used. Oracle Solaris 11 plays a central role, connecting the user interface, databases, and virtual machines to hardware: CPUs, networking, and storage.Oracle has illuminated a clear path to DBaaS using either theOracle SuperClusterproduct or one of the Oracle Optimized Solutions:SecureEnterprise Cloud Infrastructure (usually referenced as ECI). Both of these approaches lead to a similar user experience, but differ in flexibility, storage, performance characteristics, and a few other factors.At the top of the stack is the user interface, part of OEM Cloud Control. This software integrates self-service provisioning, database cloning, the service catalog, quotas and policies, and metering and chargeback.Architectural OptimizationsIn these solutions, multi-layer defenses protect against even a concerted attack. Network traffic between application and database tiers can easily utilize the Oracle Solaris 11 encryption framework. This automatically uses the encryption features of SPARC CPUs, which offer the most complete set of encryption features integrated into CPU cores, for the best encryption performance in the industry. Oracle 11g and 12cTransparent Data Encryption(TDE) also automatically use the Solaris 11 encryption framework, protecting data stored in RAM, flash memory, or disk drives. All of these remove the need to choose between security and high performance. Superior per-core database performance, compared to x86 CPUs, means that you can now have the best database performance and encrypt everything, everywhere.One pitfall of many high-scale cloud solutions is the use of inefficient virtualization, which limits the quantity of databases that run efficiently in a group of computers. The Oracle DBaaS solutions described here offer extreme scalability, by using one or both of these virtualization technologies: Oracle VM Server for SPARC and Oracle Solaris Zones. Solaris Zones are a zero-overhead software isolation technology that has been used in data centers for over a decade. "OVM SPARC" is a partitioning and virtualization technology supporting flexible configurations that can also achieve zero performance overhead. The low overhead yields a highly scalable infrastructure, enabling hundreds to thousands of virtual environments per compute rack.Deployment ProcessImplementing a private cloud from off-the-shelf components can be a frustrating experience: finding compatible versions of software, firmware, and hardware, testing never-before-tested configurations, and researching obscure tuning permutations, seeking the best results.Oracle offers pre-tested and, optionally, pre-integrated private DBaaS clouds using either SECI or SuperCluster. Oracle Advanced Customer Services staff have experience building clouds in customer data centers, including the design of business and technical catalogs, implementing the catalogs, and migrating data from old environments. If you prefer, implementation guides are available to help your staff through the process of creating a DBaaS cloud.Database CloningDevelopment and test environments for databases have evolved over time, and can be highly complex. That complexity can cause delays, mask undetected problems, and limit the ability to satisfy security and compliance policies.Implementing a DBaaS private cloud is an opportunity to simplify the processes that development and test staff use. OEM Cloud Control service catalog elements can use Oracle Database cloning features such asSnap Clone to automate instance provisioning for the self-service user.Data ProtectionBesides the data protection offered by encryption, service catalog elements can be configured with standard tools that improve availability. Backup and restore technologies such asRMAN (Oracle Reovery Mangaer),Oracle Data Guard and Active Data Guard,and backup/archive products such as theZero Data Loss Recovery Appliance (ZDLRA),can be used in service catalog elements to ensure availability of private cloud databases, implementing failover and disaster recovery.SummaryOnly Oracle can offer a complete, secure DBaaS cloud experience from components that were designed to work together from the CPU up through the database.

Oracle and Cloud Oracle is the leading supplier of hybrid cloud solutions, enabling you to use Oracle's public cloud, or to use our private cloud technologies in your own data center, or to use them...


SPARC M7 Arrives, Breaks Records

In case you were on vacation earlier this week :-)Oracle announceda new family ofSPARC servers,all based on our newSPARC M7 CPU chip.This blog entry is simply a list of the 20 world record performance benchmarksattained so far, compiled by my teammate Pavel Anni. Thanks Pavel!There are so many I decided to group them into categories:I/O microbenchmarksVirtualized Network Performance: SPARC T7-1 https://blogs.oracle.com/BestPerf/entry/201510125_netperf_t7_1Virtualized Storage: SPARC T7-1 Performance https://blogs.oracle.com/BestPerf/entry/20151025_storvirt_t7_1EncryptionZFS Encryption: SPARC T7-1 Performance https://blogs.oracle.com/BestPerf/entry/20151025_zfscrypto_t7_1SHA Digest Encryption: SPARC T7-2 Beats x86 E5 v3 https://blogs.oracle.com/BestPerf/entry/20151025_shadigest_t7_2AES Encryption: SPARC T7-2 Beats x86 E5 v3 https://blogs.oracle.com/BestPerf/entry/20151025_aes_t7_2Hadoop TeraSort: SPARC T7-4 Top Per-Chip Performance https://blogs.oracle.com/BestPerf/entry/20151025_terasort_t7_4Floating PointSPARC T7-4 Delivers 4-Chip World Record for SPEC OMP2012 https://blogs.oracle.com/BestPerf/entry/201510_specomp2012_t7_4CPU Clockrate and Memory Bandwidth measurementSPARC T7-1 Delivers 1-Chip World Records for SPEC CPU2006 Rate Benchmarks https://blogs.oracle.com/BestPerf/entry/201510_specpu2006_t7_1Memory and Bisection Bandwidth: SPARC T7 and M7 Servers Faster Than x86 and POWER8 https://blogs.oracle.com/BestPerf/entry/20151025_stream_sparcm7Virtualization BenchmarksLive Migration: SPARC T7-2 Oracle VM Server for SPARC Performance https://blogs.oracle.com/BestPerf/entry/20151025_livemig_t7_2SPECvirt_sc2013: SPARC T7-2 World Record for 2 and 4 Chip Systems https://blogs.oracle.com/BestPerf/entry/20151025_specvirt_t7_2DatabaseIn-Memory Aggregation: SPARC T7-2 Beats 4-Chip x86 E7 v2 https://blogs.oracle.com/BestPerf/entry/20151025_imdbagg_t7_2In-Memory Database: SPARC T7-1 Faster Than x86 E5 v3 https://blogs.oracle.com/BestPerf/entry/20151025_imdb_t7_1Real-Time Enterprise: SPARC T7-1 Faster Than x86 E5 v3 https://blogs.oracle.com/BestPerf/entry/20151025_rte_t7_1Single and Multi-Tier Business WorkloadsSPECjEnterprise2010: SPARC T7-1 World Record with Single Application Server Using 1 to 4 Chips https://blogs.oracle.com/BestPerf/entry/20151025_jent_t7_1Oracle E-Business Suite Applications R12.1.3 (OLTP X-Large): SPARC M7-8 World Record https://blogs.oracle.com/BestPerf/entry/20151025_peoplesoft_hrpay_m7_8PeopleSoft Enterprise Financials 9.2: SPARC T7-2 World Record https://blogs.oracle.com/BestPerf/entry/20151025_peoplesoft_fms_t7_2Oracle E-Business Order-To-Cash Batch Large: SPARC T7-1 World Record https://blogs.oracle.com/BestPerf/entry/20151015_ebiz_otc_sparc_t7Oracle E-Business Payroll Batch Extra-Large: SPARC T7-1 World Record https://blogs.oracle.com/BestPerf/entry/2015_ebiz_batch_t7_1Oracle Communications ASAP - Telco Subscriber Activation: SPARC T7-2 World Record https://blogs.oracle.com/BestPerf/entry/20151025_asap_t7_2PeopleSoft Human Capital Management 9.1 FP2: SPARC M7-8 World Record https://blogs.oracle.com/BestPerf/entry/20151025_peop_m7_8Oracle FLEXCUBE Universal Banking: SPARC T7-1 World Record https://blogs.oracle.com/BestPerf/entry/20151025_flexcube_t7_1SAP Two-Tier Standard Sales and Distribution SD Benchmark: SPARC T7-2 World Record 2 Processors https://blogs.oracle.com/BestPerf/entry/20151025_sap_sd2tier_t7_2Application-specific benchmarksOracle Stream Explorer DDOS Attack Detector: SPARC T7-4 World Record https://blogs.oracle.com/BestPerf/entry/20151025_ose_t7_4Oracle Internet Directory: SPARC T7-2 World Record https://blogs.oracle.com/BestPerf/entry/20151025_oid_t7_2Neural Network Models Using Oracle R Enterprise: SPARC T7-4 Beats 4-Chip x86 E7 v3 https://blogs.oracle.com/BestPerf/entry/20151025_neuralnet_t7_4Graph Breadth First Search Algorithm: SPARC T7-4 Beats 4-Chip x86 E7 v2 https://blogs.oracle.com/BestPerf/entry/20151025_graphbfs_t7_4Graph PageRank: SPARC M7-8 Beats x86 E5 v3 Per Chip https://blogs.oracle.com/BestPerf/entry/20151025_graphpagerank_m7_8Yahoo Cloud Serving Benchmark: SPARC T7-4 With Oracle NoSQL Beats x86 E5 v3 Per Chip https://blogs.oracle.com/BestPerf/entry/20151025_nosql_ycsb_t7_4

In case you were on vacation earlier this week :-) Oracle announced a new family ofSPARC servers, all based on our new SPARC M7 CPU chip.This blog entry is simply a list of the 20 world record...

Solaris 11

Maintaining Configuration Files in Solaris 11.2

IntroductionHave you usedSolaris11 and wondered how to maintain customized systemconfiguration files? In the past, and on other Unix/Linux systems, maintainingthese configuration files was fraught with peril: extra bolt-on tools areneeded to track changes, verify that inappropriate changes were not made, andfix them when something broke them.A combination of features added to Solaris 10 and 11 address those problems. Thisblog entry describes the current state of related features, and demonstratesthe method that was designed and implemented to automatically deploy and trackchanges to configuration files, verify consistency, and fix configuration filesthat "broke." Further, these new features are tightly integrated with the SolarisService Management Facility introduced in Solaris 10 and the packaging systemintroduced in Solaris 11.BackgroundSolaris 10 added the ServiceManagement Facility, which significantly improved onthe old, unreliable pile of scripts in /etc/rc#.d directories. This also allowedus to move from the old model of system configuration information stored in ASCIIfiles to a database of configuration information. The latter change reduces the riskassociated with manual or automated modifications of text files. Each modification isthe result of a commandthat verifies the correctness of the change before applying it. That verificationprocess greatly reduces the opportunities for a mistake that can be very difficultto troubleshoot.During updates to Solaris 10 and 11 we continued to move configuration files intoSMF service properties. However, there are still configuration files, and we wantedto provide better integration between theSolaris 11 packaging facility (IPS), andthose remaining configuration files. This blog entry demonstrates some of thatintegration, using features added up through Solaris 11.1.Many Solaris systems need customized email delivery rules. In the past, providingthose rules required replacing /etc/mail/sendmail.cf with a custom file. However,this created the need to maintain that file - restoring it after a system udpate,verifying its integrity periodically, and potentially fixing it if someone orsomething broke it.MethodIPS provides the tools to accomplish those goals, specifically:maintain one or more versions of a configuration file in an IPS repositoryuse IPS and AI (Automated Installer) to install, update, verify,and potentially fix that configuration fileautomatically perform the steps necessary to re-configure the system with aconfiguration file that has just been installed or updated.The rest of this assumes that you understand Solaris 11 and IPS.In this example, we want to deliver a custom sendmail.cf file to multiple systems.We will do that by creating a new IPS package that contains just one configuration file.We need to create the "precursor" to a sendmail.cf file, (sendmail.mc) that will beexpanded by sendmail when it starts. We also need to create a custom manifest for thepackage. Finally, we must create an SMF service profile, which will cause Solaristo understand that a new sendmail configuration is available and should be integratedinto its database of configuration information.Here are the steps in more detail. Create a directory ("mypkgdir") that will hold the package manifest and adirectory ("contents") for package contents.$ mkdir -p mypkgdir/contents$ cd mypkgdirThen create the configuration file that you want to deploy with this package. For this example, wesimply copy an existing configuration file.$ cp /etc/mail/cf/cf/sendmail.mc contents/custom_sm.mcCreate a manifest file in mypkgdir/sendmail-config.p5m: (the entity that owns the computers is thefictional corporation Consolidated Widgets, Inc.)set name=pkg.fmri value=pkg://cwi/site/sendmail-config@8.14.9,1.0set name=com.cwi.info.name value=Solaris11sendmailset name=pkg.description value="ConWid sendmail.mc file for Solaris 11, accepts only local connections."set name=com.cwi.info.description value="Sendmail configuration"set name=pkg.summary value="Sendmail configuration"set name=variant.opensolaris.zone value=global value=nonglobalset name=com.cwi.info.version value=8.14.9set name=info.classification value=org.opensolaris.category.2008:System/Coreset name=org.opensolaris.smf.fmri value=svc:/network/smtp:sendmaildepend fmri=pkg://solaris/service/network/smtp/sendmail type=requirefile custom_sm.mc group=mail mode=0444 owner=root \ path=etc/mail/cf/cf/custom_sm.mcfile custom_sm_mc.xml group=mail mode=0444 owner=root \ path=lib/svc/manifest/site/custom_sm_mc.xml \ restart_fmri=svc:/system/manifest-import:default \ refresh_fmri=svc:/network/smtp:sendmail \ restart_fmri=svc:/network/smtp:sendmailThe "depend" line tells IPS that the package smtp/sendmail must already be installed on this system. If it isn't, Solaris will install that package before proceeding to install this package.The line beginning "file custom_sm.mc" gives IPS detailed metadata about the configuration file, and indicates the full pathname - within an image - at which the macro should be stored.The last line specifies the local file name of of the service profile (more on that later), and the location to store it during package installation. It also lists three actuators: SMF services to refresh (re-configure) or restart atthe end of package installation. The first of those imports new manifests and serviceprofiles. Importing the service profile changes the property path_to_sendmail_mc.The other two re-configure and restart sendmail. Those two actions expand and then usethe new configuration file - the goal of this entire exercise! Create a service profile:$ svcbundle -o contents/custom_sm_mc.xml -s bundle-type=profile \ -s service-name=network/smtp -s instance-name=sendmail -s enabled=true \ -s instance-property=config:path_to_sendmail_mc:astring:/etc/mail/cf/cf/custom_sm.mcThat command creates the file custom_sm_mc.xml, which describes the profile.The sole profile of that profile is to set the sendmail service property "config/path_to_sendmail_mc" to the name of the new sendmail macro file. Verify correctness of the manifest. In this example, the Solaris repository is mounted at /mnt/repo1. For most systems, "-r" will be followed by the repository's URI, e.g. http://pkg.oracle.com/solaris/release/ or a data center's repository.$ pkglint -c /tmp/pkgcache -r /mnt/repo1 sendmail-config.p5mLint engine setup...Starting lint run...$As usual, the lack of output indicates success. Create the package, make it available in a repo to a test IPS client.Note: The documentationexplainsthese steps in more detail.Note: this example stores a repo in /var/tmp/cwirepo. This will work, but I am not suggestingthat you place repositories in /var/tmp. You should a repo in a directory that is publicly available.$ pkgrepo create /var/tmp/cwirepo$ pkgrepo -s /var/tmp/cwirepo set publisher/prefix=cwi$ pkgsend -s /var/tmp/cwirepo publish -d contents sendmail-config.p5mpkg://cwi/site/sendmail-config@8.14.9,1.0:20150305T163445ZPUBLISHED$ pkgrepo verify -s /var/tmp/cwirepoInitiating repository verification.$ pkgrepo info -s /var/tmp/cwirepoPUBLISHER PACKAGES STATUS UPDATEDcwi 1 online 2015-03-05T16:39:13.906678Z$ pkgrepo list -s /var/tmp/cwirepoPUBLISHER NAME O VERSIONcwi site/sendmail-config 8.14.9,1.0:20150305T163913Z$ pkg list -afv -g /var/tmp/cwirepoFMRI IFOpkg://cwi/site/sendmail-config@8.14.9,1.0:20150305T163913Z ---With all of that, you can use the usual IPS packaging commands. I tested this by adding the "cwi" publisher to a running native Solaris Zone and making the repo available as a loopback mount:# zlogin testzone mkdir /var/tmp/cwirepo# zonecfg -rz testzonezonecfg:testzone> add fszonecfg:testzone:fs> set dir=/var/tmp/cwirepozonecfg:testzone:fs> set special=/var/tmp/cwirepozonecfg:testzone:fs> set type=lofszonecfg:testzone:fs> endzonecfg:testzone> commitzone 'testzone': Checking: Mounting fs dir=/var/tmp/cwirepozone 'testzone': Applying the changeszonecfg:testzone> exit# zlogin testzoneroot@testzone:~# pkg set-publisher -g /var/tmp/cwirepo cwiroot@testzone:~# pkg info -r sendmail-config Name: site/sendmail-config Summary: Sendmail configuration Description: ConWid sendmail.mc file for Solaris 11, accepts only local connections. Category: System/Core State: Not installed Publisher: cwi Version: 8.14.9 Build Release: 1.0 Branch: NonePackaging Date: March 5, 2015 08:14:22 PM Size: 1.59 kB FMRI: pkg://cwi/site/sendmail-config@8.14.9,1.0:20150305T201422Zroot@testzone:~# pkg install site/sendmail-config Packages to install: 1 Services to change: 2 Create boot environment: NoCreate backup boot environment: NoDOWNLOAD PKGS FILES XFER (MB) SPEEDCompleted 1/1 2/2 0.0/0.0 0B/sPHASE ITEMSInstalling new actions 12/12Updating package state database DoneUpdating package cache 0/0Updating image state DoneCreating fast lookup database DoneUpdating package cache 2/2root@testzone:~# pkg verify site/sendmail-configroot@testzone:~#Installation of that package causes several effects. Obviously, the custom sendmail configuration filecustom_sm.mc is placed into the directory /etc/mail/sendmail/cf/cf. The sendmail daemon is restarted,automatically expanding that file into a sendmail.cf file and using it. I have noticed that on occasion, it is necessary to refresh and restart the sendmail service.ConclusionThe result of all of that is an easily maintained configuration file. These concepts can be used withother configuration files, and can be extended to more complex sets of configuration files.For more information, see these documents:SMF Best Practices and TroubleshootingTools for Package Self-AssemblySpecifying System Changes on Package ActionsAcknolwedgementsI appreciate the assistance of Dave Miner, John Beck, and Scott Dickson, who helped me understand the details of these features. However, I am responsible for any errors.

Introduction Have you used Solaris 11 and wondered how to maintain customized system configuration files? In the past, and on other Unix/Linux systems, maintainingthese configuration files was fraught...

Solaris 11

Provisioning Solaris 11

Oracle Solaris 11 introduced the Image Package System, a new software packaging in which Solaris software components are delivered. It replaces the System V Release 4 ("SVR4") packaging system used by Solaris 2.0 through Solaris 10. If you are learning Solaris 11, learning about IPS is a must! The links below will take you to the documents, videos, blog entries, and other artifacts that I think are the most important ones to begin.Blog Entries, ScreencastsScreencast: Introducing the Solaris Image Packaging SystemVideo: Using the Solaris Image Packaging System (IPS) to Install and Update SoftwareInstant Automated Installer Zone for Oracle Solaris 11.1Interactive manifest editing with the Automated Installer Manifest WizardMirroring IPS repositoriesDownloadableOracle Solaris Zone with Automated Installer and RepositoryHow-to Guides, Other PapersUnderstanding Oracle Solaris 11 Package VersioningGetting Started with the Solaris 11 Automated InstallerBasic Operations with the Automated InstallerMuch more!Official DocumentationCreating Solaris 11 Boot EnvironmentsChoices for Installing Solaris 11 in the Data CenterCreating IPS RepositoriesCreating a Customer Solaris 11.2 Installation Image with Distro ConstructorConverting from JumpStart to Solaris 11 Automated Installer

Oracle Solaris 11 introduced the Image Package System, a new software packaging in which Solaris software components are delivered. It replaces the System V Release 4 ("SVR4") packaging system used by...

Solaris 10 Containers

Details of Solaris Zones as Hard Partitions

Oracle offers the ability to license much of its software, including the Oracle Database, based on the quantity of CPUs that will run the software. When performance goals can be met with only a subset of the computer's CPUs, it may be appropriate to limit licensing costs by using a processor-based licensing metric and using one of the hardware partitioning technologies included with the computer. In general, computers that use the Oracle Solaris OS can choose from a variety of resource management features, including CPU, memory, and I/O. Because of the high level of integration between the plethora of Solaris features, using resource management does not mean avoiding other features. In particular, the use of resources by workloads running in Solaris Zones can be constrained. The document Hard Partitioning With Oracle Solaris Zones explains the different Solaris features that can be used to limit software licensing costs when a processor-based metric is used. It also demonstrates the use of those features. The approved methods include the ability to limit a Solaris Zone to a specific quantity of CPUs, or the ability to limit a set of Solaris Zones to a specific quantity of shared CPUs.

Oracle offers the ability to license much of its software, including the Oracle Database, based on the quantity of CPUs that will run the software. When performance goals can be met with only a subset...

Solaris 11

Migratory Solaris Kernel Zones

The IntroductionOracle Solaris 11.2 introduced Oracle Solaris Kernel Zones. Kernel Zones (KZs)offer a midpoint between traditional operating system virtualizationand virtual machines. They exhibit the low overhead and low managementeffort of Solaris Zones, and add the best parts of the independence ofvirtual machines.A Kernel Zone is a type of Solaris Zone that runs its own Solaris kernel. This gives each Kernel Zone complete independence of software packages, as well as other benefits.One of the more interesting new abilities that Kernel Zones bring toSolaris Zones is the ability to "pause" a running KZ and, "resume" iton a different computer - or the same computer, if you prefer.Of what value is the ability to "pause" a zone? One potential use is moving a workload from a smaller computer (with too few CPUs, or insufficient RAM) to a larger one. Some workloads do not maintain much state, and can restart quickly, and so they wouldn't benefit from suspend/resume. Others, such as static (read-only) databases, may take 30 minutes to start and obtain good performance. The ability to suspend, but not stop, the workload and its operating system can be very valuable.Another possible use of this ability is the staging of multiple KZs which have already booted and, perhaps, have started to run a workload. Instead of booting in a few minutes, the workload can continue from a known state in just a few seconds. Further, the suspended zone can be "unpaused" on the computer of your choice. Suspended kernel zones are like a nest of dozing ants, waiting to take action at a moment's notice.This blog entry shows the steps to create and move a KZ, highlighting boththe Solaris iSCSI implementation as well as kernel zones and their suspend/resume feature.Briefly, the steps are:Create shared storageMake shared storage available to both computers - the one that will run the zone, at first, as well as the computer on which the zone will be resumed.Configure the zone on each system.Install the zone on one system."Warm migrate" the zone by pausing it, and then, on the other computer, resuming it.Links to relevant documentation and blogs are provided at the bottom.The MethodThe Kernel Zones suspend/resume feature requires the use of storage accessible by multiple computers. However, neither Kernel Zones nor suspend/resume requires a specific type of shared storage. In Solaris 11.2 the only types of shared storage that supports zones are iSCSI and Fiber Channel. This blog entry uses iSCSI.The example below uses three computers. One is the iSCSI target, i.e. the storage server. The other two run the KZ, one at a time. All three systems run Solaris 11.2, although the iSCSI features below work on early updates to Solaris 11, or a ZFS Storage Appliance (the current family shares the brand name ZS3), or another type of iSCSI target.In the commands shown below, the prompt "storage1#" indicates commands that would be entered into the iSCSI target. Similarly, "node1#" indicates commands that you would enter into the first computer that will run the kernel zone. The few commands preceded by the prompt "bothnodes#" must be run on the both node1 and node2. The name of the kernel zone is "ant1".For simplicity, the example below ignores security concerns. (More about security below.)Finally, note that these commands should be run by a non-root user who prefaces each command with the pseudo-command "sudo". ;-)Step 1. Provide shared storage for the kernel zone. The zone only needs one device for its zpool. Redundancy is provided by the zpool in the iSCSI target.(For a more detailed explanation, see the link to the COMSTAR documentation in the section "The Links" below.)storage1# pkg install group/feature/storage-server # Install necessary software.storage1# svcadm enable stmf:default # Enable that software.storage1# zfs create rpool/zvols # A dataset for the zvol.storage1# zfs create -V 20g rpool/zvols/ant1 # Create a zvol as backing store.storage1# stmfadm create-lu /dev/zvol/rdsk/rpool/zvols/ant1 # Create a back-end LUN.Logical unit created: 600144F068D1CD00000053ECD3D20001storage1# stmfadm list-luLU Name: 600144F068D1CD00000053ECD3D20001storage1# stmfadm add-view 600144F068D1CD00000053ECD3D20001storage1# stmfadm list-view -l 600144F068D1CD00000053ECD3D20001View Entry: 0Host group : AllTarget Group : AllLUN : Autostorage1# svcadm enable -r svc:/network/iscsi/target:default # Enable the target service.storage1# svcs -l iscsi/targetfmri svc:/network/iscsi/target:defaultname iscsi targetenabled truestate onlinenext_state nonestate_time Auguest 10, 2014 03:58:50 PM ESTlogfile /var/svc/log/network-iscsi-target:default.logrestarter svc:/system/svc/restarter:defaultmanifest /lib/svc/manifest/network/iscsi/iscsi-target.xmldependency require_any/error svc:/milestone/network (online)dependency require_all/none svc:/system/stmf:default (online)storage1# itadm create-targetTarget iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5successfully createdstorage1# itadm list-target -vTARGET NAME STATE SESSIONSiqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5 online 0 alias: - auth: none (defaults) targetchapuser: - targetchapsecret: unset tpg-tags: defaultStep 2A. Configure initiatorsConfiguring iSCSI on the two iSCSI initiators uses exactly the same commands on each, so I'll just list them once.bothnodes# svcadm enable network/iscsi/initiatorbothnodes# iscsiadm modify discovery --sendtargets enable # The simplest discovery method.bothnodes# iscsiadm add discovery-address # IP address of the storage server.At this point, the initiator will automatically discover all of the iSCSI LUNs offered by that target. One way to view the list of them is with the format(1M) command.bothnodes# formatSearching for disks...doneAVAILABLE DISK SELECTIONS:.... 1. c0t600144F068D1CD00000053ECD3D20001d0 /scsi_vhci/disk@g600144f068d1cd00000053ecd3d20001...Step 2B. On each of the two computers that will host the zone, identify the Storage Uniform Resource Identfiers ("SURI") - see suri(5) for more information. This command tells you the SURI of that LUN, in each of multiple formats. We'll need this SURI to specify the storage for the kernel zone.bothnodes# suriadm lookup-uri c0t600144F068D1CD00000053ECD3D20001d0iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001iscsi://house/target.iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5,lun.1dev:dsk/c0t600144F068D1CD00000053ECD3D20001d0Step 2C. When you suspend a kernel zone, its RAM pages must be stored temporarily in a file. In order to resume the zone on a different computer, the "suspend file" must be on storage that both computers can access. For this example, we'll use an NFS share. (Another iSCSI LUN could be used instead.) The method shown below is not particularly secure, although the suspended image is first encrypted. Secure methods would require the use of other Solaris features, but they are not the topic of this blog entry.storage1# zfs create -p rpool/export/suspendstorage1# zfs set share.nfs=on rpool/export/suspendThat share must be made available on both nodes, with appropriate permissions.node1# mkdir /mnt/suspendnode1# mount -F nfs storage1:/export/suspend /mnt/suspendnode2# mkdir /mnt/suspendnode2# mount -F nfs storage1:/export/suspend /mnt/suspendStep 3. Configure a kernel zone, using the two iSCSI LUNs and a system profile.You can configure a kernel zone very easily. The only required settings are the name and the use of the kernel zone template. The name of the latter is SYSsolaris-kz. That template specifies a VNIC, 2GB of dedicated RAM, 1 virtual CPU, and local storage that will be configured automatically when the zone is installed.We need shared storage instead of local storage, so one of the first steps will be deleting the local storage resource. That device will have an ID number of zero. After deleting that resource, we add the LUN, using the SURI determined earlier.node1# zonecfg -z ant1 Use 'create' to begin configuring a new zone. zonecfg:ant1> create -t SYSsolaris-kz zonecfg:ant1> remove device id=0 zonecfg:ant1> add device zonecfg:ant1:device> set storage=iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001 zonecfg:ant1:device> set bootpri=0 zonecfg:ant1:device> info device: match not specified storage: iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001 id: 1 bootpri: 0 zonecfg:ant1:device> end zonecfg:ant1> add suspend zonecfg:ant1:suspend> set path=/mnt/suspend/ant1.sus zonecfg:ant1:suspend> end zonecfg:ant1> exitWe can create a reusable configuration profile.node1# sysconfig create-profile -o ant1[The usual sysconfig conversation ensues...]Step 4. Install and boot the kernel zone.node1# zoneadm -z ant1 install -c ant1/sc_profile.xmlProgress being logged to /var/log/zones/zoneadm.20140815T155143Z.ant1.installpkg cache: Using /var/pkg/publisher. Install Log: /system/volatile/install.15996/install_log AI Manifest: /tmp/zoneadm15390.W_a4NE/devel-ai-manifest.xml SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xmlInstallation: Starting ... Creating IPS image Installing packages from: solaris origin: http://pkg.oracle.com/solaris/release/ The following licenses have been accepted and not displayed. Please review the licenses for the following packages post-install: consolidation/osnet/osnet-incorporation Package licenses may be viewed using the command: pkg info --license DOWNLOAD PKGS FILES XFER (MB) SPEEDCompleted 483/483 64276/64276 543.7/543.7 0B/sPHASE ITEMSInstalling new actions 87530/87530Updating package state database DoneUpdating package cache 0/0Updating image state DoneCreating fast lookup database DoneInstallation: Succeeded Done: Installation completed in 207.564 seconds.node1# zoneadm -z ant1 bootStep 5. With all of the hard work behind us, we can "warm migrate" the zone. The first step is preparation of the destination system - "node2" in our example - by applying the zone's configuration to the destination.node1# zonecfg -z ant1 export -f /mnt/suspend/ant1.cfgnode2# zonecfg -z ant1 -f /mnt/suspend/ant1.cfgThe "detach" operation does not delete anything. It merely tells node1 to cease considering the zone to be usable.node1# zoneadm -z ant1 suspendnode1# zoneadm -z ant1 detachA separate "resume" sub-command for zoneadm was not necessary. The "boot" sub-command fulfills that purpose.node2# zoneadm -z ant1 attachnode2# zoneadm -z ant1 bootOf course, "warm migration" is different from "live migration" in one important respect: the duration of the service outage. Live migration achieves a service outage that lasts a small fraction of a second. In one experiment, warm migration of a kernel zone created a service outage that lasted 30 seconds. It's not live migration, but is an important step forward, compared to other types of Solaris Zones.The NotesThis example used a zpool as back-end storage. That zpool provided data redundancy, so additional redundancy was not needed within the kernel zone. If unmirrored devices (e.g. physical disks were specified in zonecfg) then data redundancy should be achieved within the zone. Fortunately, you can specify two devices in zonecfg, and "zoneadm ... install" will automatically mirror them.In a simple network configuration, the steps above create a kernel zone that has normal network access. More complicated networks may require additional steps, such as VLAN configuration, etc.Some steps regarding file permissions on the NFS mount were omitted for clarity. This is one of the security weaknesses of the steps shown above. All of the weaknesses can be addressed by using additional Solaris features. These include, but are not limited to, iSCSI features (iSNS, CHAP authentication, RADIUS, etc.), NFS security features (e.g. NFS ACLs, Kerberos, etc.), RBAC, etc.The LinksThe Kernel Zones documentation and the zonecfg(1M) man page describe kernel zones.The Configuring Storage Devices with COMSTAR section in the "Managing Devices in Oracle Solaris 11.2" documentation.The iSCSI Initiator documentation.The suriadm(1M) man page.The Install a kernel zone in 3 steps blog entry.The Tour of a kernel zone blog entry.The Kernel Zones for SAP blog entry.The Deploying SAS 9.4 with Oracle Solaris 11.2 Kernel Zones and Unified Archives white paper.And finally the Solaris 11.2: All the Posts meta-post.

The Introduction Oracle Solaris 11.2 introduced Oracle Solaris Kernel Zones. Kernel Zones (KZs) offer a midpoint between traditional operating system virtualizationand virtual machines. They exhibit...

Solaris 10 Containers

Improving Manageability of Virtual Environments

Boot Environments for Solaris 10 Branded ZonesUntil recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression: Live Upgrade did not work. The individual packaging and patching tools work correctly, but the ability to upgrade Solaris while the production workload continued running did not exist. A recent Solaris 11 SRU (Solaris 11.1 SRU 6.4) restored most of that functionality, although with a slightly different concept, different commands, and without all of the feature details. This new method gives you the ability to create and manage multiple boot environments (BEs) for a Solaris 10 Branded Zone, and modify the active or any inactive BE, and to do so while the production workload continues to run.BackgroundIn case you are new to Solaris: Solaris includes a set of features that enables you to create a bootable Solaris image, called a Boot Environment (BE). This newly created image can be modified while the original BE is still running your workload(s). There are many benefits, including improved uptime and the ability to reboot into (or downgrade to) an older BE if a newer one has a problem.In Solaris 10 this set of features was named Live Upgrade. Solaris 11 applies the same basic concepts to the new packaging system (IPS) but there isn't a specific name for the feature set. The features are simply part of IPS. Solaris 11 Boot Environments are not discussed in this blog entry.Although a Solaris 10 system can have multiple BEs, until recently a Solaris 10 Branded Zone (BZ) in a Solaris 11 system did not have this ability. This limitation was addressed recently, and that enhancement is the subject of this blog entry.This new implementation uses two concepts. The first is the use of a ZFS clone for each BE. This makes it very easy to create a BE, or many BEs. This is a distinct advantage over the Live Upgrade feature set in Solaris 10, which had a practical limitation of two BEs on a system, when using UFS. The second new concept is a very simple mechanism to indicate the BE that should be booted: a ZFS property. The new ZFS property is named com.oracle.zones.solaris10:activebe (isn't that creative? ;-) ). It's important to note that the property is inherited from the original BE's file system to any BEs you create. In other words, all BEs in one zone have the same value for that property. When the (Solaris 11) global zone boots the Solaris 10 BZ, it boots the BE that has the name that is stored in the activebe property.Here is a quick summary of the actions you can use to manage these BEs:To create a BE:Create a ZFS clone of the zone's root datasetTo activate a BE:Set the ZFS property of the root dataset to indicate the BETo add a package or patch to an inactive BE:Mount the inactive BEAdd packages or patches to itUnmount the inactive BETo list the available BEs:Use the "zfs list" command.To destroy a BE:Use the "zfs destroy" command.PreparationBefore you can use the new features, you will need a Solaris 10 BZ on a Solaris 11 system. You can use these three steps - on a real Solaris 11.1 server or in a VirtualBox guest running Solaris 11.1 - to create a Solaris 10 BZ. The Solaris 11.1 environment must be at SRU 6.4 or newer.Create a flash archive on the Solaris 10 systems10# flarcreate -n s10-system /net/zones/archives/s10-system.flarConfigure the Solaris 10 BZ on the Solaris 11 systems11# zonecfg -z s10zUse 'create' to begin configuring a new zone.zonecfg:s10z> create -t SYSsolaris10zonecfg:s10z> set zonepath=/zones/s10zzonecfg:s10z> exits11# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - s10z configured /zones/s10z solaris10 excl Install the zone from the flash archives11# zoneadm -z s10z install -a /net/zones/archives/s10-system.flar -pYou can find more information about the migration of Solaris 10 environments to Solaris 10 Branded Zones in the documentation.The rest of this blog entry demonstrates the commands you can use to accomplish the aforementioned actions related to BEs.New features in actionNote that the demonstration of the commands occurs in the Solaris 10 BZ, as indicated by the shell prompt "s10z# ". Many of these commands can be performed in the global zone instead, if you prefer. If you perform them in the global zone, you must change the ZFS file system names.CreateThe only complicated action is the creation of a BE.In the Solaris 10 BZ, create a new "boot environment" - a ZFS clone. You can assign any name to the final portion of the clone's name, as long as it meets the requirements for a ZFS file system name.s10z# zfs snapshot rpool/ROOT/zbe-0@snaps10z# zfs clone -o mountpoint=/ -o canmount=noauto rpool/ROOT/zbe-0@snap rpool/ROOT/newBEcannot mount 'rpool/ROOT/newBE' on '/': directory is not emptyfilesystem successfully created, but not mountedYou can safely ignore that message: we already know that / is not empty! We have merely told ZFS that the default mountpoint for the clone is the root directory.(Note that a Solaris 10 BZ that has a separate /var file system requires additional steps. See the MOS document mentioned at the bottom of this blog entry.)List the available BEs and active BEBecause each BE is represented by a clone of the rpool/ROOT dataset, listing the BEs is as simple as listing the clones.s10z# zfs list -r rpool/ROOTNAME USED AVAIL REFER MOUNTPOINTrpool/ROOT 3.55G 42.9G 31K legacyrpool/ROOT/zbe-0 1K 42.9G 3.55G /rpool/ROOT/newBE 3.55G 42.9G 3.55G /The output shows that two BEs exist. Their names are "zbe-0" and "newBE".You can tell Solaris that one particular BE should be used when the zone next boots by using a ZFS property. Its name is com.oracle.zones.solaris10:activebe. The value of that property is the name of the clone that contains the BE that should be booted.s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOTNAME PROPERTY VALUE SOURCErpool/ROOT com.oracle.zones.solaris10:activebe zbe-0 localChange the active BEWhen you want to change the BE that will be booted next time, you can just change the activebe property on the rpool/ROOT dataset.s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOTNAME PROPERTY VALUE SOURCErpool/ROOT com.oracle.zones.solaris10:activebe zbe-0 locals10z# zfs set com.oracle.zones.solaris10:activebe=newBE rpool/ROOTs10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOTNAME PROPERTY VALUE SOURCErpool/ROOT com.oracle.zones.solaris10:activebe newBE locals10z# shutdown -y -g0 -i6After the zone has rebooted:s10z# zfs get com.oracle.zones.solaris10:activebe rpool/ROOTrpool/ROOT com.oracle.zones.solaris10:activebe newBE locals10z# zfs mountrpool/ROOT/newBE /rpool/export /exportrpool/export/home /export/homerpool /rpoolMount the original BE to see that it's still there.s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0s10z# ls /mntDesktop export platformDocuments export.backup.20130607T214951Z procS10Flar home rpoolTT_DB kernel sbinbin lib systemboot lost+found tmpcdrom mnt usrdev net varetc optPatch an inactive BEAt this point, you can modify the original BE. If you would prefer to modify the new BE, you can restore the original value to the activebe property and reboot, and then mount the new BE to /mnt (or another empty directory) and modify it.Let's mount the original BE so we can modify it. (The first command is only needed if you haven't already mounted that BE.)s10z# zfs mount -o mountpoint=/mnt rpool/ROOT/zbe-0s10z# patchadd -R /mnt -M /var/sadm/spool 104945-02Note that the typical usage will be:Create a BEMount the new (inactive) BEUse the package and patch tools to update the new BEUnmount the new BERebootDelete an inactive BEZFS clones are children of their parent file systems. In order to destroy the parent, you must first "promote" the child. This reverses the parent-child relationship. (For more information on this, see the documentation.)The original rpool/ROOT file system is the parent of the clones that you create as BEs. In order to destroy an earlier BE that is that parent of other BEs, you must first promote one of the child BEs to be the ZFS parent. Only then can you destroy the original BE.Fortunately, this is easier to do than to explain:s10z# zfs promote rpool/ROOT/newBE s10z# zfs destroy rpool/ROOT/zbe-0s10z# zfs list -r rpool/ROOTNAME USED AVAIL REFER MOUNTPOINTrpool/ROOT 3.56G 269G 31K legacyrpool/ROOT/newBE 3.56G 269G 3.55G /DocumentationThis feature is so new, it is not yet described in the Solaris 11 documentation. However, MOS note 1558773.1 offers some details.ConclusionWith this new feature, you can add and patch packages to boot environments of a Solaris 10 Branded Zone. This ability improves the manageability of these zones, and makes their use more practical. It also means that you can use the existing P2V tools with earlier Solaris 10 updates, and modify the environments after they become Solaris 10 Branded Zones.

Boot Environments for Solaris 10 Branded Zones Until recently, Solaris 10 Branded Zones on Solaris 11 suffered one notable regression:Live Upgrade did not work. The individual packaging and...

Solaris 11

Comparing Solaris 11 Zones to Solaris 10 Zones

Many people have asked whether Oracle Solaris 11 uses sparse-root zones or whole-root zones. I think the best answer is "both and neither, and more" - but that's a wee bit confusing. :-) This blog entry attempts to explain that answer.First a recap: Solaris 10 introduced the Solaris Zones feature set, way back in 2005. Zones are a form ofserver virtualization called "OS (Operating System) Virtualization." They improve consolidation ratios by isolating processes from each other so that they cannot interact. Each zone has its own set of users, naming services, and other software components. One of the many advantages is that there is no need for a hypervisor, so there is no performance overhead. Many data centers run tens to hundreds of zones per server!In Solaris 10, there are two models of package deployment for Solaris Zones. One model is called "sparse-root" and the other "whole-root." Each form has specific characteristics, abilities, and limitations.A whole-root zone has its own copy of the Solaris packages. This allows the inclusion of other software in system directories - even though that practice has been discouraged for many years. Although it is also possible to modify the Solaris content in such a zone, e.g. patching a zone separately from the rest, this was highly frowned on. :-( (More importantly, modifying the Solaris content in a whole-root zone may lead to an unsupported configuration.)The other model is called "sparse-root." In that form, instead of copying all of the Solaris packages into the zone, the directories containing Solaris binaries are re-mounted into the zone. This allows the zone's users to access them at their normal places in the directory tree. Those are read-only mounts, so a zone's root user cannot modify them. This improves security, and also reduces the amount of disk space used by the zone - 200MB instead of the usual 3-5 GB per zone. These loopback mounts also reduce the amount of RAM used by zones because Solaris only stores in RAM one copy of a program that is in use by several zones. This model also has disadvantages. One disadvantage is the inability to add software into system directories such as /usr. Also, although a sparse-root can be migrated to another Solaris 10 system, it cannot be moved to a Solaris 11 system as a "Solaris 10 Zone."In addition to those contrasting characteristics, here are some characteristics of zones in Solaris 10 that are shared by both packaging models:A zone can modify its own configuration files in /etc.A zone can be configured so that it manages its own networking, or so that it cannot modify its network configuration.It is difficult to give a non-root user in the global zone the ability to boot and stop a zone, without giving that user other abilities.In a zone that can manage its own networking, the root user can do harmful things like spoof other IP addresses and MAC addresses.It is difficult to assign network patcket processing to the same CPUs that a zone used. This could lead to unpredictable performance and performance troubleshooting challenges.You cannot run a large number of zones in one system (e.g. 50) that each managed its own networking, because that would require assignment of more physical NICs than available (e.g. 50).Except when managed by Ops Center, zones could not be safely stored on NAS.Solaris 10 Zones cannot be NFS servers.The fsstat command does not report statistics per zone.Solaris 11 Zones use the new packaging system of Solaris 11. Their configuration does not offer a choiceof packaging models, as Solaris 10 does. Instead, two (well, four) different models of "immutability"(changeability) are offered. The default model allows a privileged zone user to modify the zone's content.The other (three) limit the content which can be changed: none, or two overlapping sets of configuration files.(See "Configuring and Administering Immutable Zones".)Solaris 11 addresses many of those limitations. With the characteristics listed above in mind, thefollowing table shows the similarities and differences between zones in Solaris 10 and in Solaris 11.(Cells in a row that are similar have the same background color.)CharacteristicSolaris 10Whole-RootSolaris 10Sparse-RootSolaris 11Solaris 11Immutable ZonesEach zone has a copy of most Solaris packagesYesNoYesYesDisk space used by a zone (typical)3.5 GB100 MB500MB500MBA privileged zone user can add software to /usrYesNoYesNoA zone can modify its Solaris programsTrueFalseTrueFalseEach zone can modify its configuration filesYesYesYesNoDelegated administrationNoNoYesYesA zone can be configured to manage its own networkingYesYesYesYesA zone can be configured so that it cannot manage its own networkingYesYesYesYesA zone can be configured with resource controlsYesYesYesYesIntegrated tool to measure a zone's resource consumption (zonestat)NoNoYesYesNetwork processing automatically happens on that zone's CPUsNoNoYesYesZones can be NFS serversNoNoYesYesPer-zone fsstat dataNoNoYesYesAs you can see, the statement "Solaris 11 Zones are whole-root zones" is only true using the narrowest definition of whole-root zones: those zones which have their own copy of Solaris packaging content. But there are other valuable characteristics of sparse-root zones that are still available in Solaris 11 Zones. Also, some Solaris 11 Zones do not have some characteristics of whole-root zones.For example, the table above shows that you can configure a Solaris 11 zone that hasread-only Solaris content. And Solaris 11 takes that concept further, offering theability to tailor that immutability. It also shows that Solaris 10 sparse-root andwhole-root zones are more similar to each other than to Solaris 11 Zones.ConclusionSolaris 11 Zones are slightly different from Solaris 10 Zones. The former can achievethe goals of the latter, and they also offer features not found in Solaris 10 Zones.Solaris 11 Zones offer the best of Solaris 10 whole-root zones and sparse-root zones,and offer an array of new features that make Zones even more flexible and powerful.

Many people have asked whether Oracle Solaris 11 uses sparse-root zones or whole-root zones. I think the best answer is "both and neither, and more" - but that's a wee bit confusing. :-) This blog...


More SPARC T5 Performance Results

Performance results for the new SPARC T5 systems keep coming in...Last week, SPEC published the most recent result for the SPECjbb013-MultiJVM benchmark.This benchmark "is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community" according to SPEC.All of the published results are at: http://www.spec.org/jbb2013/results/jbb2013multijvm.html.For the first table below, I selected all of the max-JOPS results greater than 50,000 JOPS using the most recent Java version, for the SPARC T5-2and for competing systems. From the SPECjbb2013 data, I derived two new values, max-JOPS/chip and max-JOPS/core. The latter value compensates for the different quantity of cores used in one of the tests. Finally, the "Advantage of T5" column shows the portion by which the T5-2 cores perform better than the other systems' cores. For example, on this benchmark a 32-core T5-2 computer demonstrated 15% better per-core performance than an HP DL560p with the same number of cores.As you can see, a SPARC T5 core is faster than an Intel Xeon core, compared against competing systems with 32 or more cores.ModelCPUChipsCoresOSmax-JOPSDate Publishedmax-JOPS per chipmax-JOPS per coreAdvantage of T5SPARC T5-2SPARC T5232Solaris 11.175658April 2013378292364HP ProLiant DL560p Gen8Intel E5-4650432Windows Server 2008 R2 Enterprise67850April 201316963212012%HP ProLiant DL560p Gen8Intel E5-4650432RHEL 6.366007April 201316502206315%HP ProLiant DL980 G7Intel E7-4870880RHEL 6.3106141April 201313268132778%The SPECjbb2013 benchmark also includes a performance measure called "critical-JOPS." This measurement represents the ability of a system to achieve high levels of throughput while still maintaining a short response time. The performance advantage of the T5 cores is even more pronounced.ModelCPUChipsCoresOScritical- JOPSDate Publishedcritical- JOPS per chipcritical- JOPS per coreAdvantage of T5SPARC T5-2SPARC T5232Solaris 11.123334April 201311667729HP ProLiant DL560p Gen8Intel E5-4650432Windows Server 2008 R2 Enterprise16199April 2013405050644%

Performance results for the new SPARC T5 systems keep coming in... Last week, SPEC published the most recent result for the SPECjbb013-MultiJVM benchmark.This benchmark "is relevant to all audiences...


IDC Reviews New SPARC T5 and M5 Servers

On April 5, IDC published an article that provides their view of therecent announcement of Oracle T5 and M5 servers.IDC's conclusion: "Oracle has invested deeply in improving`the performance of the T-series processors it developed following its acquisition of Sun Microsystems in 2010. It has pushed its engineering efforts to release new SPARC processor technology — providing a much more competitive general-purpose server platform. This will provide an immediate improvement for its large installed base, even as it lends momentum to a new round of competition in the Unix server marketplace."IDC also noted the "dramatic performance gains for SPARC, with 16-core microprocessor technology based on three years of IP (intellectual property) development at Oracle, following Oracle's acquisition of Sun Microsystems Inc. in January, 2010."The new T5 servers use SPARC T5 processor chips that offer more than double the performance of SPARC T4 chips, which were released just over a year ago. And the T4 chips, in turn, were a significant departure from all previous SPARC CMT CPUs, in that the T4 chips offered excellent performance for single-threaded workloads.The new M5 servers use up to 32 SPARC M5 processors, each using the same "S3" SPARC cores as the T5 chips.

On April 5, IDC published an article that provides their view of the recent announcement of Oracle T5 and M5 servers.IDC's conclusion: "Oracle has invested deeply in improving`the performance of the...


New SPARC Servers Outrun the Competition - by Leaps and Bounds!

In case you missed yesterday's launch of Oracle's new SPARC server line, based on theSPARC T5 and M5 processors, here is a brief summary - and the obligatory links to details...The new SPARC T5 chip uses the "S3" core which has been in the SPARC T4 generation for over a year.That core offers, among other things, 8 hardware threads, two simultaneous integer pipelines,and some other compute units as well (FP, crypto, etc.). The S3 core also includes the instructionsnecessary to work with the SPARC hypervisor that implements SPARC virtual machines (Oracle VM Server for SPARC, previously called Logical Domains or LDoms) including Live Migration of VMs.However, four significant improvements have been made in the new systems:16 cores on each T5 chip, instead of T4's 8 cores per chip, was made possiblebecause of a die shrink (to 28 nm).An increase in clock rate to 3.6 GHz yields an immediate 20% improvement in processing over T4 systems.Increased chip scalability allows up to 8 CPU chips per system, doubling the maximum number of cores in the mid-range systems.In addition to the mid-range servers, now the high-end M5-32 also supports OVM Server for SPARC (LDoms),while maintaining the ability to use hard partitions (Dynamic Domains) in that system.(The T5-based servers (PDF) also have LDoms, just like the T4-based systems.)The result of those is the "world's fastest microprocessor." Between the four T5 (mid-range) systems andthe M5-32, this new generation of systems has already achieved 17 performance world records.Some of the simpler comparisons that were made yesterday include (see thepress release for substantiation):An Oracle T5-8 (8 CPU sockets) runningSolaris has a higher SAP performanace rating than an IBM Power 780 (8 sockets) running AIX.A single, 2-socket T5-2 has three times the performance, at 13% of the cost, of two Power 770's - on a JD Edwards performance test.Two T5-2 servers have almost double the Siebel performance of two Power 750 servers - at one-fourth the price.One 8-processor T5-8 outperforms an 8-processor Power 780 - at one-seventh the cost - on the common SPECint_rate 2006 benchmark.The new high-end SPARC system - the M5-32 - sports 192 cores (1,536 hardware threads) of compute power. It can also be packed with 32 TB (yes, terabytes!) of RAM. Put your largest DB entirely in RAM, and see how fast it goes!Oracle has refreshed its entire SPARC server line all at once, greatly improving performance - not only compared to the previous SPARC generation, but also compared to the current generation of servers from other manufacturers.

In case you missed yesterday'slaunch of Oracle's new SPARC server line, based on the SPARC T5 and M5 processors, here is a brief summary - and the obligatory links to details... The new SPARC T5 chip ...

Solaris 11

Oracle Solaris: Zones on Shared Storage

OracleSolaris 11.1 has several new features. At oracle.com you can finda detailed list.One of the significant new features, and the most significant new feature releatedto Oracle Solaris Zones, is casually called "Zones on Shared Storage" or simply ZOSS (rhymeswith "moss"). ZOSS offers much more flexibility because you can store Solaris Zoneson shared storage (surprise!) so that you can perform quick and easy migration of a zonefrom one system to another. This blog entry describes and demonstrates the use of ZOSS.ZOSS provides complete support for a Solaris Zone that is stored on "shared storage."In this case, "shared storage" refers to fiber channel (FC) or iSCSI devices,although there is one lone exception that I will demonstrate soon. The primary intentis to enable you to store a zone on FC or iSCSI storage so that it can be migratedfrom one host computer to another much more easily and safely than in the past.With this blog entry, I wanted to make it easy for you to try this yourself. I couldn'tassume that you have a SAN available - which is a good thing, because neither do I! :-)What could I use, instead? [There he goes,foreshadowing again... -Ed.]Developing this entry reinforced the lesson that the solution to every lab problem isVirtualBox. ;-) Oracle VM VirtualBox(its formal name) helps here in a couple of important ways. It offers the ability toeasily install multiple copies of Solaris as guests on top of any popular system(Microsoft Windows, MacOS, Solaris,Oracle Linux(and other Linuxes) etc.). It also offers the ability to create a separatevirtual disk drive (VDI) that appears as a local hard disk to a guest. This virtualdisk can be moved very easily from one guest to another. In other words, you canfollow the steps below on a laptop or larger x86 system.Please note that the ability to use ZOSS to store a zone on a local disk is very usefulfor a lab environment, but not so useful for production. I do not suggestregularly moving disk drives among computers. [Update, 2013.01.28: Apparently the previous sentencecaused some confusion. I do recommend the use of Zones on Shared Storage in production environments, when appropriate storage is used. "Appropriate storage" would include SAN or iSCSI at this point. I do not recommend using ZOSSwith local disks in production because doing so would require moving the disks betweencomputers.]In the method I describe below, that virtual hard disk will contain the zone that will bemigrated among the (virtual) hosts. In production, you would use FC or iSCSI LUNsinstead. Thezonecfg(1M) man page details the syntax for each of the three types ofdevices.Why Migrate?Why is the migration of virtual servers important? Some of the most common reasonsare:Moving a workload to a different computer so that the original computer can beturned off for extensive maintenance.Moving a workload to a larger system because the workload has outgrown its original system.If the workload runs in an environment (such as a Solaris Zone)that is stored on shared storage, you can restore the service of the workload on analternate computer if the original computer has failed and will not reboot.You can simplify lifecycle management of a workload by developing it on a laptop,migrating it to a test platform when it's ready, and finally moving it to a productionsystem.ConceptsFor ZOSS, the important new concept is named "rootzpool". You can read about itin thezonecfg(1M) man page, but here's the short version: it's the backing store(hard disk(s), or LUN(s)) that will be used to make a ZFS zpool - the zpool that willhold the zone. This zpool: contains the zone's Solaris content, i.e. the root file system does not contain any content not related to the zone can only be mounted by one Solaris instance at a timeMethod OverviewHere is a brief list of the steps to create a zone on shared storage and migrate it.The next section shows the commands and output.You will need a host system with an x86 CPU (hopefully at least a couple of CPU cores), at least 2GB of RAM, and at least 25GB of free disk space. (The steps below will not actually use 25GB of disk space, but I don't want to lead you down a path that ends in a big sign that says "Your HDD is full. Good luck!")Configure the zone on both systems, specifying the rootzpool that both will use.The best way is to configure it on one system and then copy the output of "zonecfg export"to the other system to be used as input to zonecfg. This method reduces the chances ofpilot error.(It is not necessary to configure the zone on both systems before creating it. Youcan configure this zone in multiple places, whenever you want, and migrate it to oneof those places at any time - as long as those systems all have access to the shared storage.)Install the zone on one system, onto shared storage.Boot the zone.Provide system configuration information to the zone. (In the Real World(tm) you will usually automate this step.)Shutdown the zone.Detach the zone from the original system.Attach the zone to its new "home" system.Boot the zone.The zone can be used normally, and even migrated back, or to a different system.DetailsThe rest of this shows the commands and output. The two hostnames are "sysA" and"sysB".Note that each Solaris guest might use a different device name for the VDI that theyshare. I used the device names shown below, but you must discover the device name(s)after booting each guest. In a production environment you would also discover the devicename first and then configure the zone with that name. Fortunately, you can use the command"zpool import" or "format" to discover the device on the "new" host for the zone.The first steps create the VirtualBox guests and the shared disk drive. I describe the steps here without demonstrating them.Download VirtualBox andinstall it using a method normalfor your host OS. You can read thecomplete instructions.Create two VirtualBox guests,each to run Solaris 11.1. Each will use its own VDI as its root disk.Install Solaris 11.1 in each guest.Install Solaris 11.1 in each guest. To install a Solaris 11.1 guest, you can eitherdownloada pre-built VirtualBox guest, and import it,or install Solaris 11.1 from the"text install" media.If you use the latter method, after booting you will not see a windowing system. Toinstall the GUI and other important things, login and run "pkg install solaris-desktop"and take a break while it installs those important things.Life is usually easier if you install the VirtualBox Guest Additionsbecause then you can copy and paste between the host and guests, etc. You can find theguest additions in the foldermatching the version of VirtualBox you are using. You can also read theinstructions for installingthe guest additions.To create the zone's shared VDI in VirtualBox, you can open the storage configuration forone of the two guests, select the SATA controller, and click on the "Add Hard Disk" iconnearby. Choose "Create New Disk" and specify an appropriate path name for the file thatwill contain the VDI. The shared VDI must be at least 1.5 GB. Note that the guest must bestopped to do this.Add that VDI to the other guest - using its Storage configuration - so that eachcan access it while running. The steps start out the same, except that you choose "Choose Existing Disk" instead of "Create New Disk."Because the disk is configured on both of them,VirtualBox prevents you from running both guests at the same time.Identify device names of that VDI, in each of the guests. Solaris chooses the name based on existing devices. The names may be the same, or may be different from each other. This step is shown below as "Step 1."AssumptionsIn the example shown below, I make these assumptions.The guest that will own the zone at the beginning is named sysA.The guest that will own the zone after the first migration is named sysB.On sysA, the shared disk is named /dev/dsk/c7t2d0On sysB, the shared disk is named /dev/dsk/c7t3d0(Finally!) The StepsStep 1) Determine the name of the disk that will move back and forth between the systems.root@sysA:~# formatSearching for disks...doneAVAILABLE DISK SELECTIONS: 0. c7t0d0 /pci@0,0/pci8086,2829@d/disk@0,0 1. c7t2d0 /pci@0,0/pci8086,2829@d/disk@2,0Specify disk (enter its number): ^DStep 2) The first thing to do is partition and label the disk. The magic needed to write an EFIlabel is not overly complicated.root@sysA:~# format -e c7t2d0selecting c7t2d0[disk formatted]FORMAT MENU:...format> fdiskNo fdisk table exists. The default partition for the disk is: a 100% "SOLARIS System" partitionType "y" to accept the default partition, otherwise type "n" to edit the partition table. nSELECT ONE OF THE FOLLOWING:...Enter Selection: 1... G=EFI_SYS 0=Exit? fSELECT ONE......6format> label...Specify Label type[1]: 1Ready to label disk, continue? yformat> quitroot@sysA:~# ls /dev/dsk/c7t2d0/dev/dsk/c7t2d0Step 3) Configure zone1 on sysA.root@sysA:~# zonecfg -z zone1Use 'create' to begin configuring a new zone.zonecfg:zone1> createcreate: Using system default template 'SYSdefault'zonecfg:zone1> set zonename=zone1zonecfg:zone1> set zonepath=/zones/zone1zonecfg:zone1> add rootzpoolzonecfg:zone1:rootzpool> add storage dev:dsk/c7t2d0zonecfg:zone1:rootzpool> endzonecfg:zone1> exitroot@sysA:~#oot@sysA:~# zonecfg -z zone1 infozonename: zone1zonepath: /zones/zone1brand: solarisautoboot: falsebootargs:file-mac-profile:pool:limitpriv:scheduling-class:ip-type: exclusivehostid:fs-allowed:anet:...rootzpool: storage: dev:dsk/c7t2d0Step 4) Install the zone. This step takes the most time, but you can wanderoff for a snack or a few laps around the gym - or both! (Just not at the same time...)root@sysA:~# zoneadm -z zone1 installCreated zone zpool: zone1_rpoolProgress being logged to /var/log/zones/zoneadm.20121022T163634Z.zone1.install Image: Preparing at /zones/zone1/root. AI Manifest: /tmp/manifest.xml.RXaycg SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml Zonename: zone1Installation: Starting ... Creating IPS imageStartup linked: 1/1 done Installing packages from: solaris origin: http://pkg.us.oracle.com/support/DOWNLOAD PKGS FILES XFER (MB) SPEEDCompleted 183/183 33556/33556 222.2/222.2 2.8M/sPHASE ITEMSInstalling new actions 46825/46825Updating package state database DoneUpdating image state DoneCreating fast lookup database DoneInstallation: Succeeded Note: Man pages can be obtained by installing pkg:/system/manual done. Done: Installation completed in 1696.847 seconds. Next Steps: Boot the zone, then log into the zone console (zlogin -C) to complete the configuration process.Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20121022T163634Z.zone1.installStep 5) Boot the Zone.root@sysA:~# zoneadm -z zone1 bootStep 6) Login to zone's console to complete the specification of system information.root@sysA:~# zlogin -C zone1Answer the usual questions and wait for a login prompt. Then you can end theconsole session with the usual "~." incantation.Step 7) Shutdown the zone so it can be "moved."root@sysA:~# zoneadm -z zone1 shutdownStep 8) Detach the zone so that the original global zone can't use it.root@sysA:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - zone1 installed /zones/zone1 solaris exclroot@sysA:~# zpool listNAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOTrpool 17.6G 11.2G 6.47G 63% 1.00x ONLINE -zone1_rpool 1.98G 484M 1.51G 23% 1.00x ONLINE -root@sysA:~# zoneadm -z zone1 detachExported zone zpool: zone1_rpoolStep 9) Review the result and shutdown sysA so that sysB can use the shared disk.root@sysA:~# zpool listNAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOTrpool 17.6G 11.2G 6.47G 63% 1.00x ONLINE -root@sysA:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - zone1 configured /zones/zone1 solaris exclroot@sysA:~# init 0Step 10) Now boot sysB and configure a zone with the parametersshown above in Step 1. (Again, the safest method is to use "zonecfg ... export"on sysA as described in section "Method Overview" above.) The one differenceis the name of the rootzpool storage device, which was shown in the list ofassumptions, and which you must determine by booting sysB and using the "format"or "zpool import" command.When that is done, you should see the output shown next. (I used the samezonename - "zone1" - in this example, but you can choose any valid zonename you want.)root@sysB:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - zone1 configured /zones/zone1 solaris exclroot@sysB:~# zonecfg -z zone1 infozonename: zone1zonepath: /zones/zone1brand: solarisautoboot: falsebootargs:file-mac-profile:pool:limitpriv:scheduling-class:ip-type: exclusivehostid:fs-allowed:anet: linkname: net0...rootzpool: storage: dev:dsk/c7t3d0Step 11) Attaching the zone automatically imports the zpool.root@sysB:~# zoneadm -z zone1 attachImported zone zpool: zone1_rpoolProgress being logged to /var/log/zones/zoneadm.20121022T184034Z.zone1.attach Installing: Using existing zone boot environment Zone BE root dataset: zone1_rpool/rpool/ROOT/solaris Cache: Using /var/pkg/publisher. Updating non-global zone: Linking to image /.Processing linked: 1/1 done Updating non-global zone: Auditing packages.No updates necessary for this image. Updating non-global zone: Zone updated. Result: Attach Succeeded.Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20121022T184034Z.zone1.attachroot@sysB:~# zoneadm -z zone1 bootroot@sysB:~# zlogin zone1[Connected to zone 'zone1' pts/2]Oracle Corporation SunOS 5.11 11.1 September 2012Step 12) Now let's migrate the zone back to sysA. Create a file in zone1 so we can verify it exists after we migrate the zone back, then begin migrating it back.root@zone1:~# ls /optroot@zone1:~# touch /opt/fileAroot@zone1:~# ls -l /opt/fileA-rw-r--r-- 1 root root 0 Oct 22 14:47 /opt/fileAroot@zone1:~# exitlogout[Connection to zone 'zone1' pts/2 closed]root@sysB:~# zoneadm -z zone1 shutdownroot@sysB:~# zoneadm -z zone1 detachExported zone zpool: zone1_rpoolroot@sysB:~# init 0Step 13) Back on sysA, check the status.Oracle Corporation SunOS 5.11 11.1 September 2012root@sysA:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - zone1 configured /zones/zone1 solaris exclroot@sysA:~# zpool listNAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOTrpool 17.6G 11.2G 6.47G 63% 1.00x ONLINE -Step 14) Re-attach the zone back to sysA.root@sysA:~# zoneadm -z zone1 attachImported zone zpool: zone1_rpoolProgress being logged to /var/log/zones/zoneadm.20121022T190441Z.zone1.attach Installing: Using existing zone boot environment Zone BE root dataset: zone1_rpool/rpool/ROOT/solaris Cache: Using /var/pkg/publisher. Updating non-global zone: Linking to image /.Processing linked: 1/1 done Updating non-global zone: Auditing packages.No updates necessary for this image. Updating non-global zone: Zone updated. Result: Attach Succeeded.Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20121022T190441Z.zone1.attachroot@sysA:~# zpool listNAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOTrpool 17.6G 11.2G 6.47G 63% 1.00x ONLINE -zone1_rpool 1.98G 491M 1.51G 24% 1.00x ONLINE -root@sysA:~# zoneadm -z zone1 bootroot@sysA:~# zlogin zone1[Connected to zone 'zone1' pts/2]Oracle Corporation SunOS 5.11 11.1 September 2012root@zone1:~# zpool listNAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOTrpool 1.98G 538M 1.46G 26% 1.00x ONLINE -Step 15) Check for the file created on sysB, earlier.root@zone1:~# ls -l /opttotal 1-rw-r--r-- 1 root root 0 Oct 22 14:47 fileANext StepsHere is a brief list of some of the fun things you can try next.Add space to the zone by adding a second storage device to the rootzpool. Make sure that you add it to the configurations of both zones!Create a new zone, specifying two disks in the rootzpool when you first configurethe zone. When you install that zone, orclone itfrom another zone, zoneadm uses those two disks to create a mirrored pool.(Three disks will result in a three-way mirror, etc.)ConclusionHopefully you have seen the ease with which you can now move Solaris Zones from one system to another.

Oracle Solaris 11.1 has several new features. At oracle.com you can find a detailed list. One of the significant new features, and the most significant new feature releatedto Oracle Solaris Zones, is...

SPARC SuperCluster

SPARC SuperCluster Papers

Oracle has been publishing white papers that describe uses and characteristics of the SPARC SuperCluster product. Here are just a few:A Technical Overview of the Oracle SPARC SuperCluster T4-4SPARC SuperCluster T4-4 is a high performance, multi-purpose engineered system that has been designed, tested and integrated to run a wide array of enterprise applications. It is well suited for multi-tier enterprise applications with Web, database and application components. This 20-page paper discusses the components and technical characteristics of this product.SPARC SuperCluster T4-4 Platform Security Principles and CapabilitiesThe security capabilities designed into the SPARC SuperCluster, and architectural, deployment, and operational best practices for taking advantage of them.Consolidating Oracle E-Business Suite on Oracle’s SPARC SuperClusterThis Oracle Optimized Solution describes the implementation and use of SPARC SuperCluster as a consolidation platform for E-Business Suite in 30 pages.Oracle Optimized Solution for Oracle PeopleSoft Human Capital Management on SPARC SuperClusterThe Oracle Optimized Solution for PeopleSoft Human Capital Management on SPARC SuperCluster is the industry's only proven, tested, applications-to-disk solution that maintains excellence managing absences, optimizing collaborative activities, streamlining knowledge and honing processes; 31 pages.I hope you find some of those papers useful.

Oracle has been publishing white papers that describe uses and characteristics of the SPARC SuperCluster product. Here are just a few: A Technical Overview of the Oracle SPARC SuperCluster T4-4SPARC...


What's New in Oracle Solaris 11

Oracle Solaris 11 adds new features to the #1 Enterprise OS, Solaris 10. Some of these features were in "preview form" in Solaris 11 Express. The feature sets introduced there have been greatly expanded in order to make Solaris 11 ready for your data center. Also, new features have been added that were not in Solaris 11 Express in any form.The list of features below is not exhaustive. Complete documentation about changes to Solaris will be made available. To learn more, register for the Solaris 11 launch. You can attend in person, in New York City, or via webcast.Software management features designed for cloud computingThe new package management system is far easier to use than previous versions of Solaris. A completely new Solaris packaging system uses network-based repositories (located at our data centers or at yours) to modernize Solaris packaging. A new version of Live Upgrade minimizes service downtime during package updates. It also provides the ability to simply reboot to a previous version of the software if necessary - without resorting to backup tapes. The new Automated Installer replaces Solaris JumpStart and simplifies hands-off installation. AI also supports automatic creation of Solaris Zones. Distro Constructor creates Solaris binary images that can be installed over the network, or copied to physical media. The previous SVR4 (System V Release 4) packaging tools are included in Solaris 11 for installation of non-Solaris software packages. All of this is integrated with ZFS. For example, the alternate boot environemnts (ABEs) created by the Live Upgrade tools are ZFS clones, minimizing the time to create them and the space they occupy.Network virtualization and resource control features enable networks-in-a-boxPreviewed in Solaris 11 Express, the network virtualization and resource control features in Oracle Solaris 11 enable you to create an entire network in a Solaris instance. This can include virtual switches, virtual routers, integrated firewall and load-balancing software, IP tunnels, and more. I described the relevant concepts in an earlier blog entry.In addition to the significant improvements in flexibility compared to a physical network, network performance typically improves. Instead of traversing multiple physical network components (NICs, cables, switches and routers), packet transfers are accomplished by in-memory loads and stores. Packet latency shrinks dramatically, and aggregate bandwidth is no longer limited by NICs, but by memory link bandwidth.But mimicking a network wasn't enough. The Solaris 11 network resource controls provide the ability to dynamically control the amount of network bandwidth that a particular workload can use. Another blog entry described these controls. (Note that some of the details may have changed between the Solaris 11 Express details described in that entry, and the details of Solaris 11.)Easy, efficient data managementSolaris 11 expands on the award-winning ZFS file system, adding encryption and deduplication. Multiple encryption algorithms are available and can make use of encryption features included in the CPU, such as the SPARC T3 and T4 CPUs. An in-kernel CIFS server was also added, and the data is stored in a ZFS dataset. Ease-of-use is still a high-priority goal. Enabling CIFS service is as simple as enabling a dataset property.Improved built-in computer virtualizationAlong with ZFS, Oracle Solaris Zones continues to be a core feature set in use at many data centers. (The use of the word "Zones" will be preferred over the use of "Containers" to reduce confusion.) These features are enhanced in Solaris 11. I will detail these enhancements in a future blog entry, but here is a quick summary:Greater flexibility for immutable zones - called "sparse-root zones" in Solaris 10. Multiple options are available in Solaris 11. A zone can be an NFS server!Administration of existing zones can be delegated to users in the global zone.Zonestat(1) reports on resource consumption of zones. I blogged about the Solaris 11 Express version of this tool.A P2V "pre-flight" checker verifies that a Solaris 10 or Solaris 11 system is configured correctly for migration (P2V) into a zone on Solaris 11.To simplify the process of creating a zone, by default a zone gets a VNIC that is automatically configured on the most obvious physical NIC. Of course, you can manually configure a plethora of non-default network options.Advanced protectionLong known as one of the most secure operating systems on the planet, Oracle Solaris 11 continues making advances, including:CPU-speed network encryption means no compromisesSecure startup: by default, only the ssh service is enabled - a minimal attack surface reduces riskRestricted root: by default, 'root' is a role, not a user - all actions are logged or audited by usernameAnti-spoofing properties for data links...and more.As you can guess, we're looking forward to releasing Oracle Solaris 11! Its new features provide you with you the tools to simplify enterprise computing. To learn more about Solaris 11, register for the Solaris 11 launch.

Oracle Solaris 11 adds new features to the #1 Enterprise OS, Solaris 10. Some of these features were in "preview form" in Solaris 11 Express. The feature sets introduced there have been greatly...

Solaris 10 Containers

Solaris Zones help achieve World Record Benchmark Result

Maximizing performance of multi-node workloads can be challenging. Should I maximize CPU clock rate, or RAM size per node, or networkbandwidth? And how do I analyze performance of each component while also measuring aggregate throughput?Solaris Zones provide characteristics that are useful for multi-node architectures:Architectural flexibility: easily remove a network bottleneck between two components by running both in zones on one Solarisserver - and move any of them to different servers later as processing needs changeConvenient, dynamic resource management, assign a workload to a set of CPUs for predictable, consistent performance, ensurethat each workload component has access to sufficient hardware resources, etc.These characteristics are displayed in the world record benchmark result for Oracle JD Edwards EnterpriseOne. It was achieved usingSolaris Containers (Zones) to isolate individual software components, including the WebLogic-based applications and Web Tier Utilities.Solaris Zones features enabled software isolation and resource management, making the process of fine-tuning resource assignment very easy.For more details, see:adescriptive blog entrytheofficial announcement

Maximizing performance of multi-node workloads can be challenging. Should I maximize CPU clock rate, or RAM size per node, or networkbandwidth? And how do I analyze performance of each component...


Virtual Network - Part 4

Resource ControlsThis is the fourth part of a series of blog entries about Solaris network virtualization.Part 1introduced network virtualization,Part 2discussed network resource management capabilitiesavailable in Solaris 11 Express, andPart 3demonstrated the use of virtual NICs and virtual switches.This entry shows the use of a bandwidth cap on Virtual NetworkElements (VNEs).This form of network resource control can effectively limit the amount ofbandwidth consumed by a particular stream of packets. In our context, wewill restrict the amount of bandwidth that a zone can use.As a reminder, we have the following network topology, with three zones andthree VNICs, one VNIC per zone.All three VNICs were created on one ethernetinterface in Part 3 of this series.Capping VNIC BandwidthUsing a T2000 server in a lab environment, we can measure network throughput withthe new dlstat(1) command. This command reports various statistics aboutdata links, including the quantity of packets, bytes, interrupts, polls, drops,blocks, and other data.Because I am trying to illustrate the use of commands, not optimize performance,the network workload will be a simple file transfer using ftp(1).This method of measuring network bandwidth is reasonable for this purpose,but says nothing about the performance of this platform. For example,this method reads data from a disk. Some of that data may be cached, but diskperformance may impact the network bandwidth measured here. However, we can stillachieve the basic goal: demonstrating the effectiveness of a bandwidth cap.With that background out of the way, first let's check the current status of our links.GZ# dladm show-linkLINK CLASS MTU STATE BRIDGE OVERe1000g0 phys 1500 up -- --e1000g2 phys 1500 unknown -- --e1000g1 phys 1500 down -- --e1000g3 phys 1500 unknown -- --emp_web1 vnic 1500 up -- e1000g0emp_app1 vnic 1500 up -- e1000g0emp_db1 vnic 1500 up -- e1000g0GZ# dladm show-linkprop emp_app1LINK PROPERTY PERM VALUE DEFAULT POSSIBLEemp_app1 autopush rw -- -- --emp_app1 zone rw emp-app -- --emp_app1 state r- unknown up up,downemp_app1 mtu rw 1500 1500 1500emp_app1 maxbw rw -- -- --emp_app1 cpus rw -- -- --emp_app1 cpus-effective r- 1-9 -- --emp_app1 pool rw SUNWtmp_emp-app -- --emp_app1 pool-effective r- SUNWtmp_emp-app -- --emp_app1 priority rw high high low,medium,highemp_app1 tagmode rw vlanonly vlanonly normal,vlanonlyemp_app1 protection rw -- -- mac-nospoof, restricted, ip-nospoof, dhcp-nospoof<some lines deleted>Before setting any bandwidth caps, let's determine the transfer rates betweena zone on this system and a remote system.It's easy to use dlstat to determine the data rate to my home system whiletransferring a file from a zone:GZ# dlstat -i 10 e1000g0 LINK IPKTS RBYTES OPKTS OBYTES emp_app1 27.99M 2.11G 54.18M 77.34G emp_app1 83 6.72K 0 0 emp_app1 339 23.73K 1.36K 1.68M emp_app1 1.79K 120.09K 6.78K 8.38M emp_app1 2.27K 153.60K 8.49K 10.50M emp_app1 2.35K 156.27K 8.88K 10.98M emp_app1 2.65K 182.81K 5.09K 6.30M emp_app1 600 44.10K 935 1.15M emp_app1 112 8.43K 0 0The OBYTES column is simply the number of bytes transferred during that datasample. I'll ignore the 1.68MB and 1.15MB data points because the file transferbegan and ended during those samples. The average of the other values leads toa bandwidth of 7.6 Mbps (megabits per second), which is typical for my broadbandconnection.Let's pretend that we want to constrain the bandwidth consumed by that workloadto 2 Mbps. Perhaps we want to leave all of the rest for a higher-priorityworkload. Perhaps we're an ISP and charge for different levels of availablebandwidth. Regardless of the situation, capping bandwidth is easy:GZ# dladm set-linkprop -p maxbw=2000k emp_app1GZ# dladm show-linkprop -p maxbw emp__app1LINK PROPERTY PERM VALUE DEFAULT POSSIBLEemp_app1 maxbw rw 2 -- --GZ# dlstat -i 20 emp_app1 LINK IPKTS RBYTES OPKTS OBYTES emp_app1 18.21M 1.43G 10.22M 14.56G emp_app1 186 13.98K 0 0 emp_app1 613 51.98K 1.09K 1.34M emp_app1 1.51K 107.85K 3.94K 4.87M emp_app1 1.88K 131.19K 3.12K 3.86M emp_app1 2.07K 143.17K 3.65K 4.51M emp_app1 1.84K 136.03K 3.03K 3.75M emp_app1 2.10K 145.69K 3.70K 4.57M emp_app1 2.24K 154.95K 3.89K 4.81M emp_app1 2.43K 166.01K 4.33K 5.35M emp_app1 2.48K 168.63K 4.29K 5.30M emp_app1 2.36K 164.55K 4.32K 5.34M emp_app1 519 42.91K 643 793.01K emp_app1 200 18.59K 0 0Note that for dladm, the default unit for maxbw is Mbps.The average of the full samples is 1.97 Mbps.Between zones, the uncapped data rate is higher:GZ# dladm reset-linkprop -p maxbw emp_app1GZ# dladm show-linkprop -p maxbw emp_app1LINK PROPERTY PERM VALUE DEFAULT POSSIBLEemp_app1 maxbw rw -- -- --GZ# dlstat -i 20 emp_app1 LINK IPKTS RBYTES OPKTS OBYTES emp_app1 20.80M 1.62G 23.36M 33.25G emp_app1 208 16.59K 0 0 emp_app1 24.48K 1.63M 193.94K 277.50M emp_app1 265.68K 17.54M 2.05M 2.93G emp_app1 266.87K 17.62M 2.06M 2.94G emp_app1 255.78K 16.88M 1.98M 2.83G emp_app1 206.20K 13.62M 1.34M 1.92G emp_app1 18.87K 1.25M 79.81K 114.23M emp_app1 246 17.08K 0 0This five year old T2000 can move at least 1.2 Gbps of data, internally,but that took five simultaneous ftp sessions. (A better measurement method,one that doesn't include the limits of disk drives, would yield better results,and newer systems, either x86 or SPARC, have higher internal bandwidthcharacteristics.)In any case, the maximum data rate is not interesting for our purpose,which is demonstration of the ability to cap that rate.You can often resolve a network bottleneck while maintaining workloadisolation, by moving two separate workloads onto the same system, within separatezones. However, you might choose to limit their bandwidth consumption. Fortunately,the NV tools in Solaris 11 Express enable you to accomplish that:GZ# dladm set-linkprop -t -p maxbw=25m emp_app1GZ# dladm show-linkprop -p maxbw emp_app1LINK PROPERTY PERM VALUE DEFAULT POSSIBLEemp_app1 maxbw rw 25 -- --Note that the change to the bandwidth cap was made while the zone was running,potentially while network traffic was flowing. Also, changes made by dladm arepersistent across reboots of Solaris unless you specify a "-t" on command line.Data moves much more slowly now:GZ# # dlstat -i 20 emp_app1 LINK IPKTS RBYTES OPKTS OBYTES emp_app1 23.84M 1.82G 46.44M 66.28G emp_app1 192 16.10K 0 0 emp_app1 1.15K 79.21K 5.77K 8.24M emp_app1 18.16K 1.20M 40.24K 57.60M emp_app1 17.99K 1.20M 39.46K 56.48M emp_app1 17.85K 1.19M 39.11K 55.97M emp_app1 17.39K 1.15M 38.16K 54.62M emp_app1 18.02K 1.19M 39.53K 56.58M emp_app1 18.66K 1.24M 39.60K 56.68M emp_app1 18.56K 1.23M 39.24K 56.17M<many lines deleted>The data show an aggregate bandwidth of 24 Mbps.ConclusionThe network virtualization tools in Solaris 11 Express include variousresource controls. The simplest of these is the bandwidth cap, which you canuse to effectively limit the amount of bandwidth that a workload can consume.Both physical NICs and virtual NICs may be capped by using this simple method.This also applies to workloads that are in Solaris Zones - both default zones andSolaris 10 Zones which mimic Solaris 10 systems.Next time we'll explore some other virtual network architectures.

Resource Controls This is the fourth part of a series of blog entries about Solaris network virtualization.Part 1 introduced network virtualization,Part 2discussed network resource management...

Solaris 11

Virtual Network - Part 3

This is the third in a series of blog entries that discuss the networkvirtualization features in OracleSolaris 11 Express.Part 1 introduced the concept of network virtualizationand listed the basic virtual network elements that Solaris 11 Express(S11E) provides.Part 2 expanded on the concepts and discussed the resourcemanagement features which can be applied to those virtual network elements(VNEs).This blog entry assumes that you have some experience with Solaris Zones.If you don't, you can read my earlier blog entries, or buy the book"Oracle Solaris 10 System Virtualization Essentials"or readthedocumentation.This entry will demonstrate the creation of some of these VNEs.For today's examples, I will use an old Sun Fire T2000 that has oneSPARC CMT (T1) chip and 32GB RAM. I will pretend that I am implementinga 3-tier architecture in this one system, where each tier is representedby one Solaris zone. The mythical example provides access to an employeedatabase. The 3-tier service is named 'emp' and VNEs will use 'emp' intheir names to reduce confusion regarding the dozens of VNEs we expectto create for the services this system will deliver.The commands shown below use the prompt "GZ#" to indicate that thecommand is entered in the global zone by someone with sufficient privileges.Similarly, the prompt "emp-web1#" indicates a command which is enteredin the zone "emp-web1" as a sufficiently privileged user.Fortunately, Solaris network engineers gathered all of the actionsregarding the management of network elements (virtual or physical) into onecommand: dladm(1M). You use dladm to create, destroy, and configuredatalinks such as VNICs. You can also use it to list physical NICs:GZ# dladm show-linkLINK CLASS MTU STATE BRIDGE OVERe1000g0 phys 1500 up -- --e1000g2 phys 1500 unknown -- --e1000g1 phys 1500 down -- --e1000g3 phys 1500 unknown -- --We need three VNICs for our three zones, one VNIC per zone. They will alsohave useful names - one for each of the tiers - and will share e1000g0:GZ# dladm create-vnic -l e1000g0 emp_web1GZ# dladm create-vnic -l e1000g0 emp_app1GZ# dladm create-vnic -l e1000g0 emp_db1GZ# dladm show-linkLINK CLASS MTU STATE BRIDGE OVERe1000g0 phys 1500 up -- --e1000g2 phys 1500 unknown -- --e1000g1 phys 1500 down -- --e1000g3 phys 1500 unknown -- --emp_web1 vnic 1500 up -- e1000g0emp_app1 vnic 1500 up -- e1000g0emp_db1 vnic 1500 up -- e1000g0GZ# dladm show-vnicLINK OVER SPEED MACADDRESS MACADDRTYPE VIDemp_web1 e1000g0 0 2:8:20:3a:43:c8 random 0emp_app1 e1000g0 0 2:8:20:36:a1:17 random 0emp_db1 e1000g0 0 2:8:20:b4:5b:d3 random 0The system has four NICs and three VNICs. Note that the name of aVNIC may not include a hyphen (-) but may include an underscore (_).VNICs that share a NIC appear to be attached together via a virtualswitch. That vSwitch is created automatically by Solaris. This diagramrepresents the NIC and NVEs we have created.Now that these datalinks - the VNICs - exist, we can assign them toour zones. I'll assume that the zones already exist, and just neednetwork assignment.GZ# zonecfg -z emp-web1 infozonename: emp-web1zonepath: /zones/emp-web1brand: ipkgautoboot: falsebootargs:pool:limitpriv:scheduling-class:ip-type: exclusivehostid:fs-allowed:GZ# zonecfg -z emp-web1zonecfg:emp-web1> add netzonecfg:emp-web1:net> set physical=emp_web1zonecfg:emp-web1:net> endzonecfg:emp-web1> exitThose steps can be followed for the other two zones and matching VNICs. After those steps are completed, our earlier diagram would look like this:Packets passing from one zone to another within a Solaris instance do notleave the computer, if they are in the same subnet and use the same datalink.This greatly improves network bandwidth and latency. Otherwise, the packetswill head for the zone's default router.Therefore, in the above diagram packets sent from emp-web1 destined for emp-app1would traverse the virtual switch, but not pass through e1000g0.This zone is an "exclusive-IP" zone, meaning that it "owns" its own networking.What is its view of networking? That's easy to determine. The zlogin(1M)command inserts a complete command-line into the zone. By default, the commandis run as the root user.GZ# zoneadm -z emp-web1 bootGZ# zlogin emp-web1 dladm show-linkLINK CLASS MTU STATE BRIDGE OVERemp_web1 vnic 1500 up -- ?GZ# zlogin emp-web1 dladm show-vnicLINK OVER SPEED MACADDRESS MACADDRTYPE VIDemp_web1 ? 0 2:8:20:3a:43:c8 random 0Notice that the zone sees its own VNEs, but cannot see NEs or VNEs in theglobal zone, or in any other zone.The other important new networking command in Solaris 11 Express is ipadm(1M).That command creates IP address assignments, enables and disables them,displays IP address configuration information, and performs other actions.The following example shows the global zone's view before configuringIP in the zone:GZ# ipadm show-ifIFNAME STATE CURRENT PERSISTENTlo0 ok -m-v------46 ---e1000g0 ok bm--------4- ---GZ# ipadm show-addrADDROBJ TYPE STATE ADDRlo0/v4 static ok static ok static ok static ok static ok static ok ::1/128lo0/? static ok ::1/128lo0/? static ok ::1/128lo0/? static ok ::1/128At this point, not only does the zone know it has a datalink (which we sawabove) but the IP tools show that it is there, ready for use. The nextexample shows this:GZ# zlogin emp-web1 ipadm show-ifIFNAME STATE CURRENT PERSISTENTlo0 ok -m-v------46 ---GZ# zlogin emp-web1 ipadm show-addrADDROBJ TYPE STATE ADDRlo0/v4 static ok static ok ::1/128An ethernet datalink without an IP address isn't very useful, so let'sconfigure an IP interface and apply an IP address to it:GZ# zlogin emp-web1 ipadm show-ifIFNAME STATE CURRENT PERSISTENTlo0 ok -m-v------46 ---GZ# zlogin emp-web1 ipadm show-addrADDROBJ TYPE STATE ADDRlo0/v4 static ok static ok ::1/128GZ# zlogin emp-web1 ipadm create-if emp_web1GZ# zlogin emp-web1 ipadm show-ifIFNAME STATE CURRENT PERSISTENTlo0 ok -m-v------46 ---emp_web1 down bm--------46 -46GZ# zlogin emp-web1 ipadm create-addr -T static -a local= emp_web1/v4staticGZ# zlogin emp-web1 ipadm show-addrADDROBJ TYPE STATE ADDRlo0/v4 static ok static ok static ok ::1/128GZ# zlogin emp-web1 ifconfig emp_web1emp_web1: flags=1000843 mtu 1500 index 2 inet netmask ffffff00 broadcast ether 2:8:20:3a:43:c8The last command above shows the "old" way of displaying IP addressconfiguration. The command ifconfig(1) is still there, but thenew tools dladm and ipadm provide a more consistent interface, withwell-defined separation between datalink management and IP management.Of course, if you want the zone's outbound packets to be routed to other networks,you must use the route(1M) command, the /etc/defaultrouter file, or both.Next time, I'll show a new network measurement tool and the ability to controlthe amount of network bandwidth consumed.

This is the third in a series of blog entries that discuss the network virtualization features in Oracle Solaris 11 Express.Part 1 introduced the concept of network virtualizationand listed the basic...


Virtual Networks - Part 2

This is the second in a series of blog entries that discuss the network virtualization features in Solaris 11 Express.The firstentry discussed the basic concepts and the virtual network elements,including virtual NICs, VLANs, virtual switches, and InfiniBand datalinks.This entry adds to that list the resource controls and security features that are necessary for a well-managed virtual network.Virtual Networks, Real Resource ControlsIn Oracle Solaris 11 Express, there are four main datalink resource controls:a bandwidth cap, which limits the amount of traffic passing througha datalink in a small amount of elapsed timeassignment of packet processing tasks to a subset of the system's CPUsflows, which were introduced in the previous blog postrings, which are hardware or software resources that can be dedicatedto a single purpose.Let's take them one at a time. By default, datalinks such as VNICs can consume as much of the physical NIC's bandwidth as they want. That mightbe the desired behavior, but if it isn't you can apply the property "maxbw" to a datalink. The maximum permitted bandwidth can be specified inKbps, Mbps or Gbps. This value can be changed dynamically, so if you set this value too low, you can change without affecting the traffic flowingover that link. Solaris will not allow traffic to flow over that datalink at a rate faster than you specify.You can "over-subscribe" this bandwidth cap: the sum of the bandwidth caps on the VNICs assigned to a NIC can exceed the rated bandwidth ofthe NIC. If that happens, the bandwidth caps become less effective.In addition the bandwidth cap, packet processing computation can be constrained to the CPUs associated with a workload.First some background. When Solaris boots, it assigns interrupt handlerthreads to the CPUs in the system.(See SolarisCPUs for an explanation of the meaning of "CPU".)Solaris attempts to spread the interrupt handlers outevenly so that one CPU does not become a bottleneck for interrupt handling.If you create non-default CPU pools, the interrupt handlers will retain their CPU assignments. One unintended side effect of this is a situationwhere the CPUs intended for one workload will be handling interrupts caused by another workload. This can occur even with simple configurationsof Solaris Zones. In extreme cases, network packet processing for one zone can severely impact the performance of another zone.To prevent this behavior, Solaris 11 Express offers the ability to assign a datalink's interrupt handler to a set of CPUs or a pool of CPUs.To simplify this further, the obvious choice is made for you, by default, for a zone which is assigned its own resource pool. Whensuch a zone boots, a resource pool is created for the zone, a sufficient quantity of CPUs is moved from the default pool to the zone's pool, andinterrupt handlers for that zone's datalink(s) are automatically reassigned to that resource pool.Network flows enable you to create multiple lanes of traffic. Thisallows the parallelization of network traffic. You can assign a bandwidthcap to a flow. Flows were introduced inthe previouspost and will be discussed further in future posts.Finally, the newest high speed NICs support hardware rings: memory resourcesthat can be dedicated to a particular set of network traffic. For inboundpackets, this is the first resource control that separates network trafficbased on packet information such as destination MAC address. By assigningone or more rings to a stream of traffic, you can commit sufficienthardware resources to it and ensure a greater relative priority for thosepackets, even if another stream of traffic on the same NIC would otherwisecause congestion and impact packet latency of all streams.If you are using a NIC that does not support hardware rings, Solaris 11Express support software rings which cause a similar effect.Virtual Networks, Real SecurityIn addition to rescource controls, Solaris 11 Express offers datalinkprotection controls. These controls are intended to prevent a userfrom creating improper packets that would cause mischief on the network.The mac-nospoof property requires that outgoing packets have a MAC addresswhich matches the link's MAC address. The ip-nospoof property implementsa similar restriction, but for IP addresses. The dhcp-nospoof propertyprevents improper DHCP assignment.Summary (so far)The network virtualization features in Solaris 11 Express enable thecreation of virtual network devices, leading to the implementation ofan entire network inside one Solaris system. Associated resource controlfeatures give you the ability to manage network bandwidth as a resourceand reduce the potential for one workload to cause network performanceproblems for another workload. Finally, security features help you minimizethe impact of an intruder.With all of the introduction out of the way, next time I'll show someactual usesof these concepts.

This is the second in a series of blog entries that discuss the network virtualization features in Solaris 11 Express. The first entry discussed the basic concepts and the virtual network elements,incl...

Solaris 11

Virtual Networks

Network virtualization is one of the industry's hot topics. The potential toreduce cost while increasing network flexibility easily justifies the investmentin time to understand the possibilities. This blog entry describes networkvirtualization and some concepts. Future entries will show the steps tocreate a virtual network.Introduction to Network VirtualizationNetwork virtualization can be described as the process of creating a computernetwork which does not match the physical topology of a physical network.Usually this is achieved by using software tools of general-purpose computers orby using features of network hardware.A defining characteristic of a virtual network is the ability to re-configurethe topology without manipulating any physical objects: devices or cables.Such a virtual network mimics a physical network. Some types of virtual networks, for example virtual LANs (VLANs), can be implemented using features of network switches and computers.However, some other implementations do not require traditional network hardware such as routers and switches. All of the functionality of network hardware has been re-implemented in software, perhaps in the operating system.Benefits of network virtualization (NV) include increased architectural flexibility, better bandwidth and latency characteristics, the ability to prioritize network traffic to meet desired performance goals, and lower cost from fewer devices, reduced total power consumption, etc.The remainder of this blog entry will focus on a software-only implementation of NV.A few years ago, networking engineers at Sun began working on a software project named "Crossbow." The goal was to create a comprehensive set of NV features within Solaris. Just like Solaris Zones, Crossbow would provide integrated features for creation and monitoring of general purpose virtual network elements that could be deployed in limitless configurations. Becausethese features are integrated into the operating system, they automaticallytake advantage of - and smoothly interoperate with - existing features.This is most noticeable in the integration of Solaris NV features and SolarisZones. Also, because these NV features are a part of Solaris, future Solarisenhancements will be integrated with Solaris NV where appropriate.The core NV features were first released in OpenSolaris 2009.06. Since then, those core features have matured and more details have been added. The result is the ability to re-implement entire networks as virtual networks using Solaris 11 Express. Here is an example of a virtual network architecture:As you can guess from that example, you can create virtually :-) any network topology as a virtual network...Oracle SolarisNV does more than is described here. This content focuses onthe key features which might be used to consolidate workloads orentire networks into a Solaris system, using zones and NV features.Virtual Network ElementsSolaris 11 Express implements the following virtual network elements. NIC: OK, this isn't a virtual element, it's just on the list as a starting point.For a very long time, Solaris has managed Network Interface Connectors (NICs). Solaris offers tools to manage NICs, including bringing them up and down, and assigning various characteristics to them, such as IP addresses, assignment to IP Multipathing (IPMP) groups, etc. Note that up through Solaris 10, most of those configuration tasks were accomplished with theifconfig(1M) command, but in Solaris 11 Express thedladm(1M) and ipadm(1M) commands perform those tasks, and a few more. You can monitor the use of NICs with dlstat(1M). The term "datalink" is now used consistently to refer to NICs and things like NICs, such as...A VNIC is a pseudo interface created on a datalink (a NIC or an etherstub, described next). Each VNIC has its own MAC address, which can be generated automatically, but can be specified manually. For almost all purposes, a VNIC can be can be managed like a NIC. The dladm command creates, lists, deletes, and modifies VNICs. The dlstat command displays statistics about VNICs. The ipadm(1M) command configures IP interfaces on VNICs.Like NICs, VNICs have a number of properties that can be modified withdladm. These include the ability to force network processing of a VNIC to a certain set of CPUs, setting a cap (maximum) on permitted bandwidth for a VNIC, the relative priority of this VNIC versus other VNICs on the same NIC, and other properties.Etherstubs are pseudo NICs, making internal networks possible. For a general understanding, think of them as virtual switches. The command dladm manages etherstubs.A flow is a stream of packets that share particular attributes such as source IP address or TCP port number. Once defined, a flow can be managed as an entity, including capping bandwidth usage, setting relative priorities, etc. The new flowadm(1M) command enables you to create and manage flows. Even if you don't set resource controls, flows will benefit from dedicated kernel resources and more predictable, consistent performance. Further, you can directly observe detailed statistics on each flow, improving your ability to understand these streams of packets and set proper resource controls. Flows are managed with flowadm(1M) and monitored with flowstat(1M).VLANs (Virtual LANs) have been around for a long time. For consistency, the commands dladm, dlstat and ipadm now manage VLANs.InfiniBand partitions are virtual networks that use an InfiniBand fabric. They are managed with the same commands as VNICs and VLANs: dladm,dlstat, ipadm and others.SummarySolaris 11 Express provides a complete set of virtual network componentswhich can be used to deploy virtual networks within a Solaris instance.The next blog entry will describe network resource management and security. Future entries willprovide some examples.

Network virtualization is one of the industry's hot topics. The potential to reduce cost while increasing network flexibility easily justifies the investmentin time to understand the possibilities....

Solaris 11

All New Zonestat - Part 2

Part 2Recently Iintroduced zonestat(1), a new command offered in Solaris11 Express that replaces the zonestat Perl script that I had open-sourced a coupleof years ago. Today I will complete the description of the new zonestat.Fair Share SchedulerOne of the many useful resource controls offered by Solaris is the Fair ShareScheduler (FSS(7)). You can use FSS to tell Solaris "make sure that zoneXYZ canuse a specific portion of the compute capacity of a set of'Solaris CPUs'at a minimum.(For more information on FSS, see its man page and the Solaris 10 or Solaris 11Express documentation.) FSS only enforces those minima if there is CPU contention.That terse explanation used the phrase "of a set of CPUs" because the minimumportions of compute capacity enforced by Solaris can be calculated across allof the CPUs in the system, or across a set of CPUs that you have configured intoa processor set. For my purpose here, the point is that you can create aresource pool - including a processor set, assign multiple zones to thatpool, and tell Solaris to use the FSS scheduling algorithm for the processesto be scheduled on those CPUs. (See the libpool(3LIB) man page for more information.)Zonestat will show information relating to FSS, including data that answers thequestion "is there CPU contention in any of my psets, and if there is, whichzone(s) are using more than I expected?"The next example uses two new zones, and assumes that most of the zones usedearlier have been turned off for now. A dynamic resource pool, sharedDB, hasbeen created. It will be the set of CPUs used by two zones which run databasesoftware. (This method is called "capped Containers" in Oracle licensingdocuments and is considered hard partitioning. It can be used to limitsoftware license costs.) One ofthose zones, zoneDB-2, is more important than the other, zoneDB-1. To meet itsSLA, zoneDB-2 must always be able to use at least 4 of the 6 CPUs in that pset.# zonestat -r psets 10 1Collecting data for first interval...Interval: 1, Duration: 0:00:10PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXpset_default default-pset 22/22 1/- ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 0.12 0.56% - - - - - [system] 0.02 0.12% - - - - - global 0.09 0.44% - - - - -PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXsharedDB pool-pset 6/6 6/6 ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 4.01 66.8% - - 300 - - [system] 0.00 0.00% - - - - - zoneDB-2 3.00 50.1% - - 200 66.6% 75.1% zoneDB-1 1.00 16.7% - - 100 33.3% 50.2%PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXzoneA dedicated-cpu 4/4 4/4 ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 0.00 0.09% - - - - - [system] 0.00 0.00% - - - - - zoneA 0.00 0.09% - - - - -In the output above, zoneDB-1 is using the equivalent processing capacityof 1 Solaris CPU, and zoneDB-2 is using the equivalent of 3 Solaris CPUs.The PCT column indicates that zoneDB-2 is using 50% (3 out of 6) of theCPU capacity of the entire pset.In addition, the SHRS column shows the number of FSS shares assigned toeach of those zones, and %SHR is that zone's proportion of the totalnumber of shares. In other words, zoneDB-2 is using 50% of the pset, buthasn't even used its enforced minimum of 66.6%.This FSS configuration ensures that 200/300ths of 6 CPUs (i.e. 4 CPUs) areavailable to zoneDB-2. The %SHRU value of 75% tells us that the zone isusing 75% of those 4 CPUs.Again, each of those two zones is allowed to use more than its share,as long as each zone can use its specified minimum.Sorting the OutputYou may have noticed in earlier examples that the zones were not listed inalphabetical order. By default, they are sorted by the amount of theresource that zonestat was reporting. If a resource is not specified, theoutput is sorted on CPU%. The two rows showing total and system data arealways listed first.You can change the sort order with the -S option, for example:GZ$ zonestat -r processes -S used 2 1Collecting data for first interval...Interval: 1, Duration: 0:00:02PROCESSES SYSTEM LIMITsystem-limit 292K ZONE USED PCT CAP %CAP [total] 110 0.36% - - [system] 0 0.00% - - global 61 0.20% - - zoneD 26 0.08% - - zoneA 23 0.07% - -As the man page for zonestat(1) shows, many other resources can be monitored. I will not review each of them here.Aggregated DataBut wait! There's more! ;-)In addition to understanding resource usage over a short interval (e.g. 10 or 60seconds) it is often to necessary to understand the peak usage or average usageover a longer period of time. For that purpose, zonestat provides Summary Reports.A simple summary report is one that is appended to the per-sample datashown earlier. Here is the output for two samples and one summary report:GZ$ zonestat -r processes -R high -S used 10 2Collecting data for first interval...Interval: 1, Duration: 0:00:10PROCESSES SYSTEM LIMITsystem-limit 292K ZONE USED PCT CAP %CAP [total] 109 0.36% - - [system] 0 0.00% - - global 61 0.20% - - zoneD 25 0.08% - - zoneA 23 0.07% - -Interval: 2, Duration: 0:00:20PROCESSES SYSTEM LIMITsystem-limit 292K ZONE USED PCT CAP %CAP [total] 109 0.36% - - [system] 0 0.00% - - global 61 0.20% - - zoneD 25 0.08% - - zoneA 23 0.07% - -Report: High Usage Start: Thu Dec 2 21:58:54 EST 2010 End: Thu Dec 2 21:59:14 EST 2010 Intervals: 2, Duration: 0:00:20PROCESSES SYSTEM LIMITsystem-limit 292K ZONE USED PCT CAP %CAP [total] 109 0.36% - - [system] 0 0.00% - - global 61 0.20% - - zoneD 25 0.08% - - zoneA 23 0.07% - -If you only need the summary report, -q will be useful: it suppresses the individualsamples of data.GZ$ zonestat -q -r physical-memory -R high -S used 10 2Report: High Usage Start: Thu Dec 2 22:03:54 EST 2010 End: Thu Dec 2 22:04:14 EST 2010 Intervals: 2, Duration: 0:00:20PHYSICAL-MEMORY SYSTEM MEMORYmem_default 31.8G ZONE USED PCT CAP %CAP [total] 5205M 15.9% - - [system] 2790M 8.54% - - zoneD 2229M 6.83% - - global 141M 0.43% - - zoneA 44.5M 0.13% - -Let's assume that, from the data above, we determine that zoneD will never needmore than 3 GB of RAM when it is operating correctly. We can add a RAMcap so that Solaris enforces that limit in case something goes awry:GZ$ pfexec rcapadm -z zoneD -m 3gGZ$ zonestat -q -r physical-memory -R high -S used 10 2Report: High Usage Start: Thu Dec 2 22:37:15 EST 2010 End: Thu Dec 2 22:37:35 EST 2010 Intervals: 2, Duration: 0:00:20PHYSICAL-MEMORY SYSTEM MEMORYmem_default 31.8G ZONE USED PCT CAP %CAP [total] 5204M 15.9% - - [system] 2789M 8.54% - - zoneD 2229M 6.83% 3072M 72.5% global 141M 0.43% - - zoneA 44.5M 0.13% - -In addition to a summary report at the end of the output, zonestat is able togenerate periodic summary reports based on the collection of individual samples.For example, you might want to know what the peak memory usage is for each zone,per hour.(A quick side note: zonestat does not perform continuous data collection.It collects data at an interval you specify. Therefore, the peak valuesreported by zonestat are the peak values of the values which werecollected. In other words, zonestat reports the peak observed values.)The example below collects data every 10 seconds for 24 hours. It reports the peakobserved values every hour.GZ$ zonestat -q -r physical-memory -R high 10 24h 60mReport: High Usage Start: Mon Dec 5 16:42:01 EST 2010 End: Mon Dec 5 17:42:01 EST 2010 Intervals: 360, Duration: 1:00:00PHYSICAL-MEMORY SYSTEM MEMORYmem_default 31.8G ZONE USED PCT CAP %CAP [total] 3015M 9.23% - - [system] 2791M 8.55% - - global 136M 0.41% - - zoneA 44.5M 0.13% - - zoneD 45.7M 0.13% 3072M 1.48%Report: High Usage Start: Mon Dec 5 17:42:01 EST 2010 End: Mon Dec 5 18:42:01 EST 2010 Intervals: 10, Duration: 1:00:00PHYSICAL-MEMORY SYSTEM MEMORYmem_default 31.8G ZONE USED PCT CAP %CAP [total] 3015M 9.23% - - [system] 2791M 8.55% - - global 136M 0.41% - - zoneA 44.5M 0.13% - - zoneD 64.3M 0.19% 3072M 2.09%Report: High Usage Start: Mon Dec 5 18:42:01 EST 2010 End: Mon Dec 5 19:42:01 EST 2010 Intervals: 15, Duration: 1:00:00PHYSICAL-MEMORY SYSTEM MEMORYmem_default 31.8G ZONE USED PCT CAP %CAP [total] 3015M 9.23% - - [system] 2791M 8.55% - - global 136M 0.41% - - zoneA 44.5M 0.13% - - zoneD 65.1M 0.19% 3072M 2.11%[further output deleted]Parseable DataAt this point (especially after reading the sub-title of this section...)you might think that zonestat should have an option to generate output whichis easy to parse. And you won't be disappointed: CSV (colon-separated-value)output is the result of the lower-case -p option:GZ$ zonestat -p -q -r physical-memory -R high -S used 10 2report-high:header:20101203T034655Z:20101203T034715Z:10:20report-high:physical-memory:mem_default:[resource]:33423360Kreport-high:physical-memory:mem_default:[total]:5329836K:15.94%:-:-report-high:physical-memory:mem_default:[system]:2856556K:8.54%:-:-report-high:physical-memory:mem_default:zoneD:2282968K:6.83%:3145728K:72.57%report-high:physical-memory:mem_default:global:144608K:0.43%:-:-report-high:physical-memory:mem_default:zoneA:45576K:0.13%:-:-report-high:footer:20101203T034715Z10:20Even a small number of options can generate a great deal of output:GZ$ zonestat -p -r psets -R high 10 2interval:header:since-last-interval:20101203T035201Z:1:10interval:processor-set:default-pset:pset_default:[resource]:24:24:1:-:0-04-03.01interval:processor-set:default-pset:pset_default:[total]:0.09:0.38%:-:-:-:-:-:0-00-00.92interval:processor-set:default-pset:pset_default:[system]:0.01:0.06%:-:-:-:-:-:0-00-00.15interval:processor-set:default-pset:pset_default:global:0.07:0.31%:-:-:-:-:-:0-00-00.77interval:processor-set:dedicated-cpu:zoneD:[resource]:4:4:4:4:0-00-40.50interval:processor-set:dedicated-cpu:zoneD:[total]:1.00:25.01%:-:-:-:-:-:0-00-10.13interval:processor-set:dedicated-cpu:zoneD:[system]:0.00:0.05%:-:-:-:-:-:0-00-00.02interval:processor-set:dedicated-cpu:zoneD:zoneD:0.99:24.95%:-:-:-:-:-:0-00-10.10interval:processor-set:dedicated-cpu:zoneA:[resource]:4:4:4:4:0-00-40.50interval:processor-set:dedicated-cpu:zoneA:[total]:0.00:0.01%:-:-:-:-:-:0-00-00.00interval:processor-set:dedicated-cpu:zoneA:[system]:0.00:0.00%:-:-:-:-:-:0-00-00.00interval:processor-set:dedicated-cpu:zoneA:zoneA:0.00:0.01%:-:-:-:-:-:0-00-00.00interval:footer:20101203T035201Z10:10interval:header:since-last-interval:20101203T035211Z:2:20interval:processor-set:default-pset:pset_default:[resource]:24:24:1:-:0-04-03.08interval:processor-set:default-pset:pset_default:[total]:0.07:0.32%:-:-:-:-:-:0-00-00.79interval:processor-set:default-pset:pset_default:[system]:0.01:0.06%:-:-:-:-:-:0-00-00.15interval:processor-set:default-pset:pset_default:global:0.06:0.26%:-:-:-:-:-:0-00-00.63interval:processor-set:dedicated-cpu:zoneD:[resource]:4:4:4:4:0-00-40.51interval:processor-set:dedicated-cpu:zoneD:[total]:1.00:25.01%:-:-:-:-:-:0-00-10.13interval:processor-set:dedicated-cpu:zoneD:[system]:0.00:0.12%:-:-:-:-:-:0-00-00.04interval:processor-set:dedicated-cpu:zoneD:zoneD:0.99:24.89%:-:-:-:-:-:0-00-10.08interval:processor-set:dedicated-cpu:zoneA:[resource]:4:4:4:4:0-00-40.51interval:processor-set:dedicated-cpu:zoneA:[total]:0.00:0.01%:-:-:-:-:-:0-00-00.00interval:processor-set:dedicated-cpu:zoneA:[system]:0.00:0.00%:-:-:-:-:-:0-00-00.00interval:processor-set:dedicated-cpu:zoneA:zoneA:0.00:0.01%:-:-:-:-:-:0-00-00.00interval:footer:20101203T035211Z10:20report-high:header:20101203T035151Z:20101203T035211Z:10:20report-high:processor-set:default-pset:pset_default:[resource]:24:24:1:-:0-04-03.08report-high:processor-set:default-pset:pset_default:[total]:0.09:0.38%:-:-:-:-:-:0-00-00.92report-high:processor-set:default-pset:pset_default:[system]:0.01:0.06%:-:-:-:-:-:0-00-00.15report-high:processor-set:default-pset:pset_default:global:0.07:0.31%:-:-:-:-:-:0-00-00.77report-high:processor-set:dedicated-cpu:zoneD:[resource]:4:4:4:4:0-00-40.51report-high:processor-set:dedicated-cpu:zoneD:[total]:1.00:25.07%:-:-:-:-:-:0-00-10.15report-high:processor-set:dedicated-cpu:zoneD:[system]:0.00:0.12%:-:-:-:-:-:0-00-00.04report-high:processor-set:dedicated-cpu:zoneD:zoneD:0.99:24.95%:-:-:-:-:-:0-00-10.10report-high:processor-set:dedicated-cpu:zoneA:[resource]:4:4:4:4:0-00-40.51report-high:processor-set:dedicated-cpu:zoneA:[total]:0.00:0.01%:-:-:-:-:-:0-00-00.00report-high:processor-set:dedicated-cpu:zoneA:[system]:0.00:0.00%:-:-:-:-:-:0-00-00.00report-high:processor-set:dedicated-cpu:zoneA:zoneA:0.00:0.01%:-:-:-:-:-:0-00-00.00report-high:footer:20101203T035211Z10:20With parseable output, you can easily write scripts that consume the output. Thosescripts can further analyze the data, draw colorful graphs, and perform other datamanipulation.Miscellaneous CommentsA few parting notes:You can run zonestat in any zone. It will only receive data that should be visibleto that zone. For example, when run in a non-global zone, zonestat will only displaydata about the processor set on which that zone's processes run.Zonestat gets all of the data from the zonestatd daemon, which is part of theservice svc:/system/zones-monitoring:default. That service is enabled by default.Because that service is managed by SMF, if for any reason zonestatd stops, SMF willrestart it.You can specify the interval at which zonestatd gathers data by setting thezones-monitoring:default service property config/sample_interval.SummaryThe new zonestat will provide hours of entertainment. It also providesanswers to countless questions regarding your zones' use of system resources.Armed with that information, you can improve your understanding of theresource consumption of those zones, and improve the use of resource controlsto ensure predictable performance of your workloads.I hope that you have enjoyed learning about zonestat as much as I did. Check back duringthe first week of January for information on other new observability tools in Solaris 11 Express!var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ?"https://secure." : "http://www.");document.write("");

Part 2 Recently I introduced zonestat(1), a new command offered in Solaris 11 Express that replaces the zonestat Perl script that I had open-sourced a coupleof years ago. Today I will complete the...

Solaris 11

All New Zonestat!

Part 1Recently I gavea brief overview of the enhancements available in Solaris 11 Express. I also hinted at more blog entries, mostly featuring Solaris Zones and network virtualization.Before experimenting with new functions it's useful to have some tools to measure the results. With that in mind, this blog entry and its successor(s) will discuss new measurement tools that are inSolaris 11 Express: zonestat(1), flowstat(1M) and dlstat(1M). I will start with zonestat.Zonestat IntroductionBut first some history. For Solaris 10 I created an open-source tool I named "zonestat". That tool filled a need: one integrated view of the resource consumption and optional resource control settings of all running zones. The resources listed included CPUs, physical memory, virtual memory, and locked memory. Zonestat provided a "dashboard" that greatly eased the task of monitoring the resource usage of Solaris Zones.That tool has these main drawbacks: It's written in Perl and uses a large set of existing Solaris commandsto gather all of the data that it needs. Executing all of those commandsfor each data sample uses a significant amount of CPU time. It is a separate tool, not part of Solaris. It is not supported. It was originally intended as a prototype, a demonstration of whatcould be accomplished. I made a number of enhancements along the way,but for a while it wasn't clear whether it made sense to upgrade it forSolaris 11.However, even with those shortcomings, that zonestat script was put into productionat a number of data centers.In 2009 Solaris Engineering decided to write a fully supported version ofzonestat, as a new Solaris command. Instead of someone writing code in hisspare time (me), a member of the Solaris Zones Engineering Team (SteveLawrence) was assigned to write a comprehensive, efficient, fully featured toolthat achieved many of the same goals as the original zonestat, and many more.Using the experience gained from the original zonestat script, a completely newprogram (also called "zonestat") had the potential to solve all of the problems of theopen-source Perl script, and add new features which had been requested byusers of Solaris Zones, and other features which the Zones Engineering Team knewwould be useful.And in Solaris 11 Express that potential was realized. Because the new zonestat performs almost all of the functions of the original zonestat script, and performs far more in addition, the rest of this blog entry (and the next one) will only discuss the new zonestat which is part of Solaris 11 Express.The new zonestat(1) command has a plethora of options. These optionsallow the user to list data:for each of the system's zones, including the global zone anddata specific to kernel processing but not directly attributable to any onezonefor any subset of zonesregarding one or more types of resources, in absolute units or as aportion of available or capped resourcesregarding one or more instances of resources (e.g. a particularprocessor set)that has been sorted by one or more output columnsthat is human-readable output or output that can be easily parsed bya script or other programthat includes timestamps, in one of several formatsthat includes regular aggregations (called "summary reports"), such as"highest value during the interval" or "average value during the interval"Zonestat has a variety of uses. The most obvious is monitoring resourceusage of zones. Even if you don't use resource controls, zonestat willhelp you by telling you when a zone is using a significant portion (or all!)of the system's resources. Of course, zonestat really brings value tosystems that are using resource controls, making it easy to determinewhich zones are near their caps - a sure sign that there is a problemwith that zone's workload or that the zone's cap is too low.In addition, you can use zonestat to determine proper values for resourcecontrols. For example, you can deploy a workload in a zone and usezonestat to determine the maximum amount of CPU capacity it uses. Thatinformation will enable you to make better decisions about how many CPUsto assign to that zone - if you have decided that the workload should use its own, dedicated CPUs to the zone.If you are not familiar with the resource management controls offered bySolaris, you may wish to view therelevantdocumentation before, during or after reading the rest of this. The book"Oracle Solaris 10 System Virtualization Essentials" also describes all ofthe resource controls available for Solaris 10 Zones, and how they can beused to achieve various goals. Finally, the document"Understandingthe Security Capabilities of Solaris" approaches the same content from asecurity perspective.Now let's explore some of the interesting things you can do with zonestat.BasicsThe default output provides the data you would expect - basic information about the resource usage of all zones on the system. The command syntax can be simplified to this (omitting some features for now):zonestat [options] interval [duration]and the basic output looks like this:GZ$ zonestat 5 2Collecting data for first interval...Interval: 1, Duration: 0:00:05SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G ----------CPU---------- ----PHYSICAL----- -----VIRTUAL----- ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP [total] 0.10 0.31% - - 3109M 9.52% - 7379M 15.0% - [system] 0.01 0.04% - - 2797M 8.57% - 7115M 14.5% - global 0.08 0.51% - - 141M 0.43% - 129M 0.26% - zoneA 0.00 0.02% - - 43.7M 0.13% - 35.4M 0.07% - zoneB 0.00 0.02% - - 42.0M 0.12% - 32.8M 0.06% - zoneC 0.00 0.04% - - 42.0M 0.12% - 32.8M 0.06% - zoneD 0.00 0.02% - - 42.1M 0.12% - 33.2M 0.06% -Interval: 2, Duration: 0:00:10SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G ----------CPU---------- ----PHYSICAL----- -----VIRTUAL----- ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP [total] 0.09 0.30% - - 3109M 9.52% - 7379M 15.0% - [system] 0.01 0.03% - - 2797M 8.57% - 7115M 14.5% - global 0.08 0.51% - - 142M 0.43% - 129M 0.26% - zoneA 0.00 0.02% - - 43.7M 0.13% - 35.4M 0.07% - zoneB 0.00 0.02% - - 42.0M 0.12% - 32.8M 0.06% - zoneC 0.00 0.02% - - 42.0M 0.12% - 32.8M 0.06% - zoneD 0.00 0.02% - - 42.1M 0.12% - 33.2M 0.06% -First, note that unlike other Solaris stat tools (e.g. vmstat) the first set of data is not a summary since the system booted. Instead, zonestat pauses for the time interval specified on the command line, at which point it displays data representing the first sample. (Zonestat doesn't actually collect the data. Its companion, zonestatd(1M) performs that service for all zonestat clients.)Also, you probably noticed those two special lines, "[total]" and "[system]".The first of those indicates data about the total quantity of each resource,across the whole system. The lines labeled "[system]" show resourceconsumption by the kernel or by processes that aren't associated with anyone zone.Zonestat can produce a great deal of information - more than will fit onone line. Its various options allow you to view summary data - as providedin the default - or to focus on a zone, or on a particular type or instanceof a resource, or a combination of those. Obviously, the header will betailored to the output requested.The summary header looks like this:Interval: 1, Duration: 0:00:05SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G ----------CPU---------- ----PHYSICAL----- -----VIRTUAL----- ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAPThe first line of data, per sample, tells you the ordinal number of thesample - not very useful if you're just checking a few seconds of data,but pretty helpful when you're scanning through 3 days worth of output.The Duration field is similar, but is a measurement of time since the command began.The SUMMARY line shows the quantity of CPUs that exist in the system andhow many of them are online. (I wrote an earlier blog entry about themethod thatSolaris uses to count CPUs.) That line also shows the system's amount ofRAM ("Physical") and Virtual Memory (the size of RAM plus swap space on disk).The ZONE column contains the name of the zone. The values in that row represent that zone's use of resources. The columns labeled USED show that zone's consumption of each resource. The unit depends on the resource.For CPUs, a value of 1 represents a "Solaris CPU." For memory, the unit is specified in the output.Besides those generic header elements, some are specific to a resource type.%PART shows the CPU utilization, as a percentage of the compute capacity of theprocessor set in which the zone's processes run. %CAP is the percentage of thezone's CPU cap which has been used recently (if a cap has been applied to the zone).%SHRU indicates the amount of CPU used as a percentage of the shares assigned tothe zone (if the Fair Share Scheduler is in use and shares have been assignedto this zone). The latter may occasionallyshow a surprising result: a value greater than 100%. I don't have space here toexplain the Fair Share Scheduler, but the short version is "FSS enforces a minimumamount of available CPU capicity if there is contention for the CPUs, but it doesnot enforce a maximum. If there isn't contention, any process which wants to consumeCPU cycles can do so - which can lead to a value greater than 100%."The PHYSICAL section shows the amount of RAM used, the portion of thesystem's memory (PCT) represented by that amount of RAM, and the portionof the zone's RAM cap, if one has been set. The VIRTUAL section hassimilar fields.Comparing Usage to CapsTo show the data you might see when a zone has a RAM cap, let's set one. Wecould do this in zonecfg(1M) for the next time the zone boots, but I don'tfeel like rebooting the zone, so let's add that cap while the zone runs.First, a quick check of the resource capping daemon. (In these examples, I amlogged in as a user which is configured to use non-default administrative privileges.To temporarily gain those privileges, I will use the pfexec(1) command.)GZ$ svcs rcapSTATE STIME FMRIonline Oct_28 svc:/system/rcap:defaultThe service is online, and the cap is easy to set:GZ$ pfexec rcapadm -z zoneB -m 512mGZ$ zonestat -z zoneB 10 1Collecting data for first interval...Interval: 1, Duration: 0:00:10SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G ----------CPU---------- ----PHYSICAL----- -----VIRTUAL----- ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP [total] 0.15 0.49% - - 3112M 9.53% - 7382M 15.0% - [system] 0.01 0.05% - - 2797M 8.57% - 7115M 14.5% - zoneB 0.00 0.10% - - 42.5M 0.13% 8.31% 33.3M 0.06% -ZoneB is using 42.5MB, which is 0.13% of the system's memory (31.8GB),and 8.31% of the 512MB cap that we set.One of the many very useful abilities of zonestat is its ability to focuson a small part of the data which it can potentially display. The previousexample demonstrated its ability to limit the output to one zone. We canalso limit the output to just one resource type, or "zoom in" furtherto one instance of a resource.Let's limit our view to the RAM used by that zone:GZ$ zonestat -r physical-memory -z zoneB 10 1Collecting data for first interval...Interval: 1, Duration: 0:00:10PHYSICAL-MEMORY SYSTEM MEMORYmem_default 31.8G ZONE USED PCT CAP %CAP [total] 3113M 9.53% - - [system] 2797M 8.56% - - zoneB 42.5M 0.13% 512M 8.31%We can "zoom out" and look at all of the processor sets and their zoneassignments (something that was difficult in Solaris 10):GZ$ pfexec zoneadm -z zoneA bootGZ$ pfexec zoneadm -z zoneC bootGZ$ pfexec zoneadm -z zoneD bootGZ$ pfexec zonestat -r psets 10 1Collecting data for first interval...Interval: 1, Duration: 0:00:10PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXpset_default default-pset 16/16 1/- ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 0.17 1.10% - - - - - [system] 0.03 0.23% - - - - - global 0.13 0.86% - - - - -PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXzoneD dedicated-cpu 4/4 4/4 ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 1.10 27.6% - - - - - [system] 0.00 0.00% - - - - - zoneD 1.10 27.6% - - - - -PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXzoneC dedicated-cpu 4/4 4/4 ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 1.00 25.0% - - - - - [system] 0.08 2.00% - - - - - zoneC 0.92 23.0% - - - - -PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXzoneB dedicated-cpu 4/4 4/4 ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 0.00 0.14% - - - - - [system] 0.00 0.00% - - - - - zoneB 0.00 0.14% - - - - -PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAXzoneA dedicated-cpu 4/4 4/4 ZONE USED PCT CAP %CAP SHRS %SHR %SHRU [total] 0.00 0.14% - - - - - [system] 0.00 0.00% - - - - - zoneA 0.00 0.14% - - - - -With the basics out of the way, next time I will discuss some other optionsthat display other data and organize the output in different ways.var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ?"https://secure." : "http://www.");document.write("");

Part 1 Recently I gave a brief overview of the enhancements available in Solaris 11 Express. I also hinted at more blog entries, mostly featuring Solaris Zones and network virtualization.Before...


What's a Solaris CPU?

In the next few blog entries I will use the phrase "Solaris CPUs" to refer to the viewthat Solaris has of CPUs. In the old days, a CPU was a CPU - one chip, one computationalentity, one ALU, one FPU, etc. Now there are many factors to consider - CPU sockets,CPU cores per socket, hardware threads per core, etc. Solaris 10 and 11 Expressconsider "Solaris CPUs" (a phrase I made up) on which to schedule processes.Solaris considers each of these a "Solaris CPU":x86/x64 systems: a CPU core, or in some CPUs, a hardware thread (today, can be one to eight cores per socket, and one to 16 threads per socket), up to 128 "Solaris CPUs" in a Sun Fire X4800UltraSPARC-II, -III[+], -IV[+]: a CPU core, with a maximum of 144 in an E25KSPARC64-VI: a hardware thread, maximum of 256 in a Sun SPARC Enterprise M9000SPARC64-VII[+]: a hardware thread, maximum of 512 in an M9000SPARC CMT (SPARC-T1, -T2+, SPARC T3): a hardware thread, maximum of 512 in aSPARC T3-4SPARC T4: a hardware thread, maximum of 256 in a SPARC T4-4SPARC T5: a hardware thread, maximum of 1,024 in a SPARC T5-8SPARC M5: a hardware thread, maximum of 1,536 in a SPARC M5-32SPARC M6: a hardware thread, maximum of 3,072 in a SPARC M6-32SPARC M7: a hardware thread, maximum of 4,096 in a SPARC M7-16SPARC S7: a hardware thread, maximum of 128 in a SPARC S7-2Each of these "Solaris CPUs" can be controlled independently by Solaris. For example, each one can be configured into a processor set.[Edit 2013.04.25: Fixed a detail, and added T4, T5 and M5.][Edit 2013.11.05: Added M6.][Edit 2016.09.12: Added M7, S7]

In the next few blog entries I will use the phrase "Solaris CPUs" to refer to the view that Solaris has of CPUs. In the old days, a CPU was a CPU - one chip, one computationalentity, one ALU, one...


Oracle Solaris 11 Express Released!

What's New in Oracle Solaris 11 Express 2010.11 Oracle Solaris 11 Express was announced today. It is a fully supported, production-ready member of the Oracle Solaris family. First, here are the major additions and enhancements. I will expand on a few of them in future blog entries.IPS (Image Packaging System) - a new, network-based package management system which replaces the System V Release 4 packaging system which had been used for Solaris packages as well as non-Sun software. The SVR4 package tools are still there for non-Solaris packages. With IPS, package updates are automatically downloaded and installed, under your control, from a network-based repository. If you need custom environments or deliver software, you can create your own repositories. A system can be configured to get packages from multiple repositories.IPS DocumentationSolaris 10 Containers are Zones on a Solaris 11 Express system which mimic the operating environment of a Solaris 10 system. (This is the same concept as Solaris 8 Containers and Solaris 9 Containers, which run on Solaris 10 systems.) This feature set includes P2V and V2V tools to convert Solaris 10 systems, and native zones, respectively, into Solaris 10 Containers on an Oracle Solaris 11 Express system.Solaris 10 Containers DocumentationNetwork Virtualization and Resource Management: A comprehensive set of virtual network components (vNICs, vSwitches) and resource management and observability tools. When combined with Solaris Zones configured as routers, you can creat complete networks within one system. Existing tools, such as IP Filter, complete the picture by enhancing network isolation. The possibilities are endless - worth at least a few blog entries... ;-) Components include:Virtual NICs (VNICs): multiple VNICs can be configured to use one (physical) NIC. You can manage a VNIC with the same tools you use to manage NICs.Virtual switches (vSwitches): abstract, software network elements, to which VNICs can be attached. These elements fulfill the same purpose as physical switches.Virtual Routers: you can configure a Zone with multiple VNICs and configure routing in the Zone. You can also turn off all of the other services in the Zone, and configure IP Filter in that zone, to make a Router Zone behave just like a physical router - without using rack space, or using a lot of power, or being limited by the bandwidth of a physical connection.Bandwidth controls: you can use the dladm(1M) command to limit the amount of bandwidth that a NIC or VNIC can use. This can be used to prevent overrunning a NIC, or to simulate the limited bandwidth of a physical NIC, or as a sanity check in sophisticated network configurations.Network virtualization documentationZFS now includes dataset encryption, deduplication and snapshot-diff features.ZFS DocumentationHere is a list of the other major improvements.System ManagementBoot Environments: ZFS snapshots are used to create alternate boot environments, similar to those created by Live Upgrade in Solaris 10. Unlike LU, S11E BE's exist on the same root pool, so you can have multiple BE's on a single disk drive - although servers should really have mirrored root drives.InstallationAutomated Install: Replaces and extends JumpStart, built around IPS.DocumentationInteractive Text Install: Intended for GUI-less servers.Distribution Constructor: Create pre-configured bootable OS images with Distro Constructor, which uses IPS.Distro Constructor DocumentationVirtualizationThe functionality of the open-source 'zonestat' script has been re-written and expanded as a full, integrated Solaris 11 Express tool.Delegated Administration: The global zone administrator can delegate some of the management tasks for a zone to a specific global zone user.A simpler packaging model, using IPS, decreases the complexity of zoned environments.NetworkingSignificant Infiniband improvements, including SDP support.Improvements to IP Multipathing (IPMP)Network Auto-Magic remove the need to manually re-configure a laptop being moved around, or converting between physical and wireless networks.New L3/L4 load balancer.StorageZFS is the only root file system type.Time Slider automatic ZFS snapshot management, with a GUI folder-like view of the past.CIFS service enables highly scalable file servers for Microsof Windows environments.COMSTAR targets are now available for iSER, SRP and FcoE.SecurityRoot account is now a role, not a user.Labeled IPsec (and other Trusted Extensions enhancements)Trusted Platform Module supportHardwareNew hardware such as SPARC T3 systems.NUMA I/O features keep related processes, memory, and I/O channels "near" each other to maximize performance and minimized I/O latency.DISM performance improvements, especially for databases.Suspend and resume to RAM, especially for laptops.GNOME 2.30All of the Oracle Solaris 11 Express documentation is available.

What's New in Oracle Solaris 11 Express 2010.11 Oracle Solaris 11 Express was announced today. It is a fully supported, production-ready member of the Oracle Solaris family. First, here are the major...

Solaris 10 Containers

Virtual Overhead?

So you're wondering about operating system efficiency or the overhead ofvirtualization. How about a few data points?SAP created benchmarks thatmeasure transaction performance.One of them, the SAP SD, 2-Tier benchmark, behaves more like real-world workloadsthan most other benchmarks, because it exercises all of the parts of a system:CPUs, memory access, I/O and the operating system. The other factor thatmakes this benchmark very useful is the large number of results submittedby vendors. This large data set enables you to make educated performance comparisonsbetween computers, or operating systems, or application software.A couple of interesting comparisons can be made from this year's results.Many submissions use the same hardware configuration:two Nehalem (Xeon X5570) CPUs (8 cores total) running at 2.93 GHz, and48GB RAM (or more).Submitters used several different operating systems: Windows Server 2008 EE,Solaris 10, and SuSE Linux Enterprise Server (SLES) 10. Also, tworesults were submitted using some form of virtualization: Solaris 10 Containersand SLES 10 on VMware ESX Server 4.0.Operating System ComparisonThe first interesting comparison is of different operating systems and databasesoftware, on the same hardware, with no virtualization. Using the hardwareconfiguration listed above, the following results were submitted. The Solaris 10 andWindows results are the best results on each of those operating systems, onthis hardware. The SLES 10 result is the best of any Linux distro, with anyDB software, on the same hardware configuration.Operating SystemDBResult (SAPS)Solaris 10Oracle 10g21,000Windows Server 2008 EESQL Server 200818,670SLES 10MaxDB 7.817,380(Note that all of the results submitted in 2009 cannot be compared againstresults from previous years because SAP changed the workload.)With those data points, it's very easy to conclude that for transactional workloads, the combination ofSolaris 10 and Oracle 10g is roughly 20% more powerful than Linux and MaxDB.Virtualization ComparisonThe virtualization comparison is also interesting. The same benchmark was runusing Solaris 10 Containers and 8 vCPUs. It was also run using SLES 10on VMware ESX, also using 8 vCPUs.Operating SystemVirtualizationDBResult (SAPS)Solaris 10Solaris ContainersOracle 10g15,320SLES 10VMware ESXMaxDB 7.811,230InterpretationSome of the 36% advantage of the Solaris Containers result is due tothe operating systems and DB software, as we saw above. But the rest is dueto the virtualization tools.The virtualized and non-virtualized results for each OS had only one difference:virtualization was used. For example, the two Solaris 10 results shown aboveused the same hardware, the same OS, the same DB software and the same workload.The only difference was the use of Containers and the limitation of 8 vCPUs.If we assume thatSolaris 10/Oracle 10G is consistently 21% more powerful than SLES 10/MaxDB on thisbenchmark, than it's easy to conclude that VMWare ESX has 13% more overhead thanSolaris Containers when running this workload.However, the non-virtualized performance advantage of the Solaris 10 configuration overthat of SLES 10 may be different with 8 vCPUs than with 8 cores. If Solaris'advantage is less, then the overhead of VMware is even worse. If the advantage ofSolaris 10 Containers/Oracle over VMware/SLES 10/MaxDB with 8 vCPUs is more thanthe non-virtualized results, than the real overhead of VMware is not quite that bad.Without more data, it's impossible to know.But one of those three cases (same, less, more) is true. And the claimsby some people that VMware ESX has "zero" or "almost no" overhead are clearlyuntrue, at least for transactional workloads. For compute-intensive workloads,like HPC, the overhead of software hypervisors like VMware ESX is typically much smaller.What Does All That Mean?What does that overhead mean for real applications? Extra overhead meanslonger response times for transactions or fewer users per workload, or both.It also means that fewer workloads (guests) can be configured per system.In other words, response time should be better (or maximum number of users should be greater) if your transactional workload is running in a Solaris Container ratherthan in a VMware ESX guest. And when you want to add more workloads, Solaris Containersshould support more of those workloads than VMware ESX, on the same hardware.QualificationOf course, the comparison shown above only applies to certain types ofworkloads. You should test your workload on different configurationsbefore committing yourself to one.DisclosureFor more detail, seethe resultsfor yourself.SAP wants me to include the results:Best result for Solaris 10 on 2-way X5570, 2.93GHz, 48GB:Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 21,000 SAPS, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033.Best result for any Linux distro on 2-way X5570, 2.93GHz, 48GB:HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 17,380 SAPS, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006.Result on Solaris 10 using Solaris Containers and 8 vCPUs:Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034.Result on SuSE Enterprise Linux as a VMware guest, using 8 vCPUs:Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.SAP, R/3, reg TM of SAP AG in Germany and other countries.Addendum, added December 10, 2009:Today an associate reminded me that previous SAP SD 2-tier results demonstrated the overhead of Solaris Containers. Sun ran four copies of the benchmark on one system, simultaneously, one copy in each of four Solaris Containers. The system was a Sun Fire T2000, with a single 1.2GHz SPARC processor, running Solaris 10and MaxDB 7.5:2006029200603020060312006032The same hardware and software configuration - but without Containers - already had a submission:2005047The sum of the results for the four Containers can be compared to the single result for the configuration without Containers. The single system outpacedthe four Containers by less than 1.7%.Second Addendum, also added December 10, 2009:Although this blog entry focused on a comparison of performance overhead, there are othergood reasons to use Solaris Containers in SAP deployments. At least 10, in fact, as shown in thisslide deck. One interesting reason is that Solaris Containers is the only server virtualizationtechnology supported by both SAP and Oracle on x86 systems. var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ?"https://secure." : "http://www.");document.write("");

So you're wondering about operating system efficiency or the overhead of virtualization. How about a few data points?SAP created benchmarks that measure transaction performance.One of them, the SAP...

Politics, Global Topics

Drive Your Car for Free!?

We need MPK - a new way to measure the energy effiency of cars.For many years, we've used MPG (miles per gallon; apologies tothe the parts of the world that use a sane measurement system...)to measure fuel efficiency of cars.But as the automotive industry converts from fossil fuels toelectric cars, we will have confusion - until the governmentestablishes a new method.In the news last week, GM claimed that the Chevy Voltwillget 230 MPG. That leads me to picture a conversation ata car dealer:Car Salesperson: "...and the Volt gets 230 MPG!"Customer: "Wow! It doesn't use much gas, does it?"Sales: "That's right, it doesn't. And for the first 40 miles, it only uses the battery. Do you drive more than 40 miles most days?"Customer: "No, just 20 miles to work and back. So usually I won't use any gas?"Sales: "That's right."Customer: "So... basically I drive it for free?"(The salesperson knows that's not entirely true, and doesn't want to lie to the customer, so he just smiles.)Customer: "And if I drive it more than 40 miles, I'll be getting 230 MPG... that's ten times better than the car I drive now! It will cost less than one-tenth what I pay now!"Of course the problem is that electricity isn't free, and the rules allow a car company to publish a number which has little bearing on real life. As thearticle mentions, the Volt's battery must be recharged regularly.The GM CEO said that should cost "about 40 cents at off-peakelectricity rates in Detroit." GM is not lying to the public,nor are they pretending that the only expense of operation isthe cost of gasoline. Evidently the number 230 was derived byfollowing the rules.Unfortunately, most of us don't live in Detroit. For the USA, the averageresidential rate is 11.6 cents per kilowatt-hour (kWh), according to theEnergy Information Administration.What does it really cost to drive the Volt? According to the article,GM says it estimates 10 kWh to charge the Volt. That means For the first 40 milesper day, it costs $1.16. The article also says the Volt "might get asmuch as 50 mpg" when running on the gasoline engine. That's muchbetter than almost every other car on the road, but not 230 MPG.If you drive 80 miles in a day, you will use a full battery charge plusat least 0.8 gallons. Together, that will cost:$1.16 for the first 40 miles$2.00 for the next 40 miles (average gas price of $2.50 from www.eia.doe.gov)Total: at least $3.16.My car gets about 26 MPG. Driving 80 miles costs me $7.69 - more than double.(The quoted "as much as 50 MPG" means I pay less than double, but wedon't really know how much more I pay than I would with a Volt.)Most of us would save money by driving the Volt.The most important benefit of the Volt is reducing fossil fuel emissions.(I won't get into the fossil fuel emissions at the electric generationplant, 50% of which comes from burning coal...) But themisperception that the Volt - at 230 MPG - is ten times less costlythan your car may become widespread, unless GM is required to report energyefficiency in some other way.Don't get me wrong: I am happy that GM has developed the Volt, and I know thatthey didn't just make up the number 230. We do need to pay more attention to gasmileage and fossil fuel emissions. I hope that the Volt is successful.And I hope that soon there will be other electric cars. But how will wecompare their energy efficiency? A competing car company may state thattheir electric car is more efficient than the Volt. How will we know?For electric cars, we need a new metric: miles per kilowatt-hour (MPK).That will allow us to compare the cost of driving two different electriccars.And for mixed-energy cars (like the Volt) we will need to comparethe efficiency of the gas engine (MPG), and the electric engine (MPK),and we need to know how far the car will go on its electric engine.The last one could be MPC: miles per charge. Without all of thosenumbers, we can't compare the costs of driving two different cars.Sound complicated? You bet, especially when you include the need forhighway vs. city driving. But until we have more information, as consumers,we'll be confused, perhaps misled, by claims like "230 MPG."

We need MPK - a new way to measure the energy effiency of cars. For many years, we've used MPG (miles per gallon; apologies to the the parts of the world that use a sane measurement system...)to...

Solaris 10 Containers

Layered Virtualization

It's time for another guest blogger.Solving a Corner CaseOne of my former colleagues, Joe Yanushpolsky (josephy100 -AT- gmail.com) was recently involved in the movement of a latency-sensitive Linux application to Solaris as part of platform consolidation. The code was old and it required access to kernel routines not available under BrandZ. Using VirtualBox as a virtual x86 system, the task was easier than expected.BackgroundVirtualBox enables you to run multiple x86-based operating system "guests" on an x86 computer - desktop or server. Unlike other virtualization tools, like VMware ESX, VirtualBox allows you to keep your favorite operating system as the 'base' operating system. This is called a Type 2 hypervisor. For existing systems - especially desktops and laptops - this means you cankeep your current setup and applications and maintain their current performance. Only the guests will have reduced performance - more on thatlater.Here is Joe's report of his tests.The goals included allowing many people to independently run this application while sharing a server. It would be important to isolate each user from other users. But the resource controls included with VirtualBox were not sufficiently granular for the overall purpose. Solaris Containers (zones) have a richer set of resource controls. Would it be possible to combine Containers and VirtualBox?The answer was 'yes' - I tried two slightly different methods. Each method starts by installing VirtualBox in the global zone to set up a device entry and some of the software. Details are provided later. After thatis complete, the two methods differ.Create a Container and install VirtualBox in it. This is the Master WinXP VirtualBox (MWVB) Container. If any configuration steps specific to a WinXP environment are needed, they can be done now.When a Windows XP environment is needed, clone the MWVB Container and install WinXP in the clone.Management of the Container can be delegated to the user of the WinXP environment if you want.Create a Container and install VirtualBox in it. This is the Master CentOS VirtualBox (MCVB) Container.Install CentOS in the Container. When a CentOS environment is needed, clone the MCVB - including the copyof CentOS that's already in the Container - to create a new Container. Management of the Container can be delegated to the user of the CentOS environment if you want.In each case, resource controls can be applied to the Container to ensure that everyone gets a fair share ofthe system's resources like CPU, RAM, virtual memory, etc.When the process is complete, you have a guest OS, shown here via X Windows.Not only did the code run well but it did so in a sparse root non-global zone Well that was easy! How about Windows?Now, this is interesting. As long as the client VM is supported by VirtualBox, it can be installed and run in a Solaris/OpenSolaris Container. I immediately thought of several useful applications of this combination ofvirtualization technologies:migrate existing applications that are deemed "unmovable" to latest eco-friendly x64 (64-bit x86) platforms reduce network latency of distributed applications by collapsing the network onto a large memory system with zones, regardless of which OS the application components were originally written in on-demand provisioning, as a service, an entire development environment for Linux or Windows developers. When using ZFS, this could be accomplished in seconds - is this a "poor man's" cloud or what?!eliminate ISV support issues that are currently associated with BrandZ's lack of support for recent Linux kernels or Solaris 8 or 9 kernelwhat else can you create?Best of all, Solaris, OpenSolaris and VirtualBox can be downloaded and used free of charge. Simple to build, easy to deploy, low overhead, free - I love it!PerformanceThe advantage of having access to application code through Containers more than compensated for a 5% overhead (on a laptop) due to having a second kernel. The overall environment seems to be disk-sensitive (SSDs to the rescue!). Given that typical server load in a large IT shop is 15-20%, a number of such "foreign" zones could be added without impacting overall server performance.Future InvestigationsIt would be interesting to evaluate scalability of the overall environment by testing different resource controls in Solaris Containers and in VirtualBox. I'd need a machine bigger than the laptop for that :-).Installation DetailsHere are the highlights of "How to install." For more details, follow instructions in the VirtualBox User manual.Install VirtualBox on a Solaris x64 machine in the global zone so that the vboxdrv driver is available in the Solaris kernel.Create a target zone with access to the vboxdrv device ("add device; set match=/dev/vboxdrv; end").In the zone, clean up the artifacts of the previous VirtualBox installation in the global zone. All you need to do is to uninstall the SUNWvbox package and remove references to /opt/VirtualBox directory. Install VirtualBox package in the zone.Copy the OS distro into a file system in the global zone (e.g. /export/distros/centos.iso, and configure a loopback mount into the zone ("add fs; set dir=/mnt/images; set special=/export/distros; set type=lofs; end").Start VirtualBox in the zone and install the client OS distro.What advantages does this model have over other virtualization solutions?The Solaris kernel is the software layer closest to the hardware. WithSolaris, you benefit from the industry-leading scalability of Solaris and all of its innovations, like:ZFS for data protection - currently, neither Windows nor Linux distros have ZFS. You can greatly improve storage robustness of your Windows or Linux system by running it as a VirtualBox guest.SMF/FMA, which allows the whole system to tolerate hardware problemsDTrace, which allows you to analyze system performance issues while the apps are running. Although you can use DTrace in the 'base' Solaris OSenvironment to determine which guest is causing the performance issue, and whether the problem is network I/O, disk I/O, or something else, DTrace will not be able to "see" into VirtualBox guests to help figure out which particular application is the culprit - unless the guest is running Solaris, in which case you run DTrace in the guest!Cost: You can download and use Solaris and OpenSolaris without cost. You can download and use VirtualBox without cost. Some Linux distros are also free. What costs less than 'free?'What can you do with this concept? Here are some more ideas:Run almost any Linux apps on a Solaris system by running that Linux distro in VirtualBox - or a combination of different Linux distros.Run multiple Windows apps - even on different versions of Windows - on Solaris.Additional notes are available from the principal investigator, Joseph Yanushpolsky: josephy100 -AT- gmail.com .

It's time for another guest blogger.Solving a Corner Case One of my former colleagues, Joe Yanushpolsky (josephy100 -AT- gmail.com) was recently involved in the movement of a latency-sensitive Linuxapp...


What's Pluto?

Recently a young person asked me why Pluto isn't a planet. This seemed like a good educational opportunity. The explanation I used is much simpler than the official, scientific explanation - with its "planetary discrimants"and "aggregate masses" - and turned out better than I anticipated, so I thought I would share it with you. It seems appropriate for peoplewith at least a fourth or fifth grade education.The study of science results in "an organized body of knowledge gained through ... research." Observational science gathers data about the universe and classifies objects, life forms, etc. according tocharacteristics of those things.Applying those concepts to our solar system, we can measure those objects and group those with similar characteristics together.Let's try that.One of the most obvious characteristics of solar system objects is theircomposition. All of these objects fall neatly into one of three categoriesas shown by the accompanying graph:major rocky objects, e.g. Earth, Mars - these all have densities greater than 3.8 g/cm\^3gaseous objects, e.g. Jupiter, Saturn, some or all of which have cores of rock and/or metallic hydrogen - these all have densities between 0.69 and 1.6 g/cm\^3rock-ice bodies - objects with significant amounts of rock and ice,e.g. comets, Pluto and its satellite Charon - their densities range from1.0 to 3.0 g/cm\^3, with almost all of them in the range 1.0 to 2.0 g/cm\^3.The clearest distinction is between the three densest inner objects (Mercury, Venus, Earth) and all but one of the outer, rocky/icy objects.Haumea, in case you haven't heard of it, is the newest object to belabeled a "dwarf planet" by the International Astronomical Union. Although there is overlap in density between the gaseous objects and therocky/icy objects, no overlap between their sizes (masses) exists, asthis next graph shows (Earth is arbitrarily assigned a value of 1,000,000and the rest are scaled to that value):Visually, three groupings are discernible: the gaseous objects, thelarge rocky objects, and everything else. Mathematically, the groupingsare separated by more than an order of magnitude. In other words, the smallest member of one group is at least ten times the mass of the largestmember of the next group. Uranus is more than 14 times the mass of Earth,and Mercury is almost 20 times the mass of Eris, which is more massive thanPluto.Besides physical characteristics, the most useful ones are the orbitalelements.All members of our solar system orbit the sun, or orbit another non-stellarobject which in turn orbits the sun. These orbits are, almost entirely,described by Newtonian mechanics, the basic elements of which were first described by Johannes Kepler in 1609. Although an orbit has several characteristics, the simplest of them is the semi-major axis, which is often called the "average" distance between the orbiting body and the sun. Although not quite accurate,it's close enough for this purpose.Here is a graph of 17 of the most important bodies in our solar system. It shows the semi-major axis of each, relative to the semi-major axis of Earth,which is called an Astronomical Unit.It includes the four major rocky objects, the four gaseous objects, all five of the currently recognized dwarf planets, three asteroids, and five other relevant objects. Note that the orbital distances of Eris (68 AU)and Sedna (526 AU) are off the scale of this graph.Again, three groups appear in the graph: the inner rocky objects, the gaseous objects, and the outer bodies. Separation between objects increasesfrom the inner bodies to the outer ones, but suddenly, starting with Orcus,the separation between orbital distances shrinks considerably.From those three characteristics: density, mass, and orbital distance,it seems clear that there are at least three groups of major bodies inthe solar system:inner, rocky bodies, each having a mean density greater than 3.8 g/cm\^3,a mass larger than one-tenth Earth's mass, and an orbital distance less thanthan 2 AUgiant gaseous objects, each with a mean density less than 1.5 gm/cm\^3, amass larger than 14 times Earth's mass, and an orbital distance between 5 and32 AUdistant icy, rocky bodies, with a mean density less than 3 gm/cm\^3 (andalmost all less then 2), and an orbital distance greater than 35 AU.Object TypeDensityg/cm\^3Mass(Earth=1)Semi-Major Axis(AU, Earth=1)Inner, rocky>3.8>0.1<2Giant, gaseous<1.5>145-32Distant, icy, rocky<2 (except one)<1/200th>35All three of those groupings can be displayed in one graph, which showsthe distinction between the three different groups at a glance:In the graph above, the green bars show the range of values for the inner, rocky bodies, scaled so that the highest value is 100. The blue barsshow the ranges of values for the gaseous bodies. The purplish barsshows the ranges of values for the outer icy, rocky bodies.Note that only two ranges overlap: densit for the gaseous and the icy rockybodies. For all of the other characteristics, there are clear gaps betweenthe ranges of the groups.Now that we have clear groupings, the question becomes "which of thosegroups should be planets?" I hope it's obvious thatthe first two categories should be included in the list of 'planets.' The third group can be included or not, depending on how you want to definethe term 'planet.'However, there are two other factors which help me to decide. Initially,'planets' were the five wandering lights that weren't the Sun and Moon.These seven wandering lights were so important that early western culturesassigned a day of worship to each, leading eventually to the names of our days.If 'planets' started with the five wandering stars, it makes sense toadd other bodies which have similar characteristics - Uranus and Neptune -yielding a total of eight planets. But none of the others - Pluto, Haumea, Quaoar, etc. - are like the original wandering lights in the sky.Further, if we were to include the outer, rocky, icy bodies in the list of planets, the list grows significantly. Today, the list would include13 members, but another 40 known objects might be categorized with Eris, Pluto, et al., and another 150 or more are probably out there. If the category 'planet' can have 8 members or 200, I'll go with 8.Finally, regarding the question "is it 'right' to 'demote' Pluto?" The list of planetshas grown and shrunk several times throughout history. More than 25 bodies have been labeled 'planets' only to be 'demoted' later. Pluto is nothing special in this regard.

Recently a young person asked me whyPluto isn't a planet. This seemed like a good educational opportunity. The explanation I used is much simpler than the official, scientific explanation - with its"p...

Solaris 10 Containers

Zonestat 1.4 Now Available

I have posted Zonestat v1.4 at: the Zone Statistics project page (click on "Files" in the left navbar).Zonestat is a 'dashboard' for Solaris Containers. It shows resource consumption of each Container (aka Zone) and a comparison of consumption against limits you have set.Changes from v1.3:BugFix: various failures if the pools service was not online.V1.4 checks for the existence of the pools packages, and behavescorrectly whether they are installed and enabled, or not.BugFix: various symptoms if the rcapd service was not online.V1.4 checks for the existence of the rcap packages, and behavescorrectly whether they are installed and enabled, or not.BugFix: mis-reported shared memory usageBugFix: -lP produced human-readable, not machine-parseable outputBug/RFE: detect and fail if zone != global or user != rootRFE: Prepare for S10 update numbers past U6RFE: Add option to print entire name of zones with long namesRFE: Add timestamp to machine-consumable outputRFE: improve performance and correctness by collecting CPU%with DTrace instead of prstat Note that the addition of a timestamp to -P output changes theoutput format for "machine-readable" output.For most people, the most important change will be the use of DTraceto collect CPU% data. This has two effects. The first effect is improved correctness. The prstat command - used in V1.3 and earlier,can horribly underestimateCPU cycles consumed because it can miss many short-lived processes.The mpstat has its own problems with mis-counting CPU usage. So I expanded on a solution Jim Fiori offered, which uses DTrace toanswer the question "which zone is using a CPU right now?"The other benefit to DTrace is the improvement in performance ofZonestat.The less popular, but still interesting additions include:-N expands the width of the zonename field to the length of thelongest zone name. This preserves the entire zone name, for all zones,and also leaves the columns lined up. However, the length of the output lines will exceed 80 characters.The new timestamp field in -P output makes it easier for toolslike the "System Data Recorder" (SDR) to consume zonestat output.However, this was a change to the output format. If you have writtena script which used -P and assumed a specific format for zonestat output, you must change your script to understand the new format.Please send questions and requests to zones-discuss@opensolaris.org .

I have posted Zonestat v1.4 at:the Zone Statistics project page (click on "Files" in the left navbar). Zonestat is a 'dashboard' for Solaris Containers. It shows resourceconsumption of each Container...

Solaris 10 Containers

Patching Zones Goes Zoom!

Accelerated Patching of Zoned SystemsIntroductionIf you have patched a system with many zones, you have learned that ittakes longer than patching a system without zones. The more zones thereare, the longer it takes. In some cases, this can raise applicationdowntime to an unacceptable duration. Fortunately, there are a few methods which can be used to reduceapplication downtime. This document mentions many of them, and then describes the performance enhancements of two of them. But the bulk ofthis rather bulky entry is the description and results of my newestcomputer... "experiments."Executive Summary, for the Attention-Span ChallengedIt's important to distinguish between application downtime, service downtime, zone downtime, and platform downtime. 'Service' is the service being provided by an application or set of applications. To users, that's the most importantmeasure. As long as they can access the service, they're happy. (Doesn't take much, does it?)If a service depends on the proper operation of each of its component applications, planned or unplanned downtimeof one application will result in downtime of the service. Some software, e.g. web server software, can be deployed in multiple, load-balanced systems so that the service will not experience downtime even if one of the software instances is down.Applying an operating system patch may require service downtime, application downtime, zone downtime or platform downtime, depending on the patch and the entity being patched. Because in many cases patch application will require application downtime, the goal of the methods mentioned below, especially parallel patching of zones, is to minimize elapsed downtime to achieve a patched, running system.Just Enough Choices to ConfuseMethods that people use - or will soon use - to improve the patching experience of zoned systems include:Live Upgrade allows you to copy the existing Solaris instance into an "alternative boot environment," patch or upgrade the ABE, and then re-boot into it. Downtime of the service, application, or zone is limited to the amount of time it takes to re-boot the system. Further, if there's a problem, you can easily re-boot back into the original boot environment. Bob Netherton describesthis in detail on his weblog. Maybe the software should have a secondary name: Live Patch.You can detach all of the zones on the system, patch the system (which doesn't bother to patch the zones) and then re-attach the zones usingthe "update-on-attach" method which is also described here. This method can be used to reduce service downtime and application downtime, but not as much as the Live Upgrade / Live Patch method. Each zone (and its application(s)) will be down for the length of time to patch the system - and perhaps reboot it - plus thetime to update/attach the zone.You can apply the patch to another Solaris 10 system with contemporary or newer patches, and migrate the zones to that system. Optionally, you can patch the original system and migrate the zones back to it. Downtime of the zones is less than theprevious solution because the new system is already patched and rebooted.You can put the zones' zonepath (i.e. install the zones onto) very fast storage, e.g. an SSD or a storage array with battery-backed DRAM or NVRAM. The use of SSDs is described below. This method can be used in conjunction with any ofthe other solutions. It will speed up patching because patching is I/O intensive. However, this type of storage deviceis more expensive per MB, so this solution may not make fiscal sense in many situations.Sun has developed an enhancement to the Solaris patching tools whichis intended to significantly decrease the elapsed time of patching. It is currently being tested at a small number of sites.After it's released you can get the Zones Parallel Patching patch, described below. This solution decreases the elapsed time to patch a system. It can be combined with some of the solutions above, with varying benefits. For example, with Live Upgrade, parallel patching reduces the time to patch the ABE, but doesn't reduce service downtime. Also, ZPP offers little benefit for the detach/attach-on-upgrade method. However, as a stand-alone method, ZPP offers significant reduction in elapsed timewithout changing your current patching process. ZPP was mentioned by Gerry Haskins, Director of Software Patch Services.Disclaimer 1: the "Zones Parallel Patching" patch ("ZPP") is still in testing and has not yet been released. It is expected to be released mid-CY2009. That may change. Further, the specific code changes may change, which may changethe results described below.Disclaimer 2: the experiment described below, and its results, are specific to one type of system (Sun Fire T2000) and one patch (120534-14 - "the Apache patch"). Performance improvements using other hardware and other patches will produce different results.Yet More BackgroundI wanted to better understand two methods of accelerating the patching of zoned systems, especially when used in combination. Currently, a patch applied to the global zone will normally be applied to all non-global zones, one zone at a time. This is a conservative approach to the task of patching multiple zones, but doesn't take full advantage of the multi-tasking abilities of Solaris. I learned that a proposed patch was created that enables the system administrator to apply a patch in the global zone which patches the global and then patches multiple zones at the same time. The parallelism (i.e. "the number of zones that are patched at one time") can be chosen before the patch is applied. If there are multiple "Solaris CPUs" in the system,multiple CPUs can be performing computational steps at the same time. Even if there aren't many CPUs, one zone's patching process can be using a CPU while another's is writing to a disk drive.<tangent topic="Solaris vCPU">I use the phrase "Solaris CPUs" to refer to the view that Solaris has of CPUs. In the old days, a CPU was a CPU - one chip, one computational entity, one ALU, one FPU, etc. Now there are many factors to consider - CPU sockets, CPU cores per socket, hardware threads per core, etc. Solaris now considers "vCPUs" - virtual processors - as entities on which to schedule processors. Solaris considers each of these a vCPU:x86/x64 systems: a CPU core (today, can be one to six per socket, with a maximum of 24 vCPUs per system, ignoring some exotic, custom high-scale x86 systems)UltraSPARC-II, -III[+], -IV[+]: a CPU core, max of 144 in an E25KSPARC64-VI, -VII: a CPU core: max of 256 in an M9000SPARC CMT (SPARC-T1, -T2+): a hardware thread, maximum of 256 in a T5440</tangent>Separately, I realized that one part of patching is disk-intensive. Many disk-intensive workloads benefit from writing to asolid-state disk (SSD) because of the performance benefit of those devices over spinning-rust disk drives (HDD).So finally (hurrah!) the goal of this adventure: how much performance advantage would I achieve with the combination of parallel patching and an SSD, compared to sequential patching of zones on an HDD?He Finally Begins to Get to the PointI took advantage of an opportunity to test both of these methods to accelerate patching. The system was a Sun Fire T2000 with two HDDs and one SSD. The system had 32 vCPUs, was not using Logical Domains, and was running Solaris 10 10/08. Solaris was installed on the first HDD. Both HDDs were 72GB drives. The SSD was a 32GB device. (Thank you, Pat!)For some of the tests I also applied the ZPP. (Thank you, Enda!) For some of the tests I used zones that had zonepaths on the SSD; the rest 'lived' on the second HDD.As with all good journeys, this one had some surprises. And, as with all goodresearch reports, this one has a table with real data. And graphs later on.To get a general feel for the different performance of an HDD vs. an SSD,I created a zone on each - using the secondary HDD - and made clones ofit. Some times I made just one clone at a time, other times I made tenclones simultaneously. The iostat(1) tool showed me the following performance numbers: r/sw/skr/skw/swaitactvsvc_t%w%bclone x1 on HDD56227833332306.423172clone x1 on SSD35379115616500016clone x10 on HDD35470182327431952462599clone x10 on SSD354295824133026241541034At light load - just one clone at a time - the SSD performs better than the HDD, but at heavy load the SSD performs much much better, e.g. nine times the write throughput and 13x the write IOPS of the HDD, and the device driver and SSD still have room for more (34% busy vs. 99% busy).Cloning a zone consists almost entirely of copying files. Patching has a higherproportion of computation, but those results gave me high hopes for patching. I wasn't disappointed. (Evidently, every good research report also includes foreshadowing.)In addition to measuring the performance boost of the ZPP I wanted to know if that patch would help - or hurt - a system without using its parallelization feature. (I didn't have a particular reason to expect non-parallelized improvement,but occasionally I'm an optimist. Besides, if the performance with the ZPP was different without actually using parallelization, itwould skew the parallelized numbers.) So before installing the patch, I measured the length of time to apply a patch. For all of my measurements, I used patch 120543-14 - the Apache patch. At 15MB, it's not a small patch, nor is it very large patch. (The "Baby Bear" patch, perhaps? --Ed.) It's big enoughto tax the system and allow reasonable measurements, but small enough that Icould expect to gather enough data to draw useful conclusions, without investinga year of time...#define TEST while(1) {patchadd; patchrm;}So, before applying the ZPP, and without any zones on the system, I applied the Apache patch. I measured the elapsed time because our goal is to minimize elapsed time of patch application. Then I removed the Apache patch.Then I added a zone to the system, on the secondary HDD, and, I re-applied the Apache patch to the system, which automatically applied it to the zone as well. I removed the patch, created two more zones, and applied the same patch yet again. Finally, I compared the elapsed time of all three measurements.Patching the global zone alone took about 120 seconds. Patching with one non-globalzone took about 175 seconds: 120 for the global zone and 55 for the zone.Patching three zones took about 285 seconds: 120 seconds for the global zoneand 55 seconds for each of the three zones.Theoretically, the length of time to patch each zone should be consistent.Testing that theory, I created a total of 16 zones and then applied the Apache patch. No surprises: 55 seconds per zone.To test non-parallel performance of the ZPP, I applied it, kept the default settingof "no parallelization," and then re-did those tests. Application of the Apache patch did not change in behaviornor in elapsed time per zone, from zero to 16 zones. (However, I had a faint feeling that Solaris was beginning to question my sanity. "Get inline," Itold it...)How about the SSD - would it improve patch performance with zero or more zones?I removed the HDD zones and installed a succession of zones - zero to 16 - onthe SSD and applied the Apache patch each time. The SSD did not help at all - the patch still took 55 seconds per zone. Evidently this particular patch is not I/O bound, it is CPU bound.But applying the ZPP does not, by default, parallelize anything. To tell the patchtools that you would like some level of parallelization, e.g. "patch four zones atthe same time," you must edit a specific file in the /etc/patch directory and supply anumber, e.g. '4'. After you have done that, if parallel patching is possible, it will happen automatically. Multiple zones (e.g. four) will be patched atthe same time by a patchadd process running in each zone. Because that patchaddis running in a zone, it will use the CPUs that the zone is assigned to use- default or otherwise. This also means that the zone's patchadd process is subject to all of the other resource controls assigned to the zone, if any.Changing the parallelization level to 8, I re-did all of those measurements,on the HDD zones and then the SSD zones. The performance impact was obviousright away. As the graph to the right shows, the elapsed time to patch the system with a specific number of zones was less with the ZPP. ('P' indicates the levelof parallelization: 1, 8 or 16 zones patched simultaneously. The blue line shows no parallelization, the red line shows the patching of eight zones simultaneously.) Turning the numbers around, the patching "speed" improved by a factor of three.How much more could I squeeze out of that configuration? I increased the levelof parallelization to 16 and re-did everything. No additional performance, asthe graph shows.Maybe it's a coincidence, but a T2000 has eight CPU cores - maybe that's a limiting factor.At this point the SSD was feeling neglected, so I re-did all of the abovewith zones on the SSD. This graph shows the results: little benefit at low zonecount, but significant improvement at higher zone counts - when the HDD was the bottleneck. Combining ZPP and SSD resulted in patching throughput improvement of 5x with 16 zones.That seems like magic! What's the trick? A few paragraphs back, I mentioned the'trick': using all of the scalability of Solaris and, in this case, CMTsystems. Patching a system without ZPP - especially one without a running application - leaves plenty of throughput performance "on the table." Patching muliple zones simultaneously uses CPU cycles - presumably cycles that would have been idle. And it uses I/O channel and disk bandwidth - also, hopefully,available bandwidth. Essentially, ZPP is shortening the elapsed time by usingmore CPU cycles and I/O bandwidth now instead of using them later.So the main caution is "make sure there is sufficient compute and I/Ocapacity to patch multiple zones at the same time."But whenever multiple apps are running on the same system at the same time, the operating system must perform extra tasks to enable them to run safely. It doesn't matter ifthe "app" is a database server or 'patchadd.' So is ZPP using any "extra" CPU, i.e. is there any CPU overhead?Along the way, I collected basic CPU statistics, including system anduser time. The next graph shows that the amount of total CPU time (user+sys)increased slightly. The overhead was less than 10% for up to 8 zones.Another coincidence? I don't know, but at that point the overhead wasroughly 1% per zone. The overhead increased faster beyond P=8, indicatingthat, perhaps, a good rule of thumb is P="number of unused cores." Of course,if the system is using Dynamic Resource Pools or dedicated-cpus, the rulemight need to be changed accordingly.TANSTAAFL.ConclusionAll good tales need a conclusion. The final graph shows the speedup -the increase in patching throughput - based on the number of zones, level ofparallelization, and device type. Specific conclusions are:Parallel patching zones significantly reduces elapsed time if there are sufficient compute and I/O resources available.Solid-state disks significantly improve patching throughput forhigh-zone counts and similar levels of parallelization.The amount of work accomplished does not decrease - it's merelycompressed into a shorter period of elapsed time. If patching while applications are running: plan carefully in order to avoid impacting the responsiveness of your applications. Choose a level of parallelization commensurate with the amount of available compute capacityuse appropriate resource controls to maintain desired response times for your applications.Getting the ZPP requires waiting until mid-year. Getting SSDs is easy -they're available for the Sun 7210 and 7410 Unified Storage Systems andfor Sun systems.var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ?"https://secure." : "http://www.");document.write("");

Accelerated Patching of Zoned Systems Introduction If you have patched a system with many zones, you have learned that it takes longer than patching a system without zones. The more zones thereare, the...


Ice Storm 2008

On December 12, the northeast USA had a severe ice storm, which left 0.5" - 1"(1-2.5 cm) of ice on everything. Trees, heavy with ice, bent and broke,snapping wires and cutting electricity to over 200,000 homes and businesses.My house was one of the unfortunate ones.However, with every challenge there are opportunities - in this case, photographic ones. So I fired up the DSLR and started snapping - pictures,not wires.One pine tree in my backyard was so laden with ice that its tip - normally25 feet in the air - was dangling in the pond. It looks like the pine treewas thirsty and is taking a drink. (Click on the image to see a larger image.)Later, the surface of the pond froze, trapping the tip in the pond.Fortunately - for the tree - the surface melted two days later, allowing itto shake itself free.A birch tree in the front yard performed a similar feat, but it lookedmore like it was bowing. I doubt it was trying lick the snow - it knows better.I like the loss of background clutter that night - and flash! - brings to shots like that one.Another birch was bent, and its upper half reached, like fingers, through the branches of a Shadblow tree, itself coated in ice.Another night shot, a large pine seems to loom gloomily over a 6-footblue spruce. Normally its arms jut out parallel to the ground, but the icepinned its arms to its sides. But by far the worst damage nearby was a 40-to-45-foot pine tree inthe backyard. For years, it has been leaning out over the pond. No more - the weight of the ice snapped it in two, about eight feet up the trunk.In the first image, only the remaining trunk is obvious......but in the next picture, it's clear that the tree decided to "take a dip" in the pond. To give you some scale, the pond is 40 feet wide. The treereached all the way across and stripped some branches off of a tree onthe far side of the pond.As someone mentioned to me - the ice storm was "just Mother Nature doing some pruning."P.S. Nighttime brought another interesting view: moonlight refracting through the ice ontree branches.

On December 12, the northeast USA had a severe ice storm, which left 0.5" - 1" (1-2.5 cm) of ice on everything. Trees, heavy with ice, bent and broke,snapping wires and cutting electricity to over...

Solaris 10 Containers

Zonestat v1.3

It's - already - time for a zonestat update. I was never happy with themethod that zonestat used to discover the mappings of zones to resourcepools, but wanted to get v1.1 "out the door" before I had a chance toimprove on its use of zonecfg(1M). The obvious problem, which at least one person stumbled over, was the fact that you can re-configure a zonewhile it's running. After doing that, the configuration information doesn't match the current mapping of zone to pool, and zonestatbecame confused.Anyway, I found the time to replace the code in zonestat which discoveredzone-to-pool mappings with a more sophisticated method. The new method usesps(1) to learn the PID that each zone's [z]sched process is. Then it uses"poolbind -q <PID>" to look up the pool for that process. The result is moreaccurate data, but the ps command does use more CPU cycles.While performing surgery on zonestat, I also:began using the kstat module for Perl, reducing CPU time consumed by zonestat fixed some bugs in CPU reportinglimited zonename output to 8 characters to improve readabilitymade some small performance improvementsadded a $DEBUG variable which can be used to watch the commands thatzonestat is usingWith all of that, zonestat v1.3 provides better data, is roughly 30% faster, and is even smaller than the previous version! :-) You can find it at http:://opensolaris.org/os/project/zonestat.

It's - already - time for a zonestat update. I was never happy with the method that zonestat used to discover the mappings of zones to resourcepools, but wanted to get v1.1 "out the door" before I...

Solaris 10 Containers

Zonestat: How Big is a Zone?

IntroductionRecently an organization showed great interest in Solaris Containers, especially the resource management features, and then asked "what command do you use to compare the resource usage of Containers to the resource caps you have set?"I began to list them: prstat(1M), poolstat(1M), ipcs(1), kstat(1M), rcapstat(1), prctl(1), ... Obviously it would be easier to monitor Containers if there was one'dashboard' to view. Such a dashboard would enable zone administratorsto easily review zones' usage of system resources and decide if furtherinvestigation is necessary. Also, if there is a system-wide shortage ofa resource, this tool would be the first out of the toolbox, simplifyingthe task of finding the 'resource hog.'ZonestatIn order to investigate the possibilities, I created a Perl script I call'zonestat' which summarizes resource usage of Containers. I consider thisscript a prototype, not intended for production use. On the otherhand, for a small number of zones, it seems to be pretty handy, and moderately robust.Its output looks like this: |----Pool-----|------CPU-------|----------------Memory----------------| |---|--Size---|-----Pset-------|---RAM---|---Shm---|---Lkd---|---VM---|Zonename| IT| Max| Cur| Cap|Used|Shr|S%| Cap| Use| Cap| Use| Cap| Use| Cap| Use------------------------------------------------------------------------------- global 0D 66K 2 0.1 100 25 986M 139K 18E 2.2M 18E 754M db01 0D 66K 2 0.1 200 50 1G 122M 536M 0.0 536M 0 1G 135M web02 0D 66K 2 0.4 0.0 100 25 100M 11M 20M 0.0 20M 0 268M 8M ==TOTAL= --- ---- 2 ---- 0.2 -- -- 4.3G 1.1G 4.3G 139K 4.1G 2.2M 5.3G 897MThe 'Pool' columns provide information about the Dynamic Resource Pool in which the zone's processes are running. The two-character 'I' column displays the pool ID (number) and the 'T' column indicates the type of pool - 'D' for 'default', 'P' for 'private' (using the dedicated-cpu feature of zonecfg) or 'S' for 'shared.' The two 'Size' columns show the quantity of CPUs assigned to the pool in which the zone is running.The 'CPU Pset' columns show each zone's CPU usage and any caps that have been set. The first two columns show CPU quantities - CPU cores for x86,SPARC64 and all UltraSPARC systems except CMT (T1, T2, T2+). On CMT systems,Solaris considers every hardware thread ('strand') to be a CPU, and callsthem 'vCPUs.'The last two CPU columns - 'Shr' and 'S%' - show the number of FSS shares assigned to the zone, and what percentage of the total number of shares in that zone's pool. In the example above, all the zones share the default pset, and the zone 'db01' has two shares, so it should receive 50% of the CPU power of the pool at a minimum.The 'Memory' columns show the caps and usage for RAM, is very important to choose a reasonable valuable to avoid causing unnecessary paging.One final example, taken from a different configuration of zones: |----Pool-----|------CPU-------|----------------Memory----------------| |---|--Size---|-----Pset-------|---RAM---|---Shm---|---Lkd---|---VM---|Zonename| IT| Max| Cur| Cap|Used|Shr|S%| Cap|Used| Cap|Used| Cap|Used| Cap|Used------------------------------------------------------------------------------- global 0D 66K 1 0.0 0.0 200 66 1.2G 18E 343K 18E 2.6M 18E 1.1G zB 0D 66K 1 0.2 0.0 100 33 124M 18E 0.0 18E 0.0 18E 138M zA 1P 1 1 0.0 0.1 1 HH 31M 18E 0.0 18E 0.0 18E 24M==TOTAL= --- ---- 2 ---- 0.1 --- -- 4.3G 1.4G 4.3G 343K 4.1G 2.6M 5.3G 1.2GThe global zone and zone 'zB' share the default pool. Because the global zone has 200 FSS shares, compared to zB's 100 shares, global zone processes will get 2/3 of the processing power of the default pool if there is contention for that CPU. However, that is unlikely, because zB is capped at 0.2 CPUs worth of compute time.Zone 'zA' is in its own private resource pool. It has exclusive access to the one dedicated CPU in that pool.ProblemsCPU HogZonestat's biggest problem is due to its brute-force nature. It runs a few commands for each zone that is running. This can consume many CPU cycles, and can take a few seconds to run with many zones. Performance improvements to zonestat are underway.Wrong / Misleading CPU Usage Data Two commonly used methods to 'measure' CPU usage by processes and zones are prstat and mpstat. Each can produce inaccurate 'data' in certain situations. With mpstat, it is not difficult to create surprising results, e.g. on a CMT system, set a CPU cap on a zone in a pool, and run a few CPU-bound processes: the "Pset Used" column will not reach the CPU cap. This is due to the method used by mpstat to calculate its data. Prstat only computes its data occasionally, ignoring anything that happened between samples. This leads to undercounting CPU usage for zones with many short-lived processes. I wrote code to gather data from each, but prstat seemed more useful, so for now the output comes from prstat.What's NextI would like feedback on this tool, perhaps leading to minor modifications to improve its robustness and usability. What's missing? What's not clear?The future of zonestat might include these:I hope that this can be re-written in C or D. Either way, it might findits way into Solaris... If I can find the time, I would like to tackle this.New features - best added to a 'real' version:-d: show disk usage-n: show network usage-i: also show installed zones that are not running-c: also show configured zones that are not installed-p: only show processor info, but add more fields, e.g. prstat's instantaneous CPU%, micro-state fields, and mpstat's CPU%-m: only show memory-related info, and add the paging columns of vmstat, output of "vmstat -p", free swap space-s : sort sample output by a field. Good example: sort by poolIDadd a one-character column showing the default scheduler for each poolreport state transitions like mpstat does, e.g. changes in zone state, changes in pool configurationimprove robustnessThe CodeYou can find the Perl script in the Files page of the OpenSolaris project "Zone Statistics."

Introduction Recently an organization showed great interest inSolaris Containers, especially theresource management features, and then asked "what command do you use to compare the resource usage ofCon...


got (enough) memory?

DBAs are in for a rude awakening.A database runs most efficiently when all of the data is held in RAM. Insufficient RAMcauses some data to be sent to a disk drive for later retrieval.This process, called 'paging' can have a huge performance impact.This can be shown numerically by comparing the time to retrieve data fromdisk (about 10,000,000 nanoseconds) to the access time for RAM (about 20 ns).Databases are the backbone of most Internet services. If a databasedoes not perform well, no amount of improvement of the web servers orapplication servers will achieve good performance of the overall service.That explains the large amount of effort that is invested in tuningdatabase software and database design. These tasks are complicated by thedifficulty of scaling a single database to many systems in the way that webservers and app servers can be replicated. Because of those challenges,most databases are implemented on one computer. But that single system must have enough RAM for the database to perform well.Over the years, DBAs have come to expect systems to have lots of memory,either enough to hold the entire database or at least enough for all commonlyaccessed data. When implementing a database, the DBA is asked "how much memory does it need?" The answer is often padded to allow room for growth.That number is then increased to allow room for the operating system,monitoring tools, and other infrastructure software.And everyone was happy.But then server virtualization was (re-)invented to enable workloadconsolidation.Server virtualization is largely about workload isolation - preventing the actions and requirements of one workload from affecting the others. This includes constraining the amount of resources consumed by each workload.Without such constraints, one workload could consume all of the resources of the system, preventing other workloads from functioning effectively.Most virtualization technologies include features to do this - to scheduletime using the CPU(s), to limit use of network bandwidth... and to capthe amount of RAM a workload can use.That's where DBAs get nervous.I have participated in several virtualization architecture conversations which included:Me: "...and you'll want to cap the amount of RAM that each workload can use."DBA: "No, we can't limit database RAM."Taken out of context, that statement sounds like "the database needsinfinite RAM." (That's where the CFO gets nervous...)I understand what the DBA is trying to say:DBA: "If the database doesn't have sufficient RAM, its performance willbe horrible, and so will the performance of the web and app servers thatdepend on it." I completely agree with that statement.The misunderstanding is that the database is not expected to use lessmemory than before. The "rude awakening" is modifying one's mind set to accept the notion that a RAM cap on a virtualized workload is the same as having a finite amount of RAM - just like a real server.This also means that system architects must understand and respect the DBA'spoint of view, and that a virtual server must have available to it the sameamount of RAM that it would need in a dedicated system. If a non-consolidateddatabase needed 8GB of RAM to run well in a dedicated system, it will still need 8GB of RAM to run well in a consolidated environment.If each workload has enough resources available to it, the system and all of its workloads will perform well.And they all computed happily ever after.P.S. Memory needs of consolidated systems require that a system running multiple workloads will need more memory than each of the unconsolidated systems had - but less than the aggregate amount they had. Considering that need, and the fact that most single-workload systems were running at 10-15% CPU utilization, I advise people configuring virtual server platforms to focus more effort on ensuring that the computer has enough memoryfor all of its workloads, and less effort on achieving sufficient CPU performance. If the system is 'short' on CPU powerby 10%, performance will be 10% less than expected. That rarely matters.But if the system is 'short' on memory by 10%, excessive paging can cause transaction times to increase by 10 times, 100 times, or more.

DBAs are in for a rude awakening. A database runs most efficiently when all of the data is held in RAM. Insufficient RAM causes some data to be sent to a disk drive for later retrieval.This process,...

Politics, Global Topics

Mad Cows, Lame Ducks

I rarely blog about politics, but a recent news articleincensed me enough that I thought it worth sharing.It seems that the current executive branch of the United Statesgovernment doesn't like its citizens. Back in April 2004, the N.Y. Timesreported that the U.S. Department of Agriculture "refused to allow a Kansas beef producer to test allof its cattle for mad cow disease." One might think that testing food for the ability to cause a fatal, untreatable disease would be encouraged by the US government. But in this case, apparently the Agriculture Department is choosing sides in a business struggle, putting the interests of one group of corporations - large meat packers - ahead of another group - small meat packers.In all of this, the Bush administration is putting business before public health, yielding to the wishes of "larger meat packers opposed to such testing" who "fear they too will have to conduct the expensive tests" in order to remain competitive in the US.(recent Associated Press article)To be sure, Creekstone is not acting on a wholly altruistic basis. They want to be able to sell their products into the Japanese beef market, which has repeatedly prohibited beef imports from the US because of concerns over "mad cow" disease. Creekstone wants to enhance its competitiveness by proving that its beef is safe.I'm not suggesting that a government should require this test on beef products.But if one company wants to take steps that protect public health, the federal government should not get involved, especially if the goal is merely to protect the profits of Big Business.

I rarely blog about politics, but a recent news article incensed me enough that I thought it worth sharing. It seems that the current executive branch of the United Statesgovernment doesn't like its...


Virtual Eggs, One Basket

One of the hottest computer industry trends is virtualization. If you skimoff the hype, there is still a great deal to be excited about. Who doesn'tlike reducing the number of servers to manage, and reducing the electric power consumed by servers and by the machines that move the heat theycreate (though I supposed that the power utilities and the coal and natural gas companies aren't too thrilled by virtualization...)But there are some critical factors which limit the consolidation ofworkloads into virtualized environments (VE's). One, often-overlooked factoris that the technology which controls VE's is a single point of failure(SPOF) for all of the VE's it is managing. If that component(a hypervisor for virtual machines, an OS kernel for operating system-levelvirtualization, etc.) has a bug which affects its guests, they mayall be impacted. In the worst case, all of the guests will stop working.One exampleof that was the recent licensing bug in VMware. If the newest versionof VMware ESX was in use, the hypervisor would not permit gueststo start after August 12. EMC created a patch to fix the problem, butsolving it and testing the fix took enough time that some customerscould not start some workloads for about one day.For some details, see http://www.networkworld.com/news/2008/081208-vmware-bug.html and http://www.deploylinux.net/matt/2008/08/all-your-vms-belong-to-us.html.Clearly, the lesson from this is the importance of designing your consolidated environmentswith this factor in mind. For example, you should never configure both nodes of a high-availability (HA) cluster as guests of the same hypervisor.In general, don't assume that the hypervisor is perfect - it's not - andthat it can't fail - it can.

One of the hottest computer industry trends is virtualization. If you skim off the hype, there is still a great deal to be excited about. Who doesn'tlike reducing the number of servers to manage, and...


USENIX Technical Conference

If you would like to develop deeper Solaris skills, theUsenix TechnicalConference offers someexcellent opportunities. This year, the conference will be held inBoston on June 22-27. It includes vendor exhibits, training sessions andinvited talks. This year the keynote address is by David Patterson,Director of the U.C. Berkeley Parallel Computing Laboratory, on"The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem?"Many tutorials will be available, including these fivefull-day sessions:Solaris 10 Administration Workshop - by Peter Baer GalvinSystems and Network Performance Tuning - by Marc StaveleySolaris 10 Security Features Workshop - by Peter Baer GalvinSolaris 10 Performance, Observability & Debugging - by Jim MauroResource Management with Solaris Containers - Jeff Victor (yes, me)My session covers the concepts, uses and administrative interfaces ofSolaris Resource Management, focusing on Solaris Containers . It will includea lab session using the new OpenSolaris LiveCD. If you don't (yet...) runSolaris or OpenSolaris on your laptop, you will still be able to bootOpenSolaris from the CD and temporarily use Containers on your laptop, withoutmodifying your normal laptop operating system.Early-birdregistration saves $$$, but ends this Friday, June 6.

If you would like to develop deeper Solaris skills, theUsenix Technical Conference offers some excellent opportunities. This year, the conference will be held inBoston on June 22-27. It includes...

Solaris 10 Containers

Effortless Upgrade: Solaris 8 System to Solaris 8 Container

[Update 23 Aug 2008: Solaris 9 Containers was released a few months back. Reply to John's comment: yes, the steps to use Solaris 9 Containers are the same as the steps for Solaris 8 Containers]Since 2005, Solaris 10 has offered the Solaris Containers feature set,creating isolated virtual Solaris environments for Solaris 10 applications.Although almost all Solaris 8 applications run unmodified in Solaris 10Containers, sometimes it would be better to just move an entire Solaris8 system - all of its directories and files, configuration information,etc. - into a Solaris 10 Container. This has become very easy - justthree commands.Sun offers a Solaris Binary Compatibility Guaranteewhich demonstrates the significant effort that Sun invests inmaintaining compatibility from one Solaris version to the next.Because of that effort, almost all applications written for Solaris 8run unmodified on Solaris 10, either in a Solaris 10 Container orin the Solaris 10 global zone.However, there are still some data centers with many Solaris 8systems. In some situations it is not practical to re-test all ofthose applications on Solaris 10. It would be much easier to just movethe entire contents of the Solaris 8 file systems into a SolarisContainer and consolidate many Solaris 8 systems into a much smallernumber of Solaris 10 systems.For those types of situations, and some others, Sun nowoffers Solaris 8 Containers. These use the "Branded Zones" frameworkavailable in OpenSolaris and first released in Solaris 10 in August 2007.A Solaris 8 Container provides an isolated environment in whichSolaris 8 binaries - applications and libraries - can run withoutmodification. To a user logged in to the Container, or to an applicationrunning in the Container, there is very little evidence that this isnot a Solaris 8 system.The Solaris 8 Container technology rests on a very thin layer of softwarewhich performs system call translations - from Solaris 8 system callsto Solaris 10 system calls. This is not binary emulation, and thenumber of system calls with any difference is small, so theperformance penalty is extremely small - typically less than 3%.Not only is this technology efficient, it's very easy to use. There arefive steps, but two of them can be combined into one:install Solaris 8 Containers packages on the Solaris 10 systempatch systems if necessaryarchive the contents of the Solaris 8 systemmove the archive to the Solaris 10 system(if step 3 placed the archive on a file system accessible to theSolaris 10 system, e.g. via NFS, this step is unnecessary)configure and install the Solaris 8 Container, using the archiveThe rest of this blog entry is a demonstration of Solaris 8 Containers,using a Sun Ultra 60 workstation for the Solaris 8 systemand a Logical Domain on a Sun Fire T2000 for the Solaris 10 system. Ichose those two only because they were available to me. Any two SPARCsystems could be used as long as one can run Solaris 8 and one can runSolaris 10.Almost any Solaris 8 revision or patch level will work, but Sun stronglyrecommends applying the most recent patches to that system. The Solaris 10system must be running Solaris 10 8/07, and requires the followingminimum patch levels:Kernel patch: 127111-08 (-11 is now available) which needs125369-13125476-02Solaris 8 Branded Zones patch: 128548-05The first step is installation of the Solaris 8 Containers packages.You can download the Solaris 8 Containers software packages fromhttp://sun.com/download.Installing those packages on the Solaris 10 system is easy andtakes 5-10 seconds:s10-system# pkgadd -d . SUNWs8brandr SUNWs8brandu SUNWs8p2vNow we can patch the Solaris 10 system, using the patches listed above.After patches have been applied, it's time to archive the Solaris 8 system.In order to remove the "archive transfer" step I'll turn the Solaris 10system into an NFS server and mount it on the Solaris 8 system. Thearchive can be created by the Solaris 8 system, but stored on the Solaris10 system. There are several tools which can be used to create the archive:Solaris flash archive tools, cpio, pax, etc. In this example I usedflarcreate, which first became available on Solaris 8 2/04.s10-system# share /export/home/s8-archivess8-system# mount s10-system:/export/home/s8-archives /mnts8-system# flarcreate -S -n atl-sewr-s8 /mnt/atl-sewr-s8.flarCreation of the archive takes longer than any other step - 15 minutesto an hour, or even more, depending on the size of the Solaris 8 filesystems.With the archive in place, we can configure and install the Solaris 8Container. In this demonstration the Container was "sys-unconfig'd" byusing the -u option. The opposite of that is -p, which preserves thesystem configuration information of the Solaris 8 system.s10-system# zonecfg -z test8zonecfg:test8> create -t SUNWsolaris8zonecfg:test8> set zonepath=/zones/roots/test8zonecfg:test8> add netzonecfg:test8:net> set address=> set physical=vnet0zonecfg:test8:net> endzonecfg:test8> exits10-system# zoneadm -z test8 install -u -a /export/home/s8-archives/atl-sewr-s8.flar Log File: /var/tmp/test8.install.995.log Source: /export/home/s8-archives/atl-sewr-s8.flar Installing: This may take several minutes... Postprocessing: This may take several minutes... Result: Installation completed successfully. Log File: /zones/roots/test8/root/var/log/test8.install.995.logThis step should take 5-10 minutes. After the Container has beeninstalled, it can be booted.s10-system# zoneadm -z test8 boots10-system# zlogin -C test8At this point I was connected to the Container's console. It askedthe usual system configuration questions, and then rebooted:[NOTICE: Zone rebooting]SunOS Release 5.8 Version Generic_Virtual 64-bitCopyright 1983-2000 Sun Microsystems, Inc. All rights reservedHostname: test8The system is coming up. Please wait.starting rpc services: rpcbind done.syslog service starting.Print services started.Apr 1 18:07:23 test8 sendmail[3344]: My unqualified host name (test8) unknown; sleeping for retryThe system is ready.test8 console login: rootPassword:Apr 1 18:08:04 test8 login: ROOT LOGIN /dev/consoleLast login: Tue Apr 1 10:47:56 from vpn-129-150-80-Sun Microsystems Inc. SunOS 5.8 Generic Patch February 2004# bashbash-2.03# psrinfo0 on-line since 04/01/2008 03:56:381 on-line since 04/01/2008 03:56:382 on-line since 04/01/2008 03:56:383 on-line since 04/01/2008 03:56:38bash-2.03# ifconfig -alo0:1: flags=1000849 mtu 8232 index 1 inet netmask ff000000vnet0:1: flags=1000843 mtu 1500 index 2 inet netmask ffffff00 broadcast this point the Solaris 8 Container exists. It's accessible on thelocal network, existing applications canbe run in it, or new software can be added to it, or existing softwarecan be patched.To extend the example, here is the output from the commands I used tolimit this Solaris 8 Container to only use a subset of the 32 virtualCPUs on that Sun Fire T2000 system.s10-system# zonecfg -z test8zonecfg:test8> add dedicated-cpuzonecfg:test8:dedicated-cpu> set ncpus=2zonecfg:test8:dedicated-cpu> endzonecfg:test8> exitbash-3.00# zoneadm -z test8 rebootbash-3.00# zlogin -C test8Console:[NOTICE: Zone rebooting]SunOS Release 5.8 Version Generic_Virtual 64-bitCopyright 1983-2000 Sun Microsystems, Inc. All rights reservedHostname: test8The system is coming up. Please wait.starting rpc services: rpcbind done.syslog service starting.Print services started.Apr 1 18:14:53 test8 sendmail[3733]: My unqualified host name (test8) unknown; sleeping for retryThe system is ready.test8 console login: rootPassword:Apr 1 18:15:24 test8 login: ROOT LOGIN /dev/consoleLast login: Tue Apr 1 18:08:04 on consoleSun Microsystems Inc. SunOS 5.8 Generic Patch February 2004# psrinfo0 on-line since 04/01/2008 03:56:381 on-line since 04/01/2008 03:56:38Finally, to learn more about Solaris 8 Containers:General Information, including Solaris 9 ContainersFrequently Asked QuestionsAudio introductionSolaris 8 Containers and Solaris 9 Containers DatasheetSolaris 8 Containers document setFor those who were counting, the "three commands" were, at a minimum, flarcreate, zonecfg and zoneadm.

[Update 23 Aug 2008: Solaris 9 Containers was released a few months back. Reply to John's comment: yes, the steps to use Solaris 9 Containers are the same as the steps for Solaris 8 Containers]Since...

Solaris 10 Containers

ZoiT: Solaris Zones on iSCSI Targets (aka NAC: Network-Attached Containers)

IntroductionSolarisContainers have a 'zonepath' ('home') which can be a directoryon the root file system or on a non-root file system. UntilSolaris10 8/07 was released, a local file system was required for thisdirectory. Containers that are on non-root file systems have usedUFS,ZFS, orVxFS.All of those are local file systems - putting Containers onNAShas not been possible. With Solaris 10 8/07, thathas changed: a Container can now be placed on remote storage viaiSCSI.BackgroundSolaris Containers (aka Zones) are Sun'soperating system level virtualization technology.They allow a Solaris system (more accurately, an 'instance' of Solaris)to have multiple, independent, isolated application environments.A program running in a Container cannot detect or interact with a processin another Container.Each Container has its own root directory. Although viewed as the rootdirectory from within that Container, that directory is also a non-rootdirectory in the global zone. For example, a Container's root directorymight be called /zones/roots/myzone/root in the global zone.The configuration of a Container includes something called its "zonepath."This is the directory which contains a Container's root directory(e.g. /zones/roots/myzone/root) and other directories used by Solaris.Therefore, the zonepath of myzone in the example above would be/zones/roots/myzone.The global zone administrator can choose any directory to be a Container'szonepath. That directory could just be a directory on the rootpartition of Solaris, though in that case some mechanism should beused to prevent that Container from filling up the root partition.Another alternative is to use a separate partition for that Container,or one shared among multiple Containers. In the latter case, a quota shouldbe used for each Container.Local file systems have been used for zonepaths. However, many peoplehave strongly expressed a desire for the ability to put Containers onremote storage. One significant advantage to placing Containers on NASis the simplification ofContainermigration - moving a Container from one system to another. Whenusing a local file system, the contents of the Container must be transmittedfrom the original host to the new host. For small, sparse zones this cantake as little as a few seconds. For large, whole-root zones, this cantake several minutes - a whole-root zone is an entire copy of Solaris,taking up as much as 3-5 GB. If remote storage can be used to storea zone, the zone's downtime can be as little as a second or two, duringwhich time a file system is unmounted on one system and mounted on another.Here are some significant advantages to iSCSI over SANs:the ability to use commodityEthernet switching gear,which tends to be less expensive than SAN switching equipmentthe ability to manage storage bandwidth via standard, mature, commonlyused IP QoS featuresiSCSI networks can be combined with non-iSCSI IP networks to reducethe hardware investment and consolidate network management. If that is notappropriate, the two networks can be separate but use the same typeof equipment, reducing costs and types of in-house infrastrucuturemanagement expertise.Unfortunately, a Container cannot 'live' on an NFS server, and it's not clearif or when that limitation will be removed.iSCSI BasicsiSCSI is simply "SCSI communication over IP." In this case, SCSIcommands and responses are sent between two iSCSI-capable devices,which can be general-purpose computers (Solaris, Windows, Linux, etc.)or specific-purpose storage devices (e.g. Sun StorageTek 5210 NAS,EMC Celerra NS40, etc.). There are two endpoints to iSCSI communications:the initiator (client) and the target (server). A target publicizesits existence. An initiator binds to a target.The industry's design for iSCSI includes a large number of features,including security. Solaris implements many of those features.Details can be found:in the man pages foriscsiadm(1M)and iscsitadm(1M)at docs.sun.com: System Administration Guide: Devices and File Systems (Configuring iSCSI Targets and Initiators)at sun.com/blueprints: "Using iSCSI Multipathing in the Solaris 10 Operating System"at sunsolve.sun.com: Solution 209614 "iSCSI: What command outputs should be captured to begin troubleshooting iSCSI issues?"In Solaris, the command iscsiadm(1M) configures an initiator, andthe command iscsitadm(1M) configures a target.StepsThis section demonstrates the installation of a Container onto a remotefile system that uses iSCSI for its transport.The target system is an LDom on a T2000, and looks like this:System Configuration: Sun Microsystems sun4vMemory size: 1024 MegabytesSUNW,Sun-Fire-T200SunOS ldg1 5.10 Generic_127111-07 sun4v sparc SUNW,Sun-Fire-T200Solaris 10 8/07 s10s_u4wos_12b SPARCThe initiator system is another LDom on the same T2000 - although thereis no requirement that LDoms are used, or that they be on the samecomputer if they are used.System Configuration: Sun Microsystems sun4vMemory size: 896 MegabytesSUNW,Sun-Fire-T200SunOS ldg4 5.11 snv_83 sun4v sparc SUNW,Sun-Fire-T200Solaris Nevada snv_83a SPARCThe first configuration step is the creation of the storage underlyingthe iSCSI target. Although UFS could be used, let's improve the robustnessof the Container's contents and put the target's storage under control of ZFS.I don't have extra disk devices to give to ZFS, so I'll make some anduse them for a zpool - in real life you would use disk devices here:Target# mkfile 150m /export/home/disk0Target# mkfile 150m /export/home/disk1Target# zpool create myscsi mirror /export/home/disk0 /export/home/disk1Target# zpool status pool: myscsi state: ONLINE scrub: none requestedconfig: NAME STATE READ WRITE CKSUM myscsi ONLINE 0 0 0 /export/home/disk0 ONLINE 0 0 0 /export/home/disk1 ONLINE 0 0 0Now I can create a zvol - an emulation of a disk device:Target# zfs listNAME USED AVAIL REFER MOUNTPOINTmyscsi 86K 258M 24.5K /myscsiTarget# zfs create -V 200m myscsi/jvol0Target# zfs listNAME USED AVAIL REFER MOUNTPOINTmyscsi 200M 57.9M 24.5K /myscsimyscsi/jvol0 22.5K 258M 22.5K -Creating an iSCSI target device from a zvol is easy:Target# iscsitadm list targetTarget# zfs set shareiscsi=on myscsi/jvol0Target# iscsitadm list targetTarget: myscsi/jvol0 iSCSI Name: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6 Connections: 0Target# iscsitadm list target -vTarget: myscsi/jvol0 iSCSI Name: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6 Alias: myscsi/jvol0 Connections: 0 ACL list: TPGT list: LUN information: LUN: 0 GUID: 0x0 VID: SUN PID: SOLARIS Type: disk Size: 200M Backing store: /dev/zvol/rdsk/myscsi/jvol0 Status: onlineConfiguring the iSCSI initiator takes a little more work. There arethree methods to find targets. I will use a simple one.After telling Solaris to use that method, it only needs to know what theIP address of the target is.Note that the example below uses "iscsiadm list ..." severaltimes, without any output. The purpose is to show the differencein output before and after the command(s) between them.First let's look at the disks available before configuringiSCSI on the initiator:Initiator# ls /dev/dskc0d0s0 c0d0s2 c0d0s4 c0d0s6 c0d1s0 c0d1s2 c0d1s4 c0d1s6c0d0s1 c0d0s3 c0d0s5 c0d0s7 c0d1s1 c0d1s3 c0d1s5 c0d1s7We can view the currently enabled discovery methods, and enablethe one we want to use:Initiator# iscsiadm list discoveryDiscovery: Static: disabled Send Targets: disabled iSNS: disabledInitiator# iscsiadm list targetInitiator# iscsiadm modify discovery --sendtargets enableInitiator# iscsiadm list discoveryDiscovery: Static: disabled Send Targets: enabled iSNS: disabledAt this point we just need to tell Solaris which IP addresswe want to use as a target. It takes care of all the details, findingall disk targets on the target system. In this case, there is onlyone disk target.Initiator# iscsiadm list targetInitiator# iscsiadm add discovery-address iscsiadm list targetTarget: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6 Alias: myscsi/jvol0 TPGT: 1 ISID: 4000002a0000 Connections: 1Initiator# iscsiadm list target -vTarget: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6 Alias: myscsi/jvol0 TPGT: 1 ISID: 4000002a0000 Connections: 1 CID: 0 IP address (Local): IP address (Peer): Discovery Method: SendTargets Login Parameters (Negotiated): Data Sequence In Order: yes Data PDU In Order: yes Default Time To Retain: 20 Default Time To Wait: 2 Error Recovery Level: 0 First Burst Length: 65536 Immediate Data: yes Initial Ready To Transfer (R2T): yes Max Burst Length: 262144 Max Outstanding R2T: 1 Max Receive Data Segment Length: 8192 Max Connections: 1 Header Digest: NONE Data Digest: NONEThe initiator automatically finds the iSCSI remote storage, butwe need to turn this into a disk device. (Newer builds seem to notneed this step, but it won't hurt. Looking in /devices/iscsi willhelp determine whether it's needed.)Initiator# devfsadm -i iscsiInitiator# ls /dev/dskc0d0s0 c0d0s3 c0d0s6 c0d1s1 c0d1s4 c0d1s7 c1t7d0s2 c1t7d0s5c0d0s1 c0d0s4 c0d0s7 c0d1s2 c0d1s5 c1t7d0s0 c1t7d0s3 c1t7d0s6c0d0s2 c0d0s5 c0d1s0 c0d1s3 c0d1s6 c1t7d0s1 c1t7d0s4 c1t7d0s7Initiator# ls -l /dev/dsk/c1t7d0s0lrwxrwxrwx 1 root root 100 Mar 28 00:40 /dev/dsk/c1t7d0s0 ->../../devices/iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ac8a82272-b354-c913-80f9-db9cb378a6f60001,0:aNow that the local device entry exists, we can do something usefulwith it. Installing a new file system requires the use of format(1M)to partition the "disk" but it is assumed that the reader knows howto do that. However, here is the first part of the format dialogue,to show that format lists the new disk device with its uniqueidentifier - the same identifier listed in /devices/iscsi.Initiator# formatSearching for disks...donec1t7d0: configured with capacity of 199.98MBAVAILABLE DISK SELECTIONS: 0. c0d0 /virtual-devices@100/channel-devices@200/disk@0 1. c0d1 /virtual-devices@100/channel-devices@200/disk@1 2. c1t7d0 /iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ac8a82272-b354-c913-80f9-db9cb378a6f60001,0Specify disk (enter its number): 2selecting c1t7d0[disk formatted]Disk not labeled. Label it now? noLet's jump to the end of the partitioning steps, after assigning all ofthe available disk space to partition 0:partition> printCurrent partition table (unnamed):Total disk cylinders available: 16382 + 2 (reserved cylinders)Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 16381 199.98MB (16382/0/0) 409550 1 unassigned wu 0 0 (0/0/0) 0 2 backup wu 0 - 16381 199.98MB (16382/0/0) 409550 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0partition> labelReady to label disk, continue? yThe new raw disk needs a file system.Initiator# newfs /dev/rdsk/c1t7d0s0newfs: construct a new file system /dev/rdsk/c1t7d0s0: (y/n)? y/dev/rdsk/c1t7d0s0: 409550 sectors in 16382 cylinders of 5 tracks, 5 sectors 200.0MB in 1024 cyl groups (16 c/g, 0.20MB/g, 128 i/g)super-block backups (for fsck -F ufs -o b=#) at: 32, 448, 864, 1280, 1696, 2112, 2528, 2944, 3232, 3648,Initializing cylinder groups:....................super-block backups for last 10 cylinder groups at: 405728, 406144, 406432, 406848, 407264, 407680, 408096, 408512, 408928, 409344Back on the target:Target# zfs listNAME USED AVAIL REFER MOUNTPOINTmyscsi 200M 57.9M 24.5K /myscsimyscsi/jvol0 32.7M 225M 32.7M -Finally, the initiator has a new file system, on which we can install a zone.Initiator# mkdir /zones/newrootsInitiator# mount /dev/dsk/c1t7d0s0 /zones/newrootsInitiator# zonecfg -z iscuzoneiscuzone: No such zone configuredUse 'create' to begin configuring a new zone.zonecfg:iscuzone> createzonecfg:iscuzone> set zonepath=/zones/newroots/iscuzonezonecfg:iscuzone> add inherit-pkg-dirzonecfg:iscuzone:inherit-pkg-dir> set dir=/optzonecfg:iscuzone:inherit-pkg-dir> endzonecfg:iscuzone> exitInitiator# zoneadm -z iscuzone installPreparing to install zone .Creating list of files to copy from the global zone.Copying files to the zone.Initializing zone product registry.Determining zone package initialization order.Preparing to initialize packages on the zone....Initialized packages on zone.Zone is initialized.Installation of these packages generated warnings: The file contains a log of the zone installation.There it is: a Container on an iSCSI target on a ZFS zvol.Zone Lifecycle, and Tech SupportThere is more to management of Containers than creating them. When a Solarisinstance is upgraded, all of its native Containers areupgraded as well.Some upgrade methods work better with certain system configurationsthan others. This is true for UFS, ZFS, other local file system types,and iSCSI targets that use any of them for underlying storage.You can use Solaris Live Upgrade to patch or upgrade a system withContainers. If the Containers are on a traditional file systemwhich uses UFS (e.g. /, /export/home) LU will automatically do theright thing. Further, if you create a UFS file system on an iSCSI targetand install one or more Containers on it, the ABE will also needfile space for its copy of those Containers. To mimic the layoutof the original BE you could use another UFS file system onanother iSCSI target. The lucreate command would look something likethis:# lucreate -m /:/dev/dsk/c0t0d0s0:ufs -m /zones:/dev/dsk/c1t7d0s0:ufs -n newBEConclusionIf you want to put your Solaris Containers on NAS storage, Solaris 108/07 will help you get there, using iSCSI.

Introduction Solaris Containers have a 'zonepath' ('home') which can be a directory on the root file system or on a non-root file system. UntilSolaris 10 8/07 was released, a local file system was...

Solaris 10 Containers

High-availability Networking for Solaris Containers

Here's another example of Containers that can manage their own affairs.Sometimes you want to closely manage the devices that a Solaris Container uses.This is easy to do from the global zone: by default a Container does not havedirect access to devices. It does have indirect access to some devices, e.g.via a file system that is available to the Container.By default, zones use NICs that they share with the global zone, and perhapswith other zones. In the past these were just called "zones."Starting with Solaris 10 8/07, these are now referred to as "shared-IP zones."The global zone administrator manages all networking aspects of shared-IPzones.Sometimes it would be easier to give direct control of a Container'sdevices to its owner. An excellent example of this is theoption of allowing a Container to manage its own network interfaces.This enables it to configure IP Multipathing for itself, as well as IP Filterand other network features. Using IPMP increases the availability of theContainer by creating redundant network paths to the Container. When configuredcorrectly, this can prevent the failure of a network switch, network cable orNIC from blocking network access to the Container.As described atdocs.sun.com,to use IP Multipathing you must choose twonetwork devices of the same type, e.g. two ethernet NICs. Those NICs areplaced into an IPMP group through the use of the command ifconfig(1M).Usually this is done by placing the appropriate ifconfig parameters intofiles named /etc/hostname.<NIC-instance>, e.g. /etc/hostname.bge0.An IPMP group is associated with an IP address. Packetsleaving any NIC in the group have a source address of the IPMP group.Packets with a destination address of the IPMP group can enter througheither NIC, depending on the state of the NICs in the group.Delegating network configuration to a Container requires use of the newIP Instances feature.It's easy to create a zone that uses this feature, making this an"exclusive-IP zone." One new line in zonecfg(1M) will do it:zonecfg:twilight> set ip-type=exclusiveOf course, you'll need at least two network devices in the IPMP group.Using IP Instances will dedicate these two NICs to this Container exclusively.Also, the Container will need direct access to the two network devices.Configuring all of that looks like this:global# zonecfg -z twilightzonecfg:twilight> createzonecfg:twilight> set zonepath=/zones/roots/twilightzonecfg:twilight> set ip-type=exclusivezonecfg:twilight> add netzonecfg:twilight:net> set physical=bge1zonecfg:twilight:net> endzonecfg:twilight> add netzonecfg:twilight:net> set physical=bge2zonecfg:twilight:net> endzonecfg:twilight>add devicezonecfg:twilight:device> set match=/dev/net/bge1zonecfg:twilight:net> endzonecfg:twilight>add devicezonecfg:twilight:device> set match=/dev/net/bge2zonecfg:twilight:net> endzonecfg:twilight> exitAs usual, the Container must be installed and booted with zoneadm(1M):global# zoneadm -z twilight installglobal# zoneadm -z twilight bootNow you can login to the Container's console and answer the usualconfiguration questions:global# zlogin -C twilight<answer questions><the zone automatically reboots>After the Container reboots, you can configure IPMP. There are two methods.One uses link-based failure detection and one uses probe-basedfailure detection.Link-based detection requires the use of a NIC which supports this feature.Some NICs that support this are hme, eri, ce, ge, bge, qfe and vnet(part of Sun's Logical Domains). They areable to detect failure of the link immediately and report that failure toSolaris. Solaris can then take appropriate steps to ensure that networktraffic continues to flow on the remaining NIC(s).Other NICs do not support this link-based failure detection, and must useprobe-based detection. This method uses ICMP packets ("pings") from theNICs in the IPMP group to detect failure of a NIC. This requires one IPaddress per NIC, in addition to the IP address of the group.Regardless of the method used, configuration can be accomplished manuallyor via files /etc/hostname.<NIC-instance>. First I'll describe themanual method.Link-based DetectionUsing link-based detection is easiest. The commands to configure IPMP looklike these, whether they're run in an exclusive-IP zone for itself,or in the global zone, for its NICs and for NICs used by shared-IPContainers:# ifconfig bge1 plumb# ifconfig bge1 twilight group ipmp0 up# ifconfig bge2 plumb# ifconfig bge2 group ipmp0 upNote that those commands only achieve the desired network configuration untilthe next time that Solaris boots. To configure Solaris to do the same thingwhen it next boots, you must put the same configuration information into configuration files. Inserting those parameters into configuration files is also easy:/etc/hostname.bge1:twilight group ipmp0 up/etc/hostname.bge2:group ipmp0 upThose two files will be used to configure networking the next time thatSolaris boots.Of course, an IP address entry for twilight is required in /etc/inet/hosts.If you have entered the ifconfig commands directly, you are finished.You can test your IPMP group with the if_mpadm command, which can be runin the global zone, to test an IPMP group in the global zone, or can berun in an exclusive-IP zone, to test one of its groups:# ifconfig -a...bge1: flags=201000843 mtu 1500 index 4 inet netmask ffff0000 broadcast groupname ipmp0 ether 0:14:4f:f8:9:1dbge2: flags=201000843 mtu 1500 index 5 inet netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b...# if_mpadm -d bge1# ifconfig -a...bge1: flags=289000842 mtu 0 index 4 inet netmask 0 groupname ipmp0 ether 0:14:4f:f8:9:1dbge2: flags=201000843 mtu 1500 index 5 inet netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:bbge2:1: flags=201000843 mtu 1500 index 5 inet netmask ffff0000 broadcast if_mpadm -r bge1# ifconfig -a...bge1: flags=201000843 mtu 1500 index 4 inet netmask ffff0000 broadcast groupname ipmp0 ether 0:14:4f:f8:9:1dbge2: flags=201000843 mtu 1500 index 5 inet netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b...If you are using link-based detection, that's all there is to it!Probe-based detectionAs mentioned above, using probe-based detection requires more IP addresses:/etc/hostname.bge1:twilight netmask + broadcast + group ipmp0 up addif twilight-test-bge1 \\deprecated -failover netmask + broadcast + up/etc/hostname.bge2:twilight-test-bge2 deprecated -failover netmask + broadcast + group ipmp0 upThree entries for hostname and IP address pairs will, of course, beneeded in /etc/inet/hosts.All that's left is a reboot of the Container. If a reboot is not practicalat this time, you can accomplish the same effect by using ifconfig(1M) commands:twilight# ifconfig bge1 plumbtwilight# ifconfig bge1 twilight netmask + broadcast + group ipmp0 up addif \\twilight-test-bge1 deprecated -failover netmask + broadcast + uptwilight# ifconfig bge2 plumbtwilight# ifconfig bge2 twilight-test-bge2 deprecated -failover netmask + \\broadcast + group ipmp0 upConclusionWhether link-based failure detection or probe-based failure detectionis used, we have a Container with these network properties:Two network interfacesAutomatic failover between the two NICsYou can expand on this with more NICs to achieve even more resiliencyor even greater bandwidth.

Here's another example of Containers that can manage their own affairs. Sometimes you want to closely manage the devices that a Solaris Container uses.This is easy to do from the global zone: by...