Monday Aug 11, 2008

Tupperware isn't the only Branded Container in town...

For this project, two Solaris 8 branded containers were installed on the test systems using "flar" images. The containers were created “unconfigured” so that IP addresses could be assigned to avoid conflict with the source systems. Instructions from the Solaris 8 branded containers administration guide were used to install and configure the test containers. The docs in the administration guide were excellent, with lots of "type this" and "you should see this" guidance.

We had two test scenarios that we wanted to play with: (1) Branded containers on SAN shared storage, and (2) branded containers hosted on local storage with SAN shared storage for application data. There are advantages and disadvantages for both, and situations where each has significant value. I'll get into the details of that in a future blahg entry.

In order to separate the "storage administration" from the “zone administration” function, storage was configured with device mounts within /z, regardless of the type (SAN, NAS or internal) of storage being utilized:

The storage for the container was mounted to the /zones/[zonename]/ path using loopback filesystems (lofs). This mount was created for testing using the command line:

So that the mount could persist for reboots, an entry was added to /etc/vfstab:

Note that the “mount at boot” option is set to “no” in this example. This is to allow for zones installed on SAN shared storage volumes to migrate back and forth between the systems. Zones installed on local, internal storage will use “yes” for the “mount at boot” option. In the shared SAN storage zones, the filesystems must be mounted as part of the zone management when rebooting, or attaching a zone into a new host:

The basic Solaris 8 branded container was created with the zonecfg command on the first Solaris 10 host system. The basic Solaris 8 branded zone contains a zonepath for the zone to be installed within, and a network interface for network access:

Extra filesystems for administrative tools and applications are mounted into the branded container from the global zone mountpoints with the “add fs” command within zonecfg. The loopback filesystem (lofs) is used to allow the Solaris 10 global zone to manage devices, volume management, filesystems, and mountpoints, and “project” that filesystem space into the branded container. While it is possible to pass physical devices in to a container, it is not a recommended architecture when at all possible to avoid, especially in the case of branded containers, where device and filesystem management would be running under the brand, while the kernel being leveraged would be of a different OS release. This loopback filesystem is defined during the zone configuration with:

Because the Solaris 10 Global Zone (GZ) can run on machines that don’t support native Solaris 8, some applications and software packages could become confused about the architecture of the underlying hardware and cause problems. The Solaris 8 branded container is shielding us from the hardware platform hosting the Solaris 10 GZ, so it is recommended that we set a machine architecture of “sun4u” (Sun UltraSPARC) for the branded container so that the hardware platform is essentially hidden from the operating environment within the container. The “machine” attribute can be assigned within the zone configuration with:

Other filesystems can be added, additional network interfaces can be defined, and other system attributes can be assigned to the branded container as needed. Once the container has been configured within zonecfg, the configuration should be verified and committed to save the data.

The branded container is now configured and ready for installation of an image (flar) from a physical system. The Solaris 8 system image is created using the flarcreate command. Make sure that the source system is up to speed on patches, especially those related to patching, flash archives (flar) and the "df" patch (137749) that I mentioned in an earlier blahg entry.

The configured zone can now be installed using the flar image of the physical Solaris 8 based system:

Once the zone has been installed, it the “P2V” utility must be run to turn the physical machine configuration into a virtual machine configuration within the zone:

The Solaris 8 branded container can now be booted. There will likely be some system startup scripts, tools, and applications that will give warning messages and errors on boot up. Remember the nature of the zone / host system relationship and decide what needs to run in a zone and what functionality should remain in the host system global zone. The zone will boot “un-configured”, and ask for hostname, console terminal type, name service information, and network configurations on the first boot. As an alternative, a “sysidcfg” file can be copied into the /zones/[zonename]/root/etc/ directory before the first boot to allow the zone to auto-configure with sysid upon first boot.

That's it. Other than fixing up the system startup scripts (/etc/rc\*.d and /etc/init.d), and attaching in a copy of the source system's SAN attached data, the move is done and ready to be tested. The really cool part of this is that it just works. We were expecting some application issues and possibly some "speed of light" problems, but everything just worked. Obviously some things had to be adjusted in the branded container, disabling the Veritas Volume Manager startup, and some hardware inventory scripts that used "prtconf" to collect information, but these were identified early, and several reboots sorted out the symptoms from the zone's boot messages on the zlogin console.

More details later about migration of the zone between systems, various storage configurations that we tested, and some other (hopefully) interesting thoughts.


Thursday Jun 19, 2008

JET checkpoint, our milestone...

At this point, we have the packages and patches that we want, and the disk configuration that we want. We have a running system that we can fiddle with and install configuration files into. From this point forward, the system flar becomes evolutionary. Files like complex resolv.conf, NTP configurations (ntp.conf), passwd, shadow, netmasks, networks, sshd_config and the like can be configured on the JET provisioned host and then wrapped up into our "Golden Image" flar for N1SPS/N1SM to use.

For this project, we tried to maintain a line of separation between the "system" side of the configurations, and the "deployed applications" side. Within the deployed applications pile, we have two groups: IT and management applications, and end-user applications. Things like SunMC, backup tools, and administrative utilities are deployed through the IT and management steps. End-user applications (web servers, application servers, message queue) will be provisioned through other steps owned by different groups under N1SPS and N1SM (with DI Dynamic Infrastructure). Rather than use both JET and N1SPS/N1SM to maintain our system configuration, we concentrated on the JET side, and only worked within the deployment frameworks where it was necessary.

One accidental side effect that we leveraged early on was provided by the JET framework itself. All of the JET generated scripts that twiddle the bits after the Jumpstart of the system are installed on the JET installed machine. In /var/opt/sun/jet, we find the post_install scripts that were used to create the system. This includes the scripts to set up the root disk mirroring, the metadbs, and the zone partitioning. The scripts for other configuration tasks, generating ssh keys, setting ndd configurations, enabling/disabling services, installing our JASS/SST driver, adding services that will run a JASS audit every time the system boots, and our /etc/system setup script that I noted earlier, are also present.

Since all of this work was done to automate the installation and make our repetitive configuration tasks easier and safer, why not leverage that work in the deployment frameworks? So our flar not only contains the software that we want to be installed, and the configurations of the services within that system, but it also contains the necessary scripts to deploy that image and those configurations on a piece of hardware, and regenerate alot of the settings and configuration dynamically. Very cool.

As an example of how this is a timesaver for us, we had to juggle the zone soft partition sizes in mid-project. If we were using the N1SPS and N1SM (including DI Dynamic Infrastructure goodies) plans to deploy the disk configurations, we would have had to juggle configuration information for all of our configured hosts in the test environment. In our case, we modified the JET template with the new sizes, ran the "make_client -f", did a "boot net - install" on our test machine, and then rolled up a new flar revision for the deployment folks to use with the new disk configuration embedded in the flar through the inclusion of the JET generated scripts. Grand total time to do this was a couple hours, and because this was a "system side" change, the IT/management and end-user applications folks under N1SPS and N1SM were never impacted by the change. In fact, they didn't even notice that the change had taken place. Very cool.

Late in the project, we did repeat all of our evolutionary steps on a single flar revision just to test our release notes and documentation. We went back and installed the packages we wanted from the DVD, and applied all of our documented changes to produce the "final flar" image from scratch. This was a great way to test our documentation and our procedures, and did uncover a few issues for us. Still, grand total time to produce the final system image flar with embedded scripts was one day, including testing. Nice.

One of my co-workers (Hi Jim!) became deeply involved with the packaging and cluster mechanics of Solaris distributions in this project. He wrote several tools (that I will let him write about at a future date and hopefully contribute to OpenSolaris) for juggling and debugging the package dependencies and installation order information. He wrote one tool in particular that I found incredibly useful. The tool "cooks" the package information from the installation media, takes a snapshot of the pkginfo from an installed machine, and creates the ordered list of packages to add necessary to satisfy all of the pkgadd dependencies. This will allow us to create our own "metaclusters". So we can now add "SUNWCgolden" to the standard "SUNWCall, SUNWCprog, SUNWCuser, and SUNWCmin" Jumpstart options. No more "packages to add" and "packages to remove" as post-install tasks to generate our flar from the installation media. Jim rocks.


In the zone with SVM and JET...

We now have a decent OS image, our root disk layout finished and mirrored, along with the associated metadbs live. You DID test at this point and make sure that the template file checks out and the target machine installs, right?

The partitions that we need to use for zone allocations are created for us, slice s6 on each of our 8 disks. The root disk and the root mirror disk have about 80G in s6, and the remaining 6 disks have about 143G in s6. We need to glue all of that free disk space together into a giant mirrored metadevice. We'll call it d100. In our sds_metadevices_to_init variable, we need to add:

d101 d102 d100

This created metadevices d101, d102 and d100. Metadevice d101 would be defined as a 4-way concatenation of partitions c1t0d0s6, c1t2d0s6, c1t4d0s6, and c1t6d0s6. Metadevice d102 would be defined as a 4-way concatenation of partitions c1t1d0s6, c1t3d0s6, c1t5d0s6, and c1t7d0s6. We didn't see a real need for striping in this configuration for performance, and we want to retain the option of stacking more disk partitions on top of the concatenation at some point, if that becomes an issue. Remember that these disks are on a single controller, with the throttles and bottlenecks associated with that hardware limitation.

The two metadeviced d101 and d102 were then mirrored into a new metadevice named d100. This could have been accomplished by the command lines:

# metainit d101 4 1 c1t0d0s6 1 c1t2d0s6 1 c1t4d0s6 1 c1t6d0s6
# metainit d102 4 1 c1t1d0s6:1 c1t3d0s6 1 c1t5d0s6 1 c1t7d0s6
# metainit d100 -m d101
# metattach d100 d102

Now that we have a ~500GB disk partition called d100, we need to slice out the soft partitions. Since we were unable to use ZFS, using SVM softpartitions gives us the maximum flexibility possible when we slice out the zone partitions. For now, we will be creating fairly standard sizes and mountpoints for the zones to be created on, but that might change later.

The first zone (100) is for the Message Queue, and will contain 4 filesystems (two, plus Live Upgrade space). Zones 101 through 135 will use the same 6 partitions and sizes. Naming for the zone space metadevices is d[zone number][partition number].

	d1000 d1001 d1002 d1003
	d1010 d1011 d1012 d1013 d1014 d1015
	d1350 d1351 d1352 d1353 d1354 d1355

This is the equivilant of running over 200 "metainit d#### -p d100 [size]" commands by hand. I created this section with a quick and dirty shell script that did a loop, counting up each zone, and containing a loop with echo's that created the lines for that zone. Ugly, but quick and effective.

# Creating filesystems
# If you wish to newfs any UFS filesystems, then you can do them from here.
# As a side effect, the devices need not be DiskSuite ones....
# sds_newfs_devices:	Space separated list of devices to newfs - full paths
# If you wish to specify extra newfs options, then use a custom script to
sds_newfs_devices=" /dev/md/dsk/d1000 

This tells JET to newfs all of those metadevices that we just softpartitioned out of the d100 space. This is the most time-consuming piece of the machine build, lasting about 10-15 minutes. Again, this section was initially created with a quick and dirty shell script full of "for" loops and "echo" statements.

# sds_mount_devices:	space separated list of tuples that will be used
#			to populate the /etc/vfstab file
#	tuples of the form
#	blockdevice:mountpoint:fsckpass:mountatboot:options
#		blockdevice:	/dev/dsk/.... or /dev/md/dsk/....
#		mountpoint:	/export/big_raid_device	
#		fsckpass:	5th column from /etc/vfstab
#		mountatboot:	"yes" or "no"
#		options:	7th column from /etc/vfstab

This section creates the /etc/vfstab entries for the partitions that we just created. Yes, we are mounting the Live Upgrade space for both the global zone, and for our full root local zones, but that keeps folks from stealing space from my d100 or mounting those allocated partitions for other nefarious uses. The space is there and mounted into the global zone for when I need to use it for LU. Philosophical arguments on this item > /dev/null and as always, your mileage may vary.

Now we should be ready to cook our JET template and give it a shot:

jetserver# pwd

jetserver# ../bin/make_client -f myhost
Gathering network information..
        Client: (
        Server: (, SunOS)
Solaris: client_prevalidate
         Clean up /etc/ethers
Solaris: client_build
Creating sysidcfg
Creating profile
Adding base_config specifics to client configuration
Adding flash specifics to client configuration
FLASH: Modifying client profile for flash install
FLASH: Removing package/cluster/usedisk entries from profile
Adding eiscd specifics to client configuration
EISCD: EISCD: No profile changes needed
Adding sds specifics to client configuration
SDS: Configuring preferred metadevice numbers
Solaris: Configuring JumpStart boot for myhost
         Starting SMF services for JumpStart
Solaris: Configure bootparams build
cleaning up preexisting install client "myhost"
removing myhost from bootparams
updating /etc/bootparams
Force bootparams terminal type
-Restart bootparamd
Running '/opt/SUNWjet/bin/check_client  myhost'
        Client: (
        Server: (, SunOS)
Checking product base_config/solaris
Checking product flash
FLASH: Checking nfs://
Checking product custom
Checking product eiscd
Checking product sds
Check of client myhost

-> Passed....

YAY!!! We now have a template that (according to JET) might actually work. Let's go over to the target machine and give it a shot:

sc> break -y
sc> console

Enter #. to return to ALOM.

{0} ok 
{0} ok 
{0} ok boot net - install

Boot device: /pci@0/pci@0/pci@1/pci@0/pci@2/network@0  File and 
args: - install
1000 Mbps full duplex  Link up
Requesting Internet Address for 0:14:4f:d3:9a:0
Requesting Internet Address for 0:14:4f:d3:9a:0
Requesting Internet Address for 0:14:4f:d3:9a:0

... blah blah blah ...

Using sysid configuration file
Search complete.
Discovering additional network configuration...

Completing system identification...

Starting remote procedure call (RPC) services: done.
System identification complete.
Starting Solaris installation program...
Searching for JumpStart directory...
Using rules.ok from
Checking rules.ok file...
Using begin script: Utils/begin
Using derived profile: Utils/begin
Using finish script: Utils/finish
Executing JumpStart preinstall phase...
Executing begin script "Utils/begin"...
Installation of myhost at 09:45 on 19-Jun-2008
Loading JumpStart Server variables
Loading JumpStart Server variables
Loading Client configuration file
Loading Client configuration file
FLASH: Running flash begin script....
FLASH: Running flash begin script....
CUSTOM: Running custom begin script....
CUSTOM: Running custom begin script....
Begin script Utils/begin execution completed.
Searching for SolStart directory...
Checking rules.ok file...
Using begin script: install_begin
Using finish script: patch_finish
Executing SolStart preinstall phase...
Executing begin script "install_begin"...
Begin script install_begin execution completed.

So far so good, it found the right stuff. Now it should set up the disks and extract my flar.

Processing profile
	- Opening Flash archive
	- Validating Flash archive
	- Selecting all disks
	- Configuring boot device
	- Using disk (c1t0d0) for "rootdisk"
	- Configuring / (c1t0d0s0)
	- Configuring swap (c1t0d0s1)
	- Configuring /GZ_VAR_LU (c1t0d0s3)
	- Configuring /GZ_ROOT_LU (c1t0d0s4)
	- Configuring /var (c1t0d0s5)
	- Configuring  (c1t0d0s7)
	- Configuring  (c1t2d0s7)
	- Configuring  (c1t3d0s7)
	- Configuring  (c1t4d0s7)
	- Configuring  (c1t5d0s7)
	- Configuring  (c1t6d0s7)
	- Configuring  (c1t7d0s7)
	- Configuring  (c1t0d0s6)
	- Configuring  (c1t2d0s6)
	- Configuring  (c1t3d0s6)
	- Configuring  (c1t4d0s6)
	- Configuring  (c1t5d0s6)
	- Configuring  (c1t6d0s6)
	- Configuring  (c1t7d0s6)
	- Deselecting unmodified disk (c1t1d0)

Verifying disk configuration
	- WARNING: Changing the system's default boot device in 

Verifying space allocation
	NOTE: 1 archives did not include size information

Preparing system for Flash install

Configuring disk (c1t0d0)
	- Creating Solaris disk label (VTOC)

Configuring disk (c1t2d0)
	- Creating Solaris disk label (VTOC)

Configuring disk (c1t3d0)
	- Creating Solaris disk label (VTOC)

Configuring disk (c1t4d0)
	- Creating Solaris disk label (VTOC)

Configuring disk (c1t5d0)
	- Creating Solaris disk label (VTOC)

Configuring disk (c1t6d0)
	- Creating Solaris disk label (VTOC)

Configuring disk (c1t7d0)
	- Creating Solaris disk label (VTOC)

Creating and checking UFS file systems
	- Creating / (c1t0d0s0)
	- Creating /GZ_VAR_LU (c1t0d0s3)
	- Creating /GZ_ROOT_LU (c1t0d0s4)
	- Creating /var (c1t0d0s5)

Beginning Flash archive processing

Predeployment processing
16 blocks
16 blocks
16 blocks

No local customization defined

Extracting archive: patchedflar
	Extracted    0.00 MB (  0% of 1302.60 MB archive)
	Extracted    1.00 MB (  0% of 1302.60 MB archive)
	Extracted    2.00 MB (  0% of 1302.60 MB archive)
	Extracted 1302.00 MB ( 99% of 1302.60 MB archive)
	Extracted 1302.60 MB (100% of 1302.60 MB archive)
	Extraction complete

Postdeployment processing

No local customization defined

Customizing system files
	- Mount points table (/etc/vfstab)
	- Unselected disk mount points (/var/sadm/system/data/vfstab.unselected)
	- Network host addresses (/etc/hosts)
	- Environment variables (/etc/default/init)

Cleaning devices

Customizing system devices
	- Physical devices (/devices)
	- Logical devices (/dev)

Installing boot information
	- Installing boot blocks (c1t0d0s0)
	- Installing boot blocks (/dev/rdsk/c1t0d0s0)
	- Updating system firmware for automatic rebooting

Installation log location
	- /a/var/sadm/system/logs/install_log (before reboot)
	- /var/sadm/system/logs/install_log (after reboot)

Flash installation complete
Executing JumpStart postinstall phase...
Executing finish script "Utils/finish"...

YAY!!! Still going well. Now we'll see if the JET stuff actually works. At this point, we see lots of information about moving the scripts around, copying stuff in, and setting up for the reboots and activities that happen between those reboots. After the first reboot, we see:

Boot device: /pci@0/pci@0/pci@2/scsi@0/disk@0,0:a  File and args:
SunOS Release 5.10 Version Generic_127127-11 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
/dev/rdsk/c1t0d0s4 is clean
/dev/rdsk/c1t0d0s3 is clean
JumpStart (/var/opt/sun/jet/post_install/S99jumpstart) started @ Thu 
Jun 19 09:52:42 MST 2008
Loading JumpStart Server variables
Loading Client configuration file
Running additional install files for reboot Platform/1
NFS Mounting Media Directories
Mounting nfs:// on 
Mounting nfs:// on 
CUSTOM: Running 001.custom.001.set_etc_system
SDS: Running 002.sds.001.create_fmthard
SDS: Running 002.sds.001.set_boot_device
SDS: Running 002.sds.002.create_metadb
SDS: Running 002.sds.003.create_root_mirror
SDS: Running 002.sds.007.create_user_devices
SDS: Running 002.sds.008.create_filesystems
SDS: Running 002.sds.009.create_vfstab_entries
NFS Unmounting Media Directories
Unmounting /var/opt/sun/jet/js_media/pkg
Unmounting /var/opt/sun/jet/js_media/patch
Save existing system entries
Copying file /etc/system to /etc/system.prejs2
Jun 19 10:11:03 myhost reboot: rebooted by root

Jun 19 10:11:04 myhost syslogd: going down on signal 15

Jun 19 10:11:04 rpc.metad: Terminated

syncing file systems... done

YAY!!! Our disk configurations all executed without errors or warnings. Now let's see if it comes back clean after the reboot.

Boot device: rootdisk  File and args: 

SunOS Release 5.10 Version Generic_127127-11 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
/dev/md/rdsk/d4 is clean
/dev/md/rdsk/d3 is clean
/dev/md/rdsk/d1000 is clean
/dev/md/rdsk/d1001 is clean
/dev/md/rdsk/d1002 is clean
/dev/md/rdsk/d1003 is clean
/dev/md/rdsk/d1352 is clean
/dev/md/rdsk/d1353 is clean
/dev/md/rdsk/d1354 is clean
/dev/md/rdsk/d1355 is clean

... [lots of other activity deleted]

SDS: Running 003.sds.001.attach_mirrors
SDS: Running 003.sds.002.attach_user_mirrors
Disable & delete SMF tag svc:/site/jetjump
JumpStart is complete @ Thu Jun 19 10:15:55 MST 2008

myhost console login: 



Wednesday Jun 18, 2008

JET and SVM, a match made in heaven...

The flar that we created earlier now has all the packages and patches (oh yeah, JET patches the system for you as well) in it that we want, and has the basic filesystems installed for us on the root disk. We implemented the root disk layout (reserving s7 for metadb in the SVM sections later) with this section of the JET template:

#  Define Root disk configuration
#    Make sure that /var has a Live Upgrade slice
#    if /var is a separate filesystem.
#    Mount the LU spaces to make sure that someone
#    doesn't come along later and use that "free space".
#    s6 is the "freehog" partition to contain all free
#    space after the static slices are allocated
#    s7 is defined later and used as a metadb space.

Hint number one... Put lots of comments in your template files to remind yourself (and others who come along later) why and how you did things. These template files can be rather large and complex.

This section defines the rootdisk (reserved word in JET) with a / partition of 8G on s0, swap space of 32G on s1, /var of 8G on s5, and some Live Upgrade partitions for / and /var on s3 and s4 with matching sizes (important). The metadb space for SVM will be on s7 (defined later in the template), and any leftover space will be allocated to a partition on s6, but not mounted. We will add this free space to our pile of space for use by zones later on.

# Any devices we need to skip (spare slices on root disks etc)
# Format: c?t?d?s?
sds_skip_devices="c1t0d0s6 c1t1d0s6"

This tells JET that we don't want to use the root mirroring phase to set up the partitions that we will use later as part of the zone disk space.

#  Additional disks under JET control.  Skip the
#  root mirror disk (c1t1) as it is defined in the
#  mirroring steps below.
base_config_profile_additional_disks="c1t2d0 c1t3d0 \\
     c1t4d0 c1t5d0 c1t6d0 c1t7d0 "

This is where comments come into play. Without this comment, you might think that the root mirror disk should be listed in "additional disks under JET control", it just makes sense. But no, that would make very ugly things happen.

Now that we have reserved the disks for JET to use, we need to layout a disk partition scheme for them. Again, we don't want to mount them, and we will use s7 later on to add metadbs for SVM to use:

#  Define layout of the additional disks to use
#  s6 as the freehog space.  s7 will have already
#  been reserved by the metadb definitions below.

That sets up all of our physical disks (except for the metadb stuff, but that will come along later). At this point, we could install the machine and make sure that all is well, and our rootdisk works as expected. The next step is to add in the SVM stuff. Let me repeat, now is a REALLY good time to stop, try things out, and make sure that your root disk and "additional disks" are configured the way you want them.

There are three basic pieces in the SVM configuration that we need to worry about. We need to set up the metadb copies, copy the configuration of the rootdisk and mirror it, and then we need to set up the leftover diskspace and put it into a big metadevice to use later for zone space soft partitions.

# Kernel options
# If you need to increase the number of metasets from the 
# default (4) or the number of metadevices per metaset from 
# the default (128), then enter the figures here.
# Please note that increasing these numbers can significantly 
# increase the time taken to do a reconfiguration boot, or a 
# drvconfig etc.

We needed to increase the default number of metadevices from 128 to something higher. We did some testing, and making this number in the thousands didn't hurt our performance, so we erred on the side of safety with 4000.

# This variable defines where SVM will create the metadbs.
# If any meta state databases are to be created, add 
# the c?t?d?s? number here. If you need multiple copies 
# (i.e. metadb -c 3), suffix the device with a : and the 
# number of copies. e.g. c0t0d0s7:3
# eg: sds_database_locations="c3t0d0s7:3 c1t0d0s7:3"
sds_database_locations="rootdisk.s7:3 c1t2d0s7:3 c1t3d0s7:3 \\
     c1t4d0s7:3 c1t5d0s7:3 c1t6d0s7:3 c1t7d0s7:3"

In theory, according to the template comments, you don't need to specify the rootdisk or the root mirror disk in this variable. We didn't notice the comment until we already had a working configuration, and left things alone. Your mileage may vary, but we don't get warnings or errors with this configuration and everything is working fine for us.

We have set up metadb partitions in this step, and placed three copies on each metadb partition. Again, this was a part of the build specification, and traditional for the customer. That is 24 copies of the metadb, and I am not sure if I would have configured things this way if the choice was mine. Definitely do your due diligence and make your own configuration decisions wisely. Your mileage may vary.

# This variable ensures that partitions are created to 
# hold the metadbs defined above. 
# Specify locations in one of the following forms
#	s:size	       - creates s7 on the rootdisk
#	c?t?d?s:size    - creates slice on specified device
sds_database_partition="s7:32 c1t2d0s7:32 c1t3d0s7:32 \\
      c1t4d0s7:32 c1t5d0s7:32 c1t6d0s7:32 c1t7d0s7:32

We are creating eight metadb partitions, one on each disk, 32MB each. This section also reserves the proper space on s7 of the rootdisk. We had issues with the configuration when we tried to configure the metadb partition on s7 in the section of the template where we defined the rest of the rootdisk partitions. JET is smart enough to slice s7 out for us before calculating the "free" space for s6 on the root disk.

# If the boot device alias needs setting then do it here
# ie sds_root_alias="rootdisk"
# This will update the boot-device filed to ${sds_root_alias} 
# net and add the name to devalias, removing any previous 
# one of the same name

# If we do have a root mirror, then set the devalias device 
# to this name

# If you are using a two disk system and are mirroring the 
# root device,
# you may want to enable md:mirrored_root_flag in the 
# kernel (/etc/system).

# You should read the info doc about this and fully understand 
# the implications of setting it... i.e. it's not just a case 
# of always setting it!

Here we assign a name for the root mirror disk. This will be the alias used in the boot prom to set up the root mirror as a second bootable device. We also set the md:mirrored_root_flag in /etc/system. Definitely read the Infodoc that the template mentions and make your own decision on this one.

# By default, the root disk will be mirrored slice by 
# slice; the metadevices will start with d10 for the 
# first slice (sub-mirrors d11 and d12), d20 for
# the second slice (sub-mirrors d21 and d22) upwards.
# If you wish to use your own numbering scheme for the 
# metadevices, please specify them here, in the following 
# format- 
#    :mirror md:sub mirror 1:sub mirror 2

In this section, we are defining the device names for the partitions of the root disk. We were following a local numbering scheme, and this worked well for us. One interesting note here that took a couple hours to debug, apparently metadevices may be named "d0", but they can't be named d0[anything]. So d01 is not allowed. d001 is not allowed. d02843 is not allowed. Oops. I'll definitely remember that one.

# If the root device is to be mirrored, define the mirror 
# here (c?t?d?).
# sds_use_fmthard can be set to "yes" | "no" | ""
# If sds_use_fmthard is set to "yes", then JET will create 
# metadb partitions and create the metadb as defined on 
# the root disk. You DO NOT need to specify the root mirror 
# in the sds_database_partition nor sds_database_locations 
# variable.
# If sds_use_fmthard is set to "no" or "", then JET will 
# create the data partitions for you, but you will have 
# to populate the sds_database_\*  variables if you want a 
# metadb to exist on the root mirror.
# You MUST set fmthard=yes for Solaris 9 and above. 

Wow. That section is easy. Those two lines are all it takes to mirror the root disk. Just tell JET what disk to mirror it to, and tell JET to use "fmthard" to copy the partition table. This causes the installation to run a "prtvtoc" command against the configured root disk, and then feed that output to "fmthard -s" against the disk defined in sds_root_mirror. Simple.

For our configuration though, we will add specific naming for the root disk and root mirror disk partitions and metadevice names:

#  sds_metadevices_to_init="d81:1:1:/dev/dsk/c1t0d0s0 d80:-m:d81"
#	Equivalent to
#		metainit d81 1 1 /dev/dsk/c1t0d0s0
#		metainit d80 -m d81
#       This will create a one-way mirror on d80 to d81.
#       Example of combined "" and "command line" syntax:
#	(this would all be on one line, but has been split for 
#       clarity)
#  sds_metadevices_to_init="d71 d72 d70 d81:1:1:/dev/dsk/c1t0d0s0
#				d82:1:1:/dev/dsk/c2t0d0s0
#				d80:-m:d81 d91 d92 d90"
sds_metadevices_to_init="d91 d92 d0 
			d11 d12 d1
			d31 d32 d3
			d41 d42 d4
			d51 d52 d5

There we have it. Those pieces of the template define the metadbs and the root disk mirroring, and setup the partitions that we will use later to create the space for our zones. Fairly painless and straightforward, and definitely easier and safer than doing all of the twiddly bits by hand. Absolutely easier than repeating those manual tasks for 100 servers too! Next entry, I'll delve deeper into the zone partition space allocation, and the soft partitioning that we used to accomplish that piece.

This, again, is a REALLY good time to stop and try the template out. At this point, we have sliced up the root disk, created metadbs, sliced up the extra disks, and mirrored the root disk. These are the really tricky parts that will break a machine in interesting ways that are more difficult to debug. If we know that these parts are working, then creating the spaces on our leftover partitions for the zones will happen on a running and (hopefully) stable system environment, making debugging much easier.


JET is your friend...

One of the challenges in this project is keeping an eye on the evolutionary qualities of the underlying components. In other words, we know that the basic elements of disk layout, packages added, configuration files, etc. will be in flux throughout the development and test cycles. In order to streamline these activities and make them a tad safer, we decided to implement as much as possible with the Jumpstart Enterprise Toolkit (JET).

JET allows you to create template files, describing not only basic Jumpstart configurations, but many of the most common "post install" tasks that administrators do manually. Things like JASS (Solaris Security Toolkit), Solaris Volume Manager (SVM), more complex network configurations like IPMP, and many other goodies are either implemented in JET already, or easily integrated as a repeatable and automated task. In our case, with IPMP, over 200 filesystems using SVM, and some post-install tasks (dropping in config files, setting up some standard /etc/system and ndd settings, etc.), JET is definitely a great first step on our road to our final goal of doing this through N1SPS/N1SM.

I won't cut and paste my whole template file into my blog, but I will note a few of the key sections. We started with an existing JET template from the current production environment. This had the benefit of bringing along several post-install scripts that installed default configuration files for syslog, NTP, and SSH, and gave us standard and proven locations and methods for installing some of the extra software that this project required (expect, some GNU tools, etc.). Since we are working with a flar and not using the standard pkgadd method, this line in the template file (from /opt/SUNWjet/Templates by default) gives us the starting system image that we want:

#       Identify flar to load

That's it. Just that line, and all the stuff from our hand-built machine is installed on the new box. Of course, we still need to specify a bunch of other configuration things, like disk layout, SVM configs, etc., but I'll get to those later.

In addition to our standard flar, we want to install some changes to /etc/system, and twiddle some extra bits. This is accomplished with a script that we wrote and saved in /opt/SUNWjet/Clients/common.files called set_etc_system. We just place that script into the JET common.files directory and configure the template to run it in reboot 1 (JET installs do several reboots after the initial flar load):

#  Override default custom scripts.  
#  Set default locale to POSIX/C to get around buggy use of "tr"

Our set_etc_system script isn't super complicated, or error-proof, but I am using it as a simple example. We wrote it to automate a way around a mistake that we were making, forgetting to clean out settings between flar revisions. So for the settings that we want to be there, if the setting exists, leave it alone. If it doesn't exist in /etc/system, put our setting into the file:

set -a
#   set_etc_system
#   1.1 - Bill Walker < >
#   Set some sane /etc/system variables if they don't exist
BN=`/usr/bin/basename ${0}`
cat >> ${ROOTDIR}/etc/system << EOF
\* Added by Jet set_etc_system `date +%d%b%y`

if grep " autoup" /etc/system >/dev/null
        echo "${BN} : autoup already in /etc/system..."
cat >> ${ROOTDIR}/etc/system << EOF
set autoup=480

if  grep " tune_t_fsflushr" /etc/system >/dev/null
        echo "${BN} : tune_t_fsflushr already in /etc/system..."
cat >> ${ROOTDIR}/etc/system << EOF
set tune_t_fsflushr=60

if  grep " rlim_fd_cur" /etc/system >/dev/null
        echo "${BN} : rlim_fd_cur already in /etc/system..."
cat >> ${ROOTDIR}/etc/system << EOF
set rlim_fd_cur=1024

Simple but repetitive tasks such as this are amazingly easy with JET. Every time you find yourself doing some task more than once on an install or deployment, just take a couple minutes and create a script to do it for you, and add it to the custom_scripts list in your JET template. Things like cleaning out the SSH known_hosts entries for users, creating a complex /etc/resolv.conf file, adding a new service, making sure that some services are disabled or sending an email or message to you to let you know that things succeeded or failed can be great time savers and make deployment safer and faster.

Not much new and exciting information here if you already use JET, but I'll be digging a bit deeper into JET in this blahg later.





« February 2017