Tupperware comes in sets...
By mrbill on Aug 13, 2008
Continuing where I left off, the previous blahg entries addressed installation of the Solaris 8 branded container. Those pieces covered the mechanics of the container itself. One of the key architectural decisions in this process was "where do we put the stuff?". Not just mountpoints and filesystems, we already covered that, but what pieces go on local disk storage, and what pieces go on the shared SAN storage?
Since the objective is to eventually integrate into a failover scenario, we looked at two options here. Each one has benefits and can supply a capability to our final solution. In the first case, we want to fail a container over to an alternate host system. In the second case, we want to fail a container over to an alternate datacenter. Think of these two as "Business Continuity" and "Disaster Recovery".
In the Business Continuity case, the capability to do "rolling upgrades" as part of the solution would be a huge added bonus. We decided to put the zone itself on local disk storage, and the application data on the shared SAN storage. This allows us to "upgrade" a container, roll the application in, and still maintain a "fallback" configuration in case the upgrade causes problems, with minimal downtime. Accomplishing this requires two copies of the container. Application data "rollback" and "fallback" scenarios are satisfied with the shared SAN storage itself through snapshots and point in time copies.
Similar to a cluster failover pair, both zones have their own patch levels and configurations, and a shared IP address can be used for accessing application services. Only one zone can be “live” at any time as these two zones are actually copies of the same zone “system”.
To migrate the branded container to another host system, the zone must be halted, and the shared SAN storage volumes must be detached, and unmounted from the original host system:
The detach operation saves information about the container and its configuration in an XML file in the ZONEPATH (/zones/[zonename] in our configuration). This will allow the container to be created on the target system with minimal manual configuration through zonecfg.
The detached container’s filesystem can now be safely copied to the new target system. The filesystem will be backed up and then restored on to the target system. There are many utilities that can create and extract backup images, this example uses the “pax” utility. A pax archive can preserve information about the filesystem, including ACLs, permissions, creation, access, and modification times, and most types of “special files” that are persistent. Make sure that there is enough space on both the source system and the target system to hold the pax archive (/path/to/[zonename].pax in the example) as the image may be several gigabytes in size. Some warnings could be seen during the pax archive creation. Some transient special files cannot be archived, but will be re-created on the target system when the zone boots.
On the target system, the zone filesystem space must be configured and mounted, and have “700” permissions with owner root. The /zones loopback mount must also be in place, just as in the source system.
Since the zone filesystem is not on shared storage, and will remain local to the target system, the “mount at boot” option can be set to “yes”.
Storage for the applications and data should now be imported and mounted on the target system to replicate the configuration of the source system. All mountpoints, loopback filesystems, and targets of the “add fs” components of the zone must be replicated. Once the filesystems are mounted into the global zone, the zone pax archive can be extracted. Again, care must be taken to make sure that there is sufficient space on the zone filesystem for the extraction:
The filesystem of the zone is now in place, but the zone is not yet configured into the target system. The zone must be created, modified as necessary (i.e. different network adapter hardware or device naming), and “attached” to the new host system. As a sanity check, it is highly recommended that the /usr/lib/brand/solaris8/s8_p2v command is run against the new zone to make sure that the new system “accepts” the attach of a zone created elsewhere:
The “attach” command may fail with messages about patch version conflicts, as well as extra or missing patches. Even though this is a full root zone, the detach/attach functionality makes sure that the host systems are equivalent. Some patches will be missing or extra in some cases, especially where the machine types or CPU types are different (sun4u, sun4v, HBA types and models, Ethernet adapter hardware differences, etc.). It is possible to normalize all patch versions and instances across systems of different configurations and architectures, but this involves significant effort and planning, and has no real effect on the operation of the hosting systems or the hosted zones (patching software that will never run on a given machine).
Once all errors and warnings are accounted for as “accepted deltas” or resolved, a failed attach can be forced:
Zone migration can be toggled between the machines by halting the zone, detaching the zone, moving the shared SAN storage into the target system, attaching the zone and booting the zone. Once the zone has been installed, configured, and booted on both systems, there is no need to use the s8_p2v function for migration. Strictly speaking, the “detach/attach” function is not necessary since the zone itself resides locally, and is not actually migrating, but it does provide an extra layer of protection on the non-active machine to keep the halted zone from being booted while the shared storage is not active. By setting the zone state to “detached”, the zone will not boot unless the “attach” command is executed first, providing the check for the shared SAN storage configured with the “add fs” zone configuration.
Pretty simple, huh? In fact, if you look at the above diagram, it looks mysteriously like the functionality of a cluster failover. Once we modeled and tested these actions by hand, we integrated the pair of containers into Veritas Cluster Server and managed the zones through the VCS GUI. Online, offline, failover... It all just works. Very cool stuff.