Thursday Mar 04, 2010

Running VirtualBox VMs as Services in Windows

While I have been tinkering with VirtualBox on Windows hosts, one thing I really wanted to do is to run VMs truely in the background. So that there are no console windows or anything left behind while the VM is running. VirtualBox has a utility called VBoxHeadless which you think would do the trick, but unfortunately it leaves a console window running, as far as I can tell, just to spew out a little information. This is even when you've selected to access the screen via VRDP.

 Well, I just found this handy little tool called VBoxVmService. With this package, you can easily create up to 127 VMs and run them as a service is any flavor of Windows (I tried it with Windows 7 x64). The set up instructions are pretty straight forward, essentially unpack the software somewhere, edit the configuration file to add your VM(s), register VBoxVmService as a Windows service, and start the service.

The only odd thing for me was that initially the start up process complained that it could not find my VMs by name. After digging around a few posts on the VirtualBox forums, I found a workaround that fixed it for me. I had to add VBOX_USER_HOME as a System environment variable, and reboot. After which everything has been running great.

The VMs start up at boot, and I can access them with Windows Remote Desktop Connection client.


Tuesday Feb 23, 2010

Stupid VirtualBox & ZFS Tricks

So the other day I had the idea to flip things around, instead of running a Windows 7 guest within an OpenSolaris host (with virtualbox) on my desktop, run an OpenSolaris guest within a Windows 7 host.

I thought I'd do this on my primary desktop at work. The only problem is that I have a ZFS pool that contains the large majority of all my work at Sun over the nearly 20 years. Without access to that, the whole "experiment" would be of little use.

Fortunately the pool was on a second set of disks in the desktop, so I could happily leave it alone while I did the first part of the switch, install Windows 7 (64bit) and VirtualBox on bare metal, and then an OpenSolaris guest. This was all pretty straight forward, nothing additional I can offer over what is out there already.

Now the interesting part(s), first hurdle, how do I get VirtualBox to recognize the physical disks the ZFS pool resides on. After a little web searching, it ended up being pretty easy, the key is to use the VBoxManage internalcommands createrawvmdk command that VirtualBox provides.

Using this command, you can create a vmdk definition that points to a physical disk or partition. In my case, I wanted to point to the entire disk, so I used the following commands.

VBoxManage internalcommands createrawvmdk -filename C:\\Users\\Mocker\\.VirtualBox\\HardDisks\\PhysicalDrive2.vmdk" -rawdisk \\\\.\\PhysicalDrive2 -register
VBoxManage internalcommands createrawvmdk -filename C:\\Users\\Mocker\\.VirtualBox\\HardDisks\\PhysicalDrive3.vmdk" -rawdisk \\\\.\\PhysicalDrive3 -register

Now, here's the rub, in order to do this, VBoxManage needs to be run as administrator. Windows 7 apparently has some pretty tight restrictions on who/what is allowed to access raw disks. One of the easier ways to do this is to just run a command shell as administrator, which will then execute VBoxManage as administrator.

I did this by simply clicking the Start menu, entering "cmd" in the search box, which will present cmd.exe as a result, right click on cmd.exe and select "Run as Administrator".

Ok so now I had created vmdks for the physical drives. I needed to attach them as virtual disks to my OpenSolaris guest. I just used the VirtualBox UI for this. Click on the guest (make sure its not running), then click on the Storage section of the configuration, Add a couple of hard disks using the "Add Hard Disk" button of the IDE controller, then select the physical drive vmdks for each.

Note, same rub as above applies, I needed to run VirtualBox as administrator. This is getting to be a drag.

Finally, the moment of truth, time to fire up the OpenSolaris guest, and see if it will recognize the ZFS pool...

After the guest booted up, I logged in, using the format command, I could see the guest recognized the new disks, good.

Next, I ran zpool import, and, drum roll, yes, it indeed found my ZFS pool!

Finally, I ran zpool import <poolname> and voila! OpenSolaris happily imported the pool - lock, stock and barrel --

# zpool status storage
  pool: storage
 state: ONLINE
 scrub: none requested
config:
    NAME        STATE     READ WRITE CKSUM
    storage     ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        c8d1s0  ONLINE       0     0     0
        c7d1s0  ONLINE       0     0     0

One caveat, I noticed Windows 7 doesn't play well with EFI labels. Although this initial attempt worked fine, later on I found that the pool became inaccessible after reboots of the guest. Running the Disk Management tool in Windows would complain about these two disks since they had EFI labels on them.

I managed to fix this pretty easily though, I simply booted back into OpenSolaris on bare metal (yeah, I actually created a dual boot system), then though a process of zfs detach, format, zfs attach, I relabeled the disks and Windows 7 was much happier.


Friday Jan 29, 2010

How to Shrink a mirrored ZFS rpool

The other day I wanted to shrink the size of my OpenSolaris ZFS root pool.The root pool is generally pretty small, and when possible I try to keep real data out of the root pool. In the case of target system, it has 4 disks and I have separate mirrored data pool. So I've got nearly 1 TB on two disks (root pool mirroring) just sitting idle, which could be used for something like say installing other O/Ses on bare iron (yeah, sometimes I still like to do that).

Anyways, maybe I am pathetic at web searches but after a few minutes of searching I could not find an answer. So I thought about it for a while and came up with a plausible way to do it, without the need of a live CD or usb stick or anything (did I mention I'm lazy, and didn't really feel like burning a CD)

The method I came up with is this, I'll follow up a summary with more detailed steps. Note that although this procedure is to shrink a mirrored root pool, you can probably use the same method on a non-mirrored pool, as long as you have a spare hard drive or partition somewhere.

  1. Break the root pool mirror
  2. Create a temporary root pool, resized appropriately, and boot to it
  3. Destroy the real root pool, resize it, and boot back to it
  4. Destroy the temporary root pool.
  5. Reattach the mirror to the resized root pool

It sounds pretty simple, but there are a lot of steps involved. So here are the details.

In my case, I have a mirrored root pool made of the two devices

c2t0d0s0
c2t1d0s0

Break the root pool mirror.

zpool detach rpool c2t1d0s0
Create a temporary root pool, resized appropriately, and boot to it.

First, you need to resize the Solaris fdisk partition, I usually just use the fdisk option in format, the general outline for this procedure is the following:

  1. Run format
  2. Select the disk you just detached (c2t1d0)
  3. Start fdisk
  4. Delete the Solaris partition
  5. Create a new Solaris partition with a smaller size (say 50% of the disk, for example)
  6. Save and exit fdisk
  7. Start the partition tool
  8. Adjust the disk slices so that s0 contains the entire disk
  9. Label the disk
  10. Exit format.
Now, create a temporary root pool, I'll call it tpool
zpool create -f tpool c2t1d0s0

Copy the data from the root pool to the temporary pool with ZFS send & receive

zfs snapshot -r rpool@shrink
zfs send -vR rpool@shrink | zfs receive -vfd tpool

Now, before you can boot to the temporary pool, there's a couple setting you need to change so its identified as tpool properly.

First, change the boot sign of the temporary pool

rm /tpool/boot/grub/bootsign/pool_rpool
touch /tpool/boot/grub/bootsign/pool_tpool

Now, set the bootfs option on the pool

zfs set bootfs=tpool/ROOT/opensolaris-5 tpool

And finally, make sure to install GRUB on the disk

cd /boot/GRUB
/sbin/installgrub stage1 stage2 /dev/rdsk/c2t1d0s0 

Now, reboot. Make sure do do a full PROM boot instead of the newer Quick boot

reboot -p

At grub, edit the boot entry - by typing 'e' - for the BE you want to boot into, change references from rpool to tpool. In my case the boot commands ended up looking like

findroot (pool_tpool,0,a)
bootfs tpool/ROOT/opensolaris-5
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,console=text
module$ /platform/i86pc/$ISADIR/boot_archive

Then type 'b' to boot. When the system has completed booting, you should be booted off the temporary pool. You can verify this by

df -h /

Destroy the real root pool, resize it, and boot back to it.

First off, before you can destroy the root pool, the system might have some active references into the pool, like for swap and dump, and maybe some datasets like /export, you need to deactivate these before destroying the rpool.

swap -d /dev/zvol/dsk/rpool/swap
dumpadm -d /dev/zvol/dsk/tpool/dump
zfs set mountpoint=none rpool/export
zfs set mountpoint=none rpool/export/home

Now you should be able to destroy the original rpool

zpool destroy rpool

If you get "busy" errors, try destroying the datasets which should identify why its busy, and deactivate them.

zfs destroy -r rpool
zpool destroy rpool

With the original root pool destroyed, you can now repartition the device in the original pool to the new size.

  1. Run format
  2. Select the disk you just detached (c2t2d0)
  3. Start fdisk
  4. Delete the Solaris partition
  5. Create a new Solaris partition with a smaller size (say 50% of the disk, for example)
  6. Save and exit fdisk
  7. Start the partition tool
  8. Adjust the disk slices so that s0 contains the entire disk
  9. Label the disk
  10. Exit format.

With the Solaris partition resized, recreate rpool

zfs create rpool c2t2d0s0

And reverse the process of migrating to the temporary pool by moving the data back to the new root pool

zfs send -vR tpool@shrink | zfs receive -vfd rpool

Once again, set the bootfs option on the pool

zfs set bootfs=rpool/ROOT/opensolaris-5 rpool

Reinstall GRUB

cd /boot/grub
/sbin/installgrub stage1 stage2 /dev/rdsk/c2t2d0s0 

And do a full PROM boot

reboot -p

Destroy the temporary root pool & remirror root pool.

At this point the rpool is resized smaller, but not mirrored. To mirror again destroy the temporary pool

zpool destroy tpool

Attach tpool device to the rpool

zpool attach rpool c2t2d0s0 c2t1d0s0

And finally install grub once again for good measure

/sbin/installgrub stage1 stage2 /dev/rdsk/c2t1d0s0

Voila.

Saturday May 16, 2009

Paranoia and Java Cryptography

I started looking at rewriting a web app I wrote a while ago that does encryption in Java. This time trying to pay more attention to crossing the T's and dotting the I's, and I'm realizing encryption in Java is tricky.

The JCE itself makes things pretty easy, however where it gets a little tricky is how to deal with discarded information - passphrases, private keys, unencrypted data - when you are done with it. Unlike C code, for example, where you can simply zero out portions of memory when you are done with it, Java has very little available to do the same thing.

And it can get worse, when you are dealing with Strings, for example, they can get stuffed into a master String table - via String.intern(), squirreled away in the far reaches of the JDK, with no ability to destroy them. Thus, at the very least, never convert any sensitive information to a String, if you want any hope of ever clearing it out of memory.

The JCE designers seem to have been pretty keen to this, and provide interfaces that never use Strings as any parameters that could be sensitive. A good foundation, however dealing sensitive data elsewhere can be tricky.

A simple example, back to the web app, is dealing with parameters in a servlet. Say you have a servlet which takes a passphrase and a chunk of text and encodes it. Normally you retrieve parameters in a servlet with HttpServletRequest.getParameter(String name). The problem: getParameter() returns a String, and thus could get stuffed into the JVMs String table for ever more.

Although it would be unlikely for someone to gain access to the String table, and then figure out which String actually represented a passphrase, the paranoid side in me makes me a little nervous to allow that sensitive information to exist out of my control.

Even worse, the unencrypted data that you want to encrypt. Again, if you retrieve it with getParameter(), you won't be able to fully discard the unencrypted data until you restart the JVM.

I haven't quite figured out a plan for how paranoid I want to be. One thought would be to instantiate a new ClassLoader to manipulate sensitive data. Presumably when you get rid of the ClassLoader all the classes (including the String table) would at least be eligible for garbage collection.

Thursday Apr 16, 2009

My Own Private Cloud-aho - The Details

In my previous post, I went over some of the goals we are trying to attain though a data-center architecture based on virtualization with xVM Xen. Essentially looking for ways to work smarter, faster and be more flexible. In this entry, I will attempt to go into the details of the infrastructure we built.

Network Layout

A good place to start is the general network layout. The first diagram shows what we have done. Nothing too fancy. At the heart of it is two independent physical networks. We do this to increase availability. Two load balancers are at the top of the stack, which are cross connected to switches on each physical network, and do something like talk VRRP or some other crazy protocol to each other to decide who is the master and who is on standby.

There are two logical networks that run throughout the infrastructure, the Public network and the Backend network. If the names aren't clear enough, the Public network is meant to serve traffic between us and our customers (via the internet.) The Backend network is for various favors of server-to-server communication within the data-center.

Each physical host, therefore, has two network connections, one to each logical network. The notable exception is the Sun Storage Unified Storage 7410 Cluster, which is connected only to the Backend network, and has connections to both physical networks. The cluster is configured in an active-passive mode, which means that only one of the two 7410 head units is doing the file serving at any one time. In order to ensure that it can still serve stuff to the Even segment even though the Odd segment is down, we need to give it a presence on each physical segment. I'll discuss more about what specifically we're doing with the Unified Storage Cluster later on.

Install Servers

Nothing really out of the ordinary here. We've set up a pair of PXE Boot/Install servers to handle mainly installing additional Hypervisors. The Install servers themselves are running Solaris Nevada 105, I don't recall what that translates to in terms of SXCE releases. Nothing special about that version other than what happened to be the more recent drop we could get when we started assembling everything.

These are more or less independent of each other. Most of the installs work against the primary install server. The only real reason to switch the other server is if the primary is down or if its network segment is down. But as you'll see, we're really not doing many installs at all.

DHCP Servers

The DHCP servers actually run on the same physical hosts as the install servers. They are using the DHCP daemons that come as part of Solaris/Nevada. They do play a key part in the whole PXE Boot process, so they have had additional customizationsmade to allow for that â\\u20ac\\u201c there are a few good articles out there describing the additional macros to accomplish PXE Boot.

The one other interesting thing to note is that the pair of DHCP servers are configured as a cluster. The DHCP daemon doesn't really have a notion of being in a cluster, however, it can be configured to use a shared datastore, which each daemon can read and update. In our case, this is the first use of the Unified Storage Cluster. The DHCP servers are configured to use an NFS share on the cluster. The caveat here, is that you must configure the daemons to use the SUNWfiles datasource. Neither SUNWbinfiles nor SUNWnisplus will work.

A couple of other things to be aware of when clustering DHCP is to make sure you set the OWNER_IP parameter in each daemon's local dhcpsvc.conf file to a comma separated list of all the IP addresses of all the interfaces that will serve DHCP requests on all servers. Also make sure you set the RESCAN_INTERVAL to a reasonable value for you, in our case we just set it to 1 minute. Both of these values can be updated with the dhcpconfig -P command.

dhcpconfig -P OWNER_IP=192.168.78.14,192.168.78.16,192.168.76.25,192.168.76.25 
dhcpconfig -P RESCAN_INTERVAL=1 

The Hypervisors

First things first. The Hypervisors are all well loaded Sun Fire X4150 servers. They have been installed with Solaris/Nevada 105, again just because that was current at the time. We chose Nevada instead of OpenSolaris primarily because it offers a better unattended/headless install environment. They are configured with ZFS root on local hard drives, and the ZFS mirroring has been set up to increase uptime.

Not much in the way of customizations to Dom-0 or the Hypervisor. Obviously in Dom-0 we disable as many unnecessary services as possible. Since these are servers, we shut down all of the Desktop related services like the X server and the like.

We also limit Dom-0 memory to only 2G using the dom0_mem parameter to the hypervisor. This might be a little aggressive since ZFS is memory hungry, but we want to try to keep as much memory available for the guest domains as possible, and we haven't seen a problem with this yet.

We also set the hypervisor console to com1, in case we need to break into the console for any sort of debugging (knock on wood we don't have to do that.)

Both these parameters are set from the GRUB boot commands

kernel$ /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1 dom0_mem=2G

We also use the vanity device naming capabilities of Solaris - via dladm - to give all the Public and Backend interfaces the same names. So, although today we're using Sun Fire X4150 servers which all have Intel Pro/1000 network interface controllers in them, in the future, we can move to different platforms and still maintain a consistent device naming convention. This is actually pretty crucial for Xen Live Migration to work. Xen needs the same network interfaces to be available on the source and destination of the Live Migration. Without this, the migration will not succeed.

dladm rename-link e1000g0 fe0
dladm rename-link e1000g1 be0

Finally, while we're talking about Live Migration, its something we need to enable in xend. A couple of simple SMF changes handle that.

svccfg -s xend setprop config/xend-relocation-address = 192.168.78.21
svccfg -s xend setprop config/xend-relocation-hosts-allow = astring: \\
	\\"\^localhost$\^192\\.168\\.78\\.[0-9]\*$\\"

For what its worth, we currently have 22 Hypervisors in the cloud, clearly not yet a huge deployment, yet.

Unified Storage Cluster

These units sort of fell in our lap at just the right time. Although we could have approached the level of availability they offer with some solution built on top of Solaris & SunCluster, the ease of installation, configuration and maintenance the the Unified Storage Cluster affords should help keep the infrastructure simple and straight forward to maintain.

As I mentioned earlier, these units are configured in an active-passive cluster mode, and have a presence on each physical network segment. Configuring them as a cluster is amazingly simple, as any appliance should be. Once the CLUSTRON interface is connected between them, the initial boot to configure the first head node automatically detects the presence of the second head node, and prompts you to configure them as a cluster.

As far as how we are using them, well, they are used in a few capacities. First, as I mentioned earlier, they are used to house the DHCP servers' shared datastore. We also use them to house various administrative tools and bits.

But that's not the main use. Primarily we are using them as the virtual disk in which all the guest operating systems (virtual servers) are installed. This is done by creating LUNs in the Unified Storage and exposing them as iSCSI targets, which are then attached to the Dom-0's and made available to the Hypervisors.

This is another critical piece that makes some of the cool features of xVM Xen, like Live Migration, possible. Live Migration is the process of moving a running virtual server from one physical host to another. Did you read that, a running virtual server! For this to happen, the virtual disk must be available on the target physical host. Using iSCSI makes this a snap since all you need to do is attach the LUN to the target physical host and you are done. If you think of the alternative with local storage, you would have to somehow transfer the bits from the local storage on the source physical host to the target, while the virtual server is still running, which among other things is nearly impossible to do in a real-time manner.

Ok, so one other use of the Unified Storage is for shared storage for applications running inside the virtual servers. Building horizontally scalable, redundant applications sometimes requires the use of storage that can be accessed by all nodes in the application cluster. We provide this on the Unified Storage cluster with NFS.

Putting it All Together

That's a summary of all the pieces, now how does it all fit together. Here's a diagram that shows all the interactions.

Lets look at the organization on the Unified Storage Cluster first. We use different projects within the Unified Storage for keep things manageable. First thing we've done is build a set of master images of various guest operating systems like OpenSolaris - 08.11 & Dev, Nevada, and so forth. Those images are kept in a Masters project. These images are pre-installed instances of guest operating systems that reside on (iSCSI) LUNs in the Unified Storage. A snapshot of the image is taken after any revision we make to the image (note that snapshots should only be taken when the guest has been shutdown.)

When we're ready to spin up a new virtual machine, we select the current snapshot of the O/S image we want, and clone it as a new LUN in a separate Unified Storage project. The great thing here, is that the clones initially take up zero additional space, and only start to use their own space when the operating system changes anything on the virtual disk, or an application is installed, and the like. Quite a savings in disk space. We have some scripts that interact with the Unified Storage to do this.

These commands will create a new project called appl-1, and then clone a master image to the project as vm-1.

domu-project appl-1
domu-clone masters/osol-0811@version-01 appl-1/vm-1

After the image is cloned, we instruct the Unified Storage to export the cloned LUN as an iSCSI target. The target is then attached to a Dom-0 and Hypervisor. Currently we've been putting about 4-6 guests on each Hypervisor before moving onto a new one for additional virtual servers.

From here, the usual Xen commands are used to define a domain, that is xm create or virsh create commands. As part of the domain configuration, we specify the attached iSCSI LUN for the virtual disk for the domain. Again, this is simplified though scripting by using a pre-defined XML template for domain creation.

This example shows the creation of a paravirtualized guest - pv, which has both a Public and Backend interface - fe-be, on the Odd network segment - odd, with 1G of memory - 1024, and 1 CPU - 1.

domu-init pv fe-be odd 1024 1 appl-1/vm-1

I should point out that we're not really pre-attaching the iSCSI LUN to Dom-0, but instead using one of the enhancements of xVM Xen which will do this for us. We simply specify the iSCSI Qualified Name (IQN) and the IP address of the target as the virtual disk in the domain configuration, and let xVM Xen deal with attaching it when the domain starts up.

Now, we're almost there, but we need to assign IP addresses to the virtual server. We use DHCP for this. More specifically we bind specific IP addresses to specific MAC addresses to ensure that a virtual server always uses its assigned address(es). Xen creates a sort of synthetic MAC address for each interface it configures and persists the address. We can grab these addresses before the domain starts to update DHCP.

This is a good time to point out that all the master images of guest operating systems have been configured as DHCP clients. This really simplifies the entire process, since there is no need to do any post-configuration of the cloned images to give them the correct network resources. With DHCP, it just happens. You can also see why it is critical to have a highly available DHCP cluster, since so much relies on it. Once again, this is scripted.

This example shows the assignment of a Public and Backend interface to the new guest

domu-assign-ip appl-1:vm-1 fe0 vm-host1 192.168.76.71
domu-assign-ip appl-2:vm-1 be0 vm-host1-be 192.168.78.71

And that's it. At this point, we're ready to fire up the guest.

virsh start appl-1:vm-1
virsh console appl-1:vm-1

Wrap Up

That's about it for the basic details of what we've done. Its all been working incredibly well, especially considering we're running about ¾ development code everywhere.

Before I forget, I would really like to thank the xVM Xen team for their support while we've been setting things up and tinkering around, they have all been vary helpful and responsive to my questions on xen-discuss@opensolaris.org as well as private threads. Mark Johnson deserves a special mention since I glommed onto him the most.

Up next, a summary of how well we're doing on the previously outlined goals.

Tuesday Apr 14, 2009

My Own Private Cloud-aho - The Goals

For quite some time I have been informally tracking the progress of the xVM Xen team, playing with code drops on perhaps a quarterly basis, and doing benchmarking to determine if things are stable enough and efficient enough to run in a production data-center.

Probably about 6 months ago, performance looked pretty decent, and I implemented a tiny proof-of-concept “cloud” using some friends as guinea pigs. Its been running great for us. In fact, I just checked my virtual server and its been running for 245 days so far. Pretty stable indeed.

Recently I have had the opportunity to build a small cloud in one of our data-centers, expanding on the concepts from the POC into a more full-blown and mature solution, suitable for a near-production environment. Its hard to really say its production quality when so many moving parts are still pre-release.

I'd like to share what I have done with xVM Xen, in hopes that it may be useful for others who wish do do the same. But before I get into the details, let me describe some of the goals that we hope to achieve with this new virtual environment.

The Goals

  • Replace the current virtual hosting environment

    I hate to knock Solaris Containers since its a Sun technology, but we've struggled somewhat in our usage. The problem isn't really with the technology, but rather a scheduling problem when it comes to planned and unplanned maintenance. In our environment we typically load up 4-8 application zones within a single physical host. These applications are generally maintained by separate teams. When we want to do something like patch the system, the operations team needs to schedule downtime with all the application teams for the same time so patching can occur, and we've found that the operations team can spin quite a few cycles lining all the ducks in a row.

    The model with xVM Xen is quite a bit different. Since each domain, or virtual server, is more or less independent of the others, the operations team need only schedule down time with a single team at a time. So although they are doing more scheduling and patching, overall we hope that we can reduce the amount of real time they are doing entirely.

  • Simple, Rapid (Virtual) Server Creation

    The operations team has things dialed in pretty well with JASS, Jumpstart, and so on, for installing physical hosts, but the creation of virtual servers isn't as streamlined. The hope is that though the use of newer technologies available with ZFS – snapshots and cloning for one – as well as with a network storage infrastructure -and iSCSI, we can really streamline the process so that we can spin up virtual servers within minutes. The goal is to make the gating factor be how fast the system administrator can type.

  • Faster & Error Free Deployments

    Although most of the applications behind sun.com don't require huge clusters of servers, we do have a few that span perhaps a dozen or more physical hosts. The problem is that typically the process of deploying a new version the application requires the same set of repeated steps for each instance of the application, introducing the fat-finger problem. What if, using those same ZFS and iSCSI technologies, we can install once, then clone the application for the other instances. As long as that initial install is done correctly, it can greatly reduce the possibility of errors when replicating the changes across the cluster.

  • Easier Horizontal Expansion

    Occasionally, during launches or even worse, DOS attacks, applications can get hit their capacity which result in reduced quality of service for everyone. It those cases, it would be great if we could instantly increase our capacity. Are there technologies that we could employ to do this easily? We think there are.

  • Painless Migration to Newer, Bigger, Faster Hardware

    Although we've tried to employ some best practices which attempt to separate the O/S and the application on different areas of the filesystem(s), it still isn't the easiest exercise to upgrade an application to a new chunk of hardware. Essentially it becomes another case where the application team has to spend some cycles installing the service on the new hardware, test, verify, yadda, yadda, yadda.

    We think that the live migration capabilities of Xen have great potential here. Since the application would be installed in a virtual server, the process of upgrading simply becomes a push of the running application from one physical host to another. And, this could even be something the operations team does all by itself, unbeknownst to the application team at all!

  • Better Hardware Utilization

    A while ago I gave a talk with the xVM Xen team about what we had done. I don't think I really explained this one correctly, because their initial comment was something about Solaris Zones providing the most efficient method of squeezing every ounce out of a physical host.

    That's not really what this is about. Many of the physical hosts are really incredibly under utilized, perhaps peaking out at somewhere near 30% of sustained CPU used even at the 95th percentile. With hundreds of hosts running that way, we're really just wasting power, cooling and space, when we don't need to.

    We're hoping that with the virtualization capabilities with xVM Xen provides, we can make the practice of doubling or tripling up applications on a physical host more common, increasing the sustained performance closer to somewhere between 60% to 80% and lowering our datacenter footprint overall. Where an application begins to run hotter, monitoring would help us decide to move it, via Xen live migration, to a less used, larger, or private physical host.

  • More Resiliency

    What we are looking for here is for ways to be able to recover better from catastrophic failures. The server is on fire, how do we get the application off of it and up and running quickly on another physical host? How do we reduce the need for someone hands on to physically fix a problem on a piece of hardware. Again, we're hoping virtualization and other technologies will be helpful here.


I probably forgot a few goals but in general these are the bigger problems which we hope to solve with a more virtualized data-center.

In my next post, I'll describe the infrastructure we built in detail.

Thursday Oct 02, 2008

Changing WS 7 Admin Server Certificate

This is probably not supported and the WS team might slap me, but... Occasionally I install an instance of WS 7 on a host, and later rename the host. Usually because the server gets pushed to our production data-center, and thus, gets a new name.

When this happens, The subject name in the admin server certificate becomes invalid, because it essentially should be the host-name, and when the host-name you access the server doesn't match what is presented in the certificate, the web client (Firefox for example) tells you and asks for you to accept this.

Its a minor annoyance, but it bugs me none-the-less. So I poked around and figured out how to change it. Its pretty simple really, just use the certutil command-line tool included with WS 7. Specifically, here's what I did.

Go to the Admin Server's config folder

cd $WS_HOME/admin-server/config

Add the Admin Server's bin folder to your path

setenv PATH=$WS_HOME/bin:$PATH

Delete the old certificate, named Admin-Server-Cert

certutil -L -d . -n Admin-Server-Cert

Create the new certificate, specifying the new hostname in the -s parameter

certutil -S -d . -n Admin-Server-Cert -t u,u,u -s "CN=some.host.com" -c Admin-CA-Cert -v 120

Last, you will probably need to change any references to the old host-name to the new name in server.xml

Tuesday Sep 09, 2008

Running VNC over SSH on Windows

Its been a long time coming but, in keeping with my other write ups on how to run VNC over SSH on Solaris and OS X, I've finally figured out a somewhat reasonable method on Windows.

I had been using STunnel when on Windows, but that has a few draw backs, mainly that it is a less dynamic option because you have to edit a configuration file to set up new tunnels. It can't be done on the fly very easily. Plus, then there is the maintenance of the config file.

So I worked out a method similar to on Solaris and OS X with SSH Port forwarding. Only in this case, using the plink.exe utility that comes with PuTTY. First, the script. For someone who actually knows VBScript and Windows Script Host, this is probably pretty trivial, but for me, one who's tried to stay clear of Windows development, it took a bit of hacking:

Dim WshShell
Set WshShell = CreateObject("WScript.Shell")

user = "mock"

result = InputBox("VNC Server to Connect to:", "VNC Server", "")

index = Instr(result, ":")

If index > 0 Then
	host = Left(result, index - 1)
	port = Right(result, Len(result) - index)

	vport1 = CStr(5900 + port)
	vport2 = CStr(5800 + port)

	cmd = "plink " + host + " -batch -l " + user + " -L " + vport1 + ":localhost:" + vport1 + " -L " + vport2 + ":localhost:" + vport2 + " sleep 15"

	WshShell.Run cmd, 0, false

	WScript.Sleep 2000

	WshShell.Exec("C:\\Program Files\\TightVNC\\vncviewer.exe localhost:" + port)
End If

Make sure to save the file with a .vbs extension. I call mine vncv.vbs.

Also, the script assumes the use of TightVNC. It'll probably work with other clients, but you'd have to give that a try.

So, how do you use it. As with the method on the other Operating Systems, SSH Public Key Authentication is used. On Windows, with PuTTY, this means firing up pagent.exe and loading your keys into the agent. I'll leave specifics to the reader, I think I've mentioned it in other entries though.

After that, simply double click on the script, and a prompt will show up asking for the host:port of the VNC server to connect to. Give it a couple of seconds to make the necessary connections and boom, you should be presented with a dialog to enter the password of the VNC server.

If anyone finds this interesting, and would like to add some enhancements, one thing I would like to have is a connection history. So I don't have to type as much.

Monday Sep 08, 2008

Hacking Lightning

I've been using the Lightning calendaring extension for Mozilla Thunderbird for the last couple of months for basic calendaring and it works pretty awesome.

After Rama hassled me about keeping my task list on my whiteboard at work, I decided to try out the Task lists functionality in Lightning. Its pretty light weight, easy to add a task, almost exactly what I need.

However, one thing that bugged was the predefined views. There's about four or five including a "Not Started" view and a "Completed" view, but none for what I really wanted, a "Not Completed" view. So I decided to add one.

What I really wanted was to add another view, but even though I appeared to tweak all the files necessary, it didn't work. So in the end I just changed the "Overdue" view to "Overview & Open". And it was really easy one I found my way around.

The tweak involves modifying files in two jars; calendar.jar & calendar-en-US.jar both located in ~/.thunderbird/(profile)/extensions/(uuid)/chrome.

In calendar-en-US.jar I wanted to change the label of the button from "Overdue" to "Overview & Open". This is done by unzipping the file, editing locale/en-US/calendar/calendar.dtd. and rezipping the file, changing the line

<!ENTITY calendar.task.filter.overdue.label         "Overdue">

to

<!ENTITY calendar.task.filter.overdue.label         "Overdue & Open">

Now, in the second jar calendar.jar, I needed to change the logic for deciding what was "overdue". This logic is in content/calendar/calendar-task-view.js The original logic is in the lines

        overdue: function filterOverdue(item) {
          // in case the item has no due date
          // it can't be overdue by definition
          if (item.dueDate == null) {
              return false;
          }
          return (percentCompleted(item) < 100) &&
                 !(item.dueDate.compare(now()) > 0);
        },

Which you can see they make a specific provision for excluding things with no dueDate. Many of my tasks do not have a dueDate and so I want to see those too. So I changed the logic to

        overdue: function filterOverdue(item) {
	return (percentCompleted(item) < 100 && 
		 (item.dueDate == null || !(item.dueDate.compare(now()) > 0)));
        },

Once I repacked the jars, voila, I now have the behavior I desired.

Tuesday Sep 02, 2008

Converting Oracle Dates to UNIX Epoch Dates

Cleaning off my whiteboard, and I want to write these down somewhere before I erase them...

To convert from a UNIX date to Oracle date

TO_DATE('1970-01-01', 'YYYY-MM-DD') + UNIX_date_in_millis / 86400000 = Oracle_date

And to convert the other way

Oracle_date - TO_DATE('1970-01-01', 'YYYY-MM-DD') \* 86400000 = UNIX_date_in_millis

I don't remember why I needed this so long ago.

Tuesday May 06, 2008

My Partition Resizing Exercise

The other day I needed to install a non-Solaris operating system onto my Ultra 40 M2. But I still wanted Solaris and/or Nevada as my primary operating system. Even though there are a bunch of virtualization technologies out there including the newly aquired VirtualBox, I wanted to run the operating system on the "iron" as they say.

The only problem, I had allocated 100% of all my disks to Solaris. I needed to decrease the fdisk partition on a couple of disks to make space to install other operating systems on. My disk layout was roughly

c1t0d0s0  - boot environment #1 root (ufs)
c1t0d0s1  - boot environment #1 swap

c1t1d0s0  - boot environment #2 root (ufs)
c1t1d0s1  - boot environment #2 swap

And then a ZFS pool

        NAME         STATE     READ WRITE CKSUM
        storage      ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c1t2d0   ONLINE       0     0     0
            c1t3d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c1t0d0s7 ONLINE       0     0     0
            c1t1d0s7 ONLINE       0     0     0

My thought was that since c1t0d0 and c1t1d0 were only partially allocated to the zpool (and the boot disks), that I would target those for repartitioning. Which meant, first step was to rebuild the zpool without them. I didn't really need them for more space in the zpool anyways.

First step, to break the mirror, on the partial space. This would allow me to make a "backup" of the pool while I rebuild it.

zfs detach c1t1d0s7
zpool create dataz c1t1d0s7
zfs snapshot -r storage@migration
zfs send -R storage@migration > /dataz/storage.zfs

Now, destroy and recreate the main pool without the partial devices, and restore the data

zpool destroy storage
zpool create storage mirror c1t2d0 c1t3d0
zfs receive -Fd storage < /dataz/storage.zfs

Easy enough. My pool no longer has the partial disks

          mirror    ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0

Now onto resizing the target disks. The goal was to do this without doing a reinstall at all. I have been using Live Upgrade for a while to continually keep my system up to date with Nevada/SXDE releases, so I figured I could use Live Upgrade while resizing the fdisk partitions to accomplish this.

Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
sol-nv-87                  yes      yes    yes       no     -         
sol-nv-85                  yes      no     no        yes    -         

So, first disk. First step remove the LU Boot Environment

ludelete sol-nv-85

Now, since nothing is using the disk, format and fdisk aren't going to complain. So, first thing, use fdisk to reduce the size of the Solaris partition. I just reduced it to 90% of the disk

fdisk c1t1d0p0

             Total disk size is 36472 cylinders
             Cylinder size is 16065 (512 byte) blocks

                                               Cylinders
      Partition   Status    Type          Start   End   Length    %
      =========   ======    ============  =====   ===   ======   ===
          1                 Solaris           1  32824    32824     90

And then recreated the Solaris layout with format

Current partition table (original):
Total disk cylinders available: 32822 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm     132 -  1437       10.00GB    (1306/0/0)   20980890
  1       swap    wu       1 -   131        1.00GB    (131/0/0)     2104515
  2     backup    wu       0 - 32821      251.43GB    (32822/0/0) 527285430
  3 unassigned    wu       0                0         (0/0/0)             0
  4 unassigned    wu       0                0         (0/0/0)             0
  5 unassigned    wu       0                0         (0/0/0)             0
  6 unassigned    wu       0                0         (0/0/0)             0
  7       home    wm    1438 - 32821      240.41GB    (31384/0/0) 504183960
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wu       0                0         (0/0/0)             0

And finally, recreate the Boot Environment. I just made the new Boot Environment a copy of the current Boot Envionment, which happens to be Nevada/SXDE Build 87

lucreate -m /:/dev/dsk/c1t1d0s0:ufs -m -:/dev/dsk/c1t1d0s1:swap -n sol-nv-87-copy

And activate it and boot into it

luactivate sol-nv-87-copy
init 6

All pretty straight forward so far. Now here comes what ends up being the sticky part. You would think that since I was booted into the new Boot Environment, that I could delete the old Boot Environment, and do to the same thing, but...

ludelete sol-nv-87
The boot environment  contains the GRUB menu.
Attempting to relocate the GRUB menu.
ERROR: No suitable candidate slice for GRUB menu on boot disk: 
INFORMATION: You will need to create a new Live Upgrade boot environment on the boot disk to find a new candidate for the GRUB menu.
ERROR: Cannot relocate the GRUB menu in boot environment .
ERROR: Cannot delete boot environment .
Unable to delete boot environment.

That's what I was afraid of. The GRUB boot menu is sitting on the disk still. I searched around but could not find any way to move it. I found Slava Leanovich's blog entry which included instructions for moving the GRUB menu, and even though I tried that, ludelete still complained.

Well, gulp, here goes nothing. I decided to try to trick Solaris and not tell it I messed with the Boot Environment. First I made a ufsdump of root on that disk

ufsdump 0f /extra/holding/root.dump /dev/dsk/c1t0d0s0

Now with everything backed up from the disk, a big leap of faith. I went into fdisk and resized the Solaris parition as just before.

And then into format to recreate the disk layout. Of course this time I get the warnings

format c1t0d0
selecting c1t0d0
[disk formatted]
/dev/dsk/c1t0d0s0 is in use for live upgrade /. Please see ludelete(1M).
/dev/dsk/c1t0d0s1 is in use for live upgrade -. Please see ludelete(1M).

Yeah, You might think so Solaris. I move along and setup the layout.

Then, ufsrestore the data

newfs /dev/rdsk/c1t0d0s0
mount /dev/rdsk/c1t0d0s0 /mnt
cd /mnt
ufsrestore rf /extra/holding/root.dump

Sensing that this might not be enough, I decide to do a comple of things. First, reinstall GRUB manually, as described in Slava's entry

installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0
stage1 written to partition 2 sector 0
stage2 written to partition 2, 235 sectors starting at 50

And then, just for grins, update this Boot Environment to the New Boot Environment I created just a little while earlier. I figured this should ensure that anything possible that I missed with ufsdump would be restored correctly

lumake -n sol-nv-87

Well, lumake returned success at least. Here goes nothing. Lets activate the Boot Environment and cross our fingers

luactivate sol-nv-87
init 6

Tah-Dah! Yet another reason why I love Solaris. You flex it in weird ways and it responds logically.

Friday Mar 21, 2008

Playing around with Apache HttpComponents

The other week while I was "stuck" at home watching our little baby boy, I spent a little time ASF's HttpComponents. I have several HTTP utilities but so far I have just been using the somewhat limited java.net.\* classes.

As a goal of the investigation, I decided to write a multi-threaded web crawler, to some level of completeness.

Here's an overview of the heart of the crawler.

After looking around at the HttpComponents examples, it's pretty clear that the main entry point is the org.apache.http.client.HttpClient interface. This interface represents an "HTTP Client". Analogous to say the heart of a Browser like Firefox. It mainly provides methods to allow you to execute HTTP Requests. Various subclasses exist, the main one of interest being DefaultHttpClient which has methods for setting up all the typical http goodies like Cookie stores, Authentication methods, and Connection managers.

The simplest instantiation to take the defaults is something like

HttpClient httpClient = new DefaultHttpClient();

But this isn't good enough for me because it creates a client that has only a single threaded Connection Manager. This will not work for my goal. A little bit more code will fix that and create an HttpClient with a multi-thread save Connection Manager.

HttpParams params = new BasicHttpParams();
HttpConnectionManagerParams.setMaxTotalConnections(params, 100);
HttpConnectionParams.setConnectionTimeout(params, 20 \* 1000);
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);

// Create and initialize scheme registry 
SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(new Scheme("http", PlainSocketFactory.getSocketFactory(), 80));
schemeRegistry.register(new Scheme("https", SSLSocketFactory.getSocketFactory(), 443));

// Create an HttpClient with the ThreadSafeClientConnManager.
// This connection manager must be used if more than one thread will
// be using the HttpClient.
ClientConnectionManager cm = new ThreadSafeClientConnManager(params, schemeRegistry);

HttpClient httpClient = new DefaultHttpClient(cm, params);

What's the Connection Manager about? Well it provides more advanced connection management features, such as connection pools for things like keep alive connections.

Ok, so now that the HttpClient is set up, I can execute HTTP requests in various ways, but one of the easiest is

HttpGet httpget = new HttpGet(url);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();

With the HttpResponse and HttpEntity objects I can interrogate the status of the response, find the Content-Type, and get the content in the response. I put this all in a class I called Downloader the salient pieces are

public class Downloader implements Runnable {

    HttpClient httpClient;
    URL url;


    public Downloader(HttpClient client, URL url) {
        this.httpClient = client;
        this.url = url;
    }


    public void run() {

        HttpGet httpget = null;
        try {
            httpget = new HttpGet(url.toString());
        } catch (URISyntaxException ex) {
            // record error
            return;
        }

        try {
            HttpResponse response = context.getHttpClient().execute(httpget);

            // Get hold of the response entity
            HttpEntity entity = response.getEntity();

            // If the response does not enclose an entity, there is no need
            // to bother about connection release
            if (entity != null) {
                if (entity.getContentType().getValue().startsWith("text/html")) {
                    // This is an HTML file, so process it for links.
                    try {
                        BufferedReader reader = new BufferedReader(
                            new InputStreamReader(entity.getContent()));

                        // [ Read the content from the InputStream ]

                        // [ Code using tagsoup to extract links from the HTML ]

                        // [ Submit links for crawling ]

                    } catch (RuntimeException ex) {

                        // In case of an unexpected exception you may want to abort
                        // the HTTP request in order to shut down the underlying 
                        // connection and release it back to the connection manager.
                        httpget.abort();
                        throw ex;

                    } finally {

                        // Closing the input stream will trigger connection release
                        reader.close();
                    }

                } else {
                    // Reponse is not HTML, read it from the InputStream - entity.getContent() -
                    // And do something with it like save it.
                }

                }
            //
            }
        } catch (Exception ex) {
        }
    }
}

My actual code is structured a little differently because I created the notion of filters to constrain the URLs that are processed - constraints like same host, same domain, exclude paths, etc.

The main thing to note here is that you need to release the underlying connection when you are done with it. This is done by closing the InputStream in the entity - HttpEntity.getContent().close() - essentially. You must be careful that you do this for every condition or you will start leaking connections.

The other curious thing you may have noticed is that I had the Downloader class implement the Runnable interface. Why is this? Well this has to do with the the multi-threading I wanted to do. It implements Runnable so I can schedule them to execute in a pool of threads.

So how did I manage the thread pool? Why with the concurrent classes in java.util.concurrent - the ExecutorService specifically. The ExecutorService does all the hard work of managing the thread pool for you, all you really need to do is keep giving it work to do. To instantiate an ExecutorService I used

ExecutorService workerMgr = java.util.concurrent.Executors.newFixedThreadPool(threads);

And from there, giving more work to the service is as easy as

Downloader downloader = new Downloader(httpClient, url);
workerMgr.execute(downloader);

After I'm done adding items, I simply wait for all the work to finish:

workerMgr.shutdown();

Thursday Mar 06, 2008

some xVM notes before I forget

Its been about 6 months since I played around with xVM (then called just Xen) so I figured I'd refresh myself with Nevada Build 83. Previously I had been creating domains with xm create and pretty terse python-syntax(?) config files. Things are much easier now with virt-install

A few notes on commands I used to create domains.

Installing Paravirtualized Fedora Core 7

zfs create -V 8G storage/xvm-fc7-instance-1
lofiadm -a /export/xen/install/iso/F-7-x86_64-DVD.iso
mount -F hsfs /dev/lofi/2 /export/xen/install/dvd/fc7-64
share /export/xen/install/dvd/fc7-64

virt-install --name fc7 --ram 1024 --paravirt --file /dev/zvol/dsk/storage/xvm-fc7-instance-1 \\
  --location nfs:outpost.eng.sun.com:/export/xen/install/dvd/fc7-64 -x console=hvc0

Installing Paravirtualized Fedora Core 8

zfs create -V 8G storage/xvm-fc7-instance-1
virt-install --name fc8 --ram 1024 --paravirt --file /dev/zvol/dsk/storage/xvm-fc8-instance-1 \\
  --l http://fedora.secsup.org/linux/releases/8/Fedora/x86_64/os -x console=hvc0

Duplicate a domain

zfs snapshot storage/xvm-fc8-instance-1@master
zfs clone storage/xvm-fc8-instance-1@master storage/xvm-fc8-instance-2

virsh dumpxml fc8-1 | grep -v uuid | sed -e 's/fc8-1/fc8-2/g' > /tmp/fc8-2
virsh define /tmp/fc8-2

Install HVM Solaris Install to NFS image

virt-install -n nv-81-1 --hvm --vnc --vncport 5901 \\
  -f /net/derelict/extra/holding/xen-root-nv-81-1.img -r 1024 \\
  -c /export/xen/install/iso/sol-nv-bld81-x86-dvd.iso --noautoconsole

Saturday Feb 16, 2008

Hopping on the VBox/Indiana Bandwagon

Ok, so I decided to take a look at Indiana inside of VirtualBox like many others already have. The main thing people have been talking about is the lack of the pcn driver, which is needed for networking inside of VirtualBox.

I found Alan Burlison's Entry describing how to get a copy of pcn into Indiana from an SXDE release.

Only one problem for me, I'm at home and don't happen to have an SXDE ISO around, just a few systems with SXDE installed that I can get to.

My first idea, use a thumb drive to copy the necessary files into - Note, I'm running VirtualBox on top of XP. So I copied the files onto the thumb drive, but ran into problems making the drive accessible inside of the Indiana Guest.

Next, my eye caught the "Floppy Drive" options. I don't have a floppy drive in my system, you can build an image that can become a virtual floppy drive. So I hit up the ol' internet search to look for a Windows program to build a floppy image, and I found Build Floppy Image.

A quick look at the options and it would appear to fit the bill. So I copied pcn and pcn.conf into C:\\temp, and ran

bfi -t=144 -f=c:\\Img\\pcn.img c:\\temp

Done in a flash. I configured VirtualBox to use c:\\Img\\pcn.img as Indiana's floppy. Booted it, and Voila - The floppy and files were available copy per Alan's instructions.

Tuesday Nov 06, 2007

The Sys Admin Adventure Game

Every once in a while I ball things up pretty good on one of my systems it takes a while to figure out how to fix things, and in the process validates that I still have at least some system administration capabilities still.

Just yesterday I messed things up pretty good trying to upgrade to Solaris Express/Nevada 76. Its kind of funny cause I've never been one for those Zork adventure games but I usually enjoy a good debugging session. It would be interesting to see someone write the Zolaris adventure game - Ok that was bad. Anyways, here's what happened.

As I mentioned, I wanted to upgrade my desktop to Nevada 76, so I could take a look at Erwann's new Solaris Build of Compiz. I was currently running a hybrid build of Nevada 70 with the Xen/xVM bits in them. So I figured I would to a Live Upgrade to 76.

The Live Upgrade went pretty well, except one package did not install correctly, I think it was SUNWiiimr or SUNWiiimu. I initially blew that off, and booted into 76, so I could install Compiz and give it a spin. I had been waiting for better multihead support, and although its better, its still not quite right, anyways that's another story.

The problem came when I tried to run my regular desktop applications, specifically Thunderbird. Thunderbird was not stable on 76 for me. It would crash typically in less than a minute. So I looked back at the failed package, which is "Internet/Intranet Input Method Framework Package". Hmm, that's got to do with the desktop, maybe that is to blame.

So I attempted to pkgrm and pkgadd the package by hand. No dice. Fine, I'll just do a fresh install from some media. I burned 76 to a DVD and did the install. I wanted to make sure to keep the partitioning the same cause I had some ZFS partitions on there I wanted to preserve.

Everything went mostly fine, but I was installing on the non-boot drive in the workstation. The installer warned me that I would have to go into the BIOS and update the boot settings. Ok, that's easy enough. I made a mental note to do that when the install finished. And which I did after the install. Ok all looks good.

And now is where I hit the first problem. When the system boots, All I see is

Bad PBR Sig

Woah, pretty cool, its in red, for one. Must be extra important. So after a bit of searching the web, I find a posting somewhere that mentions it can be a couple of problems, one of which is that there is no bootloader on the disk, and mentions to use installboot. So I boot from the other disk into Failsafe mode. And try running it.

Error: installboot is obsolete. Use installgrub(1M)

Doh. Ok so I look at the manual page for installgrub and figure out the right thing to do. And attempt another boot. This time it boots GRUB only there is no GRUB menu. Simply a prompt.

Dang! So I take a peek at another system I got running Solaris 10 to look at its /boot/grub/menu.lst file and come up with the following to run at the GRUB prompt.

root (hd1,1,a)
kernel /platform/i86pc/multiboot
module /platform/i86pc/boot_archive
boot

I give it a try, and guess what? Well apparently multiboot is no longer supported it tells me. And suggests the following lines instead:

kernel$ /platform/i86pc/kernel/$ISADIR/unix
module$ /platform/i86pc/$ISADIR/boot_archive

Sweet. Just enter these and I'm off and running, I think. Bzzt. Not yet. I do get a little farther though. The kernel does start loading but then I get some message like "Couldn't find /devices". Wha?

Back to more web search. And I find some other random post saying the problem is because there is no bootpath in the bootenv.rc, and gives an example with a big device path name. So again I boot into Failsafe, and after I figure out the device path name, I update the bootenv.rc file and give it another go.

Hey look at that, its starting to fsck the /usr partition. Hey wait, I have no /usr I have only a root partition. Again back to Failsafe to see what's going on. This one stumped me for a little bit, but eventually figured it out. Essentially it was becuase there was no root entry in vfstab.

Finally fixed that up. And gave it one final reboot and Nevada finally booted into multiuser. Whew.

'Course that didn't fix the problem I was having with Thunderbird crashing, but it was kinda fun exercise anyways.

About

mock

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
Blogroll

No bookmarks in folder