Wednesday Mar 03, 2010

Installing OpenSolaris in Oracle VM for x86

Yeh, Virginia, you can install OpenSolaris as a guest in an Oracle VM for x86 environment!  Here is a little step-by-step guide on how to do it.  I am going to assume that you already have downloaded and installed Oracle VM Server.  Further, I am going to assume that you have already created an Oracle VM Manager system to manage your OVM farm.  All of that is more than I can tackle today.  But, to get you started, you can fetch the OVM am VM Manager bits from .  I found the simplest thing to do was to create an Oracle Enterprise Linux management system and install the VM Manager there.

Once you have the OVM environment established, you need to get some installation media to install the guests.  For OpenSolaris, here's the magic: Bootable AI.  Check out Alok's blog for more details on exactly what the Bootable AI project is about.  But in a nutshell, this make it so that you can install OpenSolaris as if you were using a network install server, but while you are booted from installation media.  This gets around the difficulty of trying to do an installation using a LiveCD in a tiny VNC window and the difficulty of trying to get a network, PXE-based installation working.  This is a quick and easy way to go.

Fetch the OpenSolaris AI iso rather than the regular LiveCD iso.  Install this into the VM Manager resource repository. (Remember, I assum you know how to do this, or can figure it out pretty easily.  I did.)

Now, create a VM just as you always would for Oracle Enterprise Linux, Solaris, Windows, or whatever.  Select Install from installation media, and use the iso that you just added to the repostiory.  When you specify what operating system this VM will run, select "Other" since it isn't one of the pre-defined choices.  Start the creation and away you go.

As you have already figured out from Alok's blog, this is only half of the story.  You still must create an AI manifest.  The manifest details which packages to install and from where, along with the details for the first user created, root password, etc.  Check out the Automated Installation Project page for details on this.  The docs are pretty good and the minimum manifest needed for bootable AI is pretty basic.  Alok talks about how to specify booting from the development repository.  That was the only change to the default manifest that I made.

Put this manifest somewhere accessible via http from the VM you want to create.  The VM you created is sitting, waiting for you to tell it where to fetch its manifest so it can boot.  You really don't want to keep it waiting much longer.

Connect to the VM using VNC. You can use the built-in VNC client in VM Manager or whatever VNC client you like best. I tend to use vncviewer because it seems to manage the screen resolutions better than the Java client. When the installer prompts you for the manifest, enter the URL for the manifest you just made. The installer will fetch it, validate it, and then go on with its usual installation using that manifest. This is so simple and so cool!

Installation proceeds like it would with an install server.  You can log in on the console of the system being installed and monitor is progress.  Then, when it's done, reboot and you are done.

One note:  I have run into difficulting with OpenSolaris b133 and this approach.  When I used the b133 iso, even though I never got an error, the resulting VM was not bootable.  (No, I haven't got around to filing a bug on this.  Was going to wait until b134.)  However, if I use the b131 iso and a manifest that referenced installing entire@0.5.11-0.133 things worked out just fine.  So, give that a shot.

Once you have created a VM that you like within Oracle VM, you can do all of the cool Oracle VM things - convert it to a template and stamp out lots of copies, move it from server to server, etc.  But that's for another day.  Or that's something to look for in Honglin Su's blog.

Thursday Feb 18, 2010

ATLOSUG February Slides Posted

Slides from the February meeting of the Atlanta OpenSolaris User Group are posted at . The topic for this last meeting was "Oracle RAC on Solaris Logical Domains - Part 1". 

Don't miss part 2 of this presentation on March 9.

Tuesday Feb 16, 2010

It is indeed possible to install Oracle 11.2 on OpenSolaris

Found this cool post while checking my Google updates today.  Had not read this blog before, but I think I will give it a look. 

It is indeed possible to install Oracle 11.2 on OpenSolaris

Wednesday Feb 03, 2010

Oracle Virtual Machine Manager and OpenSolaris

This is pretty cool!  With a whole new set of products to become familiar with, I am jumping into Oracle VM Server and Oracle VM Manager.  So far, I've got 

  • Oracle VM Server installed on a pair of X8420 blades and have them set up as an HA pool
  • Oracle VM Manager installed on a v20z as my management interface
  • Oracle Enterprise Linux VMs - both PVM and HVM, installed from media and from pre-built templates
  • Nevada b130 as an HVM guest
  • Cloned VMs, migrated VMs on the fly between the blades
  • And now, I am installing OpenSolaris b131 using Bootable AI in a VM
Looks to me like the future is bright with lots of cool new tools that we can combine to get even more out of the technologies that both Sun and Oracle have created.

Tuesday Jan 26, 2010

Unnatural Acts with AI

I'm pretty sure this is not what the AI team had in mind when they gave us bootable AI.  But in my quest to see what the oldest piece of gear I can run OpenSolaris, here's a fun one:

jack@opensolaris:~$ uname -a
SunOS opensolaris 5.11 snv_130 sun4u sparc SUNW,UltraAX-i2 Solaris
jack@opensolaris:~$ cat /etc/release
                      OpenSolaris Development snv_130 SPARC
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 18 December 2009
telnet> send brk
Type  'go' to resume
ok banner
Sun Fire V100 (UltraSPARC-IIe 500MHz), No Keyboard
OpenBoot 4.0, 1024 MB memory installed, Serial #51701117.
Ethernet address 0:3:ba:14:e5:7d, Host ID: 8314e57d.
ok go
jack@opensolaris:~$ prtdiag
System Configuration:  Sun Microsystems  sun4u Sun Fire V100 (UltraSPARC-IIe 500MHz)
System clock frequency: 100 MHz
Memory size: 1024 Megabytes

Pretty much took forever to install, but it works like a champ.  More news as it occurs!

Friday Jan 15, 2010

Bootable AI ISO is way cool

Alok Aggarwal posted, just before Christmas, a blog mentioning that the ISO images for the Auto Installer in OpenSolaris are now bootable.  Not just for x86 but also for SPARC.

This is huge!  While it does not provide a LiveCD desktop environment for SPARC, it does give us a way to easily install OpenSolaris on  SPARC gear.  Previously, it was necessary to set up an AI install server (running on an x86 platform since that was the only thing you could install natively) and use WAN Boot to install OpenSolaris on the SPARC boxes.  Well, that was a tough hurdle for some of us to get over.

Now, you can burn the AI ISO to a CD and boot it directly.  The default manifest on the disk will install a default system from the  release repository.   Or, better yet, build a simple AI manifest that changes the release repository to the dev repo and put it somewhere you can fetch via http.  When you boot up, you will be prompted for the URL of the manifest.  AI will fetch it and use it to install the system.

{2} ok boot cdrom - install prompt
Resetting ...

Sun Fire 480R, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.10.8, 16384 MB memory installed, Serial #57154911.
Ethernet address 0:3:ba:68:1d:5f, Host ID: 83681d5f.

Rebooting with command: boot cdrom - install prompt
Boot device: /pci@8,700000/ide@6/cdrom@0,0:f  File and args: - install prompt
SunOS Release 5.11 Version snv_130 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: opensolaris
Remounting root read/write
Probing for device nodes ...
Preparing automated install image for use
Done mounting automated install image
Configuring devices.
Enter the URL for the AI manifest [HTTP, default]: http://<my web server>/bootable.xml

See!  This is really easy and gives new life to really old gear.  In this case, the manifest is super simple, too.  I just grabbed the default manifest from an AI image and changed the repository and package to install.

$ pfexec lofiadm -a `pwd`/osol-dev-130-ai-x86.iso
$ pfexec mount -o ro -F hsfs /dev/lofi/1 /mnt
$ cp /mnt/auto_install/default.xml /etc/apache2/2.2/htdocs/bootable.xml

Edit this file and change

<main url="" publisher=""/>


<main url="" publisher=""/>

Or as a speedup, add the mirror to :

<main url="" publisher=""/>
<mirror url=""/>

And change

<pkg name="entire"/>


<pkg name="entire@0.5.11-0.130"/>

You can add a mirror site for the repo in this manifest.  Or you can list other packages that you want to be installed as the system is installed.  The docs for the AutoInstaller talk about how to create and modify a manifest.

Some caveats that I found:  First, NWAM and DHCP might take longer than you think.  If you quickly try to type in the URL for the manifest, you may find that you have no network yet and become concerned.  I spent the better part of a day on this.  Then, I let it sit for a couple of minutes before trying the manifest URL and life was good.  My DHCP server is particularly slow on my network.

Second, not using the mirror, on a slow system took a really long time to install.  Have not diagnosed it to network download time or processing time.  I think some of both since things like the installation phase of babel_install took nearly an hour on one system.

Third, there must be a lower bound on what sort of system will work.  T2000 works just fine.  SF480R has worked fine.  My SF280R is busted - as soon as it's fixed, I'll try it.  Not so great on E220 and E420 systems.  They appear to work, but at the very end it says it failed.  The only failure message I can see this time is due to the installer finding a former non-global zone environment on the disk. But so far, my experience on UltraSPARC-II systems is that once the installation completes, it hangs on the first reboot or fails to boot at all.  I am not surprised that systems that are no longer supported are not supported by AI.  I think I saw in Alok's notes that OBP 4.17 was the minimum supported.  That means my USII boxes are right out, and  I think even the SF280.  I hate doing firmware updates, so I have not updated the SF480.

Fourth, when I tried to install on a system that previously had the root disk mirrored with SVM, zpool create for the root pool failed.  I had to delete the metadbs and the metadevices before I could proceed.

But, I am very impressed!  Bootable AI media is way cool.  Keep your eyes and ears open, though, for more developments in the AutoInstaller in the coming months.

Wednesday Jan 13, 2010

ATLOSUG January Slides Posted

Slides from the January meeting of ATLOSUG - the Atlanta OpenSolaris User Group - are posted at

Next meeting will be February 9, 2010.  Check our web site for details.

Time to move to OpenSolaris completely

The last build of SXCE, the Solaris Express Community Edition, Build 130 has been released.  So, what?

Well, this means that it's time for all of us laggards who have been basking in the glow of new features and capabilities given to us by the Solaris developers, but who have not been willing to take the plunge into OpenSolaris completely, need to get off the fence and move straight away to OpenSolaris.

I made that move over the holidays.  Got a new laptop.  Perfect time to make the move.  Used to be, I would run SXCE natively on my laptop and run OpenSolaris in a VirtualBox.  My rationale is that I do a lot of demonstrations for customers and I wanted my laptop to look as much like the production Solaris 10 as possible, while still getting the cool new stuff. 

Now, I run OpenSolaris native on the laptop and run Solaris 10 in a VirtualBox when I need it.

Turns out the migration has been remarkably painless.  My only hassle was actually moving my own data from one laptop to the other.

I guess that in the eggs and bacon breakfast of OSes, I have moved from being the chicken (involved in the process) to being the pig (fully committed).  And this is some tasty, thick sliced, smoked bacon!  Mmmm.

Monday Dec 14, 2009

Sillyt ZFS Dedup Experiment

Just for grins, I thought it would be fun to do some "extreme" deduping.  I started out created a pool from a pair of mirrored drives on a system running OpenSolaris build 129.  We'll call the pool p1.  Notice that everyone agrees on the size when we first create it.  zpool list, zfs list, and df -h all show 134G available, more or less.  Notice that when we created the pool, we turned deduplication on from the very start.

# zpool create -O dedup=on p1 mirror c0t2d0 c0t3d0
# zfs list p1
p1      72K   134G    21K  /p1
# zpool list p1
p1     136G   126K   136G     0%  1.00x  ONLINE  -
# df -h /p1
Filesystem             size   used  avail capacity  Mounted on
p1                     134G    21K   134G     1%    /p1

So, what if we start copying a file over and over?  Well, we would expect that to dedup pretty well.  Let's get some data to play with.  We will create a set of 8 files, each one being made up of 128K of random data.  Then we will cat these together over and over and over and over and see what we get.

Why choose 128K for my file size?  Remember that we are trying to deduplicate as much as possible within this dataset.  As it turns out, the default recordsize for ZFS is 128K.  ZFS deduplication works at the ZFS block level.  By selecting a file size of 128K, each of the files I create fits exactly into a single ZFS block.  What if we picked a file size that was different from the ZFS block size? The blocks across the boundaries, where each file was cat-ed to another, would create some blocks that were not exactly the same as the other boundary blocks and would not deduplicate as well.

Here's an example.  Assume we have a file A whose contents are "aaaaaaaa", a file B containing "bbbbbbbb", and a file C containing "cccccccc".  If our blocksize is 6, while our files all have length 8, then each file spans more than 1 block.

# cat A B C > f1
# cat f1
# cat B A C > f2
# cat f2

The combined contents of the three files span across 4 blocks.  Notice that the only block in this example that is replicated is block 4 of f1 and block 4 of f2.  The other blocks all end up being different, even though the files were the same.  Think about how this would work as files numbers of files grew.

So, if we want to make an example where things are guaranteed to dedup as well as possible, our files need to always line up on block boundaries (remember we're not trying to be a real world - we're trying to get silly dedupratios).  So, let's create a set of files that all match the ZFS blocksize.  We'll just create files b1-b8 full of blocks of /dev/

# zfs get recordsize p1
p1    recordsize  128K     default
# dd if=/dev/random bs=1024 count=128 of=/p1/b1

# ls -ls b1 b2 b3 b4 b5 b6 b7 b8
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b1
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b2
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b3
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b4
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b5
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b6
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b7
 205 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b8

Now, let's make some big files out of these.

# cat b1 b2 b3 b4 b5 b6 b7 b8 > f1
# cat f1 f1 f1 f1 f1 f1 f1 f1 > f2
# cat f2 f2 f2 f2 f2 f2 f2 f2 > f3
# cat f3 f3 f3 f3 f3 f3 f3 f3 > f4
# cat f4 f4 f4 f4 f4 f4 f4 f4 > f5
# cat f5 f5 f5 f5 f5 f5 f5 f5 > f6
# cat f6 f6 f6 f6 f6 f6 f6 f6 > f7

# ls -lh
total 614027307
-rw-r--r--   1 root     root        128K Dec 14 15:28 b1
-rw-r--r--   1 root     root        128K Dec 14 15:28 b2
-rw-r--r--   1 root     root        128K Dec 14 15:28 b3
-rw-r--r--   1 root     root        128K Dec 14 15:28 b4
-rw-r--r--   1 root     root        128K Dec 14 15:28 b5
-rw-r--r--   1 root     root        128K Dec 14 15:28 b6
-rw-r--r--   1 root     root        128K Dec 14 15:28 b7
-rw-r--r--   1 root     root        128K Dec 14 15:28 b8
-rw-r--r--   1 root     root        1.0M Dec 14 15:28 f1
-rw-r--r--   1 root     root        8.0M Dec 14 15:28 f2
-rw-r--r--   1 root     root         64M Dec 14 15:28 f3
-rw-r--r--   1 root     root        512M Dec 14 15:28 f4
-rw-r--r--   1 root     root        4.0G Dec 14 15:28 f5
-rw-r--r--   1 root     root         32G Dec 14 15:30 f6
-rw-r--r--   1 root     root        256G Dec 14 15:49 f7

This looks pretty weird.  Remember our pool is only 134GB big.  Already the file f7 is 256G and we are not using any sort of compression.  What does df tell us?

# df -h /p1
Filesystem             size   used  avail capacity  Mounted on
p1                     422G   293G   129G    70%    /p1

Somehow, df now believes that the pool is 422GB instead of 134GB.  Why is that?  Well, rather than reporting the amount of available space by subtracting used from size, df now calculates its size dynamically as the sum of the space used plus the space available.  We have lots of space available since we have many many many duplicate references to the same blocks.

# zfs list p1
p1     293G   129G   293G  /p1
# zpool list p1
p1     136G   225M   136G     0%  299594.00x  ONLINE  -

zpool list tells us the actual size of the pool, along with the amount of space that it views as being allocated and the amount free.  So, the pool really has not changed size.  But the pool says that 225M are in use.  Metadata and pointer blocks, I presume.

Notice that the dedupratio is 299594!  That means that on average, there are almost 300,000 references to each actual block on the disk.

One last bit of interesting output comes from zdb.  Try zdb -DD on the pool.  This will give you a histogram of how many blocks are referenced how many times.  Not for the faint of heart, zdb will give you lots of ugly internal info on the pool and datasets. 

# zdb -DD p1
DDT-sha256-zap-duplicate: 8 entries, size 768 on disk, 1024 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced         
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
  256K        8      1M      1M      1M    2.29M    293G    293G    293G
 Total        8      1M      1M      1M    2.29M    293G    293G    293G

dedup = 299594.00, compress = 1.00, copies = 1.00, dedup \* compress / copies = 299594.00

So, what's my point?  I guess the point is that dedup really does work.  For data that has a commonality, it can save space.  For data that has a lot of commonality, it can save a lot of space.  With that come some surprises in terms of how some commands have had to adjust to changing sizes (or perceived sizes) of the storage they are reporting.

My suggestion?  Take a look at zfs dedup.  Think about where it might be helpful.  And then give it a try!

Friday Dec 11, 2009

ATLOSUG December Meeting slides posted

We had a great meeting of ATLOSUG, the Atlanta OpenSolaris User Group, this past Tuesday.  20+ people attended our first meeting with our new host, GCA Technology Services, at their training facility in Atlanta.  A big "Thank You" to Dawn and GCA for hosting our group.

Our topic this time was "What's New In ZFS" and we talked about some of the new features that have gone into ZFS recently, especially DeDupe.  George Wilson of the ZFS team was kind enough to share some slides that he had been working on and they are posted here.

Our next meeting will be Tuesday, January 12 at GCA.  Details and info can be found on the ATLOSUG website at

Monday Aug 17, 2009

ATLOSUG COMSTAR slides posted

Slides from last week's meeting of the Atlanta OpenSolaris User Group (ATLOSUG) are posted now on the group website -

We had a good group of about 16 people in attendance and a great discussion around how and why to use COMSTAR. 

The next meeting will be held on Sept. 8.  The topic will be how COMSTAR and other OpenSolaris technologies fit together in the Sun Unified Storage family of products.  Hope to see you there!

Monday Jun 29, 2009

Quick Review of Pro OpenSolaris

Pro OpenSolaris - Harry Foxwell and Christine Tran

Several (too many) weeks ago, I said that I was going to read and review Harry & Christine's new book, Pro OpenSolaris. Finally, I am getting around to doing this.

Overall, I was pleased with Pro OpenSolaris.  It does a good job at what it tries to do.  The key is to recognize when it is the right text and when others might be the right text.  Right in the Introduction, the authors are clear that this is an orientation tour.  They say "We assume that you are a professional system administrator ... and that your learning style needs only an orientation and in indication of what should be learned first in order to take advantage of OpenSolaris."  That's a good summary of the main direction of the book.  And at this, it does a very nice job!

This means that Pro OpenSolaris is not an exhaustive reference manual on all of the features and nuances of OpenSolaris.  Instead, it's a broad overview of what OpenSolaris is, how it got to be what it is, what is key features and differentiators are, and why I might choose to use OpenSolaris instead of some other system.  That's important to realize from the outset.  If you are looking for the thousand-page reference guide, this is not the one.  If you have heard about OpenSolaris and want to explore a bit more deeply, to decide whether or not OpenSolaris is something that might help your business or might be a tool you can use, this is a great place to start.
Pro OpenSolaris spends a good bit of time on the preliminaries.  There is an extensive section on the philosophical differences between the approaches and requirements of different open source licenses and styles of licenses.  Pro OpenSolaris explains clearly why OpenSolaris uses the CDDL license as opposed to other licenses and how this fits in with the overall goal of the OpenSolaris project.

Pro OpenSolaris helps you get started, with a lengthy discussion of how to go about installing OpenSolaris either on  bare metal or in a virtual machine.

Compare this to the OpenSolaris Bible (Solter, Jelinek, & Miner), which really does aspire to be the thousand-page reference guide.  In the OpenSolaris Bible, licensing and installation are given only a short discussion, since they are not central to the book's focus.  Instead, the reader is directed to other places for that discussion.

But that's why it's important to have both books.  Pro OpenSolaris gives the tour of the important parts of the OpenSolaris operating system, how and why I might use them, and why they are important, but it does not go deeply into the details.  That's probably wise for an operating system that is still growing and changing substantially with each new release.

One thing that particularly interested me in Pro OpenSolaris was the fact that it includes large sections on both the OpenSolaris Webstack which includes IPS-packaged versions of the commonly used pieces of an AMP stack - notably, Apache, MySQL, PHP, lighttpd, nginx, Ruby, Rails, etc - all compiled and optimized for OpenSolaris and including key add-ons such as DTrace providers where applicable.  Pro OpenSolaris also has a nice, long chapter on NetBeans and its role as a part of an overall OpenSolaris development environment.

What's my take overall?  Pro OpenSolaris is a quick read that will give you a good understanding of what OpenSolaris is and why you would want to use it; what it's key features are and why they are important; and how you can use these to your best advantage.  There are lots of examples and technical details so that you can see that what Harry & Christine talk about is for real.  I would recommend this as part of your library.  But I would also recommend the OpenSolaris Bible.  The two complement each other nicely to complete the picture.

Saturday Jun 13, 2009

June Atlanta OpenSolaris User Group meeting

Had a great Atlanta OpenSolaris User Group meeting this month.  We did an installfest, an update from CommunityOne, and a recap of what's new in OpenSolaris 2009.06.  About twenty folks showed up and about half loaded their laptops with the new build while we were there.

We got some great feedback for upcoming topics and are pushing forward with that.  We also decided to move back to monthly meetings starting in August.  Our next meeting is August 11 when we will talk about COMSTAR.  We are also considering a change in venue back to the Sun office in Alpharetta.  Matrix Resources has been very gracious in allowing us to use their facility, but I always feel bad that they have to have someone stick around until late at night to babysit us.

We're going to try an experiment to see if we can't get the word out a little better about our merry band via social networks.  We've started by creating a Meetup group at  Hopefully this might generate more traffic to our meetings and help us find folks in the area.

Tuesday Jun 09, 2009

OpenSolaris User Group Leaders Bootcamp

The keepers of the OpenSolaris Community took advantage of having a number of the User Group leaders at the CommunityOne conference this last week to set aside a day for a User Group Leaders' Bootcamp.

What a great opportunity to get together in the same room with folks working to create and sustain OpenSolaris user groups around the world! We had folks from every continent - from Atlanta and Argentina, from Dallas and Serbia, from China and London, and on and on. Something like twenty-five to thirty of the OpenSolaris User Groups were represented.

The whole day was a great experience. It was great to see that as different as each group was, there were a lot of common themes for both successes and for challenges. And a lot of great ideas were shared as to how to boost participation, to improve meetings, and to improve the success of the groups overall. It will be exciting to hear a report back next year on how these ideas have played out.

Be sure to check out Jim Grisanzio's photos to see some of these characters and what all went on at CommunityOne and in the OSUG Bootcamp.

Jeff Jackson, Sr. VP for Solaris Engineering, started the day off with a greeting and charge to get the most out of this opportunity to meet with each other and with the OpenSolaris and Solaris headquarters teams.

Since the thing that brought this group together was a common focus on OpenSolaris User Groups and not the fact that we knew each other, we began the day with a bit of team-building exercise, courtesy of The Go Game. This is a cross between a scavenger hunt and an improvisational acting class. Teams criss-crossed downtown San Francisco trying to find and photograph places hinted at by clues on web pages. At some venues, the teams had to act out and film various tasks. For example, on the Yerba Buena lawn, the team had to engage in an impromptu Tai Chi exercise in order to find their long-lost phys ed teacher, Ms. Karpanski, who then led the team in creating a new exercise video. Once we all returned, all of our submissions were voted on by the team and a winning team chosen. Supposedly, we can see all these photos and videos. Haven't yet found out how. Perhaps, that's for the best!

In order for us to get to know each other's groups, each User Group prepared a poster describing the group, where we were located, what we do, what sort of members make up the group, and what makes us special. Many of these posters were really well done! We had a bit of a scavenger hunt for answers to questions found by careful reading of all of the posters. It was really cool to see what sorts of projects some of the groups had undertaken and how they were working with various university or other organizations.

But the main part of the day was spent in a big brainstorming session. We all identified our successes, our failures, our challenges, and ideas for the future. We put all of these on several hundred post-it notes and placed them on large posters. We grouped them by topic and then went through all of these. Even though this only had an hour on the agenda, it ended up taking the bulk of the day. Since this was the most important thing for us, we decided to rearrange the day to accommodate it.

From these sticky-notes, we found out that some of our groups were mostly focused on administrators but others had a large developer population. We all have some sort of issues around meeting locations - whether it's a matter of access in the evening, finding a convenient location, or providing network access and power. For most groups, having some sort of refreshments was important, though some groups felt like good refreshments attracted too many folks who just show up for the food.

There were a lot of good ideas around using a registration site to get access to the facility and order food, creating and using Facebook, LinkedIn, and Twitter, using IRC, interacting with the Sun Campus Ambassadors, using MeetUp to find new members. Many folks found it useful to video and make available presentations given at their meetings. Some groups (for example in Japan) have special sub-groups for beginners. Other groups are doing large-scale development projects, such as the Belenix project in Bangalore.

For me and the Atlanta OpenSolaris User Group, I have a lot of new ideas that I want to put out to our membership and our leaders - move back to monthly meetings, use a registration site, set up a presence on various social networks.

Many people said that folks come to the user groups in order to network and expand their circle of business acquaintances. In light of the current economic situation, with so many smart people out of work, I am thinking of promoting our group with some of the job networking groups around Atlanta. For example, my church, Roswell United Methodist Church, has one of the largest job networking groups in the Atlanta area. Every two weeks, nearly 500 people meet to network and help each other in their job search. Perhaps the many IT folks in this group might find this a way to get current and stay current in a whole new area.

At any rate, I am inspired to get things cranking at ATLOSUG!

After spending the afternoon working through our hundreds of sticky notes, the OpenSolaris Governing Board had a bit of a roundtable with us to talk about what they do and how we can work better together. It was really helpful for me to hear from them and to get to put faces to some of the names for the folks I did not already know.

We finished out the evening with a great dinner at the Crab House at Pier 39. From what I have seen, many of the photos from dinner and the meeting are already on Facebook, Flickr, and likely Jim Grisanzio, OpenSolaris Chief Photographer, was out in force with his camera!

Thanks so much to Teresa Giacomini, Lynn Rohrer, Dierdre Straughan, Jim Grisanzio, Tina Hartshorn, Wendy Ames, Kris Hake and everyone else who had a hand in organizing this event. Thanks to Jeff Jackson, Bill Franklin, Chris Armes, Dan Roberts and all the other HQ folks who took the time to come and listen and interact with the leaders of these groups. I know that I got a lot out of the meeting and am more eager than ever to promote and push forward with our user group.

CommunityOne Recap

Last week, I had the opportunity to attend CommunityOne West in San Francisco, along with a number of the other leaders of OpenSolaris User Groups. (I head up the Atlanta OpenSolaris User Group.) What a great meeting! Three days of OpenSolaris.

First off, I am sure that Teresa and the OpenSolaris team selected the Hotel Mosser because they knew it was a Solaris focused venue. As Dave Barry would say, I am not making this up! Even the toilet paper was Solaris-based. Bob Netherton and I were speculating that perhaps this was an example of Solaris Roll-Based Dump Management, new in OpenSolaris 2009.06.

CommunityOne Day One

Day One was a full day of OpenSolaris and related talks. The OpenSolaris teams maintained tracks around deploying OpenSolaris 2009.06 in the datacenter and around developing applications on OpenSolaris 2009.06. For the most part, I stuck with the operations-focused sessions, though I did step out into a few others. Some of the highlights included:

  • Peter Dennis and Brian Leonard's fun survey of what's new and exciting in OpenSolaris 2009.06. ATLOSUG folks should look for a reprise of this at our meeting on Tuesday.
  • Jerry Jelinek's discussion of the various virtualization techniques built into and onto OpenSolaris. This is a sort of talk that I give a lot. It was really helpful to hear how the folks in engineering approach this topic.
  • Scott Tracy & Dan Maslowski's COMSTAR discussion and demo. COMSTAR has been significantly expanded in recent builds, with more coolness still to come. I had not paid a lot of attention to this lately and this was a really helpful talk, especially since Teresa Giacomini had asked me to present this demo for the user group leaders on Wednesday. In any case, I have reproduced the iSCSI demo that Scott did using just VirtualBox, rather than requiring a server. Of course, the VB version is not something I would run my main storage server on. But it certainly is a great tool to understand the technology. I hope to have Ryan Matteson (Ryan, you volunteered!) give a talk at the ATLOSUG sometime soon.
  • I branched out of main OpenSolaris path to see a few other things on Day One, as well. Ken Pepple, Scott Mattoon, and John Stanford gave a good talk on Practical Cloud Patterns. They talked about some of the typical ways that people do provisioning, application deployment, and monitoring within the cloud.
  • Karsten Wade, "Community Gardener" at Red Hat, gave a talk called Participate or Die. This was about the importance of participating in the Open Source projects that are important to your business. He talked about understanding the difference in participating (perhaps, using open source code) and influencing (helping to guide the project). By paying more attention to those who actively participate, active members of the community enhance their status and become influencers of the direction for a project. And it is important that this happen - in successful projects, the roadmap is driven by the participants rather than handed down on high with the hope that people will line up behind it. Really, I think, his key message was that it is important not to just passively stand by when you care about or depend upon something, leaving its future in the hands of others.
  • Kevin Nilson and Michael Van Riper gave a great talk about building and maintaining a successful user group. This was built on their experiences with the Silicon Valley Java User Group and with the Google Technology User Group. They took a great approach by collecting videos from the leaders, hosts, and participants in these and other groups around the country. It was really helpful to hear people's perspectives on why they attend a group, why companies host group meetings, and why and how people continue to lead user groups. While a lot of what they had to say, and the successes that they have had, are a product of being in a very "target-rich environment" in Silicon Valley, it was interesting to see that some things are universal: a good location makes a lot of difference; having food matters. I got a lot of ideas from this and from the OpenSolaris User Group Bootcamp that I hope to get going in ATLOSUG.
  • OpenSolaris 2009.06 Launch Party finished out the evening. Dodgeball and the Extra Action Marching Band. I thought these folks were the hit of the evening. You get the best of marching bands, big drums, loud brass, but add to that folks flaying around, throwing themselves at the dodgeball court nets. Much more exciting than your regular marching band, even some of the cool ones around Atlanta in the Battle of the Bands!

CommunityOne Day Two

Day Two was filled with OpenSolaris Deep Dives. These were very helpful, not just in content, but in helping me to hone my own OpenSolaris presentations. For this day, I stuck close to the Deploying OpenSolaris track, having learned in graduate school that I am not a developer. This track included:

  • Chris Armes kicked off the day with a talk on deploying OpenSolaris in your Data Centre (as he spells it).
  • Becoming a ZFS Ninja, presented by Ben Rockwood. Ben is an early adopter and a production user of ZFS. This was a two-hour, fairly in-depth talk about ZFS and its capabilities.
  • Nick Solter, co-author of the OpenSolaris Bible, talked about OpenHA Cluster, newly released and available for OpenSolaris. With OpenHA, enterprise-level availability is not just available, but also supported. He talked about how the cluster works and about extensions to the OpenHA cluster beyond the capabilities of Solaris Cluster, based on OpenSolaris technologies. Some of these include the use of Crossbow VNICs for private interconnects. I am still thinking about the availability implications of this and am not sure it's an answer for all configurations. But it's cool that it's there!
  • Jerry Jelinek rounded out the day talking about Resource Management with Containers, a topic near and dear to my heart and one I end up presenting a lot.
We finished out Day Two with a reunion dinner of some of the old team at Bucca di Beppo. Around the table, we had Vasu Karunanithi, Dawit Bereket, Matt Ingenthron, Scott Dickson (me), Bob Netherton, Isaac Rosenfeld, and Kimberly Chang. It was great to get at least part of the old gang together and catch up.

Day Three was the OpenSolaris User Group Leaders Bootcamp. But that's for another post....

Monday Apr 27, 2009

Just got my copy of Pro OpenSolaris

Just got my copy of Pro OpenSolaris by Harry Foxwell and Christine Tran in the mail today!  Can't wait to get a good look and post a review.  I wonder if I can get the authors to inscribe it to me!  

Also got a copy of OpenSolaris Bible by Nick Solter, Gerry Jelinek, and Dave Miner.  Looking forward into cracking into it as well.

Will post reviews shortly.

Monday Oct 08, 2007

ATLOSUG - Oct 9. Cancelled

Sun's Customer Engineering Conference is going on this week in Las Vegas. As a result, we've had to cancel our October meeting of ATLOSUG - We're all in Las Vegas.

Sorry for the inconvenience. We will pick up with our meetings in November. Ryan Matteson, from Ning, will be our speaker. Should be a really good meeting. Details on the topic to follow.

Friday Dec 01, 2006

Fun with zvols - UFS on a zvol

Continuing with some of the ideas around zvols, I wondered about UFS on a zvol.  On the surface, this appears to be sort of redundant and not really very sensible.  But thinking about it, there are some real advantages.

  • I can take advantage of the data integrity and self-healing features of ZFS since this is below the filesystem layer.
  • I can easily create new volumes for filesystems and grow existing ones
  • I can make snapshots of the volume, sharing the ZFS snapshot flexibility with UFS - very cool
  • In the future, I should be able to do things like have an encrypted UFS (sort-of) and secure deletion

Creating UFS filesystems on zvols

Creating a UFS filesystem on a zvol is pretty trivial.  In this example, we'll create a mirrored pool and then build a UFS filesystem in a zvol.

bash-3.00# zpool create p mirror c2t10d0 c2t11d0 mirror c2t12d0 c2t13d0
bash-3.00# zfs create -V 2g p/v1
bash-3.00# zfs list
p       4.00G  29.0G  24.5K  /p
p/v1    22.5K  31.0G  22.5K  -
bash-3.00# newfs /dev/zvol/rdsk/p/v1
newfs: construct a new file system /dev/zvol/rdsk/p/v1: (y/n)? y
Warning: 2082 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/p/v1:    4194270 sectors in 683 cylinders of 48 tracks, 128 sectors
        2048.0MB in 43 cyl groups (16 c/g, 48.00MB/g, 11648 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
3248288, 3346720, 3445152, 3543584, 3642016, 3740448, 3838880, 3937312,
4035744, 4134176
bash-3.00# mkdir /fs1
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# df -h /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G   2.0M   1.9G     1%    /fs1

Nothing much to it. 

Growing UFS filesystems on zvols

But, what if I run out of space?  Well, just as you can add disks to a volume and grow the size of the volume, you can grow the size of a zvol.  Now, since the UFS filesystem is a data structure inside zvol container, you have to grow it as well.  Were I using just zfs, the size of the file system would grow and shrink dynamically with the size of the data in the file system.  But  a UFS has a fixed size, so it has to be expanded manually to accomodate the enlarged volume.  Now, this seems to have quite working between b45 and b53, so I just filed a bug on this one.

bash-3.00# uname -a
SunOS atl-sewr-158-154 5.11 snv_45 sun4u sparc SUNW,Sun-Fire-480R
bash-3.00# zfs create -V 1g bsd/v1
bash-3.00# newfs /dev/zvol/rdsk/bsd/v1
bash-3.00# zfs set volsize=2g bsd/v1
bash-3.00# growfs /dev/zvol/rdsk/bsd/v1
Warning: 2048 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/bsd/v1:  4194304 sectors in 683 cylinders of 48 tracks, 128 sectors
        2048.0MB in 49 cyl groups (14 c/g, 42.00MB/g, 20160 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 86176, 172320, 258464, 344608, 430752, 516896, 603040, 689184, 775328,
3359648, 3445792, 3531936, 3618080, 3704224, 3790368, 3876512, 3962656,
4048800, 4134944

What about compression? 

Along the same lines as growing the file system, I suppose you could turn compression on for the zvol.  But since the UFS is of fixed size, it won't help especially, as far as fitting more data in the file system.  You can't put more into the filesystem than the filesystem thinks that it can hold.  Even if it isn't using that much on the disk.  Here's a little demonstration of that.

First, we will loop through, creating 200MB files in a 1GB file system with no compression.  We will use blocks of zeros, since these will compress quite a bit the second time round. 

bash-3.00# zfs create -V 1g p/v1
bash-3.00# zfs get used,volsize,compressratio p/v1
p/v1  used           22.5K    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
bash-3.00# newfs /dev/zvol/rdsk/p/v1
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# for f in f1 f2 f3 f4 f5 f6 f7 ; do
> dd if=/dev/zero bs=1024k count=200 of=/fs1/$f
> df -h /fs1
> zfs get used,volsize,compressratio p/v1
> done

200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   201M   703M    23%    /fs1
p/v1  used           62.5M    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   401M   503M    45%    /fs1
p/v1  used           149M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   601M   303M    67%    /fs1
p/v1  used           377M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   801M   103M    89%    /fs1
p/v1  used           497M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
dd: unexpected short write, wrote 507904 bytes, expected 1048576
161+0 records in
161+0 records out
Dec  1 14:53:04 atl-sewr-158-122 ufs: NOTICE: alloc: /fs1: file system full

bash-3.00# zfs get used,volsize,compressratio p/v1
p/v1  used           1.00G    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -

So, you see that it fails as it writes the 5th 200MB chunk, which is what you would expect.  Now, let's do the same thing with compression turned on for the volume.

bash-3.00# zfs create -V 1g p/v2
bash-3.00# zfs set compression=on p/v2
bash-3.00# newfs /dev/zvol/rdsk/p/v2
bash-3.00# mount /dev/zvol/dsk/p/v2 /fs2
bash-3.00# for f in f1 f2 f3 f4 f5 f6 f7 ; do
> dd if=/dev/zero bs=1024k count=200 of=/fs2/$f
> df -h /fs2
> zfs get used,volsize,compressratio p/v2
> done
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   201M   703M    23%    /fs2
p/v2  used           8.58M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.65x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   401M   503M    45%    /fs2
p/v2  used           8.58M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.65x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   601M   303M    67%    /fs2
p/v2  used           8.83M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.50x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   801M   103M    89%    /fs2
p/v2  used           8.83M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.50x    -
dd: unexpected short write, wrote 507904 bytes, expected 1048576
161+0 records in
161+0 records out
Dec  1 15:16:42 atl-sewr-158-122 ufs: NOTICE: alloc: /fs2: file system full

bash-3.00# zfs get used,volsize,compressratio p/v2
p/v2  used           9.54M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.07x    -
bash-3.00# df -h /fs2
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   962M     0K   100%    /fs2

This time, even though the volume was not using much space at all, the file system was full.  So compression in this case is especially valuable from a space management standpoint.  Depending on the contents of the filesystem, compression may still help the performance by converting multiple I/Os into single or fewer I/Os, though.

The Cool Stuff - Snapshots and Clones with UFS on Zvols

One of the things that is not available in UFS is the ability to create multiple snapshots quickly and easily.  The fssnap(1M) command allows me to create a single, read-only snapshot of a UFS file system.  In addition, it requires an additional location to maintain backing store for files changed or deleted in the master image during the lifetime of  the snapshot.

ZFS offers the ability to create many snapshots of a ZFS filesystem quickly and easily.  This ability extends to zvols, as it turns out.

For this example, we will create a volume, fill it up with some data and then play around with taking some snapshots of it.  We will just tar over the Java JDK so there are some files in the file system. 

bash-3.00# zfs create -V 2g p/v1
bash-3.00# newfs /dev/zvol/rdsk/p/v1
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# tar cf -  ./jdk/ | (cd /fs1 ; tar xf - )
bash-3.00# df -h /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G   431M   1.5G    23%    /fs1
bash-3.00# zfs list
p       4.00G  29.0G  24.5K  /p
p/swap  22.5K  31.0G  22.5K  -
p/v1     531M  30.5G   531M  -

Now, we will create a snapshot of the volume, just like for any other ZFS file system.  As it turns out, this creates new device nodes in /dev/zvol for the block and character devices.  We can mount them as UFS file systems same as always.

bash-3.00# zfs snapshot p/v1@s1  # Make the snapshot
bash-3.00# zfs list # See that it's really there
p        4.00G  29.0G  24.5K  /p
p/swap   22.5K  31.0G  22.5K  -
p/v1      531M  30.5G   531M  -
p/v1@s1      0      -   531M  -
bash-3.00# mkdir /fs1-s1
bash-3.00# mount  /dev/zvol/dsk/p/v1@s1 /fs1-s1 # Mount it
mount: /dev/zvol/dsk/p/v1@s1 write-protected # Snapshots are read-only, so this fails
bash-3.00# mount -o ro  /dev/zvol/dsk/p/v1@s1 /fs1-s1 # Mount again read-only
bash-3.00# df -h /fs1-s1 /fs1
Filesystem             size   used  avail capacity  Mounted on
                       1.9G   431M   1.5G    23%    /fs1-s1
/dev/zvol/dsk/p/v1     1.9G   431M   1.5G    23%    /fs1

At this point /fs1-s1 is a read-only snapshot of /fs1.  If I delete files, create files, or change files in /fs1, that change will not be reflected in /fs1-s1.

bash-3.00# ls /fs1/jdk
instances    jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# rm -rf /fs1/jdk/instances
bash-3.00# df -h /fs1 /fs1-s1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G    61M   1.8G     4%    /fs1
                       1.9G   431M   1.5G    23%    /fs1-s1

Just as you can create multiple snapshots.  And as with any other ZFS file system, you can rollback a snapshot and make it the master again.  You have to unmount the filesystem in order to do this, since the rollback is at the volume level.  Changing the volume underneath the UFS filesystem would leave UFS confused about the state of things.  But, ZFS catches this, too.


bash-3.00# ls /fs1/jdk/
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# rm /fs1/jdk/jdk1.6.0
bash-3.00# ls /fs1/jdk/
jdk1.5.0_08  latest       packages
bash-3.00# zfs list
p        4.00G  29.0G  24.5K  /p
p/swap   22.5K  31.0G  22.5K  -
p/v1      535M  30.5G   531M  -
p/v1@s1  4.33M      -   531M  -
bash-3.00# zfs rollback p/v1@s2 # /fs1 is still mounted.
cannot remove device links for 'p/v1': dataset is busy
bash-3.00# umount /fs1
bash-3.00# zfs rollback p/v1@s2
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# ls /fs1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages

I can create additional read-write instances of a volume by cloning the snapshot.  The clone and the master file system will share the same objects on-disk for data that remains unchanged, while new on-disk objects will be created for any files that are changed either in the master or in the clone.


bash-3.00# ls /fs1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# zfs snapshot p/v1@s1
bash-3.00# zfs clone p/v1@s1 p/c1
bash-3.00# zfs list
p        4.00G  29.0G  24.5K  /p
p/c1         0  29.0G   531M  -
p/swap   22.5K  31.0G  22.5K  -
p/v1      531M  30.5G   531M  -
p/v1@s1      0      -   531M  -
bash-3.00# mkdir /c1
bash-3.00# mount /dev/zvol/dsk/p/c1 /c1
bash-3.00# ls /c1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# df -h /fs1 /c1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G    61M   1.8G     4%    /fs1
/dev/zvol/dsk/p/c1     1.9G    61M   1.8G     4%    /c1

I think am pretty sure that this isn't exactly what the ZFS guys had in mind when they set out to build all of this, but this is pretty cool.  Now, I can create UFS snapshots without having to specify a backing store.  I can create clones, promote the clones to the master, and the other things that I can do in ZFS.  I still have to manage the mounts myself, but I'm better off than before.

I have not tried any sort of performance testing on these.  Dominic Kay has just written a nice blog about using filebench to compare ZFS and VxFS.  Maybe I can use some of that work to see how things go with UFS on top of ZFS.

As always, comments, etc. are welcome!

Fun with zvols - Swap on a zvol

I mentioned recently that I just spent a week in a ZFS internals TOI. Got a few ideas to play with there that I will share. Hopefully folks might have suggestions as to how to improve / test / validate some of these things.

ZVOLs as Swap

The first thing that I thought about was using ZFS as a swap device. Of course, this is right there in the zfs(1) man page as an example, but it still deserves a mention here.  There has been some discussion of this on the zfs-discuss list at (I just retyped that dot four times thinking it was a comma. Turns out there was crud on my laptop screen).  The dump device cannot be on a zvol (at least if you want to catch a crash dump) but this still gives a lot of flexibility.  With root on ZFS (coming before too long) ZFS swap makes a lot of sense and is the natural choice. We were talking in class that maybe it would be nice if there were a way to turn off ZFS' caching for the swap surface to improve performance, but that remains to be seen.

At any rate, setting up mirrored swap with ZFS is way simple! Much simpler even than with SVM, which in turn is simpler than VxVM. Here's all it takes:

bash-3.00# zpool create -f p mirror c2t10d0 c2t11d0
bash-3.00# zfs create -V 2g p/swap
bash-3.00# swap -a /dev/zvol/dsk/p/swap

Pretty darn simple, if you ask me. You can make it permanent by changing the lines for swap in your /etc/vfstab (below).  Notice that you use the path to the zvol in the /dev tree rather than the ZFS dataset name.

bash-3.00# cat /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#/dev/dsk/c1t0d0s1 - - swap - no -
/dev/zvol/dsk/p/swap - - swap - no -

I would like to do some performance testing to see what kind of performance you can get with swap on a zvol.  I am curious about how this will affect kernel memory usage.  I am curious about the effect of things like compression on the swap volume.  Thinking about that one, it doesn't make a lot of sense.  I am also curious about the ability to dynamically change the size of the swap space.  At first glance, changing the size of the volume does not automatically change the amount of available swap space.  That makes sense.  That makes sense for expanding swap space.  But if you reduce the size of the volume and the kernel doesn't notice, that sounds like a it could be a problem.  Maybe I should file a bug.

Suggestions for things to try and ways to measure overhead and performance for this are welcomed.

Thursday Nov 30, 2006

ZILs and ZAPs and ARCs - Oh My!

I just spent the last four days in a ZFS Intenals TOI, given by George Wilson from RPE.  This just reinforces my belief that the folks who build OpenSolaris (and most any complex software product, actually) have a special gift.  How one can conceive of all of the various parts and pieces to bring together something as cool as OpenSolaris or ZFS or DTrace, etc., is beyond me.

By way of full disclosure, I ought to admit that the main thing I learned in graduate school and while working as a developer in a CO-OP job at IBM was that I hate development.  I am not cut out for it and have no patience for it.

Anyway, though, spending a week in the ZFS source actually helps you figure out how to best use the tool at a user level.  You how things fit together and this helps to figure out how to build solutions.  I got a ton of good ideas about some things that you might do with ZFS even without moving all of your data to ZFS.  Don't know whether they will pan out or not, but some ideas to play around with.  More about that later.

Same kind of thing applies for internals of the kernel.  Whether or not you are a kernel programmer, you can be a better developer and a better system administrator if you have a notion of how the pieces of the kernel fit together.  Sun Education is now offering a class called Solaris 10 Operating System, previously only offered internally at Sun.  Since Solaris has been open-sourced, the internal Internals is now and external Internals!  If you have a chance, take this class!  I take it every couple of Solaris releases and never regret it.

But, mostly I want to say a special thanks to George Wilson and the RPE team for putting together a fantastic training event and for allowing me, from the SE / non-developer side of the house to sit in and bask in the glow of those who actually make things for a living.

Monday Sep 18, 2006

Thanks, Ryan, for a great OpenSolaris User Group meeting

The Atlanta OpenSolaris User Group met last Tuesday and it has taken me a week to get my head above water to mention it. Ryan Matteson from Earthlink, battling a nasty cold, did a great job. His presentation was on Brendan Gregg's DTrace Toolkit and how system administrators can make good use of DTrace. His slides are on his blog here on We ended up with about 25 people for this meeting.

Thanks to our sponsor for this meeting, Forsythe Systems, for providing refreshments.

The next meeting of the Atlanta OpenSolaris User Group will be Tuesday, Oct. 10, in the Sun office in Alpharetta, GA. More details are here.

Thursday Sep 07, 2006

Atlanta OpenSolaris User Group - September 12

The next meeting of the Atlanta OpenSolaris User Group will be Tuesday, Sept 12, at 7 PM in the offices of Sun Microsystems. Sun is located at 3655 North Point Parkway in Alpharetta, GA. For directions and details, see the ATLOSUG web site at

The topic for this meeting will be the DTrace Toolkit. Ryan Matteson, of Earthlink, will present. The DTrace Toolkit is a collection of tools built on top of DTrace for system and application monitoring and observation.

Please RSVP to if you plan to attend. We need to have at least a rough count for refreshments.

I've built it. but now what?

ZFS on a box like the SunFire X4500 is way cool. But what if all you have is old, controller-based storage devices? George Wilson and I were wondering about that and thought it might be useful to do some experimentation down that line. So, we collected all of the currently unused storage in my lab and built a big ZFS farm. We've got a V480 with 7 T3B and 8 T3 bricks connected via Sanbox-1 switches, along with a couple of TB of IBM Shark storage recabled to be JBOD. I have a 3510 and maybe some Adaptec RAID storage that I can hook up eventually.

So, the server is up and running with a 3 racks of storage, keeping the lab nice and toasty. Now what?!

What might be the best way to manage the T3s in a ZFS world? As a first pass, I split each brick into 2 RAID5 LUNs with a shared spare drive. But, maybe I would be better off just creating a single stripe with no RAID in the T3 and let ZFS handle the mirroring. But, I've had a number of disk errors (these are all really, really, really old) that the T3 fixed on its own w/o bothering the ZFS pool. Maybe RAID5 in the brick is the right approach. I could argue either way.

Feel free to share your suggestions on what might be a good configuration here and why. I'm happy to test out several different approaches.

Wednesday Jul 12, 2006

Atlanta OpenSolaris User Group - July 12

The Atlanta OpenSolaris User Group had a great monthly meeting last night Alok Aggarwal presented on NFSv4 to a group of about 15 and fielded quite a lot of questions ranging from how NFSv4 works to how to use the DTrace provider for NFSv4. Good meeting. Check for slides and meeting details.

No cake and pictures this month, but a big Thank-You to Intelligent Technology Systems for sponsoring us this month and bringing the pizza.

Our next meeting will be August 8 in the Alpharetta, Georgia Sun office. Check the ATLOSUG web site for details and directions.

Wednesday Jun 14, 2006

Hotlanta Celebrates One Year of OpenSolaris

The Atlanta OpenSolaris User Group launched a bit of an early birthday celebration for our good friend, OpenSolaris, last night with a rousing meeting. George Wilson, from Sun's Revenue Products Engineering group, gave us an update on what's new in ZFS lately. I have to say that I am more and more impressed with the things that you can and will be able to do with ZFS. George and I were talking about how one might use promotion of cloned ZFS filesystems as a part of a Q/A and patching process, especially for zones sitting in a ZFS filesystem. I am not yet sure of exactly how all of this might work, but I think it has promise.

George also talked about using ZFS for the root filesystem and booting from a ZFS filesystem. Also very cool. Seems to me like this has a lot of benefits. You never will have to go through the pain of backing up and restoring a root drive to resize /var or /opt! Plus, you get the added safety and security of ZFS. Old-timers who want to see a root disk that looks like a simple disk may have to rethink things a little, but I think the added benefits will outweigh the effort of change.

After George's talk, I took the stage and talked about integrating Zones and ZFS. I'm pretty excited about this. On the one hand, being able to use ZFS to provide application data space to a zone allows the zone administrator to take on the responsibility of managing their own filesystems to fit their needs, without bothering the global platform adminstrator. On the other hand, using ZFS for the zoneroot, I can easily and quickly create new zones, cloning them from a master, using ZFS properties to keep them from stomping on one another. All very cool. I have to congratulate the whole ZFS team (and the Zones team).

I am looking forward to our next meeting - July 11 - when we will hear from Alok Aggarwal on NFSv4. We got a good list of suggested topics that should keep us going through the fall.

Friday Jun 09, 2006

Atlanta OpenSolaris User Group - June 13

The Atlanta OpenSolaris User Group is having its next meeting on Tuesday, June 13, at 7:00 PM in the Alpharetta, GA Sun Office. Details and directions can be found at

Our speakers for this meeting will be George Wilson from Sun's engineering group talking about ZFS as a Root Filesystem, and Scott Dickson talking about integrating ZFS and Zones.

Come out and help us celebrate the 1st birthday of OpenSolaris!

Monday May 15, 2006

It's neat, but is it useful?

Sometimes weird ideas occur to me while I'm on airplanes. The other day, while flying to a customer engagement, I was thinking about the fact that customers often ask about how to manage usernames and passwords between the global zone and non-global zones in Solaris 10. Certainly, you can use a centrally managed solution such as LDAP or NIS, but many of these customers don't have anything like that. Moreover, they only have a few users on any particular system and want all of the users in the global zone to be known in the non-global zones as well.

So, this got me to thinking. What if we use loopback mounts for things like /etc/passwd and /etc/shadow? Hey, yeah! That's the ticket! That might work! If I make a readonly mount of these files, I bet I can access them in the non-global zone. If I make then read-only, they end up being managed from the global zone, and less likely to be a security problem.

And what about /etc/hosts? Well, probably there's DNS, but not necessarily. I have customers who have 50,000+ line host files. They would love to share these, too. So, why not mount /etc/inet while we're at it?

Here's what I did. I have a zone called z4 whose zoneroot is located at /zones/z4. I had already created this zone previously, so I will just use zonecfg to make some modifications to the existing zone:

global# mv /zones/z4/root/etc/passwd /zones/z4/root/etc/
global# mv /zones/z4/root/etc/shadow /zones/z4/root/etc/
zonecfg -z z4
zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/etc/passwd
zonecfg:z4:fs> set special=/etc/passwd
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> add options [ro,nodevices]
zonecfg:z4:fs> end
zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/etc/shadow
zonecfg:z4:fs> set special=/etc/shadow
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> add options [ro,nodevices]
zonecfg:z4:fs> end
zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/etc/inet
zonecfg:z4:fs> set special=/etc/inet
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> add options [ro,nodevices]
zonecfg:z4:fs> end
zonecfg:z4> verify
zonecfg:z4> commit
zonecfg:z4> \^D

When I boot up the zone and take a look at what's mounted, I now see this:

# uname -a
SunOS z4 5.10 Generic_Patch i86pc i386 i86pc
# zonename
# df -h
Filesystem             size   used  avail capacity  Mounted on
/                      5.9G   3.5G   2.3G    61%    /
/dev                   5.9G   3.5G   2.3G    61%    /dev
/etc/inet              5.9G   3.5G   2.3G    61%    /etc/inet
/etc/passwd            5.9G   3.5G   2.3G    61%    /etc/passwd
/etc/shadow            5.9G   3.5G   2.3G    61%    /etc/shadow
/lib                   5.9G   3.5G   2.3G    61%    /lib
/opt                   3.9G   1.6G   2.3G    42%    /opt
/platform              5.9G   3.5G   2.3G    61%    /platform
/sbin                  5.9G   3.5G   2.3G    61%    /sbin
/usr                   5.9G   3.5G   2.3G    61%    /usr
proc                     0K     0K     0K     0%    /proc
ctfs                     0K     0K     0K     0%    /system/contract
swap                   1.5G   240K   1.5G     1%    /etc/svc/volatile
mnttab                   0K     0K     0K     0%    /etc/mnttab
                       5.9G   3.5G   2.3G    61%    /lib/
fd                       0K     0K     0K     0%    /dev/fd
swap                   1.5G     0K   1.5G     0%    /tmp
swap                   1.5G    16K   1.5G     1%    /var/run

Now, I can log directly into the zone using the same username and password as the global zone. This seems like it could be pretty cool. /etc/passwd, /etc/shadow, and /etc/inet are all mount points from the global zone.I am not sure that it's really useful. What does anyone else think? Is this a technique that should be strongly discouraged? Or something that we need to document and encourage?

One thing that this makes me think of is a potential RFE for zonecfg. It would be nice to be able to somehow have an include operator, so that you can pull in common segments to be added to each zone configuration. But maybe the right way to do this is to just do this in a script.

Thoughts? Comments?

Saturday May 06, 2006

Atlanta OpenSolaris User Group - May 9

I can't believe that I've let things go so far away from me that my last post was in November. Here it is May already! Lots of news from ATLOSUG (Atlanta OpenSolaris User Group). Since November, we have had a couple of meetings in January and March, moved around trying to find a better venue.

The last meeting was a great overview of ZFS, given by George Wilson, one of the engineers involved in the port of ZFS from Nevada back to Solaris 10. This was the first meeting held at the Sun office in Alpharetta, Georgia. Had a great turnout. Lots of discussion and questions. We could have gone on for another hour or more. Expect to hear more from George on ZFS in the coming months.

The next meeting of ATLOSUG will be May 9 at 7PM at the Sun office. Check the ATLOSUG site for directions and details. Matrix Resources is sponsoring this meeting for us, and we thank them for their support (and for the refreshments!). Our topic for this meeting is BrandZ - Running Linux applications in a Solaris zone. Expect to see some slides, and then a bunch of demos of how to build, install branded zones, running applications in zones, and then some cool interactions of zones and ZFS, zones and DTrace.

Another piece of news regarding the ATLOSUG, starting with the May meeting, we are altering the schedule to meet monthly rather bimonthly. Seems like there is enough going on and enough people interested to keep us going at that pace. So, the next meeting will be June 12, at the Sun office.

Regarding the location, admittedly, in the Atlanta area, meeting locations are a challenge. As a stand-alone user group, we need a meeting location that doesn't cost very much, is accessible in the evening, and is as convenient to some large portion of the city as possible. Meeting downtown or midtown is inconvenient to many folks on the north side. Meeting on the north side (at the Sun office, for example) makes attendance near impossible for ITP folks. Clearly, around the Perimeter is the best bet, but everything we have found so far is expensive. So, if you have an idea for a location on the top end of the perimeter that's cheap, accessible, and available, please let me know.

And to anyone in Atlanta, we look forward to seeing you on May 9!

Wednesday Nov 09, 2005

All I can say is Wow!

The Atlanta OpenSolaris User Group kicked off last night with just over 50 attendees! There were about 30 who had signed up beforehand, and I would have been happy with 20 for this first meeting. I was floored that we had SRO. All the food was gone; all the soda was gone; all the shirts were gone! Scott & George demo OpenSolaris

The crowd braved the fierce Atlanta traffic to convene at the Crowne Plaza Ravinia hotel. Our future meetings will be held on campus at Georgia Tech, where we hope that students will get involved with OpenSolaris. As it turns out, the Atlanta Telecom Professionals were having their annual awards Gala at the same hotel, so it really was a case of braving the crowds and traffic.

But, just over 50 people turned out from all over town. Customers, iForce partners, recruiters, integrators, universities, commercial, Sun engineers - all sorts of folks.

As this was an organizational event, we talked about meeting mechanics, frequency, etc. As I said, our future meetings, held the 2nd Tuesday of odd-numbered months, will be in the Georgia Tech Student Center in mid-town Atlanta at 7:00PM, with networking and refreshments starting around 6:30. We're taking a lesson from the venerable Atlanta Unix Users Group and not trying to get complicated or fancy in our structure. Each meeting will include time for discussion, Q&A, and a presentation. We invite partners to sponsor meetings and help defray the cost of the refreshments, etc.

Our kickoff presentation was an overview of OpenSolaris. Much thanks to Jerry Jelinek, whose slides provided a lot of background. You can find a recap of the meeting with photos and the slides here.

I think we're off to a great start! We have sponsors fighting over who gets to sponsor upcoming meetings, and we have speakers volunteering for most of the next year already! We may have to meet more frequently to get the speakers in.

Thanks so much to the folks who have been a great help, and will continue to be a great help - Crystal Nichols from Intelligent Technology Systems for covering logistics, and to George Wilson and Don Deal from Sun's Sustaining Engineering group for technical backup.

We'll see everyone at the next meeting on January 10!

Tuesday Oct 18, 2005

Atlanta OpenSolaris User Group Launches

We are kicking off an OpenSolaris User Group in the Atlanta area. Several customers have asked me about whether such a thing existed. Since it didn't, we're starting one! The first meeting will be on Tuesday, November 8, 7:00-9:00 PM, at the Crowne Plaza Ravinia hotel in Atlanta.

Subsequent meetings will be on the campus of Georgia Tech, in the Student Center (the room was already booked for the first meeting). We will meet on the second Tuesday of January, March, May, July, September, and November (every other month).

Both the web site and email discussion list are live already.

For the kickoff meeting, I will talk about OpenSolaris, what it is, and how to get involved. But we hope to have community members present at many of our subsequent meetings. Ryan Matteson from Earthlink has already volunteered to speak about an article he wrote on the DTrace Toolkit.

We hope that anyone in the Atlanta area interested in OpenSolaris, as well as the commercially distributed Solaris, will come by help us get a great start!


Interesting bits about Solaris, Virtualization, and Ops Center


« July 2015